Planet Code4Lib

2021 AMIA Cross-Pollinator: Justine Thomas / Digital Library Federation

Justine ThomasThe Association of Moving Image Archivists (AMIA) and DLF will be sending Justine Thomas to attend the 2021 virtual DLF/AMIA Hack Day and AMIA spring conference! As this year’s “cross-pollinator,” Justine will enrich both the Hack Day event and the AMIA conference, sharing a vision of the library world from her perspective.

About the Awardee

Justine Thomas (@JustineThomasM) is currently a Digital Programs Contractor at the National Museum of American History (NMAH) focusing on digital asset management and collections information support. Prior to graduating in 2019 with a Master’s in Museum Studies from the George Washington University, Justine worked at NMAH as a collections processing intern in the Archives Center and as a Public Programs Facilitator encouraging visitors to discuss American democracy and social justice issues.


About Hack Day and the Award






The seventh AMIA+DLF Hack Day (online April 1-15) will be a unique opportunity for practitioners and managers of digital audiovisual collections to join with developers and engineers to remotely collaborate to develop solutions for digital audiovisual preservation and access.

The goal of the AMIA + DLF Award is to bring “cross-pollinators”–developers and software engineers who can provide unique perspectives to moving image and sound archivists’ work with digital materials, share a vision of the library world from their perspective, and enrich the Hack Day event–to the conference.

Find out more about this year’s Hack Day activities here.

The post 2021 AMIA Cross-Pollinator: Justine Thomas appeared first on DLF.

Evergreen 3.7-rc available / Evergreen ILS

The Evergreen Community is pleased to announce the availability of the release candidate for Evergreen 3.7. This release follows up on the recent beta release. The general release of 3.7.0 is planned for Wednesday, 14 April 2021. Between now and then, please download the release candidate and try it out.

Additional information, including a full list of new features, can be found in the release notes.

Intro to the fediverse / Jez Cope

Wow, it turns out to be 10 years since I wrote this beginners guide to Twitter. Things have moved on a loooooong way since then.

Far from being the interesting, disruptive technology it was back then, Twitter has become part of the mainstream, the establishment. Almost everyone and everything is on Twitter now, which has both pros and cons.

So what’s the problem?

It’s now possible to follow all sorts of useful information feeds, from live updates on transport delays to your favourite sports team’s play-by-play performance to an almost infinite number of cat pictures. In my professional life it’s almost guaranteed that anyone I meet will be on Twitter, meaning that I can contact them to follow up at a later date without having to exchange contact details (and they have options to block me if they don’t like that).

On the other hand, a medium where everyone’s opinion is equally valid regardless of knowledge or life experience has turned some parts of the internet into a toxic swamp of hatred and vitriol. It’s easier than ever to forget that we have more common ground with any random stranger than we have similarities, and that’s led to some truly awful acts and a poisonous political arena.

Part of the problem here is that each of the social media platforms is controlled by a single entity with almost no accountability to anyone other than shareholders. Technological change has been so rapid that the regulatory regime has no idea how to handle them, leaving them largely free to operate how they want. This has led to a whole heap of nasty consequences that many other people have done a much better job of documenting than I could (Shoshana Zuboff’s book The Age of Surveillance Capitalism is a good example). What I’m going to focus on instead are some possible alternatives.

If you accept the above argument, one obvious solution is to break up the effective monopoly enjoyed by Facebook, Twitter et al. We need to be able to retain the wonderful affordances of social media but democratise control of it, so that it can never be dominated by a small number of overly powerful players.

What’s the solution?

There’s actually a thing that already exists, that almost everyone is familiar with and that already works like this.

It’s email.

There are a hundred thousand email servers, but my email can always find your inbox if I know your address because that address identifies both you and the email service you use, and they communicate using the same protocol, Simple Mail Transfer Protocol (SMTP)1. I can’t send a message to your Twitter from my Facebook though, because they’re completely incompatible, like oil and water. Facebook has no idea how to talk to Twitter and vice versa (and the companies that control them have zero interest in such interoperability anyway).

Just like email, a federated social media service like Mastodon allows you to use any compatible server, or even run your own, and follow accounts on your home server or anywhere else, even servers running different software as long as they use the same ActivityPub protocol.

There’s no lock-in because you can move to another server any time you like, and interact with all the same people from your new home, just like changing your email address. Smaller servers mean that no one server ends up with enough power to take over and control everything, as the social media giants do with their own platforms. But at the same time, a small server with a small moderator team can enforce local policy much more easily and block accounts or whole servers that host trolls, nazis or other poisonous people.

How do I try it?

I have no problem with anyone for choosing to continue to use what we’re already calling “traditional” social media; frankly, Facebook and Twitter are still useful for me to keep in touch with a lot of my friends. However, I do think it’s useful to know some of the alternatives if only to make a more informed decision to stick with your current choices. Most of these services only ask for an email address when you sign up and use of your real name vs a pseudonym is entirely optional so there’s not really any risk in signing up and giving one a try. That said, make sure you take sensible precautions like not reusing a password from another account.

Instead of… Try…
Twitter, Facebook Mastodon, Pleroma, Misskey
Slack, Discord, IRC Matrix
WhatsApp, FB Messenger, Telegram Also Matrix
Instagram, Flickr PixelFed
YouTube PeerTube
The web Interplanetary File System (IPFS)

  1. Which, if you can believe it, was formalised nearly 40 years ago in 1982 and has only had fairly minor changes since then! ↩︎

Third English round table on next generation metadata: investing in the utility of authorities and identifiers / HangingTogether

Thanks to George Bingham, UK Account Manager at OCLC, for contributing this post as part of the Metadata Series blog posts. 

OCLC metadata discussion series

As part of the OCLC Research Discussion Series on Next Generation Metadata, this blog post reports back from the third English language round table discussion held on March 23, 2021.  The session was scheduled to facilitate a UK-centric discussion with a panel of library representatives from the UK with backgrounds in bibliographic control, special collections, collections management, metadata standards and computer science – a diverse and engaged discussion group.

Mapping exercise

Map of next-gen metadata projects (third English session)

As with other round table sessions, the group started with mapping next generation metadata projects that participants were aware of, on a 2×2 matrix characterizing the application area: bibliographic data, cultural heritage data, research information management (RIM) data, and for anything else, the category, “Other”. The resulting map gave a nice overview of some of the building blocks of the emerging next generation metadata infrastructure, focussing in this session on the various national and international identifier initiatives – ISNI, VIAF, FAST, LC/NACO authority file and LC/SACO subject lists, and ORCID – and metadata and linked data infrastructure projects such as Plan-M (an initiative, facilitated by Jisc, to rethink the way that metadata for academic and specialist libraries is created, sold, licensed, shared, and re-used in the UK), BIBFrame and OCLC’s Shared Entity Management Infrastructure.

The map also raises interesting questions about some of the potential or actual obstacles to the spread of next generation metadata:

What to do about missing identifiers? How to incorporate extant regional databases and union catalogs into the national and international landscape? How “open” are institutions’ local archive management systems? Who is willing to pay for linked data?   

Contributing to Library of Congress authorities

The discussion panel agreed that there is a pressing need for metadata to be less hierarchical, which linked data delivers, and that a collaborative approach is the best way forward. One example is the development of the UK funnel for NACO and SACO, which has reinforced the need for a more national approach in the UK. The funnel allows the UK Higher Education institutions to contribute to the LC name and subject authorities using a single channel – rather than each library setting up its own channel. Because they work together as a group to make their contributions to the authority files, the quality and the “authority” of their contributions is significantly increased.

Registering and seeding ISNIs

One panelist reported on a one-year trial with ISNI for the institution’s legal deposit library, as a first step into working with linked data. It is hoped that it will prove to be a sustainable way forward. There is considerable enthusiasm and interest for this project amongst the institution’s practitioners, a vital ingredient for a successful next generation metadata initiative.

Another panelist expanded on several ongoing projects with the aim of embedding ISNI identifiers within the value chain and getting them out to where cataloguers can pick them up. For example, publishers are starting to use them in their ONIX feeds to enable them to create clusters of records. Also, cataloging agencies in the UK are being supplied with ISNI identifiers so that they can embed them in the metadata at source, in the cataloging-in-publication (CIP) metadata, that they supply to libraries in the UK.

Efforts are also under way to systematically match ISNI entries against VIAF entries, and to provide a reconciliation file to enable OCLC to update the VIAF with the most recent ISNI. These could then be fed through to the Library of Congress, who can then use these to update NACO files.

With 6 million files to update, this is a perfect example of a leading edge dynamic next generation metadata initiative that will have to overcome the considerable challenge of scalability for it to succeed at a global level.

Challenges faced by identifiers

The discussion moved on to the other challenges faced by identifier schemes. It was noted that encouraging a more widespread collaborative approach would rely on honesty amongst the contributors. There would need to be built in assurances that the tags/data come from a trusted source. Would the more collaborative approach introduce too much scope for duplicate identifiers being created, and too many variations on preferred names? Cultural expectations would have to be clearly defined and adhered to. And last but by no means least is the challenge of providing the resources needed to upscale to a national and international scope.

Obstacles in moving towards next generation metadata 

Participants raised concerns that library management systems are not keeping pace with current discussions on next generation metadata or with real world implementations, to the extent that they may be the biggest obstacle in the move towards next generation metadata. It was recognized that moving to linked data involves a big conceptual and technical leap from the current string-based metadata creation, sharing and management practices, tools and methodologies.

Progress can only be made in small steps, and there is still much work to be done to demonstrate the benefits of next generation metadata, a prerequisite if we are to complete the essential step of gaining the support of senior management and buy-in from system suppliers.  

If we don’t lead, will someone else take over?

Towards the end of the session, a brief discussion arose around the possibility (and danger) of organizations outside the library sector “taking over” if we can’t manage the transition ourselves. Amazon was cited as already becoming regarded as a good model to follow for metadata standards, despite what we know to be its shortcomings: it does not promote high quality data, and there are numerous problems concealed within the data, that are not evident to non-professionals. These quality issues would become very problematic if they are allowed to become pervasive in the global metadata landscape.

Our insistence on ‘perfect data’ is a good thing, but are people just giving up on it because it’s too difficult to attain?”   

About the OCLC Research Discussion Series on Next Generation Metadata

In March 2021, OCLC Research conducted a discussion series focused on two reports: 

  1. Transitioning to the Next Generation of Metadata” 
  2. Transforming Metadata into Linked Data to Improve Digital Collection Discoverability: A CONTENTdm Pilot Project”. 

The round table discussions were held in different European languages and participants were able share their own experiences, get a better understanding of the topic area, and gain confidence in planning ahead. 

The Opening Plenary Session opened the forum for discussion and exploration and introduced the theme and its topics. Summaries of all eight round table discussions are published on the OCLC Research blog, Hanging Together. This post is preceded by the posts reporting on the first English session, the Italian session, the second English session, the French session, the German session, and the Spanish session.

The Closing Plenary Session on April 13 will synthesize the different round table discussions. Registration is still open for this webinar: please join us

The post Third English round table on next generation metadata: investing in the utility of authorities and identifiers appeared first on Hanging Together.

More Thoughts on Pre-recording Conference Talks / Peter Murray

Over the weekend, I posted an article here about pre-recording conference talks and sent a tweet about the idea on Monday. I hoped to generate discussion about recording talks to fill in gaps—positive and negative—about the concept, and I was not disappointed. I’m particularly thankful to Lisa Janicke Hinchliffe and Andromeda Yelton along with Jason Griffey, Junior Tidal, and Edward Lim Junhao for generously sharing their thoughts. Daniel S and Kate Deibel also commented on the Code4Lib Slack team. I added to the previous article’s bullet points and am expanding on some of the issues here. I’m inviting everyone mentioned to let me know if I’m mischaracterizing their thoughts, and I will correct this post if I hear from them. (I haven’t found a good comments system to hook into this static site blog.)

Pre-recorded Talks Limit Presentation Format

Lisa Janicke Hinchliffe made this point early in the feedback:

Jason described the “flipped classroom” model that he had in mind as the NISOplus2021 program was being developed. The flipped classroom model is one where students do the work of reading material and watching lectures, then come to the interactive time with the instructors ready with questions and comments about the material. Rather than the instructor lecturing during class time, the class time becomes a discussion about the material. For NISOplus, “the recording is the material the speaker and attendees are discussing” during the live Zoom meetings.

In the previous post, I described how the speaker could respond in text chat while the recording replay is beneficial. Lisa went on to say:

She described an example: the SSP preconference she ran at CHS. I’m paraphrasing her tweets in this paragraph. The preconference had a short keynote and an “Oprah-style” panel discussion (not pre-prepared talks). This was done live; nothing was recorded. After the panel, people worked in small groups using Zoom and a set of Google Slides to guide the group work. The small groups reported their discussions back to all participants.

Andromeda points out (paraphrasing twitter-speak): “Presenters will need much more— and more specialized—skills to pull it off, and it takes a lot more work.” And Lisa adds: “Just so there is no confusion … I don’t think being online makes it harder to do interactive. It’s the pre-recording. Interactive means participants co-create the session. A pause to chat isn’t going to shape what comes next on the recording.”

Increased Technical Burden on Speakers and Organizers

Andromeda also agreed with this: “I will say one of the things I appreciated about NISO is that @griffey did ALL the video editing, so I was not forced to learn how that works.” She continued, “everyone has different requirements for prerecording, and in [Code4Lib’s] case they were extensive and kept changing.” And later added: “Part of the challenge is that every conference has its own tech stack/requirements. If as a presenter I have to learn that for every conference, it’s not reducing my workload.”

It is hard not to agree with this; a high-quality (stylistically and technically) recording is not easy to do with today’s tools. This is also a technical burden for meeting organizers. The presenters will put a lot of work into talks—including making sure the recordings look good; whatever playback mechanism is used has to honor the fidelity of that recording. For instance, presenters who have gone through the effort to ensure the accessibility of the presentation color scheme want the conference platform to display the talk “as I created it.”

The previous post noted that recorded talks also allow for the creation of better, non-real-time transcriptions. Lisa points out that presenters will want to review that transcription for accuracy, which Jason noted adds to the length of time needed before the start of a conference to complete the preparations.

Increased Logistical Burden on Presenters

This is a consideration I hadn’t thought through—that presenters have to devote more clock time to the presentation because first they have to record it and then they have to watch it. (Or, as Andromeda added, “significantly more than twice the time for some people, if they are recording a bunch in order to get it right and/or doing editing.”)

No. Audience. Reaction.

Wow, yes. I imagine it would take a bit of imagination to get in the right mindset to give a talk to a small camera instead of an audience. I wonder how stand-up comedians are dealing with this as they try to put on virtual shows. Andromeda summed this up:

Also in this heading could be “No Speaker Reaction”—or the inability for subsequent speakers at a conference to build on something that someone said earlier. In the Code4Lib Slack team, Daniel S noted: “One thing comes to mind on the pre-recording [is] the issue that prerecorded talks lose the ‘conversation’ aspect where some later talks at a conference will address or comment on earlier talks.” Kate Deibel added: “Exactly. Talks don’t get to spontaneously build off of each other or from other conversations that happen at the conference.”

Currency of information

Lisa points out that pre-recording talks before en event means there is a delay between the recording and the playback. In the example she pointed out, there was a talk at RLUK that pre-recorded would have been about the University of California working on an Open Access deal with Elsevier; live, it was able to be “the deal we announced earlier this week”.


Near the end of the discussion, Lisa added:

…and Andromeda added: “Strong agree here. I understand that this year everyone was making it up as they went along, but going forward it’d be great to know that in advance.”

That means conferences will need to take these needs into account well before the Call for Proposals (CfP) is published. A conference that is thinking now about pre-recording their talks must work through these issues and set expectations with presenters early.

As I hoped, the Twiter replies tempered my eagerness for the all-recorded style with some real-world experience. There could be possibilities here, but adapting face-to-face meetings to a world with less travel won’t be simple and will take significant thought beyond the issues of technology platforms.

Edward Lim Junhao summarized this nicely: “I favor unpacking what makes up our prof conferences. I’m interested in recreating that shared experience, the networking, & the serendipity of learning sth you didn’t know. I feel in-person conferences now have to offer more in order to justify people traveling to attend them.”

Related, Andromeda said: “Also, for a conf that ultimately puts its talks online, it’s critical that it have SOMEthing beyond content delivery during the actual conference to make it worth registering rather than just waiting for youtube. realtime interaction with the speaker is a pretty solid option.”

If you have something to add, reach out to me on Twitter. Given enough responses, I’ll create another summary. Let’s keep talking about what that looks like and sharing discoveries with each other.

The Tree of Tweets

It was a great discussion, and I think I pulled in the major ideas in the summary above. With some guidance from Ed Summers, I’m going to embed the Twitter threads below using Treeverse by Paul Butler. We might be stretching the boundaries of what is possible, so no guarantees that this will be viewable for the long term.

Should All Conference Talks be Pre-recorded? / Peter Murray

The Code4Lib conference was last week. That meeting used all pre-recorded talks, and we saw the benefits of pre-recording for attendees, presenters, and conference organizers. Should all talks be pre-recorded, even when we are back face-to-face?

Note! After I posted a link to this article on Twitter, there was a great response of thoughtful comments. I've included new bullet points below and summarized the responses in another blog post.

As an entirely virtual conference, I think we can call Code4Lib 2021 a success. Success ≠ Perfect, of course, and last week the conference coordinating team got together on a Zoom call for a debriefing session. We had a lengthy discussion about what we learned and what we wanted to take forward to the 2022 conference, which we’re anticipating will be something with a face-to-face component.

That last sentence was tough to compose: “…will be face-to-face”? “…will be both face-to-face and virtual”? (Or another fully virtual event?) Truth be told, I don’t think we know yet. I think we know with some certainty that the COVID pandemic will become much more manageable by this time next year—at least in North America and Europe. (Code4Lib draws from primarily North American library technologists with a few guests from other parts of the world.) I’m hearing from higher education institutions, though, that travel is going to be severely curtailed…if not for health risk reasons, then because budgets have been slashed. So one has to wonder what a conference will look like next year.

I’ve been to two online conferences this year: NISOplus21 and Code4Lib. Both meetings recorded talks in advance and started playback of the recordings at a fixed point in time. This was beneficial for a couple of reasons. For organizers and presenters, pre-recording allowed technical glitches to be worked through without the pressure of a live event happening. Technology is not nearly perfect enough or ubiquitously spread to count on it working in real-time. 1 NISOplus21 also used the recordings to get transcribed text for the videos. (Code4Lib used live transcriptions on the synchronous playback.) Attendees and presenters benefited from pre-recording because the presenters could be in the text chat channel to answer questions and provide insights. Having the presenter free during the playback offers new possibilities for making talks more engaging: responding in real-time to polls, getting forehand knowledge of topics for subsequent real-time question/answer sessions, and so forth. The synchronous playback time meant that there was a point when (almost) everyone was together watching the same talk—just as in face-to-face sessions.

During the Code4Lib conference coordinating debrief call, I asked the question: “If we saw so many benefits to pre-recording talks, do we want to pre-record them all next year?” In addition to the reasons above, pre-recorded talks benefit those who are not comfortable speaking English or are first-time presenters. (They have a chance to re-do their talk as many times as they need in a much less stressful environment.) “Live” demos are much smoother because a recording can be restarted if something goes wrong. Each year, at least one presenter needs to use their own machine (custom software, local development environment, etc.), and swapping out presenter computers in real-time is risky. And it is undoubtedly easier to impose time requirements with recorded sessions. So why not pre-record all of the talks?

I get it—it would be different to sit in a ballroom watching a recording play on big screens at the front of the room while the podium is empty. But is it so different as to dramatically change the experience of watching a speaker at a podium? In many respects, we had a dry-run of this during Code4Lib 2020. It was at the early stages of the coming lockdowns when institutions started barring employee travel, and we had to bring in many presenters remotely. I wrote a blog post describing the setup we used for remote presenters, and at the end, I said:

I had a few people comment that they were taken aback when they realized that there was no one standing at the podium during the presentation.

Some attendees, at least, quickly adjusted to this format.

For those with the means and privilege of traveling, there can still be face-to-face discussions in the hall, over meals, and social activities. For those that can’t travel (due to risks of traveling, family/personal responsibilities, or budget cuts), the attendee experience is a little more level—everyone is watching the same playback and in the same text backchannels during the talk. I can imagine a conference tool capable of segmenting chat sessions during the talk playback to “tables” where you and close colleagues can exchange ideas and then promote the best ones to a conference-wide chat room. Something like that would be beneficial as attendance grows for events with an online component, and it would be a new form of engagement that isn’t practical now.

There are undoubtedly reasons not to pre-record all session talks (beyond the feels-weird-to-stare-at-an-unoccupied-ballroom-podium reasons). During the debriefing session, one person brought up that having all pre-recorded talks erodes the justification for in-person attendance. I can see a manager saying, “All of the talks are online…just watch it from your desk. Even your own presentation is pre-recorded, so there is no need for you to fly to the meeting.” That’s legitimate.

So if you like bullet points, here’s how it lays out. Pre-recording all talks is better for:

  • Accessibility: better transcriptions for recorded audio versus real-time transcription (and probably at a lower cost, too)
  • Engagement: the speaker can be in the text chat during playback, and there could be new options for backchannel discussions
  • Better quality: speakers can re-record their talk as many times as needed
  • Closer equality: in-person attendees are having much the same experience during the talk as remote attendees

Downsides for pre-recording all talks:

  • Feels weird: yeah, it would be different
  • Erodes justification: indeed a problem, especially for those for whom giving a speech is the only path to getting the networking benefits of face-to-face interaction
  • Limits presentation format: it forces every session into being a lecture. For two decades CfPs have emphasized how will this season be engaging/not just a talking head? (Lisa Janicke Hinchliffe)
  • Increased Technical Burden on Speaker and Organizers: conference organizers asking presenters to do their own pre-recording is a barrier (Junior Tidal), and organizers have added new requirements for themselves
  • No Audience Feedback: pre-recording forces the presenter into an unnatural state relative to the audience (Andromeda Yelton)
  • Currency of information: pre-recording talks before en event naturally introduces a delay between the recording and the playback. (Lisa Janicke Hinchliffe)

I’m curious to hear of other reasons, for and against. Reach out to me on Twitter if you have some. The COVID-19 pandemic has changed our society and will undoubtedly transform it in ways that we can’t even anticipate. Is the way that we hold professional conferences one of them?

  1. Can we just pause for a moment and consider the decades of work and layers of technology that make a modern teleconference call happen? For you younger folks, there was a time when one couldn’t assume the network to be there. As in: the operating system on your computer couldn’t be counted on to have a network stack built into it. In the earliest years of my career, we were tickled pink to have Macintoshes at the forefront of connectivity through GatorBoxes. Go read the first paragraph of that Wikipedia article on GatorBoxes…TCP/IP was tunneled through LocalTalk running over PhoneNet on unshielded twisted pairs no faster than about 200 kbit/second. (And we loved it!) Now the network is expected; needing to know about TCP/IP is pushed so far down the stack as to be forgotten…assumed. Sure, the software on top now is buggy and bloated—is my Zoom client working? has Zoom’s service gone down?—but the network…we take that for granted. 

Upcoming DIG Sprint / Islandora

Upcoming DIG Sprint agriffith Thu, 04/08/2021 - 20:03

The Islandora Documentation Interest Group is holding a sprint!

To support the upcoming release of Islandora, the DIG has planned a 2-week documentation, writing-and-updating sprint to occur as part of the release process. To prepare for that effort, we’re going to spend April 19 – 30th on an Auditing Sprint, where volunteers will review existing documentation and complete this spreadsheet, providing a solid overview of the current status of our docs so we know where to best deploy our efforts during the release. This sprint will run alongside the upcoming Pre-Release Code Sprint, so if you’re not up for coding, auditing docs is a great way to contribute during sprint season!

We are looking for volunteers to sign up to take on two sprint roles:

Auditor: Review a page of documentation and fill out a row in the spreadsheet indicating things like the current status (‘Good Enough’ or ‘Needs Work’) , the goal for that particular page (e.g., “Explain how to create an object,” or “Compare Islandora 7 concepts to Islandora 8 concepts”), and the intended audience (Beginners, developers, etc.).

Reviewer: Read through a page that has been audited and indicate if you agree with the auditor’s assessment, add additional notes or suggestions as needed; basically, give a second set of eyes on each page.

 You can sign up for the sprint here, and sign up for individual pages here.


Registration now open for Samvera Virtual Connect, April 20 – 21 / Samvera

Registration is now open for Samvera Virtual Connect 2021! Samvera Virtual Connect will take place April 20th -21st from 11am – 2pm EDT. Registration is free and open to anyone with an interest in Samvera.

This year’s program is packed with presentations and lightning talks of interest to developers, managers, librarians, and other current or potential Samvera Community participants and technology users.

Register and view the full program on the Samvera wiki.

The post Registration now open for Samvera Virtual Connect, April 20 – 21 appeared first on Samvera.

2021 DLF Forum, DigiPres, and Learn@DLF Calls for Proposals / Digital Library Federation

Join us online

We’re delighted to share that it’s CFP season for CLIR’s annual events.

Based on community feedback, we’ve made the decision to take our events online again in 2021. We look forward to new and better ways to come together—as always, with community at the center.

Our events will take place on the following dates:

For all events, we encourage proposals from members and non-members; regulars and newcomers; digital library practitioners and those in adjacent fields such as institutional research and educational technology; and students, early-career professionals and senior staff alike. Proposals to more than one event are permitted, though please submit different proposals for each. 

The DLF Forum and Learn@DLF CFP is here: 

NDSA’s Digital Preservation 2021: Embracing Digitality CFP is here:

Session options range from 5-minute lighting talks at the Forum to half-day workshops at Learn@DLF, with many options in between.

The deadline for all opportunities is Monday, May 17, at 11:59pm Eastern Time.

If you have any questions, please write to us at, and be sure to subscribe to our Forum newsletter to stay up on all Forum-related news. We’re looking forward to seeing you this fall.

-Team DLF

The post 2021 DLF Forum, DigiPres, and Learn@DLF Calls for Proposals appeared first on DLF.

What did you do in the lockdowns PT? Part 1 - Music Videos / Peter Sefton

Post looks too long? Don't want to read? Here's the summary. Last year Gail McGlinn* and I did the lockdown home-recording thing. We put out at least one song video per week for a year (and counting - we're up to 58 over 53 weeks). Searchable, sortable website here. We learned some things, got better at performing for the phone camera and our microphones and better at mixing and publishing the result.

* Disclosure Gail's my wife. We got married; she proposed, I accepted.

I may I might - Is this the world's best marriage proposal acceptance song? (It did win a prize at a Ukulele festival for best song)

(This post is littered with links to our songs, sorry but there are 58 of them and someone has to link to them.)

In the second quarter of 2020 Gail McGlinn and I went from playing and singing in community music events (jams, gigs, get togethers) at least once a week to being at home every evening, like everyone else. Like lots of people we decided to put our efforts into home recording, not streaming cos that would be pointless for people with basically no audience, but we started making videos and releasing them under our band name Team Happy.

By release I mean "put on Facebook" and "sometimes remember to upload to YouTube".

This post is about that experience and what we learned.

Team Happy is the name we use to perform as a duo at open mic events and the odd community or ukulele festival. We were originally called "The Narrownecks" in honour of where we live, for one gig, but then we found out there's another group with that name. Actually they're much better than us, just go watch them.

Coming in to 2020 we already had a YouTube channel and it had a grand total of two videos on it with a handful of views - as in you could count them on your fingers. It's still a sad thing to behold, how many views we have - but it's not about views it's about getting discovered and having our songs performed by, oh I dunno, Casey Chambers? Keith Urban? (Oh yeah, that would mean we'd need views. Bugger.) Either that or it's about our personal journey and growth as people. Or continuing to contribute to our local music communities in lockdown (which is what Gail says it's about.). Seriously though, we think I called your name and Dry Pebbles would go well on someone else's album.

Dry Pebbles, by Gail McGlinn - a song written tramping through the bush.

I called your name by Peter Sefton

Anyway, in late March we got out our recording gear and started. While phone cameras are fine for the quality of video we need, we wanted to do better than phone-camera sound. (Here's an example of that sound from one of our first recordings on my song Seventeen - it's pretty muddy, like the lighting.)

Seventeen by Peter Sefton

Initial attempts to get good audio involved feeding USB-audio from a sound mixer with a built in audio interface (a Yamaha MX10) into the phone itself and recording an audio track with the video - but this is clunky and you only get two tracks even though the mixer has multiple inputs. We soon graduated to using a DAW - a Digital Audio Workstation with our mixer, still only two tracks but much less mucking around with the phone.

So this is more or less what we ended up with for the first few weeks - We'd record or "track" everything on the computer and then use it again to mix.

Our first-generation recording rig with annoying recording via a laptop

There's a thing you have to do to audio files called mastering which means getting them to a suitable volume level and dynamic range for distribution. Without it loud stuff is too quiet and quiet stuff is too quiet, and the music has no punch. This was a complete mystery to me to start with so I paid for online services that use AI to master tracks - kind of but not really making everything louder. At some point I started doing it myself, beginning the long process of learning the mysteries of compression and limiting and saving money. Haven't mastered it yet, though. Mastering is an actual profession, by the way and I'm not going to reach those heights.

In May, we got a new bit of gear, the Tascam Model 12 an all in one mixer-recorder-interface that lets you track (that is record tracks) without a computer - much easier to deal with. A bit later we got a Zoom H5 portable recorder with built in mics and a couple of extra tracks for instruments so we can do stuff away from home - this got used on our month-long holiday in March 2021. Well it was almost a month, but there was a Rain Event and we came home a bit early. These machines let you capture tracks, including adding new ones without touching the computer which is a big win as far as I am concerned.

Gail singing Closer to fine on The Strand in Townsville, in North Queensland, recorded on the H5 and (partly) mixed in the car on holidays.

After a bit, and depending on the level of lockdown we'd have guests around to visit and when that was happening, we kept our distance at either end of our long lounge room and used a phone camera and microphone at each end.

Our second-generation recording rig with stand-alone laptop-free tracking

This new setup made it much easier to do overdubs - capture more stuff into the Model 12 and make videos each time, like on this song of mine They Say Dancing where I overdubbed guitar and bass over a live track.

They Say Dancing by Peter Sefton

So what did we learn?

  1. Perfect is the enemy of Done. Well, we knew that, but if you've decided to release a song every week, even if you're away on a holiday, or there are other things going on then there's no time to obsess over details - you have to get better at getting a useable take quickly or you won't be able to keep going for a year or more.

  2. Practice may not make perfect, but it's a better investment than new gear, or doing endless takes with the cameras rolling. We got better at picking a song (or deciding to write one or finish one off), playing it for a week or two and then getting the take.

  3. Simplify! We learned that to get a good performance sometimes it was better for only one of us to play or sing, that fancy parts increased the chance of major errors, meaning yet another take. If in doubt (like my harmony singing that's always in doubt) we're learning to leave it out.

  4. Nobody likes us! Actually we know that's not true, some of the songs get hundreds of plays on Facebook but not many people actually click the like button, maybe twenty or so. But then you run into people in the supermarket; they say "love the songs keep it up"! And there are quite a few people who listen every week on FB we just can't tell they're enjoying it. There are complex reasons for this lack of engagement - some people don't like to like things so that (they think) the evil FB can't track them. I think the default auto-play for video might be a factor too - the video starts playing, and that might not be a good time, so people skip forward to something else.

    It's kind of demoralizing that it is MUCH easier to get likes with pictures of the dog.

    Puppies win every time

    Our spoiled covid-hound, Floki - about 18 months old. Much more likeable on the socials than our music.

  5. YouTube definitely doesn't like us. I figured that some of the songs we sang would attract some kind of Youtube audience - we often search to see what kinds of covers of songs are out there and thought others might find us the same way, but we get almost no views on that platform. I also thought that adding some text about the gear we used might bring in some views. For example we were pretty early adopters of the Tascam Model 12. I had tried to find out what one sounded like in real life before I bought, with no success - and I thought people might drop by to hear us, but I don't think Google/YouTube is giving us any search-juice at all.

Our personal favourites

Our Favourite cover we did (and we actually agreee on this - Team Happy is NOT an ironic name) was Colour my World. We'd just got the Tascam and Gail was able to double track herself - no mucking around with computers. We had fun that night.

Colour my World - one of our fave covers to perform

And my favourite original? Well i'm very proud of All L'Amour for you with lots of words and a bi-lingual pun - I wanted to do that on the local community radio just last weekend when we were asked in, but the host Richard 'Duck' Keegan kind of mentioned the aforementioned I Called Your Name so we did that instead along with Dry Pebbles and Seventeen.

All L'Amour for you The last word on love and metaphors for love? By Peter Sefton.

Gail's fave original? I may I might, the song that snagged her the best husband in South Katoomba over 1.95m tall. And she likes the tear jerker Goodbye Mongrel dog I wrote, on which she pays some pumpin' banjo.

Goodbye Mongrel dog - a song that says goodbye to a (deceased) Mongrel dog who went by the name of Spensa.

Music-tech stuff and mixing tips

For those of you who care, here's a roundup of the main bits of kit that work well. We've reached the point where there's actually nothing on the shopping list - we can do everything for the foreseeable future with what we have.

I have mentioned that we track using the Tascam Model 12 and the Zoom H5 - these are both great. The only drawback of the Zoom is that you can't see the screen (and thus the levels) from performance position. It also needed a better wind shield - I bought a dead-cat, shaggy thing to go over the mics that works if the wind is moderate.

When I bought the Tascam I thought it was going to be all analogue through the mixer stage like their Model 16 and Model 24, but no, it's all digital. I don't think this is an issue having used it but it was not something they made all that explicit at launch. There's a digital Zoom equivalent (the L12) which is a bit smaller, and has more headphone outputs but at the expense of having to do mode-switching to to access all the functions. I think the Tascam will be easier to use for live shows when those start happening again.

For video we just use our phones - for a while we had matching Pixel 4XLs then a Pixel 5 which drowned in a tropical stream. Yes they're waterproof, those models, but not when they have tiny cracks in the screen. No more $1000 phones for me.

Reaper is bloody marvelous software. It's cheap for non-commercial use, incredibly powerful and extensible. I have not used any other Digital Audio Workstation other than Garage Band, that comes for free on the Apple Platform but as far as I can see there's no reason for non-technophobic home producers to pay any more than the Reaper fee for something else.

Our mainstay mics are a slightly battered pair of Audio Technica AT2020s - we had these for performing live with Gail's band U4ria - everyone gathered around a condenser mic, bluegrass style. For recording we either put one at either end of the room or mount them vertically in an X/Y configuration - 90° to get stereo. They're fairly airy and have come to be a big part of our sound. We tried some other cheap things that didn't work very well, and I got a pair of Australian Rode M5 pencil condenser mics, not expensive, that I hoped might be easier to mount X/Y but we didn't like them for vocals at all, though they're great on stringed instruments. We do have an SM58 and SM57 -- gotta love a microphone with a wikipedia page -- which see occasional use as vocal mics if we want a more rock 'n roll sound, or the guest singer is more used to a close-mic. And the SM57 for guitar amps sometimes.

We tend to play our favourite acoustic instruments but when we have bass we use the Trace Elliot Elf amp which has a great compressor and a DI output (it can send a signal to the mixer/interface without going via the speaker). Sometimes we run the speaker and try not to let it bleed too much into the AT2020s, very occasionally we wear headphones for the first track and go direct so there's no bass bleed. I have done a bit of electric guitar with the Boss Katana 50 - to me it sounds good in the room that amp, but has not recorded well either via the headphone out or via an SM57. I get better results thru the bass amp. I don't have any kind of actual electric guitar tone sorted though I have seen lot of videos about how to achieve the elusive tone. Maybe one day.

One thing that I wasn't expecting to happen - I dropped the top E of my little Made in Mexico Martin OOO Jr guitar to D (you know, like Keef) some time early in 2020 and it ended up staying there. Gives some nice new chord voicings (9ths mostly) and it's the same top 4 strings as a 5 string banjo with some very easy-to-grab chords. Have started doing it to Ukuleles too, putting them in open C.

A note on the bass: Playing bass is fun (we knew that before we started) but mixing it so it can be heard on a phone speaker is a real challenge. One approach that helps is using an acoustic bass which out of a lot more high frequency than a solid body electric this also helps because you don't have to have an amp on while you're tracking it live, but you can take a direct input from a pickup (or two) AND mic the bass giving you lots of signals with different EQ to play with. I gaffa-taped a guitar humbucker into my Artist Guitars 5 string acoustic and it sounds huge.

The basic (ha!) trick I try to use for getting more high frequency for tiny speakers is to create a second track, saturate the signal with distortion and/or saturation effects to boost the upper harmonic content and then cut all the low frequency out and mix that so it can just be heard and imply the fundamental bass frequency in addition to the real bassy bass. Helps if you have some bridge pickup or under-saddle pickup in the signal if those are available and if you remember.

I also like to add some phaser effect that gives some motion in the upper frequencies - for example my Perfect Country Pop Song - too much phaser? Probably, but I can hear the bass on my phone and it bounces :). Phaser is Team Happy's favourite effect, nothing says perfect country pop (which is what we are, right?) like a phaser.

Perfect Country Pop Song - is it perfect or merely sublime? (This one has a cute puppy in it).

Everything I know about music production is from YouTube. Everything I know about song writing is from deep in my soul. Thank you for reading all the way to the bottom. Normal service will resume next week.

Collaborations Workshop 2021: collaborative ideas & hackday / Jez Cope

My last post covered the more “traditional” lectures-and-panel-sessions approach of the first half of the SSI Collaborations Workshop. The rest of the workshop was much more interactive, consisting of a discussion session, a Collaborative Ideas session, and a whole-day hackathon!

The discussion session on day one had us choose a topic (from a list of topics proposed leading up to the workshop) and join a breakout room for that topic with the aim of producing a “speed blog” by then end of 90 minutes. Those speed blogs will be published on the SSI blog over the coming weeks, so I won’t go into that in more detail.

The Collaborative Ideas session is a way of generating hackday ideas, by putting people together at random into small groups to each raise a topic of interest to them before discussing and coming up with a combined idea for a hackday project. Because of the serendipitous nature of the groupings, it’s a really good way of generating new ideas from unexpected combinations of individual interests.

After that, all the ideas from the session, along with a few others proposed by various participants, were pitched as ideas for the hackday and people started to form teams. Not every idea pitched gets worked on during the hackday, but in the end 9 teams of roughly equal size formed to spend the third day working together.

My team’s project: “AHA! An Arts & Humanities Adventure”

There’s a lot of FOMO around choosing which team to join for an event like this: there were so many good ideas and I wanted to work on several of them! In the end I settled on a team developing an escape room concept to help Arts & Humanities scholars understand the benefits of working with research software engineers for their research.

Five of us rapidly mapped out an example storyline for an escape room, got a website set up with GitHub and populated it with the first few stages of the game. We decided to focus on a story that would help the reader get to grips with what an API is and I’m amazed how much we managed to get done in less than a day’s work!

You can try playing through the escape room (so far) yourself on the web, or take a look at the GitHub repository, which contains the source of the website along with a list of outstanding tasks to work on if you’re interested in contributing.

I’m not sure yet whether this project has enough momentum to keep going, but it was a really valuable way both of getting to know and building trust with some new people and demonstrating the concept is worth more work.

Other projects

Here’s a brief rundown of the other projects worked on by teams on the day.

Coding Confessions
Everyone starts somewhere and everyone cuts corners from time to time. Real developers copy and paste! Fight imposter syndrome by looking through some of these confessions or contributing your own.
A template to set up a Raspberry Pi with everything you need to run a Carpentries ( data science/software engineering workshop in a remote location without internet access.
Research Dugnads
A guide to running an event that is a coming together of a research group or team to share knowledge, pass on skills, tidy and review code, among other software and working best practices (based on the Norwegian concept of a dugnad, a form of “voluntary work done together with other people”)
Collaborations Workshop ideas
A meta-project to collect together pitches and ideas from previous Collaborations Workshop conferences and hackdays, to analyse patterns and revisit ideas whose time might now have come.
Integrate existing tools to improve the machine-readable metadata attached to open research projects by integrating projects like SOMEF, codemeta.json and HowFAIRIs ( Complete with CI and badges!
Software end-of-project plans
Develop a template to plan and communicate what will happen when the fixed-term project funding for your research software ends. Will maintenance continue? When will the project sunset? Who owns the IP?
Habeas Corpus
A corpus of machine readable data about software used in COVID-19 related research, based on the CORD19 dataset.
Extend the all-contributors GitHub bot ( to include rich information about research project contributions such as the CASRAI Contributor Roles Taxonomy (

I’m excited to see so many metadata-related projects! I plan to take a closer look at what the Habeas Corpus, Credit-all and howDescribedIs teams did when I get time. I also really want to try running a dugnad with my team or for the GLAM Data Science network.

FAIR Data Management; It's a lifestyle not a lifecycle / Peter Sefton

I have been working with my colleague Marco La Rosa on summary diagrams that capture some important aspects of Research Data Management, and include the FAIR data principles; that data should be Findable, Accessible, Interoperable and Reusable.

But first, here's a rant about some modeling and diagramming styles and trends that I do not like.

I took part in a fun Twitter thread recently kicked off by Fiona Tweedie.

Fiona Tweedie @FCTweedie So my current bugbear is university processes that seem to forget that the actual work of higher ed is doing research and/ or teaching. This "research lifecycle" diagram from @UW is a stunning example:

The UW MyResearch Lifecycle with the four stages: Plan/Propose, Setup, Manage, and Closeout

In this tweet Dr Tweedie has called out Yet Another Research Lifecycle Diagram That Leaves Out The Process Of You Know, Actually Doing Research. This process-elision happened more than once when I was working as an eResearch manager - management would get in the consultants to look at research systems, talk to the research office and graduate school and come up with a "journey map" of administrative processes that either didn't mention the actual DOING research or represented it as a tiny segment, never mind that it's, you know, the main thing researchers do when they're being researchers rather than teachers or administrators.

At least the consultants would usually produce a 'journey map' that got you from point A to Point B using chevrons to >> indicate progress and didn't insist that everything was a 'lifecycle'.

Something like:

Plan / Propose  >> Setup  >> Manage / Do Research >> Closeout

But all too commonly processes are represented using the tired old metaphor of a lifecycle.

Reminder: A lifecycle is a biological process; how organisms come into existence, reproduce and die via various means including producing seeds, splitting themselves in two, um, making love, laying eggs and so on.

It's really stretching the metaphor to talk about research in this way - maybe the research outputs in the UW "closeout" phase are eggs that hatch into new bouncing baby proposals?

Regrettably, arranging things in circles and using the "lifecycle" metaphor is very common - see this Google image search for "Research Lifecycle":

I wonder if the diagramming tools that are available to people are part of the issue - Microsoft Word, for example can build cycles and other diagrams out of a bullet list.

(I thought it would be amusing to draw the UW diagram from above as a set cogs but this happened - you can only have 3 cogs in a Word diagram.)

Attempt to use Microsoft Word to make a diagram 4 cogs for Plan/Propose, Setup, Manage, and Closeout but it will only draw three of them

Research Data Management as a Cycle

Now that I've got that off my chest let's look at research data management. Here's a diagram which is in fairly wide use, from The University of California.

(This image has a CC-BY logo which means I can use it if I attribute it - but I'm not 100% clear on the original source of the diagram - it seems to be from UC somewhere.)

Marco used this one in some presentations we gave. I thought we could do better.

The good part of this diagram is that it shows research data management as a cyclical, recurring activity - which for FAIR data it needs to be.

What I don't like:

  1. I think it is trying to show a project (ie grant) level view of research with data management happening in ONE spot on the journey. Typically researchers do research all the time (or in between teaching or when they can get time on equipment) not at a particular point in some administrative "journey map". We often hear feedback that their research is a lifetime activity and does not happen the way administrators and IT think it does.

  2. "Archive" is shown as a single-step pre-publication. This is a terrible message; if we are to start really doing FAIR data then data need to be described and made findable and accessible ASAP.

  3. The big so-called lifecycle is (to me) very contrived and looks like a librarian view of the world with data searching as a stand-alone process before research data management planning. Not clear whether Publication means articles or data.

  4. "Data Search / Reuse" is a type of "Collection", and why is it happening before data management planning? "Re-Collection" is also a kind of collection, so we can probably collapse all those together (the Findable and Accessible in FAIR).

  5. It’s not clear whether Publication means articles or data or both.

  6. Most research uses some kind of data storage but very often not directly; people might be interacting with a lab notebook system or a data repository - at UTS we arrived at the concept of "workspaces" to capture this.

The "Minimum Viable FAIR Diagram"

Marco and I have a sketch of a new diagram that attempts to address these issues and addresses what needs to be in place for broad-scale FAIR data practice.

Two of the FAIR principles suggest services that need to be in place; ways to Find and Access data. The I and R in FAIR are not something that can be encapsulated in a service, as such, rather they imply that data are well described for re-use and Interoperation of systems and in Reusable formats.

As it happens, there is a common infrastructure component which encapsulates finding data and accessing; the repository. Repositories are services which hold data and make it discoverable and accessible, with governance that ensures that data does not change without notice and is available for access over agreed time frames - sometimes with detailed access control. Repositories may be general purpose or specialized around a particular type of data: gene sequences, maps, code, microscope images etc. They may also be ad-hoc - at a lab level they could be a well laid out, well managed file system.

Some well-funded disciplines have established global or national repositories and workflows for some or all of their data, notably physics and astronomy, bioinformatics, geophysical sciences, climate and marine science. Some of these may not be thought of by their community as repositories - but according to our functional definition they are repositories, even if they are "just" vast shared file systems or databases where everyone knows what's what and data managers keep stuff organized. Also, some institutions have institutional data repositories but it is by no means common practice across the whole of the research sector that data find their way into any of these repositories.

Remember: data storage is not all files-on-disks. Researchers use a very wide range of tools which may make data inaccessible outside of the tool. Examples include: cloud-based research (lab) notebook systems in which data is deposited alongside narrative activity logs; large shared virtual laboratories where data are uploaded; Secure eResearch Platforms (SERPs) which allow access only via virtualized desktops with severely constrained data ingress and egress; survey tools; content management systems; digital asset management systems; email (yes, it's true some folks use email as project archives!); to custom-made code for a single experiment.

Our general term for all of the infrastructures that researchers use for RDM day to day including general purpose storage is “workspaces”.

Many, if not most workspaces do not have high levels of governance, and data may be technically or legally inaccessible over the long term. They should not be considered as suitable archives or repositories - hence our emphasis on making sure that data can be described and deposited into general purpose, standards-driven repository services.

The following is a snapshot of the core parts of an idealised FAIR data service. It shows the activities that researchers undertake, acquiring data from observations, instruments and by reuse, conducting analysis and data description in a working environment, and depositing results into one or more repositories.

We wanted it to show:

  • That infrastructure services are required for research data management - researchers don't just "Archive" their data without support - they and those who will reuse data need repository services in some form.

  • That research is conducted using workspace environments - more infrastructure.

A work-in-progress sketch of FAIR research data management.

We (by which I mean Marco) will make this prettier soon.

And yes, there is a legitimate cycle in this diagram it's the FIND -> ACCESS -> REUSE -> DESCRIBE -> DEPOSIT cycle that's inherent in the FAIR lifestyle.

Things that might still be missing:

  • Some kind of rubbish bin - to show that workspaces are ephemeral and working data that doesn't make the cut may be culled, and that some data is held only for a time.

  • What do you think's missing?

Thoughts anyone? Comments below or take it up on twitter with @ptsefton.

(I have reworked parts of a document that Marco and I have been working on with Guido Aben for this document, and thanks to recent graduate Florence Sefton for picking up typos and sense-checking).

Elon Musk: Threat or Menace? / David Rosenthal

Although both Tesla and SpaceX are major engineering achievements, Elon Musk seems completely unable to understand the concept of externalities, unaccounted-for costs that society bears as a result of these achievements.

First, in Tesla: carbon offsetting, but in reverse, Jaime Powell reacted to Tesla taking $1.6B in carbon offsets which provided the only profit Tesla ever made and putting them into Bitcoin:
Looked at differently, a single Bitcoin purchase at a price of ~$50,000 has a carbon footprint of 270 tons, the equivalent of 60 ICE cars.

Tesla’s average selling price in the fourth quarter of 2020? $49,333.

We’re not sure about you, but FT Alphaville is struggling to square the circle of “buy a Tesla with a bitcoin and create the carbon output of 60 internal combustion engine cars” with its legendary environmental ambitions.

Unless, of course, that was never the point in the first place.
Below the fold, more externalities Musk is ignoring.

Second, there is Musk's obsession with establishing a colony on Mars. Even assuming SpaceX can stop their Starship second stage exploding on landing, and do the same with the much bigger first stage, the Mars colony scheme would have massive environmental impacts. Musk envisages a huge fleet of Starships ferrying people and supplies to Mars for between 40 and 100 years. The climate effects of dumping this much rocket exhaust into the upper atmosphere over such a long period would be significant. The idea that a world suffering the catastrophic effects of climate change could sustain such an expensive program over many decades simply for the benfit of a miniscule fraction of the population is laughable.

These externalities are in the future. But there are a more immediate set of externalities.

Back in 2017 I expressed my skepticism about "Level 5" self-driving cars in Techno-hype part 1, stressing that the problem was that to get to Level 5, or as Musk calls it "Full Self-Driving", you need to pass through the levels where the software has to hand-off to the human. And the closer you get to Level 5, the harder this problem becomes:
Suppose, for the sake of argument, that self-driving cars three times as good as Waymo's are in wide use by normal people. A normal person would encounter a hand-off once in 15,000 miles of driving, or less than once a year. Driving would be something they'd be asked to do maybe 50 times in their life.

Even if, when the hand-off happened, the human was not "climbing into the back seat, climbing out of an open car window, and even smooching" and had full "situational awareness", they would be faced with a situation too complex for the car's software. How likely is it that they would have the skills needed to cope, when the last time they did any driving was over a year ago, and on average they've only driven 25 times in their life? Current testing of self-driving cars hands-off to drivers with more than a decade of driving experience, well over 100,000 miles of it. It bears no relationship to the hand-off problem with a mass deployment of self-driving technology.
Mack Hogan's Tesla's "Full Self Driving" Beta Is Just Laughably Bad and Potentially Dangerous starts:
A beta version of Tesla's "Full Self Driving" Autopilot update has begun rolling out to certain users. And man, if you thought "Full Self Driving" was even close to a reality, this video of the system in action will certainly relieve you of that notion. It is perhaps the best comprehensive video at illustrating just how morally dubious, technologically limited, and potentially dangerous Autopilot's "Full Self Driving" beta program is.
Hogan sums up the lesson of the video:
Tesla's software clearly does a decent job of identifying cars, stop signs, pedestrians, bikes, traffic lights, and other basic obstacles. Yet to think this constitutes anything close to "full self-driving" is ludicrous. There's nothing wrong with having limited capabilities, but Tesla stands alone in its inability to acknowledge its own shortcomings.
Hogan goes on to point out the externalities:
When technology is immature, the natural reaction is to continue working on it until it's ironed out. Tesla has opted against that strategy here, instead choosing to sell software it knows is incomplete, charging a substantial premium, and hoping that those who buy it have the nuanced, advanced understanding of its limitations—and the ability and responsibility to jump in and save it when it inevitably gets baffled. In short, every Tesla owner who purchases "Full Self-Driving" is serving as an unpaid safety supervisor, conducting research on Tesla's behalf. Perhaps more damning, the company takes no responsibility for its actions and leaves it up to driver discretion to decide when and where to test it out.

That leads to videos like this, where early adopters carry out uncontrolled tests on city streets, with pedestrians, cyclists, and other drivers unaware that they're part of the experiment. If even one of those Tesla drivers slips up, the consequences can be deadly.
Of course, the drivers are only human so they do slip up:
the Tesla arrives at an intersection where it has a stop sign and cross traffic doesn't. It proceeds with two cars incoming, the first car narrowly passing the car's front bumper and the trailing car braking to avoid T-boning the Model 3. It is absolutely unbelievable and indefensible that the driver, who is supposed to be monitoring the car to ensure safe operation, did not intervene there.
An example of the kinds of problems that can be caused by autonomous vehicles behaving in ways that humans don't expect is reported by Timothy B. Lee in Fender bender in Arizona illustrates Waymo’s commercialization challenge:
A white Waymo minivan was traveling westbound in the middle of three westbound lanes on Chandler Boulevard, in autonomous mode, when it unexpectedly braked for no reason. A Waymo backup driver behind the wheel at the time told Chandler police that "all of a sudden the vehicle began to stop and gave a code to the effect of 'stop recommended' and came to a sudden stop without warning."

A red Chevrolet Silverado pickup behind the vehicle swerved to the right but clipped its back panel, causing minor damage. Nobody was hurt.
The Tesla in the video made a similar unexpected stop. Lee stresses that, unlike Tesla's, Waymo's responsible test program has resulted in a generally safe product, but not one that is safe enough:
Waymo has racked up more than 20 million testing miles in Arizona, California, and other states. This is far more than any human being will drive in a lifetime. Waymo's vehicles have been involved in a relatively small number of crashes. These crashes have been overwhelmingly minor with no fatalities and few if any serious injuries. Waymo says that a large majority of those crashes have been the fault of the other driver. So it's very possible that Waymo's self-driving software is significantly safer than a human driver.
The more serious problem for Waymo is that the company can't be sure that the idiosyncrasies of its self-driving software won't contribute to a more serious crash in the future. Human drivers cause a fatality about once every 100 million miles of driving—far more miles than Waymo has tested so far. If Waymo scaled up rapidly, it would be taking a risk that an unnoticed flaw in Waymo's programming could lead to someone getting killed.
I'm a pedestrian, cyclist and driver in an area infested with Teslas owned, but potentially not actually being driven, by fanatical early adopters and members of the cult of Musk. I'm personally at risk from these people believing that what they paid good money for was "Full Self Driving". When SpaceX tests Starship at their Boca Chica site they take precautions, including road closures, to ensure innocent bystanders aren't at risk from the rain of debris when things go wrong. Tesla, not so much.

Of course, Tesla doesn't tell the regulators that what the cult members paid for was "Full Self Driving"; that might cause legal problems. As Timothy B. Lee reports, Tesla: “Full self-driving beta” isn’t designed for full self-driving:
"Despite the "full self-driving" name, Tesla admitted it doesn't consider the current beta software suitable for fully driverless operation. The company said it wouldn't start testing "true autonomous features" until some unspecified point in the future.
Tesla added that "we do not expect significant enhancements" that would "shift the responsibility for the entire dynamic driving task to the system." The system "will continue to be an SAE Level 2, advanced driver-assistance feature."

SAE level 2 is industry jargon for a driver-assistance systems that perform functions like lane-keeping and adaptive cruise control. By definition, level 2 systems require continual human oversight. Fully driverless systems—like the taxi service Waymo is operating in the Phoenix area—are considered level 4 systems."
There is an urgent need for regulators to step up and stop this dangerous madness:
  • The NHTSA should force Tesla to disable "Full Self Driving" in all its vehicles until the technology has passed an approved test program
  • Any vehicles taking part in such a test program on public roads should be clearly distinguishable from Teslas being driven by actual humans, for example with orange flashing lights. Self-driving test vehicles from less irresponsible companies such as Waymo are distinguishable in this way, Teslas in which some cult member has turned on "Full Self Driving Beta" are not.
  • The FTC should force Tesla to refund, with interest, every dollar paid by their customers under the false pretense that they were paying for "Full Self Driving".

Collaborations Workshop 2021: talks & panel session / Jez Cope

I’ve just finished attending (online) the three days of this year’s SSI Collaborations Workshop (CW for short), and once again it’s been a brilliant experience, as well as mentally exhausting, so I thought I’d better get a summary down while it’s still fresh it my mind.

Collaborations Workshop is, as the name suggests, much more focused on facilitating collaborations than a typical conference, and has settled into a structure that starts off with with longer keynotes and lectures, and progressively gets more interactive culminating with a hack day on the third day.

That’s a lot to write about, so for this post I’ll focus on the talks and panel session, and follow up with another post about the collaborative bits. I’ll also probably need to come back and add in more links to bits and pieces once slides and the “official” summary of the event become available.


2021-04-07 Added links to recordings of keynotes and panel sessions


The first day began with two keynotes on this year’s main themes: FAIR Research Software and Diversity & Inclusion, and day 2 had a great panel session focused on disability. All three were streamed live and the recordings remain available on Youtube:

FAIR Research Software

Dr Michelle Barker, Director of the Research Software Alliance, spoke on the challenges to recognition of software as part of the scholarly record: software is not often cited. The FAIR4RS working group has been set up to investigate and create guidance on how the FAIR Principles for data can be adapted to research software as well; as they stand, the Principles are not ideally suited to software. This work will only be the beginning though, as we will also need metrics, training, career paths and much more. ReSA itself has 3 focus areas: people, policy and infrastructure. If you’re interested in getting more involved in this, you can join the ReSA email list.

Equality, Diversity & Inclusion: how to go about it

Dr Chonnettia Jones, Vice President of Research, Michael Smith Foundation for Health Research spoke extensively and persuasively on the need for Equality, Diversity & Inclusion (EDI) initiatives within research, as there is abundant robust evidence that all research outcomes are improved.

She highlighted the difficulties current approaches to EDI have effecting structural change, and changing not just individual behaviours but the cultures & practices that perpetuate iniquity. What initiatives are often constructed around making up for individual deficits, a bitter framing is to start from an understanding of individuals having equal stature but having different tired experiences. Commenting on the current focus on “research excellent” she pointed out that the hyper-competition this promotes is deeply unhealthy. suggesting instead that true excellence requires diversity, and we should focus on an inclusive excellence driven by inclusive leadership.

Equality, Diversity & Inclusion: disability issues

Day 2’s EDI panel session brought together five disabled academics to discuss the problems of disability in research.

  • Dr Becca Wilson, UKRI Innovation Fellow, Institute of Population Health Science, University of Liverpool (Chair)
  • Phoenix C S Andrews (PhD Student, Information Studies, University of Sheffield and Freelance Writer)
  • Dr Ella Gale (Research Associate and Machine Learning Subject Specialist, School of Chemistry, University of Bristol)
  • Prof Robert Stevens (Professor and Head of Department of Computer Science, University of Manchester)
  • Dr Robin Wilson (Freelance Data Scientist and SSI Fellow)

NB. The discussion flowed quite freely so the following summary, so the following summary mixes up input from all the panel members.

Researchers are often assumed to be single-minded in following their research calling, and aptness for jobs is often partly judged on “time send”, which disadvantages any disabled person who has been forced to take a career break. On top of this disabled people are often time-poor because of the extra time needed to manage their condition, leaving them with less “output” to show for their time served on many common metrics. This can partially affect early-career researchers, since resources for these are often restricted on a “years-since-PhD” criterion. Time poverty also makes funding with short deadlines that much harder to apply for. Employers add more demands right from the start: new starters are typically expected to complete a health and safety form, generally a brief affair that will suddenly become an 80-page bureaucratic nightmare if you tick the box declaring a disability.

Many employers claim to be inclusive yet utterly fail to understand the needs of their disabled staff. Wheelchairs are liberating for those who use them (despite the awful but common phrase “wheelchair-bound”) and yet employers will refuse to insure a wheelchair while travelling for work, classifying it as a “high value personal item” that the owner would take the same responsibility for as an expensive camera. Computers open up the world for blind people in a way that was never possible without them, but it’s not unusual for mandatory training to be inaccessible to screen readers. Some of these barriers can be overcome, but doing so takes yet more time that could and should be spent on more important work.

What can we do about it? Academia works on patronage whether we like it or not, so be the person who supports people who are different to you rather than focusing on the one you “recognise yourself in” to mentor. As a manager, it’s important to ask each individual what they need and believe them: they are the expert in their own condition and their lived experience of it. Don’t assume that because someone else in your organisation with the same disability needs one set of accommodations, it’s invalid for your staff member to require something totally different. And remember: disability is unusual as a protected characteristic in that anyone can acquire it at any time without warning!

Lightning talks

Lightning talk sessions are always tricky to summarise, and while this doesn’t do them justice, here are a few highlights from my notes.

Data & metadata

Learning & teaching/community

Wrapping up

That’s not everything! But this post is getting pretty long so I’ll wrap up for now. I’ll try to follow up soon with a summary of the “collaborative” part of Collaborations Workshop: the idea-generating sessions and hackday!

MarcEdit 7.5 Update / Terry Reese



Preview Changes

One of the most requested features over the years has been the ability to preview changes prior to running them.  As of 7.5.8 – a new preview option has been added to many of the global editing tools in the MarcEditor.  Currently, you will find the preview option attached to the following functions:

  1. Replace All
  2. Add New Field
  3. Delete Field
  4. Edit Subfield
  5. Edit Field
  6. Edit Indicator
  7. Copy Field
  8. Swap Field

Functions that include a preview option will be denoted with the following button:

Add/Delete Field Option -- showing the Preview Button -- a button with a black down arrow

When this button is pressed, the following option is made available

Add/Delete Field -- Black button with an arrow -- shows Preview menu

When Preview Results is selected, the program will execute the defined action, and display the potential results in a display screen.  For example:

Preview Results page -- Grid Results

To protect performance, only 500 results at a time will be loaded into the preview grid, though users can keep adding results to the grid and continue to review items.  Additionally, users have the ability to search for items within the grid as well as jump to a specific record number (not row number). 

These new options will show up first in the windows version of MarcEdit, but will be added to the MarcEdit Mac 3.5.x branch in the coming weeks. 

New JSON => XML Translation

To better support the translation of data from JSON to MARC, I’ve included a JSON => MARC algorithm in the MARCEngine.  This will allow JSON data to serialized into XML.  The benefit of including this option, is that I’ve been able to update the XML Functions options to allow JSON to be a starting format.  This will specifically useful for users that want to make use of linked data vocabularies to generate MARC Authority records.  Users can direct MarcEdit to facilitate the translation from JSON to XML, and then create XSLT translations that can then be used to complete the process to MARCXML and MARC.  I’ve demonstrated how this process works using a vocabulary of interest to the #critcat community, the Homosaurus vocabulary (How do I generate MARC authority records from the Homosaurus vocabulary? – Terry’s Worklog (

OCLC API Interactions

Working with the OCLC API is sometimes tricky.   MarcEdit utilizes a specific authentication process that requires OCLC keys be setup and configured to work a certain way.  When issues come up, it is sometimes very difficult to debug them.  I’ve updated the process and error handling to surface more information – so when problems occur and XML debugging information isn’t available, the actual exception and inner exception data will be surfaced instead.  This often can provide information to help understand why the process isn’t able to complete.

Wrap up

As noted, there have been a number of updates.  While many fall under the category of house-keeping (updating icons, UX improvements, actions, default values, etc.) – this update does include a number of often asked for, significant updates, that I hope will improve user workflows.


How do I generate MARC authority records from the Homosaurus vocabulary? / Terry Reese

Step by step instructions here:

Ok, so last week, I got an interesting question on the listserv where a user asked specifically about generating MARC records for use in one’s ILS system from a JSONLD vocabulary.  In this case, the vocabulary in question as Homosaurus (Homosaurus Vocabulary Site) – and the questioner was specifically looking for a way to pull individual terms for generation into MARC Authority records to add to one’s ILS to improve search and discovery.

When the question was first asked, my immediate thought was that this could likely be accommodated using the XML/JSON profiling wizard in MarcEdit.  This tool can review a sample XML or JSON file and allow a user to create a portable processing file based on the content in the file.  However, there were two issues with this approach:

  1. The profile wizard assumes that data format is static – i.e., the sample file is representative of other files.  Unfortunately, for this vocabulary, that isn’t the case. 
  2. The profile wizard was designed to work with JSON – JSON LD is actually a different animal due to the inclusion of the @ symbol. 

While I updated the Profiler to recognize and work better with JSON-LD – the first challenge is one that doesn’t make this a good fit to create a generic process.  So, I looked at how this could be built into the normal processing options.

To do this, I added a new default serialization, JSON=>XML == which MarcEdit now supports.  This allows the tool to take a JSON file, and deserialize the data so that is output reliably as XML.  So, for example, here is a sample JSON-LD file (

  "@context": {
    "dc": "",
    "skos": "",
    "xsd": ""
  "@id": "",
  "@type": "skos:Concept",
  "dc:identifier": "adoptiveParents",
  "dc:issued": {
    "@value": "2019-05-14",
    "@type": "xsd:date"
  "dc:modified": {
    "@value": "2019-05-14",
    "@type": "xsd:date"
  "skos:broader": {
    "@id": ""
  "skos:hasTopConcept": [
      "@id": ""
      "@id": ""
  "skos:inScheme": {
    "@id": ""
  "skos:prefLabel": "Adoptive parents",
  "skos:related": [
      "@id": ""
      "@id": ""
      "@id": ""
      "@id": ""

In MarcEdit, the new JSON=>XML process can take this file and output it in XML like this:

<?xml version="1.0"?>
        <prefLabel>Adoptive parents</prefLabel>

The ability to reliably convert JSON/JSONLD to XML means that I can now allow users to utilize the same XSLT/XQUERY process MarcEdit utilizes for other library metadata format transformation.  All that was left to make this happen was to add a new origin data format to the XML Function template – and we are off and running.

The end result is users could utilize this process with any JSON-LD vocabulary (assuming they created the XSLT) to facilitate the automation of MARC Authority data.  In this case of this vocabulary, I’ve created an XSLT and added it to my github space:

but have included the XSLT in the MarcEdit XSLT directory in current downloads.

In order to use this XSLT and allow your version of MarcEdit to generate MARC Authority records from this vocabulary – you would use the following steps:

  1. Be using MarcEdit 7.5.8+ or MarcEdit Mac 3.5.8+ (Mac version will be available around 4/8).  I have not decided if I will backport to 7.3-
  2. Open the XML Functions Editor in MarcEdit
  3. Add a new Transformation – using JSON as the original format, and MARC as the final.  Make sure the XSLT path is pointed to the location where you saved the downloaded XSLT file.
  4. Save

That should be pretty much it.  I’ve recorded the steps and placed them here:, including some information on values you may wish to edit should you want to localize the XSLT. 

Publishers going-it-alone (for now?) with GetFTR / Peter Murray

In early December 2019, a group of publishers announced Get-Full-Text-Research, or GetFTR for short. I read about this first in Roger Schonfeld’s “Publishers Announce a Major New Service to Plug Leakage” piece in The Scholarly Kitchen via Jeff Pooley’s Twitter thread and blog post. Details about how this works are thin, so I’m leaning heavily on Roger’s description. I’m not as negative about this as Jeff, and I’m probably a little more opinionated than Roger. This is an interesting move by publishers, and—as the title of this post suggests—I am critical of the publisher’s “go-it-alone” approach.

First, some disclosure might be in order. My background has me thinking of this in the context of how it impacts libraries and library consortia. For the past four years, I’ve been co-chair of the NISO Information Discovery and Interchange topic committee (and its predecessor, the “Discovery to Delivery” topic committee), so this is squarely in what I’ve been thinking about in the broader library-publisher professional space. I also traced the early development of RA21 and more recently am volunteering on the SeamlessAccess Entity Category and Attribute Bundles Working Group; that’ll become more important a little further down this post.

I was nodding along with Roger’s narrative until I stopped short here:

The five major publishing houses that are the driving forces behind GetFTR are not pursuing this initiative through one of the major industry collaborative bodies. All five are leading members of the STM Association, NISO, ORCID, Crossref, and CHORUS, to name several major industry groups. But rather than working through one of these existing groups, the houses plan instead to launch a new legal entity. 

While [Vice President of Product Strategy & Partnerships for Wiley Todd] Toler and [Senior Director, Technology Strategy & Partnerships for the American Chemical Society Ralph] Youngen were too politic to go deeply into the details of why this might be, it is clear that the leadership of the large houses have felt a major sense of mismatch between their business priorities on the one hand and the capabilities of these existing industry bodies. At recent industry events, publishing house CEOs have voiced extensive concerns about the lack of cooperation-driven innovation in the sector. For example, Judy Verses from Wiley spoke to this issue in spring 2018, and several executives did so at Frankfurt this fall. In both cases, long standing members of the scholarly publishing sector questioned if these executives perhaps did not realize the extensive collaborations driven through Crossref and ORCID, among others. It is now clear to me that the issue is not a lack of knowledge but rather a concern at the executive level about the perceived inability of existing collaborative vehicles to enable the new strategic directions that publishers feel they must pursue. 

This is the publishers going-it-alone. To see Roger describe it, they are going to create this web service that allows publishers to determine the appropriate copy for a patron and do it without input from the libraries. Librarians will just be expected to put this web service widget into their discovery services to get “colored buttons indicating that the link will take [patrons] to the version of record, an alternative pathway, or (presumably in rare cases) no access at all.” (Let’s set aside for the moment the privacy implications of having a fourth-party web service recording all of the individual articles that come up in a patron’s search results.) Librarians will not get to decide the “alternative pathway” that is appropriate for the patron: “Some publishers might choose to provide access to a preprint or a read-only version, perhaps in some cases on some kind of metered basis.” (Roger goes on to say that he “expect[s] publishers will typically enable some alternative version for their content, in which case the vast majority of scholarly content will be freely available through publishers even if it is not open access in terms of licensing.” I’m not so confident.)

No, thank you. If publishers want to engage in technical work to enable libraries and others to build web services that determine the direct link to an article based on a DOI, then great. Libraries can build a tool that consumes that information as well as takes into account information about preprint services, open access versions, interlibrary loan and other methods of access. But to ask libraries to accept this publisher-controlled access button in their discovery layers, their learning management systems, their scholarly profile services, and their other tools? That sounds destined for disappointment.

I am only somewhat encouraged by the fact that RA21 started out as a small, isolated collaboration of publishers before they brought in NISO and invited libraries to join the discussion. Did it mean that it slowed down deployment of RA21? Undoubtedly yes. Did persnickety librarians demand transparent discussions and decisions about privacy-related concerns like what attributes the publisher would get about the patron in the Shibboleth-powered backchannel? Yes, but because the patrons weren’t there to advocate for themselves. Will it likely mean wider adoption? I’d like to think so.

Have publishers learned that forcing these kinds of technologies onto users without consultation is a bad idea? At the moment it would appear not. Some of what publishers are seeking with GetFTR can be implemented with straight-up OpenURL or—at the very least—limited-scope additions to OpenURL (the Z39.88 open standard!). So that they didn’t start with OpenURL, a robust existing standard, is both concerning and annoying. I’ll be watching and listening for points of engagement, so I remain hopeful.

A few words about Jeff Pooley’s five-step “laughably creaky and friction-filled effort” that is SeamlessAccess. Many of the steps Jeff describes are invisible and well-established technical protocols. What Jeff fails to take into account is the very visible and friction-filled effect of patrons accessing content beyond the boundaries of campus-recognized internet network addresses. Those patrons get stopped at step two with a “pay $35 please” message. I’m all for removing that barrier entirely by making all published content “open access”. It is folly to think, though, that researchers and readers can enforce an open access business model on all publishers, so solutions like SeamlessAccess will have a place. (Which is to say nothing of the benefit of inter-institutional resource collaboration opened up by a more widely deployed Shibboleth infrastructure powered by SeamlessAccess.)

What is known about GetFTR at the end of 2019 / Peter Murray

In early December 2019, a group of publishers announced Get-Full-Text-Research, or GetFTR for short. There was a heck of a response on social media, and the response was—on the whole—not positive from my librarian-dominated corner of Twitter. For my early take on GetFTR, see my December 3rd blog post “Publishers going-it-alone (for now?) with GetFTR.” As that post title suggests, I took the five founding GetFTR publishers to task on their take-it-or-leave-it approach. I think that is still a problem. To get you caught up, here is a list of other commentary.

If you are looking for a short list of what to look at, I recommend these posts.

GetFTR’s Community Update

On December 11—after the two posts I list below—an “Updating the Community” web page was posted to the GetFTR website. From a public relations perspective, it was…interesting.

We are committed to being open and transparent

This section goes on to say, “If the community feels we need to add librarians to our advisory group we will certainly do so and we will explore ways to ensure we engage with as many of our librarian stakeholders as possible.” If the GetFTR leadership didn’t get the indication between December 3 and December 12 that librarians feel strongly about being at the table, then I don’t know what will. And it isn’t about being on the advisory group; it is about being seen and appreciated as important stakeholders in the research discovery process. I’m not sure who the “community” is in this section, but it is clear that librarians are—at best—an afterthought. That is not the kind of “open and transparent” that is welcoming.

Later on in the Questions about library link resolvers section is this sentence:

We have, or are planning to, consult with existing library advisory boards that participating publishers have, as this enables us to gather views from a significant number of librarians from all over the globe, at a range of different institutions.

As I said in my previous post, I don’t know why GetFTR is not engaging in existing cross-community (publisher/technology-supplier/library) organizations to have this discussion. It feels intentional, which colors the perception of what the publishers are trying to accomplish. To be honest, I don’t think the publishers are using GetFTR to drive a wedge between library technology service providers (who are needed to make GetFTR a reality for libraries) and libraries themselves. But I can see how that interpretation could be made.

Understandably, we have been asked about privacy.

I punted on privacy in my previous post, so let’s talk about it here. It remains to be seen what is included in the GetFTR API request between the browser and the publisher site. Sure, it needs to include the DOI and a token that identifies the patron’s institution. We can inspect that API request to ensure nothing else is included. But the fact that the design of GetFTR has the browser making the call to the publisher site means that the publisher site knows the IP address of the patron’s browser, and the IP address can be considered personally identifiable information. This issue could be fixed by having the link resolver or the discovery layer software make the API request, and according to the Questions about library link resolvers section of the community update, this may be under consideration.

So, yes, an auditable privacy policy and implementation is key for for GetFTR.

GetFTR is fully committed to supporting third-party aggregators

This is good to hear. I would love to see more information published about this, including how discipline-specific repositories and institutional repositories can have their holdings represented in GetFTR responses.

My Take-a-ways

In the second to last paragraph: “Researchers should have easy, seamless pathways to research, on whatever platform they are using, wherever they are.” That is a statement that I think every library could sign onto. This Updating the Community is a good start, but the project has dug a deep hole of trust and it hasn’t reached level ground yet.

Lisa Janicke Hinchliffe’s “Why are Librarians Concerned about GetFTR?”

Posted on December 10th in The Scholarly Kitchen, Lisa outlines a series of concerns from a librarian perspective. I agree with some of these; others are not an issue in my opinion.

Librarian Concern: The Connection to Seamless Access

Many librarians have expressed a concern about how patron information can leak to the publisher through ill-considered settings at an institution’s identity provider. Seamless Access can ease access control because it leverages a campus’ single sign-on solution—something that a library patron is likely to be familiar with. If the institution’s identity provider is overly permissive in the attributes about a patron that get transmitted to the publisher, then there is a serious risk of tying a user’s research activity to their identity and the bad things that come from that (patrons self-censoring their research paths, commoditization of patron activity, etc.). I’m serving on a Seamless Access task force that is addressing this issue, and I think there are technical, policy, and education solutions to this concern. In particular, I think some sort of intermediate display of the attributes being transmitted to the publisher is most appropriate.

Librarian Concern: The Limited User Base Enabled

As Lisa points out, the population of institutions that can take advantage of Seamless Access, a prerequisite for GetFTR, is very small and weighted heavily towards well-resourced institutions. To the extent that projects like Seamless Access (spurred on by a desire to have GetFTR-like functionality) helps with the adoption of SAML-based infrastructure like Shibboleth, then the whole academic community benefits from a shared authentication/identity layer that can be assumed to exist.

Librarian Concern: The Insertion of New Stumbling Blocks

Of the issues Lisa mentioned here, I’m not concerned about users being redirected to their campus single sign-on system in multiple browsers on multiple machines. This is something we should be training users about—there is a single website to put your username/password into for whatever you are accessing at the institution. That a user might already be logged into the institution single sign-on system in the course of doing other school work and never see a logon screen is an attractive benefit to this system.

That said, it would be useful for an API call from a library’s discovery layer to a publisher’s GetFTR endpoint to be able to say, “This is my user. Trust me when I say that they are from this institution.” If that were possible, then the Seamless Access Where-Are-You-From service could be bypassed for the GetFTR purpose of determining whether a user’s institution has access to an article on the publisher’s site. It would sure be nice if librarians were involved in the specification of the underlying protocols early on so these use cases could be offered.


Lisa reached out on Twitter to say (in part): “Issue is GetFTR doesn’t redirect and SA doesnt when you are IPauthenticated. Hence user ends up w mishmash of experience.” I went back to read her Scholarly Kitchen post and realized I did not fully understand her point. If GetFTR is relying on a Seamless Access token to know which institution a user is coming from, then that token must get into the user’s browser. The details we have seen about GetFTR don’t address how that Seamless Access institution token is put in the user’s browser if the user has not been to the Seamless Access select-your-institution portal. One such case is when the user is coming from an IP-address-authenticated computer on a campus network. Do the GetFTR indicators appear even when the Seamless Access institution token is not stored in the browser? If at the publisher site the GetFTR response also uses the institution IP address table to determine entitlements, what does a user see when they have neither the Seamless Access institution token nor the institution IP address? And, to Lisa’s point, how does one explain this disparity to users? Is the situation better if the GetFTR determination is made in the link resolver rather than in the user browser?

Librarian Concern: Exclusion from Advisory Committee

See previous paragraph. That librarians are not at the table offering use cases and technical advice means that the developers are likely closing off options that meet library needs. Addressing those needs would ease the acceptance of the GetFTR project as mutually beneficial. So an emphatic “AGREE!” with Lisa on her points in this section. Publishers—what were you thinking?

Libraries and library technology companies are making significant investments in tools that ease the path from discovery to delivery. Would the library’s link resolver benefit from a real-time API call to a publisher’s service that determines the direct URL to a specific DOI? Oh, yes—that would be mighty beneficial. The library could put that link right at the top of a series of options that include a link to a version of the article in a Green Open Access repository, redirection to a content aggregator, one-click access to an interlibrary-loan form, or even an option where the library purchases a copy of the article on behalf of the patron. (More likely, the link resolver would take the patron right to the article URL supplied by GetFTR, but the library link resolver needs to be in the loop to be able to offer the other options.)

My Take-a-ways

The patron is affiliated with the institution, and the institution (through the library) is subscribing to services from the publisher. The institution’s library knows best what options are available to the patron (see above section). Want to know why librarians are concerned? Because they are inserting themselves as the arbiter of access to content, whether it is in the patron’s best interest or not. It is also useful to reinforce Lisa’s closing paragraph:

Whether GetFTR will act to remediate these concerns remains to be seen. In some cases, I would expect that they will. In others, they may not. Publishers’ interests are not always aligned with library interests and they may accept a fraying relationship with the library community as the price to pay to pursue their strategic goals.

Ian Mulvany’s “thoughts on GetFTR”

Ian’s entire post from December 11th in ScholCommsProd is worth reading. I think it is an insightful look at the technology and its implications. Here are some specific comments:

Clarifying the relation between SeamlessAccess and GetFTR

There are a couple of things that I disagree with:

OK, so what is the difference, for the user, between seamlessaccess and GetFTR? I think that the difference is the following - with seamless access you the user have to log in to the publisher site. With GetFTR if you are providing pages that contain DOIs (like on a discovery service) to your researchers, you can give them links they can click on that have been setup to get those users direct access to the content. That means as a researcher, so long as the discovery service has you as an authenticated user, you don’t need to even think about logins, or publisher access credentials.

To the best of my understanding, this is incorrect. With SeamlessAccess, the user is not “logging into the publisher site.” If the publisher site doesn’t know who a user is, the user is bounced back to their institution’s single sign-on service to authenticate. If the publisher site doesn’t know where a user is from, it invokes the SeamlessAccess Where-Are-You-From service to learn which institution’s single sign-on service is appropriate for the user. If a user follows a GetFTR-supplied link to a publisher site but the user doesn’t have the necessary authentication token from the institution’s single sign-on service, then they will be bounced back for the username/password and redirected to the publisher’s site. GetFTR signaling that an institution is entitled to view an article does not mean the user can get it without proving that they are a member of the institution.

What does this mean for Green Open Access

A key point that Ian raises is this:

One example of how this could suck, lets imagine that there is a very usable green OA version of an article, but the publisher wants to push me to using some “e-reader limited functionality version” that requires an account registration, or god forbid a browser exertion, or desktop app. If the publisher shows only this limited utility version, and not the green version, well that sucks.

Oh, yeah…that does suck, and it is because the library—not the publisher of record—is better positioned to know what is best for a particular user.

Will GetFTR be adopted?

Ian asks, “Will google scholar implement this, will other discovery services do so?” I do wonder if GetFTR is big enough to attract the attention of Google Scholar and Microsoft Research. My gut tells me “no”: I don’t think Google and Microsoft are going to add GetFTR buttons to their search results screens unless they are paid a lot. As for Google Scholar, it is more likely that Google would build something like GetFTR to get the analytics rather than rely on a publisher’s version.

I’m even more doubtful that the companies pushing GetFTR can convince discovery layers makers to embed GetFTR into their software. Since the two widely adopted discovery layers (in North America, at least) are also aggregators of journal content, I don’t see the discovery-layer/aggregator companies devaluing their product by actively pushing users off their site.

My Take-a-ways

It is also useful to reinforce Ian’s closing paragraph:

I have two other recommendations for the GetFTR team. Both relate to building trust. First up, don’t list orgs as being on an advisory board, when they are not. Secondly it would be great to learn about the team behind the creation of the Service. At the moment its all very anonymous.

Where Do We Stand?

Wow, I didn’t set out to write 2,500 words on this topic. At the start I was just taking some time to review everything that happened since this was announced at the start of December and see what sense I could make of it. It turned into a literature review of sort.

While GetFTR has some powerful backers, it also has some pretty big blockers:

  • Can GetFTR help spur adoption of Seamless Access enough to convince big and small institutions to invest in identity provider infrastructure and single sign-on systems?
  • Will GetFTR grab the interest of Google, Google Scholar, and Microsoft Research (where admittedly a lot of article discovery is already happening)?
  • Will developers of discovery layers and link resolvers prioritize GetFTR implementation in their services?
  • Will libraries find enough value in GetFTR to enable it in their discovery layers and link resolvers?
  • Would libraries argue against GetFTR in learning management systems, faculty profile systems, and other campus systems if its own services cannot be included in GetFTR displays?

I don’t know, but I think it is up to the principles behind GetFTR to make more inclusive decisions. The next steps is theirs.

Managing Remote Conference Presenters with Zoom / Peter Murray

Bringing remote presenters into a face-to-face conference is challenging and fraught with peril. In this post, I describe a scheme using Zoom that had in-person attendees forgetting that the presenter was remote!

The Code4Lib conference was this week, and with the COVID-19 pandemic breaking through many individuals and institutions made decisions to not travel to Pittsburgh for the meeting. We had an unprecedented nine presentations that were brought into the conference via Zoom. I was chairing the livestream committee for the conference (as I have done for several years—skipping last year), so it made the most sense for me to arrange a scheme for remote presenters. With the help of the on-site A/V contractor, we were able to pull this off with minimal requirements for the remote presenter.

List of Requirements

  • 2 Zoom Pro accounts
  • 1 PC/Mac with video output, as if you were connecting an external monitor (the “Receiving Zoom” computer)
  • 1 PC/Mac (the “Coordinator Zoom” computer)
  • 1 USB audio interface
  • Hardwired network connection for the Receiving Zoom computer (recommended)

The Pro-level Zoom accounts were required because we needed to run a group call for longer than 40 minutes (to include setup time). And two were needed: one for the Coordinator Zoom machine and one for the dedicated Receiving Zoom machine. It would have been possible to consolidate the two Zoom Pro accounts and the two PC/Mac machines into one, but we had back-to-back presenters at Code4Lib, and I wanted to be able to help one remote presenter get ready while another was presenting.

In addition to this equipment, the A/V contractor was indispensable in making the connection work. We fed the remote presenter’s video and audio from the Receiving Zoom computer to the contractor’s A/V switch through HDMI, and the contractor put the video on the ballroom projectors and audio through the ballroom speakers. The contractor gave us a selective audio feed of the program audio minus the remote presenter’s audio (so they wouldn’t hear themselves come back through the Zoom meeting). This becomes a little clearer in the diagram below.

Physical Connections and Setup

This diagram shows the physical connections between machines.

Diagram of parts

The Audio Mixer and Video Switch were provided and run by the A/V contractor. The Receiving Zoom machine was the one that is connected to the A/V contractor’s Video Switch via an HDMI cable coming off the computer’s external monitor connection. In the Receiving Zoom computer’s control panel, we set the external monitor to mirror what was on the main monitor. The audio and video from the computer (i.e., the Zoom call) went out the HDMI cable to the A/V contractor’s Video Switch. The A/V contractor took the audio from the Receiving Zoom computer through the Video Switch and added it to the Audio Mixer as an input channel. From there, the audio was sent out to the ballroom speakers the same way audio from the podium microphone was amplified to the audience. We asked the A/V contractor to create an audio mix that includes all of the audio sources except the Receiving Zoom computer (e.g., in-room microphones) and plugged that into the USB Audio interface. That way, the remote presenter could hear the sounds from the ballroom—ambient laughter, questions from the audience, etc.—in their Zoom call. (Note that it was important to remove the remote presenter’s own speaking voice from this audio mix; there was a significant, distracting delay between the time the presenter spoke and the audio was returned to them through the Zoom call.)

We used a hardwired network connection to the internet, and I would recommend that—particularly with tech-heavy conferences that might overflow the venue wi-fi. (You don’t want your remote presenter’s Zoom to have to compete with what attendees are doing.) Be aware that the hardwired network connection will cost more from the venue, and may take some time to get functioning since this doesn’t seem to be something that hotels often do.

In the Zoom meeting, we unmuted the microphone and selected the USB Audio interface as the microphone input. As the Zoom meeting was connected, we made the meeting window full-screen so the remote presenter’s face and/or presentation were at the maximum size on the ballroom projectors.

Setting Up the Zoom Meetings

The two Zoom accounts came from the Open Library Foundation. (Thank you!) As mentioned in the requirements section above, these were Pro-level accounts. The two accounts were and The olf_host2 account was used for the Receiving Zoom computer, and the olf_host3 account was used for the Coordinator Zoom computer. The Zoom meeting edit page looked like this:

Screen capture of the Zoom meeting edit page

This is for the “Code4Lib 2020 Remote Presenter A” meeting with the primary host as Note these settings:

  • A recurring meeting that ran from 8:00am to 6:00pm each day of the conference.
  • Enable join before host is checked in case the remote presenter got on the meeting before I did.
  • Record the meeting automatically in the cloud to use as a backup in case something goes wrong.
  • Alternative Hosts is

The “Code4Lib 2020 Remote Presenter B” meeting was exactly the same except the primary host was olf_host3, and olf_host2 was added as an alternative host. The meetings were set up with each other as the alternative host so that the Coordinator Zoom computer could start the meeting, seamlessly hand it off to the Receiving Zoom computer, then disconnect.

Preparing the Remote Presenter

Remote presenters were given this information:

Code4Lib will be using Zoom for remote presenters. In addition to the software, having the proper audio setup is vital for a successful presentation.

  • Microphone: The best option is a headset or earbuds so a microphone is close to your mouth. Built-in laptop microphones are okay, but using them will make it harder for the audience to hear you.
  • Speaker: A headset or earbuds are required. Do not use your computer’s built-in speakers. The echo cancellation software is designed for small rooms and cannot handle the delay caused by large ballrooms.

You can test your setup with a test Zoom call. Be sure your microphone and speakers are set correctly in Zoom. Also, try sharing your screen on the test call so you understand how to start and stop screen sharing. The audience will see everything on your screen, so quit/disable/turn-off notifications that come from chat programs, email clients, and similar tools.

Plan to connect to the Zoom meeting 30 minutes before your talk to work out any connection or setup issues.

At the 30-minute mark before the remote presentation, I went to the ballroom lobby and connected to the designated Zoom meeting for the remote presenter using the Coordinator Zoom computer. I used this checklist with each presenter:

  • Check presenter’s microphone level and sound quality (make sure headset/earbud microphone is being used!)
  • Check presenter’s speakers and ensure there is no echo
  • Test screen-sharing (start and stop) with presenter
  • Remind presenter to turn off notifications from chat programs, email clients, etc.
  • Remind the presenter that they need to keep track of their own time; there is no way for us to give them cues about timing other than interrupting them when their time is up

The critical item was making sure the audio worked (that their computer was set to use the headset/earbud microphone and audio output). The result was excellent sound quality for the audience.

When the remote presenter was set on the Zoom meeting, I returned to the A/V table and asked a livestream helper to connect the Receiving Zoom to the remote presenter’s Zoom meeting. At this point, the remote presenter can hear the audio in the ballroom of the speaker before them coming through the Receiving Zoom computer. Now I would lock the Zoom meeting to prevent others from joining and interrupting the presenter (from the Zoom Participants panel, select More then Lock Meeting). I hung out on the remote presenter’s meeting on the Coordinator Zoom computer in case they had any last-minute questions. As the speaker in the ballroom was finishing up, I wished the remote presenter well and disconnected the Coordinator Zoom computer from the meeting. (I always selected Leave Meeting rather than End Meeting for All so that the Zoom meeting continued with the remote presenter and the Receiving Zoom computer.)

As the remote presenter was being introduced—and the speaker would know because they could hear it in their Zoom meeting—the A/V contractor switched the video source for the ballroom projectors to the Receiving Zoom computer and unmuted the Receiving Zoom computer’s channel on the Audio Mixer. At this point, the remote speaker is off-and-running!

Last Thoughts

This worked really well. Surprisingly well. So well that I had a few people comment that they were taken aback when they realized that there was no one standing at the podium during the presentation.

I’m glad I had set up the two Zoom meetings. We had two cases where remote presenters were back-to-back. I was able to get the first remote presenter set up and ready on one Zoom meeting while preparing the second remote presenter on the other Zoom meeting. The most stressful part was at the point when we disconnected the first presenter’s Zoom meeting and quickly connected to the second presenter’s Zoom meeting. This was slightly awkward for the second remote presenter because they didn’t hear their full introduction as it happened and had to jump right into their presentation. This could be solved by setting up a second Receiving Zoom computer, but this added complexity seemed to be too much for the benefit gained.

I would definitely recommend making this setup a part of the typical A/V preparations for future Code4Lib conferences. We don’t know when an individual’s circumstances (much less a worldwide pandemic) might cause a last-minute request for a remote presentation capability, and the overhead of the setup is pretty minimal.

Tethering a Ubiquity Network to a Mobile Hotspot / Peter Murray

I saw it happen.

The cable-chewing device

The contractor in the neighbor’s back yard with the Ditch Witch trencher burying a cable. I was working outside at the patio table and just about to go into a Zoom meeting. Then the internet dropped out. Suddenly, and with a wrenching feeling in my gut, I remembered where the feed line was buried between the house and the cable company’s pedestal in the right-of-way between the properties. Yup, he had just cut it.

To be fair, the utility locator service did not mark the my cable’s location, and he was working for a different cable provider than the one we use. (There are three providers in our neighborhood.) It did mean, though, that our broadband internet would be out until my provider could come and run another line. It took an hour of moping about the situation to figure out a solution, then another couple of hours to put it in place: an iPhone tethered to a Raspberry Pi that acted as a network bridge to my home network’s UniFi Security Gateway 3P.

Network diagram with tethered iPhone

A few years ago I was tired of dealing with spotty consumer internet routers and upgraded the house to UniFi gear from Ubiquity. Rob Pickering, a college comrade, had written about his experience with the gear and I was impressed. It wasn’t a cheap upgrade, but it was well worth it. (Especially now with four people in the household working and schooling from home during the COVID-19 outbreak.) The UniFi Security Gateway has three network ports, and I was using two: one for the uplink to my cable internet provider (WAN) and one for the local area network (LAN) in the house. The third port can be configured as another WAN uplink or as another LAN port. And you can tell the Security Gateway to use the second WAN as a failover for the first WAN (or as load balancing the first WAN). So that is straight forward enough, but do I get the Personal Hotspot on the iPhone to the second WAN port? That is where the Raspberry Pi comes in.

The Raspberry Pi is a small computer with USB, ethernet, HDMI, and audio ports. The version I had laying around is a Raspberry Pi 2—an older model, but plenty powerful enough to be the network bridge between the iPhone and the home network. The toughest part was bootstrapping the operating system packages onto the Pi with only the iPhone Personal Hotspot as the network. That is what I’m documenting here for future reference.

Bootstrapping the Raspberry Pi

The Raspberry Pi runs its own operating system called Raspbian (a Debian/Linux derivative) as well as more mainstream operating systems. I chose to use the Ubuntu Server for Raspberry Pi instead of Raspbian because I’m more familiar with Ubuntu. I tethered my MacBook Pro to the iPhone to download the Ubuntu 18.04.4 LTS image and follow the instructions for copying that disk image to the Pi’s microSD card. That allows me to boot the Pi with Ubuntu and a basic set of operating system packages.

The Challenge: Getting the required networking packages onto the Pi

It would have been really nice to plug the iPhone into the Pi with a USB-Lightning cable and have it find the tethered network. That doesn’t work, though. Ubuntu needs at least the usbmuxd package in order to see the tethered iPhone as a network device. That package isn’t a part of the disk image download. And of course I can’t plug my Pi into the home network to download it (see first paragraph of this post).

My only choice was to tether the Pi to the iPhone over WiFi with a USB network adapter. And that was a bit of Ubuntu voodoo. Fortunately, I found instructions on configuring Ubuntu to use a WPA-protected wireless network (like the one that the iPhone Personal Hotspot is providing). In brief:

sudo -i
cd /root
wpa_passphrase my_ssid my_ssid_passphrase > wpa.conf
screen -q
wpa_supplicant -Dwext -iwlan0 -c/root/wpa.conf
<control-a> c
dhclient -r
dhclient wlan0

Explanation of lines:

  1. Use sudo to get a root shell
  2. Change directory to root’s home
  3. Use the wpa_passphrase command to create a wpa.conf file. Replace my_ssid with the wireless network name provided by the iPhone (your iPhone’s name) and my_ssid_passphrase with the wireless network passphrase (see the “Wi-Fi Password” field in Settings -> Personal Hotspot).
  4. Start the screen program (quietly) so we can have multiple pseudo terminals.
  5. Run the wpa_supplicant command to connect to the iPhone wifi hotspot. We run this the foreground so we can see the status/error messages; this program must continue running to stay connected to the wifi network.
  6. Use the screen hotkey to create a new pseudo terminal. This is control-a followed by a letter c.
  7. Use dhclient to clear out any DHCP network parameters
  8. Use dhclient to get an IP address from the iPhone over the wireless network.

Now I was at the point where I could install Ubuntu packages. (I ran ping to verify network connectivity.) To install the usbmuxd and network bridge packages (and their prerequisites):

apt-get install usbmuxd bridge-utils

If your experience is like mine, you’ll get an error back:

couldn't get lock /var/lib/dpkg/lock-frontend

The Ubuntu Pi machine is now on the network, and the automatic process to install security updates is running. That locks the Ubuntu package registry until it finishes. That took about 30 minutes for me. (I imagine this varies based on the capacity of your tethered network and the number of security updates that need to be downloaded.) I monitored the progress of the automated process with the htop command and tried the apt-get command when it finished. If you are following along, now would be a good time to skip ahead to Configuring the UniFi Security Gateway if you haven’t already set that up.

Turning the Raspberry Pi into a Network Bridge

With all of the software packages installed, I restarted the Pi to complete the update: shutdown -r now While it was rebooting, I pulled out the USB wireless adapter from the Pi and plugged in the iPhone’s USB cable. The Pi now saw the iPhone as eth1, but the network did not start until I went to the iPhone to say that I “Trust” the computer that it is plugged into. When I did that, I ran these commands on the Ubuntu Pi:

dhclient eth1
brctl addbr iphonetether
brctl addif iphonetether eth0 eth1
brctl stp iphonetether on
ifconfig iphonetether up

Explanation of lines:

  1. Get an IP address from the iPhone over the USB interface
  2. Add a network bridge (the iphonetether is an arbitrary string; some instructions simply use br0 for the zero-ith bridge)
  3. Add the two ethernet interfaces to the network bridge
  4. Turn on the Spanning Tree Protocol (I don’t think this is actually necessary, but it does no harm)
  5. Bring up the bridge interface

The bridge is now live! Thanks to Amitkumar Pal for the hints about using the Pi as a network bridge. More details about the bridge networking software is on the Debian Wiki.

Note! I'm using a hardwired keyboard/monitor to set up the Raspbery Pi. I've heard from someone that was using SSH to run these commands, and the SSH connection would break off at brctl addif iphonetecther eth0 eth1

Configuring the UniFi Security Gateway

I have a UniFi Cloud Key, so I could change the configuration of the UniFi network with a browser. (You’ll need to know the IP address of the Cloud Key; hopefully you have that somewhere.) I connected to my Cloud Key at and clicked through the self-signed certificate warning.

First I set up a second Wide Area Network (WAN—your uplink to the internet) for the iPhone Personal Hotspot: Settings -> Internet -> WAN Networks. Select “Create a New Network”:

  • Network Name: Backup WAN
  • IPV4 Connection Type: Use DHCP
  • IPv6 Connection Types: Use DHCPv6
  • DNS Server: and (CloudFlare’s DNS servers)
  • Load Balancing: Failover only

The last selection is key…I wanted the gateway to only use this WAN interfaces as a backup to the main broadband interface. If the broadband comes back up, I want to stop using the tethered iPhone!

Second, assign the Backup WAN to the LAN2/WAN2 port on the Security Gateway (Devices -> Gateway -> Ports -> Configure interfaces):

  • Port WAN2/LAN2 Network: WAN2
  • Speed/Duplex: Autonegotiate

Apply the changes to provision the Security Gateway. After about 45 seconds, the Security Gateway failed over from “WAN iface eth0” (my broadband connection) to “WAN iface eth2” (my tethered iPhone through the Pi bridge). These showed up as alerts in the UniFi interface.

Performance and Results

So I’m pretty happy with this setup. The family has been running simultaneous Zoom calls and web browsing on the home network, and the performance has been mostly normal. Web pages do take a little longer to load, but whatever Zoom is using to dynamically adjust its bandwidth usage is doing quite well. This is chewing through the mobile data quota pretty fast, so it isn’t something I want to do every day. Knowing that this is possible, though, is a big relief. As a bonus, the iPhone is staying charged via the 1 amp power coming through the Pi.

Reflections on “Responsibilities of Citizenship for Immigrants and our Daughter” / Peter Murray

Eighteen years ago, on Friday, September 7th, 2001, I was honored to be asked to participate in a naturalization ceremony for 46 new citizens of the United States in a courtroom of Judge Alvin Thompson in Hartford, Connecticut. I published those remarks on a website that has long since gone dormant. In light of the politics of the day, I was thinking back to that ceremony and what it meant to me to participate. I regret the corny reference to Star Trek, but I regret nothing else I said on that day. I titled the remarks “Responsibilities of Citizenship for Immigrants and our Daughter”.

Good afternoon. I’m honored to be here as you take your final step to become a citizen of the United States of America. My wife Celeste, who will soon give birth to another new American citizen, is here to celebrate this joyous occasion with you. And if you’ll pardon the musings of a proud soon-to-be father, I would like to share some thoughts about citizenship inspired by this ceremony and the impending arrival of our first child.

Our daughter will be a citizen by birth, but you have made a choice to become an American. This choice may or may not have been easy for you, but I have the utmost respect for you for making that choice.

I don’t know what compelled you to submit yourself to the naturalization process – perhaps economic, political, social, or religious reasons. I have to think that you did it to better your life and the lives of your family. But you should know that the process does not stop here.

Along with the rights of citizenship come the responsibilities expected of you. Perhaps you are more aware of these responsibilities than I given your choice to become a citizen, but please allow me to enumerate some of them. Exercise your right to be heard on matters of concern to you. Vote in every election that you can. When asked to do so, eagerly perform your duty as a member of a jury. Watch what is happening around you, and form your own opinions. Practice your religion and respect the right of others to do the same. These are the values we will try to instill in our daughter; I hope you take them to heart, instill them in your family members, and inspire your fellow citizens to do the same.

But as you take this final, formal step of citizenship, be aware that becoming an American does not mean you have to leave your native culture behind. A part of American culture is the 1960’s show Star Trek, which promoted the concept of IDIC: Infinite Diversity in Infinite Combinations. In that futuristic world, diverse cultures and ideas are respected with the realization that society is stronger because of them. While we cannot claim to have reached that ideal world, one can say that the American Dream is best realized when our diversity is celebrated and shared by the members of this country. My daughter will be the celebration of that diversity: the product of Irish, German, Polish, and English immigrants. By adding your own history and experiences to the fabric of our country, you make America stronger. In addition to all of the formal responsibilities asked of you as a new citizen, I charge you to share with your fellow citizens that which makes you unique.

Our past honored citizens fought hard to make this country what it is today. As they showed courage, we too must be prepared to show courage. As they endured pain, we too must be prepared to make sacrifices for the good of our nation. Like them, we too must strive for liberty and justice for all. As Americans, we are all filled with these hopes and dreams.

On behalf of my wife and our daughter soon to be born, and my parents, brother, and sister, Celeste’s parents, two sisters and their families, and on behalf of the people of Hartford, the State of Connecticut, and the citizens of all 50 states, I congratulate you on your new role as citizens of the United States of America. Please use the power that is now vested in you to advance the cause of hope and opportunity and diversity. I invite you to be active participants in the next chapter of America’s history of progress toward the goals of freedom and equality for all.

Four days later—September 11, 2001—the trajectory of the lives of the people in that courtroom would change. We couldn’t know how much they would change. We still don’t know how much they will change.

To these newly naturalized citizens, I spoke of beliefs that I thought were universally American. They were the beliefs that I grew up with…that were infused in me by my parents and the communities I lived in.

Did I grow up in a bubble? Have there always been fellow citizens around me that wanted to block other people from coming to this country and throw out anyone that didn’t look like them? Were there always cruel agents of the government that thought it reasonable to lock fellow humans in cages, to separate children from caregiving adults, to single out people of another race for extraordinary scrutiny, and seem to find joy in doing so?

I’m now struggling with these questions. I’m struggling to understand how the election of a person to lead our country has been the focusing lens for division. (Trump? Obama?) I struggle to comprehend the toxic mix of willful ignorance and arrogance of cultures has come to shape the way we look at each other, the way we hear each other, and the way we speak to each other. I want to believe there are common threads of humanity weaving around and between citizens and visitors of America—threads that bind us tight enough to work towards shared purposes and loose enough to allow for individual character.

I speak and I listen. I struggle and I believe. I have to…for my daughter, her brother that followed, and for the 46 new citizens I welcomed 18 years ago.

As a Cog in the Election System: Reflections on My Role as a Precinct Election Official / Peter Murray

I may nod off several times in composing this post the day after election day. Hopefully, in reading it, you won’t. It is a story about one corner of democracy. It is a journal entry about how it felt to be a citizen doing what I could do to make other citizens’ voices be heard. It needed to be written down before the memories and emotions are erased by time and naps.

Yesterday I was a precinct election officer (PEO—a poll worker) for Franklin County—home of Columbus, Ohio. It was my third election as a PEO. The first was last November, and the second was the election aborted by the onset of the coronavirus in March. (Not sure that second one counts.) It was my first as a Voting Location Manager (VLM), so I felt the stakes were high to get it right.

  • Would there be protests at the polling location?
  • Would I have to deal with people wearing candidate T-shirts and hats or not wearing masks?
  • Would there be a crash of election observers, whether official (scrutinizing our every move) or unofficial (that I would have to remove)?

It turns out the answer to all three questions was “no”—and it was a fantastic day of civic engagement by PEOs and voters. There were well-engineered processes and policies, happy and patient enthusiasm, and good fortune along the way.

This story is going to turn out okay, but it could have been much worse. Because of the complexity of the election day voting process, last year Franklin County started allowing PEOs to do some early setup on Monday evenings. The early setup started at 6 o’clock. I was so anxious to get it right that the day before I took the printout of the polling room dimensions from my VLM packet, scanned it into OmniGraffle on my computer, and designed a to-scale diagram of what I thought the best layout would be. The real thing only vaguely looked like this, but it got us started.

A schematic showing the voting position and the flow of voters through the polling place. What I imagined our polling place would look like

We could set up tables, unpack equipment, hang signs, and other tasks that don’t involve turning on machines or breaking open packets of ballots. One of the early setup tasks was updating the voters’ roster on the electronic poll pads. As happened around the country, there was a lot of early voting activity in Franklin County, so the update file must have been massive. The electronic poll pads couldn’t handle the update; they hung at step 8-of-9 for over an hour. I called the Board of Elections and got ahold of someone in the equipment warehouse. We tried some of the simple troubleshooting steps, and he gave me his cell phone number to call back if it wasn’t resolved.

By 7:30, everything was done except for the poll pad updates, and the other PEOs were wandering around. I think it was 8 o’clock when I said everyone could go home while the two Voting Location Deputies and I tried to get the poll pads working. I called the equipment warehouse and we hung out on the phone for hours…retrying the updates based on the advice of the technicians called in to troubleshoot. I even “went rogue” towards the end. I searched the web for the messages on the screen to see if anyone else had seen the same problem with the poll pads. The electronic poll pad is an iPad with a single, dedicated application, so I even tried some iPad reset options to clear the device cache and perform a hard reboot. Nothing worked—still stuck at step 8-of-9. The election office people sent us home at 10 o’clock. Even on the way out the door, I tried a rogue option: I hooked a portable battery to one of the electronic polling pads to see if the update would complete overnight and be ready for us the next day. It didn’t, and it wasn’t.

Picture of a text with the contents: '(Franklin County Board Of Elections) Franklin County is going to ALL Paper Signature Poll Books.  Open your BUMPER PACKET and have voters sign in on the Paper Signature Poll Books.  Use the Paper Authority To Vote Slips.  Go thru your Paper Supplemental Absentee List and record AB/PROV on the Signature Line of all voters on that list.  Mark Names off of the White and Green Register of Voters Lists.' Text from Board of Elections

Polling locations in Ohio open at 6:30 in the morning, and PEOs must report to their sites by 5:30. So I was up at 4:30 for a quick shower and packing up stuff for the day. Early in the setup process, the Board of Elections sent a text that the electronic poll pads were not going to be used and to break out the “BUMPer Packets” to determine a voter’s eligibility to vote. At some point, someone told me what “BUMPer” stood for. I can’t remember, but I can imagine it is Back-Up-something-something. “Never had to use that,” the trainers told me, but it is there in case something goes wrong. Well, it is the year 2020, so was something going to go wrong?

Fortunately, the roster judges and one of the voting location deputies tore into the BUMPer Packet and got up to speed on how to use it. It is an old fashioned process: the voter states their name and address, the PEO compares that with the details on the paper ledger, and then asks the voter to sign beside their name. With an actual pen…old fashioned, right? The roster judges had the process down to a science. They kept the queue of verified voters full waiting to use the ballot marker machines. The roster judges were one of my highlights of the day.

And boy did the voters come. By the time our polling location opened at 6:30 in the morning, they were wrapped around two sides of the building. We were moving them quickly through the process: three roster tables for checking in, eight ballot-marking machines, and one ballot counter. At our peak capacity, I think we were doing 80 to 90 voters an hour. As good as we were doing, the line never seemed to end. The Franklin County Board of Elections received a grant to cover the costs of two greeters outside that helped keep the line orderly. They did their job with a welcoming smile, as did our inside greeter that offered masks and a squirt of hand sanitizer. Still, the voters kept back-filling that line, and we didn’t see a break until 12:30.

The PEOs serving as machine judges were excellent.
This was the first time that many voters had seen the new ballot equipment that Franklin County put in place last year. I like this new equipment: the ballot marker prints your choices on a card that it spits out. You can see and verify your choices on the card before you slide it into a separate ballot counter. That is reassuring for me, and I think for most voters, too. But it is new, and it takes a few extra moments to explain. The machine judges got the voters comfortable with the new process. And some of the best parts of the day were when they announced to the room that a first-time voter had just put their card into the ballot counter. We would all pause and cheer.

The third group of PEOs at our location were the paper table judges. They handle all of the exceptions.

  • Someone wants to vote with a pre-printed paper ballot rather than using a machine? To the paper table!
  • The roster shows that someone requested an absentee ballot? That voter needs to vote a “provisional” ballot that will be counted at the Board of Elections office if the absentee ballot isn’t received in the mail. The paper table judges explain that with kindness and grace.
  • In the wrong location? The paper table judges would find the correct place.

The two paper table PEOs clearly had experience helping voters with the nuances of election processes.

Rounding out the team were two voting location deputies (VLD). By law, a polling location can’t have a VLD and a voting location manager (VLM) of the same political party. That is part of the checks and balances built into the system. One VLD had been a VLM at this location, and she had a wealth of history and wisdom about running a smooth polling location. For the other VLD, this was his first experience as a precinct election officer, and he jumped in with both feet to do the visible and not-so-visible things that made for a smooth operation. He reminded me a bit of myself a year ago. My first PEO position was as a voting location deputy last November. The pair handled a challenging curbside voter situation where it wasn’t entirely clear if one of the voters in the car was sick. I’d be so lucky to work with them again.

The last two hours of the open polls yesterday were dreadfully dull. After the excitement of the morning, we may have averaged a voter every 10 minutes for those last two hours. Everyone was ready to pack it in early and go home. (Polls in Ohio close at 7:30, so counting the hour early for setup and the half an hour for tear down, this was going to be a 14 to 15 hour day.) Over the last hour, I gave the PEOs little tasks to do. At one point, I said they could collect the barcode scanners attached to the ballot markers. We weren’t using them anyway because the electronic poll pads were not functional. Then, in stages (as it became evident that there was no final rush of voters), they could pack up one or two machines and put away tables. Our second to last voter was someone in medical scrubs that just got off their shift. I scared our last voter because she walked up to the roster table at 7:29:30. Thirty seconds later, I called out that the polls are closed (as I think a VLM is required to do), and she looked at me startled. (She got to vote, of course; that’s the rule.) She was our last voter; 799 voters in our precinct that day.

Then our team packed everything up as efficiently as they had worked all day. We had put away the equipment and signs, done our final counts, closed out the ballot counter, and sealed the ballot bin. At 8:00, we were done and waving goodbye to our host facility’s office manager. One of the VLD rode along with me to the board of elections to drop off the ballots, and she told me of a shortcut to get there. We were among the first reporting results for Franklin County. I was home again by a quarter of 10—exhausted but proud.

I’m so happy that I had something to do yesterday. After weeks of concern and anxiety for how the election was going to turn out, it was a welcome bit of activity to ensure the election was held safely and that voters got to have their say. It was certainly more productive than continually reloading news and election results pages. The anxiety of being put in charge of a polling location was set at ease, too. I’m proud of our polling place team and that the voters in our charge seemed pleased and confident about the process.

Maybe you will find inspiration here.

  • If you voted, hopefully it felt good (whether or not the result turned out as you wanted).
  • If you voted for the first time, congratulations and welcome to the club (be on the look-out for the next voting opportunity…likely in the spring).
  • If being a poll worker sounded like fun, get in touch with your local board of elections (here is information about being a poll worker in Franklin County).

Democracy is participatory. You’ve got to tune in and show up to make it happen.

Picture of certificate from Franklin County Board of Elections in appreciation for serving as a voting location manager for the November 3, 2020, general election. Certificate of Appreciation

Time for a new look... / Jez Cope

I’ve decided to try switching this website back to using Hugo to manage the content and generate the static HTML pages. I’ve been on the Python-based Nikola for a few years now, but recently I’ve been finding it quite slow, and very confusing to understand how to do certain things. I used Hugo recently for the GLAM Data Science Network website and found it had come on a lot since the last time I was using it, so I thought I’d give it another go, and redesign this site to be a bit more minimal at the same time.

The theme is still a work in progress so it’ll probably look a bit rough around the edges for a while, but I think I’m happy enough to publish it now. When I get round to it I might publish some more detailed thoughts on the design.

Diversity, Astrology and Inclusion: not a valid approach / Tara Robertson

horoscopes chart in cosmic blue and purple

Edit: Thanks Tim Smith for letting me know the ChartHop website now shows that this is all an April Fools’ day prank. I fell for it. Kudos to the commitment to write a 13 page fake report and for your social team’s convincing response.

I first learned about ChartHop’s Charting Better Galaxies product on Dr. Sarah Saska’s Instagram account. It is the first workplace diversity, equity and inclusion tool that’s based on astrology. I thought there was a 50-50 chance that this was an April Fools’ gag. It turns out it’s a real thing and that they got $14M in Series A funding last year.

Here’s some of their key findings from a 13 page Guide to Workplace Astrology:

Key Findings: Virgos are the most highly represented sign across the industry.; There is a representation gap in senior leadership. On average, Fire Signs make up 40% of leadership positions.; Earth Signs are the lowest paid across industries and roles by sign type.

um…WHAT? This makes me really angry.

This product is problematic for several reasons:

  1. There are legitimate systemic inequalities in the workplace and in society. Racism, sexism, abelism, homophobia, transphobia and other systemic oppressions mean that our current systems are inequitable. Thinking about systemically oppressed astrological star signs trivializes real inequalities.
  2. Unless staff are explicitly opting in to have their employer derive their astrological sign from their date of birth, this is an unethical use of data.
  3. Extrapolating people’s personalities based astrology stereotypes is garbage and not appropriate for the workplace. Imagine getting feedback in a performance review based on the stereotypes of your star sign? Gemini 5/21 - 6/20 You have big and creative ideas. Yet, you struggle to move forward with decisive action.

For me, astrology is one of many tools I use to reflect on myself and my life–I’ve got the CHANI app on my phone. However, it’s completely inappropriate for workplaces to do pay equity analyses based on star sign, determine an individual’s leadership potential based on star sign, or to do organizational design based on star sign. Also: astrology goes against some people’s religious or epistemological world views.

As a DEI practitioner and consultant who uses data driven and research backed approaches I see ChartHop using familiar words and phrases like: representation gaps, aggregated and anonymized data, data and resources to address workplace inequities. However, visualizing a work force’s astrological signs is not a valid methodology for diagnosing inclusion issues, nor is it a useful thing to help make the workplace more diverse, inclusive or equitable.

In 2003 it was estimated that the diversity, equity and inclusion industry was worth more than $8 billion. There has been huge growth since last year and there have been a lot of new companies with technology tools to surface bias, new job boards to connect with underrepresented talent, and lots of people entering the industry as consultants.

I’m glad that most organizations know this is something they need to develop a strategy for.  I’d encourage practitioners and business leaders to think critically about what they’re prioritizing and measuring. I’m excited about some of the products and services that I see, as I think they will contribute to long term, systemic change and more equitable workplaces. ChartHop’s product isn’t it though.



The post Diversity, Astrology and Inclusion: not a valid approach appeared first on Tara Robertson Consulting.

Evergreen 3.7-beta available / Evergreen ILS

The Evergreen Community is pleased to announce the availability of the beta release for Evergreen 3.7. This release contains various new features and enhancements, including:

  • Support for SAML-based Single Sign On
  • Hold Groups, a feature that allows staff to add multiple users to a named hold group bucket and place title-level holds for a record for that entire set of users
  • The Bootstrap public catalog skin is now the default
  • “Did you mean?” functionality for catalog search focused on making suggestions for single search terms
  • Holdings on the public catalog record details page can now be sorted by geographic proximity
  • Library Groups, a feature that allows defining groups of organizational units outside of the hierarchy that can be used to limit catalog search results
  • Expired staff accounts can now be blocked from logging in
  • Publisher data in the public catalog display is now drawn from both the 260 and 264 field
  • The staff catalog can now save all search results (up to 1,000) to a bucket in a single operation
  • New opt-in settings for overdue and predue email notifications
  • A new setting to allow expired patrons to renew loans
  • Porting of additional interfaces to Angular, including Scan Item as Missing Pieces and Shelving Location Groups

Evergreen admins installing the beta or upgrading a test system to the beta should be aware of the following:

  • The minimum version of PostgreSQL required to run Evergreen 3.6 is PostgreSQL 9.6.
  • The minimum version of OpenSRF is 3.2.
  • This release adds anew OpenSRF service, open-ils.geo.
  • The release also adds several new Perl module dependencies, Geo::Coder::Google, Geo::Coder::OSM, String::KeyboardDistance, and Text::Levenshtein::Damerau::XS.
  • The database update procedure has more steps than usual; please consult the upgrade section of the release notes.
  • The beta release should not be used for production.

Additional information, including a full list of new features, can be found in the release notes.

A Retrospective with the Archives Unleashed Project / Archives Unleashed Project

Written by Samantha Fritz (Project Manager, Archives Unleashed Project) on behalf of the Archives Unleashed team.

This piece has been cross-posted with the International Internet Preservation Consortium Blog.

The web archiving world blends the work and contributions of many institutions, groups, projects, and individuals. The field is witnessing work and progress in many areas, from policies, to professional development and learning resources, to the development of tools that address replay, acquisition, and analysis.

For over two decades memory institutions and organizations around the world have engaged in web archiving to ensure the preservation of born-digital content that is vital to our understanding of post-1990s research topics. Increasingly web archiving programs are adopted as part of institutional activities, because in general there is a recognition from librarians, archivists, scholars, and others that web archives are critical resources and are vulnerable to stewarding our cultural heritage.

The National Digital Stewardship Alliance has conducted surveys to “understand the landscape of web archiving activities in the United States.” Reflecting on the most recent 2017 survey results, respondents indicated they perceived the least progress in the past two years fell in the category of access, use, and reuse. The 2017 report indicates that this could suggest “a lack of clarity about how Web archives are to be used post-capture” (Farrell et. al. 2017 Report, p.13). This finding makes complete sense given that focus has largely revolved around selection, appraisal, scoping and capture.

Ultimately, the active use of web archives by researchers, and by extension the development of tools to explore web archives has lagged. As such we see institutions and service providers like librarians and archivists are tasked with figuring out how to “use” web archives.

We have petabytes of data, but we also have barriers

The amount of data captured is well into the petabyte range — and we can look at larger organizations like the Internet Archive, the British Library, the Bibliothèque Nationale de France, Denmark’s Netarchive, the National Library of Australia’s Trove platform, and Portugal’s, who have curated extensive web archive collections, but we still don’t see a mainstream or heavy use of web archives as primary sources in research. This is in part due to access and usability barriers. Essentially, the technical experience needed to work with web archives, especially at scale, is beyond the reach of most scholars.

It is this barrier that offers an opportunity for discussion and work in and beyond the web archiving community. As such, we turn to a reflection of contributions from the Archives Unleashed Project for lowering barriers to web archives.

About the Archives Unleashed Project

Archives Unleashed was established in 2017 with support from The Andrew W. Mellon Foundation. The project grew out of an earlier series of events which identified a collective need among researchers, scholars, librarians and archivists for analytics tools, community infrastructure, and accessible web archival interfaces.

In recognizing the vital role web archives play in studying topics from the 1990s forward, the team has focused on developing open-source tools to lower the barrier for working with and analyze web archives at scale.

From 2017–2020 Archives Unleashed has a three-pronged strategy for tackling the computational woes of working with large data, and more specifically W/ARCs:

  1. Development of the Archives Unleashed Toolkit: apply modern big data analytics infrastructure to scholarly analysis of web archives
  2. Deployment of the Archives Unleashed Cloud: provide a one-stop, web-based portal for scholars to ingest their Archive-It collections and execute a number of analyses with the click of a mouse.
  3. Organization of Archives Unleashed Datathons: to build a sustainable user community around our open-source software.

Milestones + Achievements

If we look at how Archives Unleashed tools have developed, we have to reach back to 2013 when Warcbase was developed. It was the forerunner to the Toolkit and was built on Hadoop and HBase as an open-source platform to support temporal browsing and large-scale analytics of web archives (Ruest et al., 2020, p. 157).

The Toolkit moves beyond the foundations of Warcbase. Our first major transition was to replace Apache HBase with Apache Spark to modernize analytical functions. In developing the Toolkit, the team was able to leverage the needs of users to inform two significant development choices. First, by creating a Python interface that has functional parity with the Scala interface. Python is widely accepted, and more commonly known, among scholars in the digital humanities who engage in computational work. From a sustainability perspective, Python is a stable, open-source, and ranked as one of the most popular programming languages.

Second, the Toolkit shifted from Spark’s resilient distributed datasets (RDDs), part of the Warcbase legacy, to support DataFrames. While this was part of the initial Toolkit roadmap, the team engaged with users to discuss the impact of alternative options to RDD. Essentially, DataFrames offers the ability within Apache Spark to produce a tabular based output. From the community, this approach was unanimously accepted in large part because of the familiarity with pandas, and DataFrames made it easier to visually read the data outputs (Fritz, et. al, 2018, Medium Post).

Comparison between RDD and DataFrame outputs

The Toolkit is currently at a 0.90.0 release, and while the Toolkit offers powerful analytical functionality, it is still geared towards an advanced user. Recognizing that scholars often didn’t know where to start with analyzing W/ARC files, and the intimidating nature of the command line, we took a cookbook approach in developing our Toolkit user documentation. With it, researchers can modify dozens of example scripts for extracting and exploring information. Our team focused on designing documentation that presented possibilities and options, while at the same time guided and supported user learning.

Sparkshell for using the Archives Unleashed Toolkit

The work to develop the Toolkit, provided the foundations for other platforms and experimental methods of working with web archives. The second large milestone reached by the project was the launch of the Archives Unleashed Cloud.

The Archives Unleashed Cloud, largely developed by project co-investigator Nick Ruest, is an open-source platform that was developed to provide a web-based front end for users to access the most recent version of the Archives Unleashed Toolkit. A core feature of the Cloud, is that it uses the Archive-It WASAPI, which means that users are directly connected to their Archive-It collections and can proceed to analyze web archives without having to spend time delving into the technical world.

Archives Unleashed Cloud Interface for Analysis

Recognizing that the Toolkit, while flexible and powerful, may still be a little too advanced for some scholars, the Cloud offers a more user-friendly and familiar user interface for interacting with data. Users are presented with simple dashboards which provide insights into WARC collections, provide downloadable derivative files and offer simple in-browser visualizations.

In June of 2020, marking the end of our grant, the Cloud had analyzed just under a petabyte of data, and has been used by individuals from 59 unique institutions across 10 countries. Cloud remains an open-source project, with code available through a GitHub repository. The canonical instance will be deprecated as of June 30 2021 and be migrated into Archive-It, but more on that project in a bit.

Datathons + Community Engagement

Datathons provided an opportunity to build a sustainable community around Archives Unleashed tools, scholarly discussion, and training for scholars with limited technical expertise to explore archived web content.

Adapting the hackathon model, these events saw participants from over fifty institutions from seven countries engage in a hands-on learning environment — working directly with web archive data and new analytical tools to produce creative and ingenuitive projects that explore W/ARcs. In collaborating with host institutions, the datathons also highlight web archive collections from host institutions, increasing visibility and usability cases for their curated collections.

In a recently published article, “Fostering Community Engagement through Datathon Events: The Archives Unleashed Experience,” we reflected on the impact that our series of datathon events had on community engagement within the web archiving field, and on the professional practices of attendees. We conducted interviews with datathon participants to learn about their experiences and complemented this with an exploration of established models from the community engagement literature. Our article culminates in contextualizing a model for community building and engagement within the Archives Unleashed Project, with potential applications for the wider digital humanities field.

Our team has also invested and participated in the wider web archival community through additional scholarly activities, such as institutional collaborations, conferences, and meetings. We recognize that these activities bring together many perspectives, and have been a great opportunity to listen to the needs of users and engage in conversations that impact adjacent disciplines and communities.

(Archives Unleashed Datathon, Gelman Library, George Washington University)

Lessons Learned

  1. It takes a community

If there is one main take away we’ve learned as a team, and that all our activities point to, it’s that projects can’t live in silos! Be they digital humanities, digital libraries, or any other discipline, projects need communities to function, survive, and thrive.

We’ve been fortunate and grateful to have been able to connect with various existing groups including being welcomed by the web archiving and digital humanities communities. Community development takes time and focused efforts, but it is certainly worthwhile! Ask yourself, if you don’t have a community, who are you building your tools, services, or platforms for? Who will engage with your work?

We have approached community building through a variety of avenues. First and foremost, we have developed relationships with people and organizations. This is clearly highlighted through our institutional collaborations in hosting datathon events, but we’ve also used platforms like Slack and Twitter to support discussion and connection opportunities among individuals. For instance, in creating both general and specific Slack channels, new users are able to connect with the project team and user community to share information and resources, ask for help, and engage in broader conversations on methods, tools, and data.

Regardless of platform, successful community building relies on authentic interactions and an acknowledgment that each user brings unique perspectives and experiences to the group. In many cases we have connected with uses who are either new to the field or to analysis methods of web archives. As such, this perspective has helped to inform an empathetic approach to the way we create learning materials, deliver reports and presentations, and share resources.

  1. Interdisciplinary teams are important

So often we see projects and initiatives that highlight an interdisciplinary environment — and we’ve found it to be an important part of why our project has been successful.

Each of our project investigators personifies a group of users that the Archives Unleashed Project aims to support, all of which converge around data, more specifically WARCs or web archive data. We have a historian who is broadly representative of digital humanists and researchers who analyze and explore web archives; a librarian who represents the curators and service providers of web archives; and a computer scientist who reflects tool builders.

A key strength of our team has been to look at the same problem from different perspectives, allowing each member to apply their unique skills and experiences in different ways. This has been especially valuable in developing underlying systems, processes and structures which now make up the Toolkit. For instance, triaging technical components offered a chance for team members to apply their unique skill sets, which often assisted in navigating issues and roadblocks.

We also recognized each sector has its own language and jargon that can be jarring to new users. In identifying the wide range of technical skills within our team, we leveraged (and valued) those “I have no idea what this means/ what this does.” moments. If these types of statements were made by team members or close collaborators, chances are they would carry through to our user community.

Ultimately, the interdisciplinary nature and the wide range of technical expertise found within our team, helped us to see and think like our users.

  1. Sustainability planning is really hard

Sustainability has been part question, part riddle. This is the case for many digital humanities projects. These sustainability questions speak to the long term lifecycle of the project, and our primary goal has always been to ensure a project’s survival and continued efforts once the grant cycle has ended.

As such the Archives Unleashed team has developed tools and platforms with sustainability in mind, specifically by adopting widely adopted and stable programming languages and best practices. We’ve also been committed to ensuring all our platforms and tools have developed in the spirit of open-access, and are available in public GitHub repositories.

One overarching question remained as our project entered its final stages in the Spring of 2020: how will the Toolkit live on? Three years of development and use cases demonstrated not only the need and adoption of tools created under the Archives Unleashed Project, but also solidified the fact that without these tools, there aren’t currently any simplified processes to adequately replace it.

Where we are headed (2020–2023)

Our team was awarded a second grant from The Andrew W. Mellon Foundation, which started in 2020 and will secure the future of Archives Unleashed. The goal of this second phase is the integration of the Cloud with Archive-it, so as a tool it can succeed in a sustainable and long-term environment. The collaboration between Archives Unleashed and Archive-It also aims to continue to widen and enhance the accessibility and usability of web archives.

Priorities of the Project

First, we will merge the Archives Unleashed analytical tools with the Internet Archive’s Archive-it service to provide an end-to-end process for collecting and studying web archives. This will be completed in three stages:

  1. Build. Our team will be setting up the physical infrastructure and computing environment needed to kick start the project. We will be purchasing dedicated infrastructure with the Internet Archive.
  2. Integrate. Here we will be migrating the back end of the Archives Unleashed Cloud to Archive-it and paying attention to how the Cloud can scale to work within its new infrastructure. This stage will also see the development of a new user interface that will provide a basic set of derivatives to users.
  3. Enhance. The team will incorporate consultation with users to develop an expanded and enhanced set of derivatives and implement new features.

Secondly, we will engage the community by facilitating opportunities to support web archives research and scholarly outputs. Building on our earlier successful datathons, we will be launching the Archives Unleashed Cohort program to engage with and support web archives research. The Cohorts will see research teams participate in year-long intensive collaborations and receive mentorship from Archives Unleashed with the intention of producing a full-length manuscript.

We’ve made tremendous progress, as the close of our first year is in sight. Our major milestone will be to complete the integration of the Archives Unleashed Cloud/Toolkit over to Archive-It. As such users will soon see a beta release of the new interface for conducting analysis with their web archive collections, specifically by downloading over a dozen derivatives for further analysis, and access to simple in-browser visualizations.

Our team looks forward to the road ahead, and would like to express our appreciation for the support and enthusiasm Archives Unleashed has received!

We would like to recognize our 2017–2020 work was primarily supported by the Andrew W. Mellon Foundation, with financial and in-kind support from the Social Sciences and Humanities Research Council, Compute Canada, the Ontario Ministry of Research, Innovation, and Science, York University Libraries, Start Smart Labs, and the Faculty of Arts and David R. Cheriton School of Computer Science at the University of Waterloo.


Farrell, M., McCain, E., Praetzellis, M., Thomas, G., and Walker, P. 2018. Web Archiving in the United States: A 2017 Survey. National Digital Stewardship Alliance Report. DOI 10.17605/OSF.IO/3QH6N

Ruest, N., Lin, J., Milligan, I., and Fritz, S. 2020. The Archives Unleashed Project: Technology, Process, and Community to Improve Scholarly Access to Web Archives. In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020 (JCDL ‘20). Association for Computing Machinery, New York, NY, USA, 157–166. DOI:

Fritz, S., Milligan, I., Ruest, N., and Lin, J. To DataFrame or Not, that is the Questions: A PySpark DataFrames Discussion. May 29, 2018. Medium.


We’ve provided some additional reading materials and resources that have been written by our team, and shared with the community over the course of our project work.

For a full list please visit our publications page:

Shorter blog posts can be found on our Medium site:





A Retrospective with the Archives Unleashed Project was originally published in Archives Unleashed on Medium, where people are continuing the conversation by highlighting and responding to this story.

Security releases: Evergreen 3.6.3 and 3.5.4 / Evergreen ILS

The Evergreen community is pleased to announce the release of Evergreen 3.6.3 and 3.5.4, both available from the downloads page.


It is recommended that all Evergreen sites upgrade as soon as possible.

These releases fix a critical cross-site scripting (XSS) vulnerability.

All of these new releases contain additional bug fixes unrelated to the security issue. For more information on the changes in these releases, please consult their release notes:

Podcast interview: Names, binaries and trans-affirming systems on Legacy Code Rocks! / Erin White

In February I was honored to be invited to join Scott Ford on his podcast Legacy Code Rocks!. I’m embedding the audio below. View the full episode transcript — thanks to trans-owned Deep South Transcription Services!

I’ve pulled out some of the topics we discussed and heavily edited/rearranged them for clarity.

Names in systems

Legal name vs. name of use

Let’s think about Facebook’s former Real name policy. Early on Mark Zuckerberg even said that having two names showed a lack of integrity.

The underlying assumption was that there’s one name that everybody always uses, and only people with malicious intend would do anything different. The notion that people are using different identities to “trick” others is also a common, harmful trope used to demonize and discredit trans people.

We now widely acknowledge that people are called different names in different circumstances either because of familial or professional relationships, different eras of their lives, different contexts, or because of a change in their gender identity.

People’s legal names may stay the same, but their names of use vary. That was the thing that got me thinking about trans-affirming systems design.

What would a world look like where trans folks actually see themselves in systems rather than simply accommodated? What if if they truly were affirmed and celebrated?

One way to do that is to allow people to say what their names are. There are very few contexts when we actually need folks’ full legal first names.

Not “edge cases”

Allowing for name flexibility is an example of a technology that helps a lot of different people. For example, of the 140 people on staff at our library, about a third of us are using names that are different from our full legal first name. People are going by middle names or by more familiar versions of first names, like Jimmy instead of James; or are using totally different names. While some people would see an errant name field as a minor annoyance, for other folks it’s a safety issue. It’s one change that’s a big quality of life increase for a lot of folks.


Then there’s the gender binary. Computers run on binaries. As technologists we love the idea of ones and zeros, simplifying things when possible: off/on, yes/no; and frequently we do that with gender too. You’ve got a form asking for gender (typically unnecessarily) and there’s only two options.

Gendered stereotypes serve no one

We know full well that when we provide gender data it is often used to sell us things based gender stereotypes. When systems are actively reinforcing the gender binary, the result is reductive and uninspiring, and something that doesn’t reflect the lived gender experience of most people, whether they are trans or not.

Trans/cis binary

Another gender-related binary: either you’re trans or you’re cis. That’s a false binary. People’s gender identities change throughout their lives. There’s valid expressions of gender identity that are neither/nor, that are both/and, so to create that wall between trans and cis is really harmful for all, and cashes out as violence against people who don’t conform.

So many trans people I know don’t think they are “trans enough.” And so many cis people spend so much time trying to prove that they are manly or womanly enough. It’s exhausting.

Everybody has a gender

It’s important for folks who identify as cisgender to to think about and question their genders. You have gender(s)!

Ask yourself, How does my gender impact how I move through the world? How does it impact how I interact with people, and how I present myself, how I dress? It’s not just trans people that should be thinking about this. Just reflect on what your gender is, and how you do it. There’s so much richness there, even within within the cisgender and transgender buckets, there’s just so much.

Binaries create inequalities

Binaries in themselves can be violent. As humans, we categorize things as a survival mechanism so that we don’t have to spend all our energy processing every single sensory input.

At the same time, when we have categories that pit things against each other with a clear delineating line between, those differentiations create inequality.

One harmful binary at the root of American culture: either you’re white or you’re not, and you’re less than. The foundation of the U.S. is the exploitation and oppression of nonwhite people, Black and brown people. In technology, a binary might be “technical” and “non-technical” people. Those types of less-than/greater-than binaries occur across identities and sectors including gender.

Once you start to perceive all the binaries you can’t unsee them. Understanding how detrimental they are helps us understand how the systems we build can reject them and instead reflect the rich bouquet of lived human experience.

Making trans-affirming systems

Audit how how names are handled. Do you require a legal name for anything? If not, let people choose their name, let people update it. Does that name cascade to their username? Are they able to change a username? If I signed up 10 years ago and now I need to change my username, I want to bring over my entire history, am I able to do that?

Follow that up with a gender audit. Are you asking for gender anywhere? Why do you actually need it? Are you asking for people to indicate gender or a title? Add the gender-inclusive Mx. to the honorifics field and if possible make it optional because some folks are just not into it.

Images. If you’re using stock photography or other images on your site, do they represent diversity of lived experiences? Do you have folks who are not white, who are not young, who are disabled, who maybe aren’t conventionally gender presenting? Folks dressed in different types of clothing or with different gender presentation? There’s a few different open photo libraries on the web — the broadly gender spectrum collection comes to mind.

Content. Think about the the content of the web and how users are communicated with in the language that we use. Singular “they” instead of “he or she.”

More on my A List Apart article, Trans-inclusive design.


I recently read Design Justice and can’t recommend it highly enough. Constant learning is our life’s work. We can’t stay stagnant. We have to keep pushing ourselves, talking to people, and making sure that what we’re building is something that’s going to serve everybody.

Community Announcement / Islandora

Community Announcement agriffith Wed, 03/31/2021 - 16:49

As you know, the Islandora Foundation has recently updated its governance structure to remain compliant with Canadian non-profit regulations. Islandora Foundation members approved these changes at the Annual General Meeting in early March. A summary of these changes is provided here, as well as our emerging roadmap for moving forward.

A newly formed “Leadership Group”, composed of representatives from our Partner-level member organizations, replaces the pre-existing Board of Directors, and a smaller Board of Directors remains responsible for Islandora’s administrative and fiscal responsibilities. This Leadership Group met for the first time on Friday, March 26th to begin to discuss their goals going forward, and the ways the Leadership Group will interact with the other governance structures of the Islandora community. The Leadership Group immediately affirmed their commitment to transparent communication and collaboration with the vibrant, robust Islandora community and will be creating a Terms of Reference over the next month. The Terms of Reference will be written with agility and transformation in mind, as we work together to secure a strong future for both the community and codebase.

In the meantime, please let us know if you have any questions regarding the formation of the Leadership Group, and stay tuned to hear more about the initial goals of this group.

Ethical Financial Stewardship: One Library’s Examination of Vendors’ Business Practices / In the Library, With the Lead Pipe

By Katy DiVittorio and Lorelle Gianelli

In Brief

The evaluation of library collections rarely digs into the practices or other business ventures of the companies that create or sell library resources. As financial stewards, academic Acquisition Librarians are in a unique position to consider the business philosophy and practices of our vendors as they align with the institutions we serve. This article shares one academic library’s research and assessment of library vendors’ corporate practices, a review that involved purchasing Consumer Sustainability Rating Scorecards and Accessibility Reports. Challenges the library faced include lack of vendor involvement and how to move forward when it is discovered that a provider’s business ventures could harm our library patrons or their families. As a library that serves two official Hispanic-Serving Institutions (HSI) and one emerging HSI this evaluation also considered how vendor practices may impact Hispanic/Latinx students. 


The ethical behavior of businesses is often referred to as Corporate Social Responsibility (CSR), or conscious capitalism. CSR is the idea that businesses have a responsibility to consider collective values and contribute to society in a positive way. For some companies it is a fundamental part of their business model, for example TOMS donates a pair of shoes for each pair purchased. For others, it means allowing employees to volunteer during the year, contributing to charity, or developing more sustainable business practices. Greater societal good has been shown to have many benefits for businesses, including increased revenue, improved employee retention, and a strengthened supply chain (Trotter, 2017). This concept has even been incorporated into business curricula. Villanova University offers four types of CSR programs and Harvard Business School has a Corporate Responsibility Initiative (CRI) with the goal of studying and promoting responsible business practices (Villanova University, 2020; Harvard Kennedy School, n.d.).

A 2019 article published in the International Journal of Corporate Social Responsibility points out that “business’ concern for society” can be traced back centuries to Roman Law, but the concept was not written about until the 1930s (Agudelo et al., 2019). In the 1950s CSR was mostly philanthropic, while the growth of social movements in the 1960s highlighted issues impacted by corporate decision making such as pollution, employee safety, labor laws, and civil rights. It was in this environment that companies, such as Ben & Jerry’s, started to integrate social concerns into their foundations (Agudelo, et al., 2019). 

CSR has continued to evolve as investors demand that companies “serve a social purpose” (Proulx, 2018). In 2018, the founder of the investment firm BlackRock sent an ultimatum to several large companies demanding that they either contribute in a positive way to society or lose his firm’s financial support. Given that BlackRock is the world’s largest investment firm, with $6 trillion in assets, this move was considered “a lightning rod” moment by the Associate Dean at the Yale School of Management, an expert on corporate leadership (Proulx, 2018). Legal changes also reflect this trend. Beginning in 2018, under EU law, public companies are required to report company details that go beyond their finances, such as their diversity policies (Agudelo, 2019).  

Contributing to society in a positive way involves more than just financial support; social responsibility also includes issues related to ethics and equality. In 2019, the Interfaith Center on Corporate Responsibility (ICCR), an investor advocacy group, submitted ten proposals to Amazon, encouraging the company to refrain from selling facial-recognition software and to directly link executive compensation to improved diversity and sustainability practices (Romano, 2019). More recently, COVID has put a strain on many consumers, and this combined with racial unrest is “bringing about rapid change and heightened consumer expectations” (Moore, 2020). As companies fight to stay viable in these uncertain times, CSR can be an important part to staying in business, especially as people “see through platitudes and hold companies accountable when their stated values and actions do not align” (Moore, 2020).  Library acquisitions staff are consumers, though purchasing content for the library and university rather than for themselves. Just as individual consumers base buying decisions on personal principles, librarians operate within the values of their institutions and profession.


Auraria Library serves three institutions of higher education on one campus: University of Colorado Denver (CU Denver), Metropolitan State University of Denver (MSU Denver), and Community College of Denver (CCD). The authors would like to acknowledge that in order to create the Auraria campus in the early 1970s families, homes, and businesses were displaced (Gallegos, 2011). Eminent domain was enacted to remove hundreds of families and their homes, and most of the families living in the area at the time were Hispanic/Latinx and protested their forced relocation (Rael, 2019). As a form of reparation, the three institutions offer free tuition to the families, children, and grandchildren of those who were forcibly displaced and kept some of the houses to be used as part of the campus (Rael, 2019). 

The Auraria campus was designed as a place where a student could attend community college, transfer to a four-year college, and go on to earn a graduate degree all on the same campus. While that vision has changed over the years and each of the institutions has carved out its own identity, the library continues to be one of the few shared resources and services. Auraria Library serves a diverse patron population and offers resources that support curricula from the vocational to the PhD level. MSU Denver & CCD are Hispanic-Serving Institutions (HSIs), while CU Denver is an emerging HSI. HSIs are public and private, two and four-year not-for-profit institutions that have at least 25% full time enrollment of Hispanic undergraduate students. 

Implications for Hispanic-Serving Institutions

City University of New York Law Professor Sarah Lamdan published two articles that explore “the ethical issues that arise when lawyers buy and use legal research services sold by the vendors that build ICE’s surveillance systems” (Lamdan, 2019, p.1). Westlaw, a leading legal database used by lawyers, libraries, and other private industries to conduct legal research is owned by Thomson Reuters, a company that also creates CLEAR Investigation software. CLEAR is Thomson Reuter’s software that law enforcement agencies, including U.S. Immigration and Customs Enforcement (ICE), use to collect thousands of data points on people in order to assist with their investigations and identify community threats. The funds that libraries pay Westlaw support the creation and operations of CLEAR. These library funds are supporting surveillance, including surveillance of our most vulnerable communities.

ICE was created in part as a response to the September 11, 2001 attacks. While unauthorized immigration levels have decreased since 2007, immigration detention and removals have increased since 2015 (Krogstad et al., 2019; Guo & Baugh 2019). ICE has been criticized for its policies and abuses of power. ICE tracks down immigrants when nothing illegal is happening and who have no criminal record (Lamdan, 2019). In 2017, ICE requested that the National Records and Archives Administration (NARA) destroy documentation for abuse allegations related to violent assault, sexual assault, and death back to the creation of ICE in 2003 (Eagle, 2019). While NARA initially approved this request public and professional outcry pushed NARA to reevaluate this decision (Eagle, 2019).

Because of these practices, people within various professions have pushed back against working with ICE and the companies that build products for ICE. Employees at Microsoft (including its GitHub subsidiary), Google, and Amazon have all pushed back against their own companies working with ICE (Bergen & Bass, 2019; Chao, 2018; Shahani, 2019; Shaban, 2018). Colleges and universities around the U.S. are seeing demonstrations from their students against ICE and companies that do business with them (McLean, 2019). For example, at Johns Hopkins, Associate Professor Drew Daniel, who started the campaign for his university to cut ties with ICE stated, “I think there’s a very strong feeling across the board from undergraduates that it was deeply inconsistent that you wanted an inclusive and diverse campus while partnering with ICE, because of the racism in the way ICE targets black and brown people” (McLean, 2019, pp.2-3).

The Deferred Action for Childhood Arrivals (DACA) program started in 2012 and allows children brought to the U.S. without authorization before June 2007 and under the age of 16 deferred removal from the country. This allows them to stay in the U.S. and attend school and work with authorization. This policy was put in place under the Obama administration. While in 2017, the Trump administration announced it would end DACA, in 2020 the Supreme Court ruled that Trump could not immediately end DACA, and President Biden took steps in 2021 to protect the policy (Liptak & Shear, 2020; Redden, 2021). 

Most undocumented immigrants live in twenty cities across the United States. Denver has been on this list since 2005 (Passel & Cohn, 2019). Colorado has implemented several programs that support its undocumented student population. MSU Denver serves more Hispanic/Latinx students (5,469) than any other higher education institution in Colorado (Phare, 2019). All three institutions on the Auraria campus have support services for DACA students. MSU Denver was also the first institution in Colorado to offer in-state tuition to undocumented students, and Colorado passed the ASSET (Advancing Students for a Stronger Economy Tomorrow) bill that provides undocumented students the opportunity to pay in-state tuition at public institutions if they fit certain criteria (Metropolitan State University of Denver, n.d.). In October 2019, the University of Colorado System and MSU Denver co-signed a U.S. Supreme Court brief to defend young people who immigrated illegally as children and that supports their ability to pursue higher education (Langford, 2019; Presidents’ Alliance on Higher Education and Immigration, 2019). When the U.S. Supreme Court heard oral arguments to end DACA in November 2019, CU Denver and MSU Denver reconfirmed their commitment to supporting DACA and undocumented students (DeWind, 2019; Watson, 2019). 

There is surging nationwide enrollment of college students who self-identify as Hispanic/Latinx and the number of HSIs are increasing as a result (Garcia, 2019). According to predictions U.S. high school graduation rates will peak in 2025 with a national dip following in 2026 (WICHE, 2016). White high school graduates are decreasing while it is predicted that non-white high school graduates will increase from 42% to 49% of the U.S. population by 2023 (WICHE, 2016). 

In Colorado there is very little state support for higher education (State Support for Higher Education per Full-Time Equivalent Student, 2019). Because of this, universities and colleges rely heavily on tuition monies, and any dip in enrollment can mean a crisis for the institution. Higher education administration needs to be aware of how their practices affect their growing Hispanic/Latinx population. If colleges and universities work with companies that could harm these students or their families then they risk losing a segment of this growing student population. 

Vendor Ethics Taskforce (VET)

As Auraria Library staff became more aware of the undocumented student experience, their experiences with ICE, and the increasing importance of CSR, they wanted to act. In September 2018, Auraria Library created the Vendor Ethics Taskforce (VET). The charge of VET was to research and evaluate its learning materials vendors using values-based metrics. VET’s assessments would be used in renewal or new subscription/purchasing decisions and negotiations and to start conversations with vendors about areas of concern. As good stewards of the institutions’ funds, its goal was to avoid working with companies that are out of alignment with the library’s and institutions’ values. 

VET selected a small number of vendors to assess to start including a variety based on size and products offered. VET wanted a broad representation in its pilot project and so included vendors that VET had potential ethical concerns about, vendors it thought would score well overall, and vendors for which VET was unsure of the outcome.

VET consisted of members from Collections Strategies, Researcher Support Services, and Education and Outreach Services departments. VET selected five metrics after reviewing the library’s values and mission statement; its three institutions’ values, vision, and mission statements; and the American Library Association’s professional values. The metrics are as follows: 

1. Diversity: This metric seeks to examine the internal hiring practices of the company, paying particular attention to the diversity of the companies’ high-level staff and board members and pay equity. 

2. Ethics: Ethics refer to what the company values and how this informs its decision making. This is commonly referred to as a company’s Code of Conduct, Standards of Business, Core Values, or Code of Ethics.

3. Data Privacy: What data does the company collect on patrons? How is it used and by whom? Is the company also an information broker? Does it actively collect and sell data?  

4. Accessibility: Does the company have an accessibility statement, and does it indicate they follow national accessibility standards? Do the company’s products meet national accessibility guidelines (WCAG, Section 508)? If not, how are they addressing the shortfalls? 

5. Environment/Sustainability: Does the company have a statement on sustainability? Does it give to an environmental charity? Is it winning sustainability awards?

After several months of researching and determining the metrics that VET would use VET recognized the need for outside expertise. VET researched outside consultants and obtained funding to procure their services. In early 2019 VET selected two companies with which to contract: EcoVadis and Michigan State University’s Usability/Accessibility Research and Consulting Services (MSU UARC). Because transparency was an important component to this project, VET informed each library vendor via email before the vendor was assessed.

MSU UARC has been used by the Big 10 Academic Alliance. VET elected to pilot this service with six vendors and spent $3,000. During this assessment MSU UARC checked each vendor’s website against Web Content Accessibility Guidelines (WCAG) 2.0 AA. MSU UARC was given temporary access via a guest login to do this work and then provided a report listing the major accessibility issues on the site. The Big 10 Academic Alliance had reports on some of the library’s vendors, but they were a year or more out-of-date. VET planned to use the older Big 10 reports against our newer ones to see which vendors are making improvements and which ones were taking no action in improving their accessibility.

The second company, EcoVadis, conducts Consumer Sustainability Ratings on vendors using international standards and produces a vendor “scorecard.” Each scorecard has four categories: Environment, Labor & Human Rights, Ethics, and Sustainable Procurement. During our pilot VET had access to a limited number of vendors’ scorecards. If the vendor was already in the EcoVadis database, VET received immediate access. If not, it would take EcoVadis a month or two to get the vendor added and collect all the data points to create a scorecard. In addition, an EcoVadis representative planned to work with VET on having follow-up conversations with vendors if the scorecard revealed areas of concern. They could help VET plan, for example, challenging conversations in which VET asked a vendor to consider changing specific practices that had been uncovered in the report. This EcoVadis pilot cost the Auraria Library $2,000. 

Each of the EcoVadis vendors was sent a letter via email inviting them to participate in this initiative in May 2019. See Appendix A for an example. The letter, signed by Auraria Library’s Director, explained that VET wanted to conduct this assessment to help its library demonstrate excellent financial stewardship, while also ensuring that the social and environmental performances of its vendors aligned with its institutions’ values. The vendors were also notified that there would be a small cost to them to participate in addition to the fee Auraria had already paid. Depending on the size of their company the vendor would need to pay somewhere between $500 and $2,000. The authors have chosen only to share the names of the vendors for whom we did receive scorecards.

Outcomes and Findings


Overall VET discovered that library vendors are not ready to participate in a program like EcoVadis. Out of all the vendors contacted, only two shared their EcoVadis scorecards. One was Clarivate, and it already had a scorecard in place. Informa was the only vendor willing to go through the steps to have itself assessed. Some of the other companies said they were not able or willing to participate due to the labor and cost involved to them. However, there was one company that already had an EcoVadis scorecard but refused to allow EcoVadis to share it with VET. VET members ended up talking with vendors’ lawyers in a few cases when the company was deciding whether to participate. Some companies said they would reconsider participating in the future. 

Since Auraria Library had paid for access to scorecards and was not getting them due to lack of vendor participation, VET decided to try another tactic. VET reached out to University Procurement staff to let them know about this pilot and to see if they had any vendors they would be interested in asking for scorecards. As it happened, the Director of Strategic Procurement had been considering EcoVadis for some time. University Procurement added their own suppliers and were able to get reports for Adobe, Agilent Technologies, AVIS, CISCO, DELL, Enterprise, Fastenal, Lenovo, MedLine, SAP SE, Staples, Thermo Fisher Scientific, UPS, and WW Grainger. 

The two EcoVadis scorecards for the library vendors varied in detail. The one for Clarivate did not have enough information to be helpful due to a lack of documented policies or procedures shared by Clarivate with EcoVadis. The Informa scorecard showed Informa in a high percentile (good). Overall, Informa had many more strength areas than areas that needed improvement. VET also had various conversations with its Informa representative and was able to learn about the work Informa is doing and awards they have received around sustainability. 

Despite not being able to obtain EcoVadis scorecards for most vendors, the project resulted in a stronger relationship between University Procurement and the library. The EcoVadis scorecards for University Procurement, on the whole, contained more detail and information that could help the University identify companies that support the institution’s values. One reason for this could be that these larger companies are expected to provide this information to their customers. Libraries have not historically been asking for this type of information from library vendors, and so perhaps these companies were unprepared to give it out.

Accessibility Reports

After receiving the accessibility reports in August 2019, VET shared them with our Tri-institutional Accessibility Committee, a committee that consists of accessibility experts from each school on the Auraria campus and several library representatives. Each report described accessibility issues the vendors had resolved based on the Big 10 reports, issues that still existed, and new accessibility issues. The Tri-institutional Accessibility Committee suggested a couple of tactics: 1) include the areas that are below WCAG 2.0 AA standards in the next license, with the statement that they must be resolved by the next renewal or the library will cancel; and/or 2) negotiate a lower price, since accessibility is below standards. Following that meeting, we sent the accessibility reports to our vendors and asked for a response on how they planned to resolve the areas of concern. 

The first vendor to respond said they were planning a platform audit in 2020 and would incorporate VET’s accessibility report findings to ensure that areas that require action will be improved by their product team. The next vendor we heard from would not add the additional accessibility clause we suggested and refused to add even the library’s standard accessibility language in the license (see Appendix B), which most vendors are willing to add. Finally, several vendors never replied at all despite repeated attempts at contact by our library. 

Internal VET Templates

Since most of the vendors VET contacted were reluctant to participate in the EcoVadis assessment process, VET created an internal template to gather information itself. See Appendix C for the template. Through this process, VET found many positive steps that vendors are taking and awards they are winning for their CSR efforts. 

For example, Informa has won multiple awards for its sustainability work. It was named a 2018 industry mover in the Dow Jones Sustainability Index; a member of the FTSE4Good index, which is a group of ethical investment stock choices based on a range of corporate responsibility criteria; and a constituent of the Ethibel Sustainability Index for Excellence in Europe, which is a list of the 200 top performing companies for corporate responsibility in Europe (G. Howcroft, personal communication, March 21, 2017). Through this template, VET also found that Cambridge installed a large solar array to cut CO2 emissions by 20% (Cambridge University Press, 2019). This was especially positive and consistent with Auraria’s values, as the Auraria Library had just installed solar panels on the roof, which cover two-thirds of the Library’s current energy usage and distribute surplus power back to the campus grid (Evans, 2019). The Coalition for Diversity and Inclusion in Scholarly Communications (C4DISC), which officially launched in 2020, promotes the diversity, equity, inclusion, and accessibility work being done by scholarly communications associations and societies. 

VET used the research it gathered when holding conversations with vendors. Thomson Reuters’ CLEAR Investigation software and specifically ICE’s use of this data was still a major concern for a library that serves so many undocumented students. During a phone call with Westlaw representatives, VET addressed some of its concerns including the concern that CLEAR software relied on artificial intelligence and facial recognition software, which has been shown to be racist and sexist (Lohr, 2018; Buolamwini & Gebru, 2018). VET also was concerned that the Thomson Reuters’ CEO was on the board of the ICE Foundation. This was an uncomfortable conversation and indicated that many of our library priorities were out of alignment with priorities for Thomson Reuters. VET received follow-up information that addressed some VET concerns, specifically that CLEAR does not use facial recognition technology and that the CEO is no longer on the ICE Foundation Board (the ICE Foundation is no longer in operation as of the writing of this paper).

One of VET’s library colleagues wanted to test removing his data from CLEAR, but when VET looked at the criteria, he did not qualify. Thomson Reuters allows judges, public officials, or members of law enforcement to request their personal data be removed if they can prove that having the data in CLEAR exposes them to risk or physical harm and/or they are a victim of identity theft (Thomson Reuters, n.d.). As an alternative, our colleague submitted an Information Request Form to find out what information CLEAR has collected on him. After several weeks he received 41 pages of data from CLEAR. The data that CLEAR had on him included name, gender, Social Security number, phone number, date of birth, spouse, addresses (11 previous going back 15 years), ownership of four different cars, utility records, voting participation, and political party. There were also around 40 categories they collect data on for which nothing was returned for our colleague. Thomson Reuters collects this type of data on all of us. 

They also have incorrect data. The cover letter misgendered the individual and included incorrect addresses, an incorrect marital status, and an incorrect age range (Swauger, 2019). This is an example of the data Thomson Reuters sells to ICE and other law enforcement agencies. The consequences of ICE and other law enforcement agencies using incorrect data when detaining and arresting people is chilling.   

One outcome from the creation of VET and its many discussions was shifting print book purchasing away from Amazon to local and independent bookstores. There was unanimous support for this within the library, and the move has received positive feedback from faculty. While most of the library’s print books are purchased via the vendor GOBI, around 13% have historically been purchased via Amazon. Auraria Library moved that 13% to independent and local bookstores, companies that the library wants to support. There are still a small number of books purchased via Amazon if the library is unable to get them elsewhere. VET made this move because Amazon operations are out of alignment with Auraria Library’s values around supporting the health and wellbeing of people, especially those from marginalized communities. Amazon is in the surveillance business and has created and used facial recognition software with its product Rekognition, which has been shown to incorrectly identify people (Snow, 2018; Williams, 2020). In June 2020 Amazon put a one-year moratorium on selling Rekognition to police in response to community protests against police brutality and the deaths of many people of color at the hands of law enforcement (Amazon, 2020). Amazon is also repeatedly accused of having poor and unsafe working conditions for its employees (Spitznagel, 2019; Tims, 2019). 

Despite not being able to leverage outside expertise such as EcoVadis, VET will continue gathering reports on vendors using our internal template and sharing them with our librarians who make collection decisions. It has allowed us better insight into the companies with which the library works. An added benefit to the research is staff staying informed about industry changes and current events in the publishing world. 

VET also hopes to work with other libraries to combine our efforts. For example, another University of Colorado campus library is planning on reviewing diversity classification types, such as women, minority and small businesses, for its suppliers. This library will add the supplier type into its integrated library system so it can easily run reports to see where its funds are going. The library hopes to see how much it is spending on women-owned businesses or small businesses. 

Challenges and Limitations

During this project, VET experienced numerous challenges. The biggest challenge was developing a response when discovering a concerning policy or practice. All three institutions on the Auraria campus are committed to supporting undocumented students. The funds our library spends on Thomson Reuters products may go to support CLEAR Investigation Software, which is sold to ICE for multi-millions (Lamdan, 2019). The information collected by the CLEAR software may be incorrect as we saw from our colleague’s report, and using it to target undocumented immigrants is concerning. 

Nonetheless, our paralegal and other students need Westlaw to succeed in school and compete in the workforce. If our library cancelled our Westlaw subscription, this would put our students at a disadvantage and they may not be able to secure employment in the legal field.  Our librarians talked with legal and paralegal professors on campus and trialed other products, but Westlaw is what law firms use. As Lamdan (2019) points out Westlaw and Lexis are the dominant resources used within the legal profession. 

It put us in a position of having to support one student group over another. While cancelling Westlaw would be a strong statement of our library’s values, it would not change Thomson Reuters’ work. It would continue to develop CLEAR and sell it to ICE and other law enforcement agencies. In the end, the library reluctantly renewed its Westlaw subscription.

Another challenge was getting vendors to participate in a formal project that uses internationally recognized standards. While VET was able to research on its own, the level and amount of information the authors were able to gather was very limited compared to a program like EcoVadis offers. Although CSR has a long history across industries, many vendors were reluctant to provide information addressing sustainability, diversity, privacy, accessibility, or ethics. It is our hope that persistent discussions about these issues will encourage our vendors to make changes that benefit libraries and the communities we serve. This is an approach summarized by the CEO of Newground Social Investment, a Seattle investment firm: “You have to have consistent applied pressure to gradually change. But because [the companies are] so big, that change of trajectory leads to immensely better outcomes” (Romano, 2019).

While VET hoped to demonstrate ethical financial stewardship with this project, the authors recognize the library’s budget is just a small percentage of an institution’s expenditures. The library may be working towards conscious consumerism, but other departments may still have problematic business relationships. Many higher education institutions hold contracts with prison industries to use prison labor (Burke, 2020). For example, MSU Denver and, until recently, CU Denver had to purchase office furniture from the Colorado Correctional Industries (CCI) (Byars, 2020; Metropolitan State University of Denver Purchasing Manual, 2017). CU Denver is currently reexamining this business relationship after protests from students, staff, and faculty (Hernandez, 2020). Prison labor falls outside the Fair Labor Standard Act and is overwhelmingly made up of people of color, perpetuating oppression and worker exploitation (Leung, 2018).  


The authors call on other librarians and national library organizations to advocate that library vendors proactively address and share work around sustainability, diversity, privacy, accessibility, and ethics in their companies. Lack of documentation or little to no work towards ethical practices from a vendor does not necessarily mean that a library should stop doing business with them. In these cases, there is potential to have productive conversations between the vendors and the library to encourage companies to incorporate a CSR model. When a library sees a vendor take positive actions it is important to reinforce the value of that work. If a library sees a vendor out of alignment with their institution’s values, it should hold conversations first with the vendor and, if that goes nowhere, then hold conversations with appropriate individuals within its institution. In addition, libraries should support companies that are doing ethical work, look for alternatives, and create their own resources. We hope to see librarians continue to identify and address vendor business practices that hurt our students, especially those who come from marginalized communities.


The authors would like to thank all of the current and past members of the Vendor Ethics Taskforce: Gayle Bradbeer, Karen Sobel, Katherine Brown, Molly Rainard, and Shea Swauger. We are also grateful to Sommer Browning, Lando Archibeque, and Meg Brown-Sica for reviewing early drafts, internal peer reviewer Ikumi Crocoll and Publishing Editor Ian Beilin.


Amazon. (June 10, 2020). We are implementing a one-year moratorium on police use of Rekognition.

Bergen, M., & Bass, D. (2019, October 10). Microsoft employees call to end GitHub ICE contract. Bloomberg.

Buolamwini, J. & Gebru, T. (2018, February 23-24). Gender shades: Intersectional accuracy disparities in commercial gender classification. Conference on Fairness, Accountability and Transparency, New York, NY, United States.

Burke, L. (2020, February 14). Public universities, prison-made furniture. Inside Higher Ed.

Byars, M. (2020, August 18). CU to no longer exclusively buy inmate-made furniture from Colorado Correctional Industries. Daily Camera.

Cambridge University Press. (2019, October 07). Cambridge University Press cuts its carbon emissions through one of the UK’s largest flat roof solar installations. News.

Chao, S. (2018, July 26). MIT professors spearhead petition in support of Microsoft employees protesting contract with ICE. The Tech.

DeWind, A. (2019, November 12). CU Denver stands with DACA students and families. CU Denver News.

Eagle, J. H. (2019, March 05). I want them to know we suffer here: preserving records of migrant detention in opposition to racialized immigration enforcement structures. Journal of Radical Librarianship, 5, 16-40.

Evans, L. (2019, September 19). A new look and new services for the Auraria Library. Metropolitan State University Early Bird.

Gallegos, M., Garcia, A. J., & Valdez, D. (2011). Where the rivers meet: the story of Auraria, Colorado. Su Teatro, Inc.

Garcia, G. A. (2019). Becoming Hispanic-serving institutions: Opportunities for colleges and Universities. Johns Hopkins University Press. 

Guo, M. & Baugh, R. (2019, October). Annual Flow Report Immigration Enforcement Actions: 2018. Department of Homeland Security Office of Immigration Statistics.

Harvard Kennedy School. (n.d.). Corporate responsibility initiative about. Retrieved February 12, 2020, from

Hernandez, E. (2020, June 17). CU to re-examine buying furniture made with prison labor after petition from students, faculty. Denver Post.

Krogstad, J.M., Passel, J.S., & Cohn D. (2019, June 12). 5 Facts about illegal immigration in the U.S. Pew Research.

Lamdan, S. (2019). When Westlaw fuels ice surveillance: Legal ethics in the era of big data policing. New York University Review of Law & Social Change, 43(2), 255.

Langford, K. (2019). CU system signals support for DACA. Daily Camera. Retrieved from Latapí Agudelo, M. A., Jóhannsdóttir, L., & Davídsdóttir, B. (2019). A literature review of the history and evolution of corporate social responsibility. International Journal of Corporate Social Responsibility, 4(1), 1-23. https://doi:10.1186/s40991-018-0039-y

Leung, K. E. (2018). Prison labor as a lawful form of race discrimination. Harvard Civil Rights-Civil Liberties Law Review, 53(2), 681.

Liptak, A. & Shear, M.D. (2020, June 18). Trump Can’t Immediately End DACA, Supreme Court Rules. The New York Times.

Lohr, S. (2018, February 09). Facial recognition is accurate, if you’re a White guy. The New York Times.

McLean, D. (2019, November 05). Student activists are pushing back against immigration policy. For some, it’s personal. The Chronicle of Higher Education.

Metropolitan State University of Denver. (n.d.). ASSET and applying for financial aid. Retrieved February 12, 2020, from  

Metropolitan State University of Denver. (2017). Purchasing manual. 

Moore, K.B. (2020, July 31). Corporate Social Responsibility: Consumers Will Remember Companies That Led In 2020. Forbes.

Passel, J. S., & Cohn, D. (2019, March 11). 20 metro areas are home to six-in-ten unauthorized immigrants in U.S. Pew Research Center.

Phare, C. (2019, May 13). New Colorado law extends state financial aid to dreamers. RED.

Presidents’ Alliance on Higher Education and Immigration. (2019, October 7). 165 universities and Colleges file amicus brief urging Supreme Court to protect DACA.

Proulx, N. (2018). Do companies have a responsibility to contribute positively to society? The New York Times.

Rael, A. (2019, September). Let’s talk: Auraria displacement. [Presentation]. University of Colorado Ethnic Studies Program, Plaza Building 102L, Auraria Campus, Denver, CO, United States.

Redden, E. (2021, January 21). Biden makes immigration day 1 priority. Inside Higher Ed.

Romano, B. (2019, March 8). Activist shareholders push Amazon from everything from facial recognition to climate change. Seattle Times.

Shaban, H. (2018). Amazon employees demand company cut ties with ICE. The Washington Post.

Shahani, A. (2019, August 20). Employees demand google publicly commit to not work with ICE. National Public Radio.

Snow, J. (2018, July 26). Amazon’s face recognition falsely matched 28 members of congress with mugshots. ACLU.

Spitznagel, E. (2019, July 13). Inside the hellish workday of an Amazon warehouse employee. New York Post.

State Support for Higher Education per Full-Time Equivalent Student – map view (2019): State Indicators: NSF – National Science Foundation.

Swauger, S. [@SheaSwauger]. (2019, December 13). The fact that they misgendered me in the letter even though they’re sending a report with my gender information [Tweet]. Twitter.

Swauger, S. [@SheaSwauger]. (2019, December 13). It also had wrong information. I’ve been divorced for over 2 years, and the report had 13 different data points [Tweet]. Twitter.

Thomson Reuters. (n.d.). Legal notices public records privacy statement. Retrieved February 14 2020, from

Tims, A. (2019, April 14). Fines and frantic life on the road- the lot of Amazon’s harried staff. The Guardian.

Trotter, G. (2017, December 08). More companies find spending on corporate responsibility increases the bottom line. Chicago Tribune.

Villanova University, (2020, January 23) Types of corporate social responsibility programs and career options. (2020). Retrieved February 14, 2020, from

Watson, M. (2019). Defending DACA in Denver and D.C. RED. 

Western Interstate Commission for Higher Education (WICHE). (2016). Knocking at the college door: Projections of high school graduates through 2032.

Williams, M. (Winter 2020). The trouble with facial recognition. ACLU Magazine.

Appendix A

Dear Vendor, 

Auraria Library would like to invite you to partner with us on a vendor assessment pilot project. Inspired by the work of MIT and the University of California, we are beginning a project to go beyond cost-per-use assessment of our learning materials and explore values-based metrics. This kind of assessment will help Auraria Library continue to demonstrate excellent financial stewardship while also ensuring that the social and environmental performances of our vendors align with our institutions’ values. 

In order to complete this project, we have selected the EcoVadis Corporate Social Responsibility (CSR) monitoring platform. The University of California also uses EcoVadis to assess the vendors with whom they work. The EcoVadis platform combines CSR assessment expertise and data management tools which will allow you to demonstrate your best practices in areas of sustainability, diversity, and ethics.

Several vendors we already work with have completed this process. We have chosen you for this pilot project because we highly value the content you provide to our patrons. As a large academic library, we believe that by taking part in this assessment you are signalling to your customers that you care about social responsibility and sustainability. 

The EcoVadis CSR monitoring platform is co-financed by Auraria Library but also requires vendors to pay an annual subscription fee. This scorecard will be available to other institutions that work with EcoVadis. You will soon receive an invitation from EcoVadis to activate your account. Upon registration, the first stage will be to complete a CSR performance assessment. 

We thank you in advance for your time and willingness to embark on this exciting pilot project. 

Best Regards,

Appendix B 

Example 1

Licensor shall comply with the Americans with Disabilities Act (ADA), by supporting assistive software or devices such as large print interfaces, text-to-speech output, voice-activated input, refreshable braille displays, and alternate keyboard or pointer interfaces, in a manner consistent with the Web Accessibility Initiative Web Content Accessibility Guidelines 2.0 AA Licensor shall ensure that product maintenance and upgrades are implemented in a manner that does not compromise product accessibility. Licensor shall provide to Licensee a current, accurate completed Voluntary Product Accessibility Template (VPAT) to demonstrate compliance with accessibility standards ( If the product does not comply, the Licensor shall adapt the Licensed Materials in a timely manner and at no cost to the Licensee in order to comply with applicable law.

Source: Big Ten Academic Alliance standardized accessibility language—standardized-license-language

Example 2

Licensor shall comply with the Americans with Disabilities Act (ADA), by supporting assistive software or devices such as large print interfaces, text-to-speech output, voice-activated input, refreshable braille displays, and alternate keyboard or pointer interfaces, in a manner consistent with the Web Accessibility Initiative Web Content Accessibility Guidelines 2.0 ( 

Source: “Soft” privacy clause modified from Liblicense

Example 3

The university affords equal opportunity to individuals in its employment, services, programs and activities in accordance with federal and state laws. This includes effective communication and access to electronic and information communication technology resources for individuals with disabilities. [Supplier] shall: (1) deliver all applicable services and products in reasonable compliance with applicable university standards (for example, Web Content Accessibility Guidelines 2.0, Level AA or Section 508 Standards for Electronic and Information Technology as applicable); (2)upon request, provide the university with its accessibility testing results and written documentation verifying accessibility; (3) promptly respond to and resolve accessibility complaints; and (4) indemnify and hold the university harmless in the event of claims arising from inaccessibility.

Source: University of Colorado Boulder’s mandated language

Appendix C

VET MASTER Template 


Subscription Period: 

This profile last updated: 


Parent Company: 

☐Public Company or ☐Private Company

☐For Profit Company or ☐Non-Profit Company

VET MetricSummarized/Highlighted FindingsLinks/Shared Drive PathsTake Note!
Data Privacy    

Take part in EU Open Data Days, an event focused on the benefits of open data and its reuse in the EU / Open Knowledge Foundation

Open Knowledge Foundation are partnering with the Publications Office of the European Union for EU Open Data Days, an event to bring the benefits of open data and its reuse to the EU public sector. Below you can find details about the event in a press release republished from the Publications Office.

Participate in the first edition of the EU Open Data Days from 23-25 November 2021. This unique event will serve as a knowledge hub, bringing the benefits of open data to the EU public sector, and through it to people and businesses.

This fully online event will start with EU DataViz 2021, a conference on open data and data visualisation, on 23 and 24 November. It will close with the finale of EU Datathon, the annual open data competition, on 25 November.

Speak at EU DataViz 2021

The EU Open Data Days organising team are looking for speakers to help shape a highly relevant conference programme. Are you an expert on open data and/or data visualisation? We encourage you to share your ideas, successful projects and best practices, which can be actionable in the setting of the EU public sector.

We welcome proposals from all over the world, and from all sectors: academia, private entities, journalists, data visualisation freelancers, EU institutions, national public administrations and more. For more information, visit the EU DataViz website.

Submit your proposal for a conference contribution by 21 May 2021 here.

Compete in EU Datathon 2021

Propose your idea for an application built on open data and compete for your share of the prize fund of EUR 99 000. Demonstrate the value of open data and address a challenge related to the European Commission’ priorities.

We welcome ideas from data enthusiasts from all around the world. Check the rules of the competition and to participate, submit your proposal for an application by 21 May 2021 here.

Follow us for more information

The EU Open Data Days are organised by the Publications Office of the European Union with the support of the ISA2 programme. Find out more on the EU Open Data Days website and follow updates on Twitter @EU_opendata.

Meet our panel of experts for the Net Zero Challenge pitch contest / Open Knowledge Foundation

The Net Zero Challenge is a global competition to answer the following question – how can you advance climate action using open data? Our aim is to identify, promote, support and connect innovative, practical and scalable projects.

Having selected our shortlist of projects competing for the $1,000 USD prize, we have now invited all the teams to pitch their projects to our panel of experts during a live streamed virtual event on Tuesday 13th April 2021 from 15:00 to 16:00 London time. Register now to watch the event.

Our panel of experts hail from four different organisations which are leading players in the field of using open data for climate action:

Mengpin Ge is an Associate with WRI’s Global Climate Program, where she provides analytical and technical support for the Open Climate Network (OCN) and CAIT 2.0 projects. Her work focuses on analysing and communicating national and international climate policies and data to inform climate decision making towards the 2015 climate agreement.


Natalia Carfi is the Interim Executive Director for the Open Data Charter. She previously worked as the Open Government Director for the Undersecretary of Public Innovation and Open Government of Argentina where she coordinated the co-creation of the 3rd Open Government National Action Plan. She was also Open Government coordinator for the Digital Division of the Government of Chile and for the City of Buenos Aires. She is part of the Open Data Leaders Network and the Academic Committee of the International Open Data Conference. Within ODC she’s been leading the open data for climate action work, collaborating with Chile and Uruguay.


Bruno Sanchez-Andrade Nuño is the Principal Scientist at Microsoft “AI for Earth”, building the “Planetary Computer”. He has a PhD in Astrophysics, and Rocket Science postdoc. Bruno has led Big Data innovation at the World Bank Innovation Labs, served as VP Social Impact at the satellite company Satellogic and Chief Scientist at Mapbox. He published the book “Impact Science” on the role of science and research for social and environmental Impact. He was awarded Mirzayan Science Policy Fellow of the US National Academies of Science and a Young Global Leader of the World Economic Forum.


Eleanor Stewart is the Data Protection Officer & Head of Transparency at Foreign, Commonwealth and Development Office, where she is she is responsible for driving the necessary institutional change within the department to achieve and maintain compliance with GDPR/DPA 2018, the release of its information and supporting the UK Governments international programmes and objectives in Transparency and Open Data through the Open Government Partnership and other initiatives as well as working to embed digital methodologies and processes in the day-to-day work of a foreign affairs ministry.

Please register here to watch the Net Zero Challenge pitch contest.

This is a virtual event taking place on Tuesday 13th April 2021 from 15:00 to 16.00 London time.


Slack-like tools in the online classroom / Coral Sheldon-Hess

This post serves two purposes: 1) to give other people running online courses (synchronous or asynchronous; semester-long or otherwise) ideas about how they might build community and better support students while also decreasing their email load and 2) to ask if there’s a tool besides Slack that achieves all of the same things.

A Slack-type tool fills in a really important gap in student-student and student-professor communication, both for online (any mode) courses and for courses that only meet once per week. Without something Slack-like, your choices for communication are the learning management system (LMS) discussion boards or email. I think it’s uncontroversial to say we all hate LMS discussion boards. Getting to and posting in the discussion board is time-consuming; there’s a lot of friction there that makes people less likely to ask their questions. And then the notification mechanism is clunky, so they’re really not a good bet for students who need timely help. As for email, I’ll go out on a limb and say that I suspect most professors do not enjoy answering multiple versions of the same question over and over, one by one, especially when they have to choose between knowing in their hearts that some students aren’t asking and won’t know, versus making yet another LMS announcement to address any given issue. Something Slack-like lets you have those public-to-the-whole-course discussions, where multiple people can contribute to and be helped by a single conversation (like a discussion board!), but it’s also fast (like emailing the professor!). And as a bonus, it serves as a lightweight announcement mode (e.g. “I forgot to mention something in class this week” or “here’s a clarification on this homework problem”) that doesn’t flood your students’ inboxes.

Having identified this need early, because so many of my in-person courses were three-hour blocks held one night per week, I’ve been using Slack in my courses since before the pandemic. Students need a bit more contact with each other and with their professor than a one-meeting-per-week structure allows. This is especially true of first-time programmers, and Python 1 is my favorite course and my blueprint for everything else that I teach. And while my focus, here, is online teaching, I think it’s important to say this next bit: even when classes were in-person, students reported that Slack helped them. Both then and now, people who are too shy to seek help can see and benefit from the conversations their peers and I have; and, vitally, seeing other people’s questions helps everyone feel less alone when they run into trouble. I still end up having at least one conversation where I’m talking a student through some combination of impostor syndrome and stereotype threat each semester, but I credit Slack, in part, for the fact that I now lose fewer students to “I’m just not cut out for this” than I used to.

Since all of my courses moved online, Slack has come to feel necessary. My asynchronous students, in particular, tell me that Slack makes the course feel more “real” than their other online-asynchronous classes. They actually get to know some of their colleagues. They get to know me. And they get semi-real-time support, which is incredibly useful in a coding course.

Slack, combined with a reservation system that creates half-hour one-on-one web conferences (Calendly, which links to Zoom), has made it so that my office hours are more pleasant and more useful than they were in Spring 2020, when this next bit hadn’t occurred to me yet. Instead of sitting on Zoom for my mandatory 5 hours of scheduled “office time,” waiting for people to pop in, feeling uncomfortable about walking away from the computer even for a moment, and dealing with the inevitable Poisson distribution of help-seekers (“nobody for 2 hours and then 5 people at once”), I now hold my office hours on Slack, with an option to set up video chat appointments for more in-depth discussion. In addition to students reserving time with Calendly, I’ve also had office hour conversations start out on Slack and move to Zoom. It’s been pretty great.

I did a 25-minute talk about this (and it would not be hard to convince me to record a video version of the talk for the internet), with slides showing well-anonymized examples of student interactions on Slack, so that you can see for yourself how some of this plays out: There’s a whole thread, in those slides, that I’m not addressing here, which is about “flipped classroom”/”lab time,” something I’m just not having the same opportunity to do this semester. (Several of my “synchronous” courses are actually synchronous/asynchronous hybrids, which leaves little to no time for live “labs.” And as for Python, I’m spending more time live-coding with my Pythonistas and giving them less time for “in-class” work time, which may or may not turn out to be the right call.)

A few lessons learned

First of all, there has to be some kind of incentive to the students to get them using it. I use a couple of methods, the most blunt of which is making it part of their “engagement” grade for the semester (the whole “engagement grade” thing would make an interesting post of its own, honestly). Also, if anyone emails me a question that “should” go on Slack, I anonymize and post it in Slack, then reply in-thread, and then my email reply is merely a gentle reminder and a link to the Slack thread. Maybe the number one incentive I offer, though, is being available and helpful in the Slack workspace, even outside of office hours.

Second lesson learned: you really have to make posting in Slack part of the first week’s assignment, to get them over the initial hurdle of (hopefully) installing the app or (at least) logging in with their browser, before the course gets challenging. I have them post a self-intro and reply in-thread to at least one other student’s self-intro. As part of that, you’ll also need to give them good directions and links to the service’s getting started guides (Slack’s are very good). The whole “first you need an account, THEN you need the app” issue does cause some initial difficulties, but everyone eventually gets it.

And my third lesson learned is that, especially if you are as bothered by notifications as I am, or if you just want to have some “non-work” time in your life, you’re going to have to work pretty hard in the first couple of weeks to teach your students some rules about etiquette. My rule is that it’s only acceptable to @-mention me during my office hours, and nobody should ever DM me (and they should not DM anyone else without their explicit permission); private questions go in email. You’d think an etiquette document would do the job, but it does not, even if you quiz them on it and also pin it to the channel. Some people will need reminders occasionally throughout the semester. This is, I’ll be honest, deeply frustrating. But 1) the pros outweigh this con, for me, and 2) I think it’s also doing them a service: hopefully they won’t constantly @-mention or DM their bosses when they start using a Slack-like tool in the office, because they will have learned that professional etiquette is different from social media etiquette.

Slack alternatives?

Slack has honestly worked great for me. If I were confident that there would continue to be a free version, I might just keep using it, despite its not-insignificant downsides: there’s no blocking feature, there’s no way to prevent direct messages, and there are occasional signals that the development team doesn’t understand human beings. But I rely on threading to keep conversations manageable; I rely on locking my workspaces down to college email addresses to help maintain privacy; and I rely on the fact that every Slack workspace is distinct to help students and me (oh goodness, certainly also me) maintain some separation between our professional and social personas. All of this is to say: I’m aware of Discord, yes; and no, it is absolutely not an option. (Though I do like that you can block people and choose “nobody can DM me from this server” as a setting.)

My list of services to look at more closely: Zulip (does it support threading? also, is it free in the cloud, or would I need to convince our IT department to install and manage it? if the latter, it is probably a no-go), Rocket.Chat (same questions), Teams (is there a way to make it not-user-hostile?), maybe some kind of modern IRC frontend?, something else I haven’t heard of?

I would dearly miss threading if I lost it, but if I have to choose between that and the ability to lock down the server only to college email addresses, I’d have to pick the latter. “Free and cloud-based” is probably a requirement, alas. And the interface has to be relatively friendly: I can’t go old-school IRC, because even though I teach in Computer Information Technology and Data Analytics, not all of my students are up to that, certainly not at the outset of the semester.

Anyway, if you have suggestions, or more information about any service I’m currently considering, please leave a comment!

The image in the header photo belongs to Slack; it’s on their distance education page.

Mesa redonda sobre metadatos de próxima generación en español: la gestión de las identidades de los investigadores es lo más importante / HangingTogether

Muchas gracias a Francesc García Grimau, OCLC, por la traducción de esta entrada de blog, que originalmente estaba en inglés.

Como parte de la Serie de Debates de OCLC Research sobre metadatos de próxima generación, esta entrada de blog informa de la mesa redonda en español celebrada el 8 de marzo de 2021.

OCLC metadata discussion series

Bibliotecarios – en su mayoría especialistas en metadatos – y representantes de instituciones patrimoniales, de investigación y gubernamentales, así como de proveedores de servicios y software, se unieron a la sesión desde varias regiones de España. Con tantas partes interesadas e importantes del campo reunida alrededor de la mesa redonda virtual, la conversación fue dinámica y con mucha implicación, y ofreció una oportunidad para mucha interacción sobre un tema que fue considerado muy importante y oportuno por el grupo.

El ejercicio de mapeo

Mapa de proyectos de metadatos de próxima generación (sesión en español)

Al igual que en todos los demás debates de las mesas redondas, los participantes comenzaron por hacer un balance de los proyectos de metadatos de próxima generación en su región e initiativas en otros lugares. El mapa resultante estaba lleno de notas adhesivas con nombres de proyectos, servicios, estándares, agregadores de identificadores y más. El cuadrante superior izquierdo enumeró varios ficheros de autoridad locales y regionales. Un ejemplo de esto último es el catálogo cooperativo de autoridades de nombres de Cataluña (CÀNTIC). El cuadrante superior derecho se llenó con portales, sistemas y proyectos relacionados con la Gestión de la Información de Investigación (RIM). Un ejemplo es GREC, el CRIS (Current Research System Information) desarrollado por la Universidad de Barcelona, actualmente utilizado en diversas instituciones y organizaciones de investigación. Otro es brújulaUAL, el servicio de perfil investigador de la Universidad de Almería, que recoge todos los identificadores, publicaciones, métricas de citaciones e índices H de sus estudiosos. Los proyectos relacionados con el patrimonio cultural llenaron el cuadrante inferior izquierdo y se desbordaron en el cuadrante inferior derecho. La mayoría de ellos se referían a esfuerzos de digitalización y agregación, tales como: Galiciana, la biblioteca digital de Galicia; la Biblioteca Virtual de Prensa Histórica; e Hispana, el agregador nacional y portal para el patrimonio digitalizado de España. 

Aprovechar los ficheros de las autoridades locales para administrar las identidades de los investigadores

El mapa desató una animada discusión. Comenzando con el cuadrante superior izquierdo, varios participantes explicaron sus esfuerzos para llevar sus ficheros de autoridades locales al siguiente nivel. Un bibliotecario universitario mencionó planes para publicar las autoridades de autor de sus investigadores como un conjunto de Datos Abiertos Enlazados (LOD), con enlaces a los nombres correspondientes del fichero de autoridad de la Biblioteca Nacional de España (BNE). Este fue un ejemplo entre muchos. Todas las principales bibliotecas académicas de España se están centrando actualmente en aprovechar sus ficheros de autoridades locales: enriquecerlos con identificadores (ORCID, BNE, VIAF, etc.), publicarlos como LOD, y también alimentar sistemas externos, como el Portal de Investigación de la universidad o la base de datos ORCID, con autoridades y datos bibliográficos. Al hacerlo, se encuentran con algunas dificultades prácticas – por ejemplo, ¿un investigador determinado sigue activo o se ha jubilado? – ilustra la necesidad de integrar sistemas en todo el campus, en el caso particular de este ejemplo, con el sistema de Recursos Humanos de la universidad. El grupo hizo dos observaciones importantes sobre esta tendencia actual:

  1. las bibliotecas académicas se están centrando en sus propios datos bibliográficos y de ficheros de autoridad, lo que es un buen punto de partida para la gestión de las identidades de los investigadores, pero están prestando muy poca atención a los proyectos CRIS, que son sistemas paralelos que registran datos similares;
  2. las bibliotecas académicas hacen lo mismo, con el mismo propósito, pero actúan localmente y por lo tanto sus enfoques e implementación tienden a diferir, lo que conduce a una gama de soluciones idiosincráticas en todo el país.

Priorizar la digitalización por encima de la producción de metadatos de próxima generación

El grupo se sorprendió al encontrar tantos proyectos patrimoniales (la mitad inferior del mapa) y tan pocos proyectos RIM (cuadrante superior derecho). Un participante ofreció la explicación de que esto podría deberse a “la forma en que se organizan las bibliotecas” en España, donde se presta mucha más atención a la digitalización de las colecciones patrimoniales. Otros coincidieron y opinaron que la política de financiación del Gobierno de España está priorizando la digitalización de las obras en papel por encima de la producción de metadatos de próxima generación. Esto ha determinado el panorama de los proyectos bibliotecarios del país. En términos de metadatos, bastantes proyectos de patrimonio cultural están construyendo onlogies (también añadidas al mapa), pero – como dijo uno de los participantes – estos son a menudo complejos. Hay muchas oportunidades sin explotar para utilizar recursos y herramientas existentes como Wikidata y/o Wikibase, lo que también podría ayudar a mejorar el acceso multilingüe a las colecciones en línea. Sin embargo, el grupo también señaló que la tecnología para pasar de proyectos de datos vinculados a la producción a escala aún no estaba madura.

Crear oportunidades para un esfuerzo más concertado

Uno de los participantes comentó:

“Hay un montón de proyectos a los que realizar un seguimiento. ¿Cuántos de ellos implican a varias instituciones? Tenemos poca orientación para saber hacia dónde vamos. Hay pocos proyectos colaborativos, y casi ningún esfuerzo multilingüe.”

Esta fue la señal para que los participantes se centraran en la conveniencia de hacer avanzar los esfuerzos de metadatos de próxima generación de manera más colaborativa. Los portales de investigación fueron vistos como un área de aplicación importante donde se necesitaba colaboración. En este contexto, el grupo mencionó Dialnet, una gran iniciativa colaborativa bibliotecaria hispano-latinoamericana que agrega los metadatos de las colecciones académicas de sus miembros, incluyendo textos completos y tesis doctorales. Permite la recuperación y el enriquecimiento de datos bibliográficos (publicaciones) y bibliométricos (citaciones, coautores). El servicio es de especial importancia para España por las numerosas publicaciones y autores españoles y de Ciencias Sociales y Humanidades que contiene, en comparación con SCOPUS, por ejemplo. Algunos de los participantes sugirieron que su servicio podría servir como un importante centro de enlace para los datos de bibliotecas y los datos RIM en España.

Otra iniciativa importante es el desarrollo de normas de catalogación por parte del grupo de trabajo RDA de la REBIUN (Red de Bibliotecas Universitarias Españolas) para la producción de registros de autoridad. Los perfiles permiten y recomiendan la adición de muchos PDI relevantes a los registros de autoridad (ORCID, ISNI, VIAF, BNE, Dialnet ID, Web of Science ResearcherID, SCOPUS ID, Wikidata, etc.) e indican la mejor manera de hacerlo. El plan consiste en construir un catálogo colectivo de todos los ficheros de autoridad que cubren a todos los autores científicos de las universidades españolas y publicarlo como un conjunto de datos abierto. Alguien observó que el pensamiento detrás de esta iniciativa se inspiró en el informe de Karen Smith-Yoshimura sobre la transición a los metadatos de próxima generación, algo que fue agradable de escuchar.

En resumen, el grupo vio definitivamente posibilidades de interconectar sus iniciativas locales y crear más sinergias. Esperaban conocer más sobre la Infraestructura de Gestión de Entidades Compartidas de OCLC y expresaron su deseo de que OCLC les ayudara a organizar más debates como éste, para continuar el análisis del panorama de los proyectos de metadatos de próxima generación y la conversación sobre la colaboración en España.

Acerca de la serie de debates de OCLC Research sobre metadatos de próxima generación

En marzo de 2021, OCLC Research llevó a cabo una serie de debates centrados en dos informes:

  1. Transición a la próxima generación de metadatos
  2. Transformación de metadatos en datos enlazados para mejorar la capacidad de descubrimiento de la colección digital: un proyecto piloto de CONTENTdm“.

Los debates de las mesas redondas se celebraron en diferentes idiomas europeos y los participantes pudieron compartir sus propias experiencias, comprender mejor el tema y ganar confianza para la planificación futura.

La sesión plenaria de apertura abrió el foro para su discusión y exploración e introdujo el ámbito y sus temas. Los resúmenes de los ocho debates de las mesas redondas se publican en el blog de investigación de OCLC, Hanging Together. Este post es el cuarto, precedido por las publicaciones que informan sobre la primera sesión en inglés, la sesión en italiano, la segunda sesión en inglés, la sesión en francés y la sesión en alemán.

La sesión plenaria de clausura del 13 de abril sintetizará los diferentes debates de las mesas redondas. La inscripción sigue abierta para este evento en línea: por favor, ¡únase a nosotros!

The post Mesa redonda sobre metadatos de próxima generación en español: la gestión de las identidades de los investigadores es lo más importante appeared first on Hanging Together.

Islandorans Unite! It's Release Time / Islandora

Islandorans Unite! It's Release Time dlamb Mon, 03/29/2021 - 19:07

It's that time again everyone!  Our amazing community contributors have made all sorts of improvements and upgrades to Islandora.  Some have been merged, but some are still hanging out, waiting for the love they need to make it into the code base.  We're calling on you - yes you! - to help us get things merged, tested, documented, and released to the world.

I would like to kick off this release cycle with a sprint to mop up some the amazing improvements that have unmerged pull requests.  Did you know that we have pull requests for an advanced search module and a basic batch ingest form just lounging around?  And that's not all.  There are all kinds of great improvements that just need some time and attention. A little code review and some basic testing by others are all that is needed before we freeze the code and start turning the crank on the release process.

Here's a rough timetable for the release:

  • April 19 - 30th: Code Sprint
  • May 3rd: Code Freeze
  • May 3rd - 14th: Testing, bug fixing, responding to feedback
  • May 17th - 28th: Documentation sprint
  • May 31st - June 18th: More testing, bug fixing, and responding to feedback
  • June 21st - July 2nd: Testing sprint
  • Release!

This is, of course, an optimistic plan.  If major issues are discovered we will take the time to address them which can affect the timeline.  I also plan on liaising with the Documentation Interest Group and folks from the Users' Call / Open Meetings for the documentation and testing sprints, and their availabilities may nudge things a week in either direction.

An open and transparent release process is one of the hallmarks of our amazing community. If you or your organization have any interest in helping out, please feel free to reach out or sign up for any of the upcoming sprints.  There are plenty of opportunities to contribute regardless of your skill set or level of experience with Islandora.  There's something for everyone!

We'll make further announcements for the other sprints, but you can sign up for the code sprint now using our sign up sheet.  Hope to see you there!


Making strategic choices about library collaboration in RDM / HangingTogether

Academic libraries are responding to a host of disruptions – emerging technologies, changing user expectations, evolving research and learning practices, economic pressures, and of course, the COVID-19 pandemic. While these disruptions create challenges, they can also present opportunities to support research and learning in new ways.

Pursuing these opportunities often obliges libraries to invest in new capacities to support evolving roles and shifting priorities. A case in point is research data management (RDM). Securing the long-term persistence of research data, along with enabling its discovery, accessibility, and re-use, is now widely acknowledged as responsible scholarly practice across many disciplines. In response, many academic libraries are investing heavily to acquire RDM capacities as part of a broader effort to support an evolving scholarly record and growing interest in open science.

The recent OCLC Research project Realities of RDM defined a general RDM service space (see figure) and found that academic libraries have adopted a range of strategies for sourcing capacity within this space, including building in-house, contracting with external providers, and collaborating with other institutions around shared programs, expertise, services, and infrastructure.

Examining multi-institutional collaboration in RDM

Among the various sourcing options, multi-institutional collaboration – collective action to meet mutual needs – can be an inviting, or even preferred choice. A number of factors motivate this, ranging from the practical – e.g., recognition that prospective investments are beyond the means of a single institution – to the principled – e.g., arguments that universities should collaboratively own and manage scholarly infrastructure currently controlled by commercial enterprises.

But the decision to source RDM capacity through multi-institutional collaborative arrangements is a strategic choice that must be evaluated on a case-by-case basis. Collaboration is a sourcing option that entails both benefits and costs, and different libraries will make different choices. Our OCLC colleague Lorcan Dempsey recently observed that:

“it is not simply more collaboration that is needed – it is a strategic view of collaboration … There should be active, informed decision-making about what needs to be done locally and what would benefit from stronger coordination or consolidation within collaborative organizations.”

This observation is the starting point for a new OCLC Research project, Library Collaboration in RDM, in which we will examine what “active, informed decision-making” means in the context of choosing to source RDM capacities collaboratively across institutions. What circumstances suggest that cross-institutional collaboration is a better choice for RDM than other sourcing options? RDM is an excellent context for exploring these questions, in that it is both an emerging area of strategic interest to academic libraries, and one in which the decision to collaborate is highly relevant. Our goal in conducting this project is to help academic libraries make intentional, strategic choices about cross-institutional collaboration in RDM.

The project will consist of two phases.

  • In the first, we will explore relevant academic literatures such as economics, political science, and organizational theory to see what they have to say about the factors and decision points that figure most prominently in the decision to source capacity through collaborative arrangements. In reviewing these literatures and synthesizing their major findings, we hope to build a framework of “strategic principles” guiding the decision to collaborate: the key elements to consider when evaluating the pros and cons of the collaboration option. Our inspiration here again comes from Lorcan, who has noted that “there is a large literature on organizations across several disciplines which would provide a valuable context for the examination of collaborative library work.”
  • With these strategic principles in hand, the second phase of the project will explore how they play out in real-world decision-making about multi-institutional collaboration in RDM. We will conduct a set of interview-based case studies in which we will use the strategic principles as an analytical frame, taking a “deep dive” into how they shape the process of evaluating the collaboration option. Combining the results from both phases of the work, we will then work out some general recommendations for academic libraries as they consider which RDM activities are usefully pursued through cross-institutional collaborative arrangements, and which are best accomplished through other sourcing options. Although our recommendations will be developed in the context of RDM decision-making, we expect that they will also inform any situation where collaboration is being considered as a sourcing option. 

The Library Collaboration in RDM project benefits from the deep expertise of an international advisory committee:

  • Donna Bourne-Tyson, Dean of Libraries, Dalhousie University
  • David Groenewegen, Director, Research, Monash University Library
  • David Minor, Program Director for Research Data Curation, University of California San Diego Library
  • Judy Ruttenberg, Senior Director of Scholarship and Policy, Association of Research Libraries
  • Michael Witt, Interim Associate Dean for Research, Purdue University Libraries and School of Information Studies
  • Maurice York, Director of Library Initiatives, Big Ten Academic Alliance

We are grateful for the opportunity to consult with this group as the research moves forward.

Why is this work important?

Library collaboration has always been an important topic, but never more so than today. Advances in digital and network technologies have amplified the benefits and lowered the costs of collaborations that lift capacity above the scale of a single institution, at the same time that economic pressures have called into question the feasibility of duplicating capacity across institutions at sub-optimal scales. The onset of the COVID-19 pandemic has introduced new uncertainties into the mix, and collaboration may be seen as an opportunity to blunt the impact of risk and economic burdens by spreading them across multiple institutions. As interest in the collaboration option grows, it becomes correspondingly more important for academic libraries to be purposeful and strategic in their collaboration choices.    

Stay tuned for more information on this project as it unfolds. Please get in touch with me or my colleague Chris Cyr with any questions; we’d be happy to hear from you!

The post Making strategic choices about library collaboration in RDM appeared first on Hanging Together.

Spanish round table on next generation metadata: managing researcher identities is top of mind / HangingTogether

As part of the OCLC Research Discussion Series on Next Generation Metadata, this blog post reports back from the Spanish language round table discussion held on March 8, 2021. (A Spanish translation is available here).

OCLC metadata discussion series

Librarians – mostly metadata specialists – and representatives from heritage, research, and government institutions, as well as from service and software providers, joined the session from various regions in Spain. With so many prominent stakeholders from the field around the table, the conversation was engaged and dynamic and offered an opportunity for much interaction on a topic that was considered very important and timely by the group.

The mapping exercise

As in all the other round table discussions, participants started with taking stock of next generation metadata projects in their region or initiatives elsewhere. The resulting map was chock-full of sticky notes with names of projects, services, standards, identifier hubs, and more. The left-upper quadrant listed several local and regional authority files.

Map of next-gen metadata projects (Spanish session)

An example of the latter is the cooperative catalog of name authorities from Cataluña (CÀNTIC). The right-upper quadrant was filled with Research Information Management (RIM) related portals, systems, and projects. One example is GREC, the CRIS (Current Research System Information) developed by the University of Barcelona, currently used in various institutions and research organizations. Another is brújulaUAL, the researcher profile service of the University of Almería, which collects all identifiers, publications, citation metrics, and h-index of their scholars. The cultural heritage related projects filled the left-lower quadrant and overflowed to the lower-right quadrant. Most of them referred to digitization and aggregation efforts, such as: Galiciana, the digital library of Galicia; Biblioteca Virtual de Prensa Histórica, the Virtual Library of Historical Newspapers; and Hispana, the national aggregator and portal for Spain’s digitized heritage.

Leveraging the local authority file to manage researcher identities

The map sparked off a lively discussion. Starting with the upper-left quadrant, several participants explained their efforts to bring their local authority file to the next level. One university librarian mentioned plans to publish the name authorities of their researchers as a Linked Open Data (LOD) set, with links to the corresponding names from the authority file of the Biblioteca Nacional de España (BNE). This was one example among many. All the major academic libraries in Spain are currently focusing on leveraging their local authority file: enriching it with identifiers (ORCID, BNE-ids, VIAF, etc.), publishing it as LOD, and also feeding external systems – such as the university’s Research Portal or the ORCID-database, with authority and bibliographic data. In doing so, they are encountering some practical difficulties – for example, is a given scholar still active or retired?  – illustrating the need to integrate systems across campus, in this example’s particular case, with the university’s Human Resources system. The group made two important observations about this current trend:

1) academic libraries are focusing on their own authority file and bibliographic data – which is a good starting point for managing researcher identities, but they are paying very little attention to CRIS-projects – which are parallel systems recording similar data;

2) academic libraries do the same thing, for the same purpose, but they act locally and so their approaches and implementations tend to differ, leading to a range of idiosyncratic solutions across the country.

Privileging digitization above the production of next generation metadata

The group was surprised to find so many heritage collection projects (lower half of the map) and so few RIM-projects (upper-right quadrant). One participant offered the explanation that this might be due to “the way libraries are organized” in Spain, where much more attention is given to the digitization of heritage collections. Others agreed and were of the opinion that the Spanish Government funding policy is privileging the digitization of paper above the production of next generation metadata. This has determined the landscape of library projects in the country. In terms of metadata, quite a few cultural heritage projects are building ontologies (also added to the map), but – as one of the participants put it – these are often complex. There are many untapped opportunities to use existing resources and tools such as Wikidata and/or Wikibase – which could also help improve multilingual access to collections online. However, the group also noted that the technology to move from linked data projects to production at scale was not yet mature.

Creating opportunity for more concerted effort

One of the participants remarked:

“There are lots of projects to keep track of. How many of them are institutional? We have little guidance for where we are going. There are few collaborative projects, and hardly any multilingual efforts.”

This was the signal for the participants to focus on the desirability to move their next generation metadata efforts forward in a more concerted way. The research portals were seen as an important application area where collaboration was needed. In this context, the group mentioned Dialnet, a large Spanish-Latin American library cooperative aggregating the metadata of the scholarly collections of its members, including full-texts and doctoral theses. It allows the retrieval and enrichment of bibliographic (publications) and bibliometric (citations, coauthors) data. The service is of particular importance to Spain because of the many Spanish and Social Sciences and Humanities publications and authors it contains, compared to SCOPUS, for example. Some of the participants suggested this service could serve as an important linking pin for library data and RIM-data in Spain.

Another important initiative is the development of cataloging rules by the RDA working group of the REBIUN (the Network of Spanish University Libraries) for producing authority files. The rules allow and recommend the addition of many relevant PIDs to the authority files (ORCID, ISNI, VIAF, BNE-ids, Dialnet-ids, Web of Science ResearcherID, SCOPUS-ids, Wikidata, etc.) and indicate the best way to do this. The plan is to build a union catalog of all the authority files covering all scientific authors at Spanish universities and publish it as an open dataset. Someone observed that the thinking behind this initiative was inspired by Karen Smith-Yoshimura’s report on The transition to the next generation metadata, which was nice to hear.

To sum up, the group definitively saw possibilities to interconnect their local initiatives and to create more synergies. They hoped to learn more about OCLC’s Shared Entity Management Infrastructure and expressed the wish to see OCLC help them organize more discussions like this one, to continue the analysis of the next generation metadata projects landscape and the conversation on collaboration in Spain.

About the OCLC Research Discussion Series on Next Generation Metadata  

In March 2021, OCLC Research conducted a discussion series focused on two reports: 

  1. Transitioning to the Next Generation of Metadata” 
  2. Transforming Metadata into Linked Data to Improve Digital Collection Discoverability: A CONTENTdm Pilot Project”. 

The round table discussions were held in different European languages and participants were able share their own experiences, get a better understanding of the topic area, and gain confidence in planning ahead. 

The Opening Plenary Session opened the forum for discussion and exploration and introduced the theme and its topics. Summaries of all eight round table discussions are published on the OCLC Research blog, Hanging Together. This post is the fourth one, preceded by the posts reporting on the first English session, the Italian session, the second English session, the French session and the German session.

The Closing Plenary Session on April 13 will synthesize the different round table discussions. Registration is still open for this webinar: please join us

The post Spanish round table on next generation metadata: managing researcher identities is top of mind appeared first on Hanging Together.

Table ronde française sur les métadonnées de nouvelle génération: le défi consiste à gérer de concert de multiples échelles / HangingTogether

Merci à Arnaud Delivet, OCLC, pour la traduction de l’article original en anglais.

Cet article de blog revient sur la table ronde en français organisée par le département recherche d’OCLC le 3 mars 2021 dans le cadre de sa Série de discussions sur les métadonnées de prochaine génération.

OCLC metadata discussion series

Les participants – issus des domaines du contrôle bibliographique, des services de catalogage, de la gestion de l’information sur la recherche (RIM), des archives et des collections patrimoniales – ont rejoint la session en provenance de France, de Belgique, d’Allemagne et d’Italie, formant un groupe très hétérogène. Comme lors de toutes les autres tables rondes, les participants ont commencé par donner un aperçu des projets liés aux métadonnées de prochaine génération en cours dans leur région.

L’exercice de cartographie

Carte de projets liés aux métadonnées de prochaine génération (session en français)

La carte résultante reflète bien ce qui était au cœur des préoccupations du groupe. Elle met en exergue le Fichier National d’Entités français (FNE), projet phare du Programme de transition bibliographique national, mené conjointement par la Bibliothèque nationale de France (BnF), et l’Agence bibliographique de l’enseignement supérieur (Abes). Un groupe de projets faisant partie de ce même programme occupe le quadrant supérieur gauche de la carte. Un regroupement intéressant d’étiquettes est apparu autour du centre de la carte, toutes relatives aux identifiants persistants, démontrant l’importance qui leur est attribuée et reflétant leur domaine d’application (données bibliographiques; données de gestion de l’information sur la recherche (RIM) et communications savantes; données patrimoniales). En comparaison, relativement peu de projets figurent dans les quadrants RIM ou patrimoniaux.

Possibilité de pollinisation croisée

Le groupe a ensuite partagé des informations sur les principaux projets présents sur la carte. Un participant a expliqué comment les deux principales agences bibliographiques françaises, la BnF et l’Abes, ont été amenées à collaborer en réponse à une délimitation moins stricte entre production de métadonnées et flux d’utilisation. Celle-ci résulte de la politique française d’ouverture des données publiques qui favorise la publication et la réutilisation de ces données. En conséquence, les deux agences ont décidé de rassembler leurs données et de coproduire des données bibliographiques à l’avenir, afin de profiter des avantages de la centralisation et de la normalisation pour gagner en efficacité au moment de la production et de la publication. Le FNE est la base centrale dans laquelle les deux organisations cocréeront des entités et est considéré comme la première mise en œuvre à grande échelle du Library Reference Model de l’IFLA. D’autres bibliothèques et archives françaises pourront, à terme, contribuer à cette base de connaissances pour qu’elle devienne véritablement une entreprise nationale.

Un autre participant a clarifié le statut du paysage RIM français et décrit deux initiatives infrastructurelles qui consistent à collecter des métadonnées pour permettre le suivi et l’évaluation de la recherche scientifique:

  1. CapLab, un système dans le cloud déployé par l’Amue (Agence de Mutualisation des Universités et Établissements) pour évaluer les projets de recherche.
  2. HALliance, un grand projet d’investissement porté par le CCSD (Le Centre pour la Communication Scientifique Directe) qui a pour but de développer la prochaine génération de l’Archive Ouverte HAL, avec une approche centralisée de l’analyse et de l’évaluation de la recherche.

Il semble y avoir peu de pollinisation croisée entre les initiatives nationales liées aux données de gestion de l’information sur la recherche et celles portant sur les données bibliographiques en France, et il a été observé que combler cet écart pourrait ouvrir de nouvelles opportunités. euroCRIS, l’organisation internationale pour le développement des systèmes d’information de recherche (CRIS) – qui gère également le modèle de données CERIF  – rassemble différentes parties prenantes de la communauté RIM et pourrait être une voie vers d’autres explorations collaboratives. Et c’était bien sûr exactement l’objectif de cette session: partager, connecter et rapprocher.

Transcender le niveau institutionnel

Le paysage belge a été décrit comme très différent du paysage français, même si l’objectif d’ouvrir les collections sur le Web des données est le même. La centralisation y est moindre et les institutions nationales – telles que les Archives d’État ou la Bibliothèque Royale – ont eu tendance à poursuivre leur propre travail traditionnel sur les autorités. Cependant, il a également été noté que la Bibliothèque Royale a endossé le rôle d’agence d’enregistrement ISNI pour les auteurs en Belgique et envisage de publier un fichier d’autorité national pour les noms d’auteurs avec leurs identifiants ISNI, VIAF et d’autres. Cet effort sera en outre étendu grâce à une collaboration avec d’autres bibliothèques belges.

Le passage à un niveau supérieur de gestion des métadonnées pour les bibliothèques universitaires à titre individuel a été décrit comme problématique pour diverses raisons. Tout d’abord, il est nécessaire d’harmoniser les autorités de noms d’auteurs dans différents systèmes, tels que le dépôt institutionnel, le catalogue et les collections numériques. Cependant, la mise en place d’un fichier d’autorité local centralisé pose ses propres problèmes de synchronisation. A cela s’ajoute la difficulté de choisir parmi les nombreux identifiants d’auteurs existants et de décider lesquels sont pertinents pour vos collections. Un moyen pratique et attrayant de surmonter ces problèmes consiste à se connecter à l’un des grands fichiers d’autorité internationaux, tels que le Name Authority File (NAF) de la Bibliothèque du Congrès, qui sont enrichis avec tous les identifiants pertinents et mis à jour automatiquement.

Le défi que pose la gestion de différentes échelles

Du point de vue français, la gestion des métadonnées de nouvelle génération à l’échelle nationale fait sens pour plusieurs raisons. Premièrement, en raison de la nécessité de respecter les principes et missions bibliographiques français. Plus précisément, la BnF et l’Abes sont attachées à la distinction entre «l’oeuvre» et «l’expression». Cette distinction est moins nette dans la tradition bibliographique anglo-saxonne. De plus, dans les grandes bases de connaissances internationales, les définitions des concepts relatifs à l’identité des personnes (par exemple, «identité publique») sont encore en évolution. Deuxièmement, la France est à la pointe en matière de gestion des autorités. Les liens entre les autorités et les notices bibliographiques ne sont pas basés sur des chaînes de caractères, mais sur des identifiants. Le passage au paradigme des entités interconnectées est donc relativement facile à faire. Les membres du groupe ont clairement indiqué qu’investir dans une infrastructure bibliographique à l’échelle nationale ne signifiait pas que les données étaient cloisonnées. Au contraire, les données seront interopérables, accessibles et réutilisables au niveau international. C’est à cela que servent les normes internationales. En fin de compte, l’échelle nationale est la meilleure pour débuter, car – comme l’a dit l’un des participants :

«Le global ne peut être alimenté que par le local, en termes de culture, en termes de consolidation de l’entité (…) nous avons la responsabilité, les connaissances et le savoir-faire au niveau local pour gérer les entités que sont les auteurs français (…), nous sommes les mieux qualifiés pour identifier et consolider ces entités qui requièrent un savoir local».

Dans le domaine RIM, l’échelle nationale est également incontournable pour des raisons historiques et culturelles françaises. Cependant, en raison de la nature de la recherche scientifique, deux autres échelles ont été décrites comme d’égale importance: l’échelle européenne et celle de la communauté disciplinaire. La première est motivée par les politiques de science ouverte et «une nébuleuse de projets et de groupes de travail financés par l’UE qui façonnent des infrastructures de données de recherche ouvertes». Le second est caractérisé par des pratiques de métadonnées extrêmement spécialisées. Toutes ces différentes échelles doivent être gérées de concert, ce qui représente un véritable défi.

Ce fut une conclusion révélatrice de cette table ronde, rappelant à chacun qu’il existe de multiples «bonnes échelles» coexistantes pour les métadonnées de prochaine génération, qui doivent être interconnectées et gérées dans leur ensemble.

À propos de la série de discussions du département recherche d’OCLC sur les métadonnées de nouvelle génération

En mars 2021, OCLC Research a mené une série de discussions basée sur deux rapports:

  1. Transitioning to the Next Generation of Metadata
  2. Transforming Metadata into Linked Data to Improve Digital Collection Discoverability: A CONTENTdm Pilot Project”.

Les tables rondes se sont déroulées dans différentes langues et les participants ont pu partager leurs expériences propres, mieux comprendre le sujet et se préparer pour l’avenir en toute confiance.

La séance plénière d’ouverture a lancé le forum de discussion et d’étude et a permis de présenter le thème et les sujets abordés. Les résumés des huit tables rondes sont publiés sur le blog d’OCLC Research, Hanging Together. Cet article est le quatrième de cette série, et fait suite à ceux consacrés à la première session en anglais, à la session en italien et à la deuxième session en anglais.

La séance plénière de clôture du 13 avril fera la synthèse des différentes tables rondes. Les inscriptions restent ouvertes pour ce webinaire: Rejoignez-nous!

The post Table ronde française sur les métadonnées de nouvelle génération: le défi consiste à gérer de concert de multiples échelles appeared first on Hanging Together.

How to write a static site generator in 30 lines or less / Hugh Rundle

This weekend I played around with Gemini. I became aware of Gemini in the last six months or so, via various techie people talking about it on Mastodon. Even though the tech stack for Mastodon could barely be further from what Gemini is trying to do, there is a lot of overlap in the interests of people using each technology. I'm very attracted to the forced simplicity of Gemini: it has a lot in common with concepts I've written about before, like Brutalist web design, decentralisation, the Hundred Rabbits philosophy and even to an extent some of the things that drive the Rust community - it's not surprising that a lot of the Gemini server software is written in Rust (or that Hundred Rabbits have a gemini site.

I found Gemini a little confusing at first, since it's more a "souped up gopher" than a slimmed-down HTTPS, and gopher is something my mother once used to access files over the university network, rather than something I ever remember using myself. But with a few hours reading and tinkering I got a site up and running, where I'll post notes about what I'm reading and thinking, in the spirit of the Classic Era of blogging. The gemtext syntax is disarmingly simple — significantly simpler and cleaner than HTML, essentially it's a very tight and rigid subset of Markdown with one exception, being the way URLs are formatted. This means that the gemini file format is extremely close to plain text.

Even so, I don't really want to futz around manually adding a new listing to an index page every time I write a note in a new file. So while there's no need to transpile formatting like I do for this blog (writing in markdown and publishing in HTML), the whole point of Gemini for me is to focus exclusively on writing prose rather than code or markup.

Fortuitously, just as I was thinking about this, Ed Summers published a nice little note about a shell script he uses for journaling. Ed's post inspired me to create my own tiny "static site generator" for Gemini:


DATE=`date +"%Y-%m-%d"`
YEAR=`date +"%Y"`
LISTING="\n=> $DATE.gmi $DATE ($@)"

# push to server if run with argument `up`
if [ $1 = "up" ];
rsync -aqz $DIR $REMOTE
echo "🚀 capsule launched!"
if [ ! -f $FILE ];
# Create new file with title as header
echo "# $@\n" > $FILE
if [ ! -f "$DIR/$YEAR/index.gmi" ];
# make new directory and index page for current year if there isn't one
mkdir $DIR/$YEAR
echo "# $YEAR Notes\n" > $DIR/$YEAR/index.gmi
# add new file to the top of the index listing for current year
sed -i "" "2 s/^/$LISTING/" $DIR/$YEAR/index.gmi
code $FILE

I saved this as an executable called "notes", so now I can run:

notes Title of today's post

And that will open a new file named with today's date, with a heading, "Title of today's post". It will also add a link to that page with that title, on the index page for the year's posts. If it's the first post for the year, it will create a new directory and index page for that year before saving the new file. If I already wrote a note today, instead of writing over the top of it, the script simply opens the file in VSCode.

Once I've written my note, I just notes up and the site is updated 🚀. Thanks for the inspiration, Ed!

$ j / Ed Summers

You may have noticed that I try to use this static website as a journal. But, you know, not everything I want to write down is really ready (or appropriate) to put here.

Some of these things end up in actual physical notebooks–there’s no beating the tactile experience of writing on paper for some kind of thinking.

But I also spend a lot of time on my laptop, and at the command line in some form or another. So I have a directory of time stamped Markdown files stored on Dropbox, for example:


Sometimes these notes migrate into a blog post or some other writing I’m doing. I used this technique quite a bit when writing my dissertation when I wanted to jot down things on my phone when an idea arrived.

I’ve tried a few different apps for editing Markdown on my phone, but mostly settled on iA Writer which mostly just gets out of the way.

But when editing on my laptop I tend to use my favorite text editor Vim with the vim-pencil plugin for making Markdown fun and easy. If Vim isn’t your thing and you use another text editor keep reading since this will work for you too.

The only trick to this method of journaling is that I just need to open the right file. With command completion on the command line this isn’t so much of a chore. But it does take a moment to remember the date, and craft the right path.

Today while reflecting on how nice it is to still be using Unix, it occurred to me that I could create a little shell script to open my journal for that day (or a previous day). So I put this little file j in my PATH:

So now when I’m in the middle of something else and want to jot a note in my journal I just type j.

Unix, still crazy after all these years.

Internet Archive Storage / David Rosenthal

The Internet Archive is a remarkable institution, which has become increasingly important during the pandemic. It has been for many years in the world's top 300 Web sites and is currently ranked #209, sustaining almost 60Gb/s outbound bandwidth from its collection of almost half a trillion archived Web pages and much other content. It does this on a budget of under $20M/yr, yet maintains 99.98% availability.

Jonah Edwards, who runs the Core Infrastructure team, gave a presentation on the Internet Archive's storage infrastructure to the Archive's staff. Below the fold, some details and commentary.

Among the highlights:
  • 750 servers, some up to 9-years old
  • 1,300 VMs
  • 30K storage devices
  • >20K spinning disks (in paired storage), a mix of 4,8,12,16TB drives, about 40% of the bytes are on 16TB drives.
  • almost 200PB of raw storage
  • growing the size of the archive >25%/yr.
  • adding 10-12PB of raw storage per quarter
  • with 16TB drives it would need 15 racks to hold a copy
  • currently running ~75 racks
  • currently serving about 55GB/s, planning for ~80GB/s soon
Edwards reports that the primary outage causes are:
  • Fiber cuts
  • power quality issues
  • power outages
Going forward, Edwards is asking "whether paired storage the right model?" The current constraints are:
  • Items in the archive are directories on disk
  • basic unit of storage is the disk
  • disks are replicated across datacenters
  • content is served from all (=both?) copies
The big issue with treating disk as the unit of paired storage is that when a disk fails a new member of the pair has to be created by reading the whole of the good member and writing the whole of the new member. This takes time, during which the good member is under high load and thus likely to suffer a correlated failure. The new member will be at the start of its life so subject to infant mortality, although it is fair to say that drive manufacturers have paid a lot of attention to reducing infant mortality. Edwards reports that the more recent drives are enough faster than the 8TB drives that the risk is manageable, but as the drives get bigger architectural change will be required to manage this.

Another issue is that the servers in the Archive's racks provide both the storage and the processing needed. The CPUs are getting faster, but not fast enough to keep up with the disks getting denser. More storage per server and per rack also increases the demand for per-rack bandwidth.

Deutschsprachige Gesprächsrunde zu Metadaten der nächsten Generation: Formate, Kontexte und Lücken / HangingTogether

Vielen Dank an Petra Löffel, OCLC, für die Übersetzung dieses im Original englischsprachigen Blogposts.

OCLC metadata discussion series

Im Rahmen der Diskussionsserie zu Metadaten der nächsten Generation berichtet dieser Blogpost von der deutschen Gesprächsrunde zu diesem Thema, die am 10. März 2021 stattgefunden hat.

Teilnehmer*innen aus Deutschland, der Schweiz und Ungarn repräsentierten Nationalbibliotheken, Staatsbibliotheken, Universitäts- und Spezialbibliotheken. Sie brachten Erfahrungen aus den Bereichen Metadaten und Sammlungsentwicklung, Open Access und automatisierte Schlagwortindizierung, Metadaten-Konzepte und Entitäten-Management ein – zusammengenommen eine gute Rezeptur für eine lebhafte und variantenreiche Diskussion.


Übersicht der Projekte (Deutschsprachige Gesprächsrunde)

Wie in den anderen Gesprächsrunden war eine Bestandsaufnahme existierender Projekte in den Regionen der erste Schritt. Die resultierende Übersicht von Projekten zeigte verstärkte Aktivität im Bereich bibliographischer Daten; ergänzend waren Aktivitäten in den Sektionen Forschungsdaten und Wissenschaftliche Kommunikation sowie Kulturerbe-Daten erkennbar.

Formate und Kontexte

Die Notiz “MARC21 –> BIBFRAME” auf der Übersicht entfachte eine sofortige Diskussion über die Eignung von Daten”formaten” in verschiedenen Kontexten. Die Gruppe war sich einig, dass BIBFRAME geeigneter und flexibler ist als MARC, aber auch seine Schwächen hat. Um Daten auszutauschen müssen Vereinbarungen darüber bestehen (und befolgt werden!) wie der Standard genutzt wird. Und eine Brücke zwischen verschiedenen Datentypen zu schlagen ist keine der Stärken von BIBFRAME.

Ein Teilnehmer formulierte sinngemäß:

Die Trennung von Titel- und Normdaten ist nicht mehr angemessen, da in der Zukunft werden alle Datentypen Teil eines einzigen großen Graphen sein werden.

Die Teilnehmer*innen sahen die Notwendigkeit für Übergänge zwischen verschiedenen Datenquellen. Neue Plattformen müssen modular und skalierbar genug sein, um die Besonderheiten der verschiedenen beteiligten Institutionen im Detail berücksichtigen zu können. Der Schritt von Normdaten zu Identitätsmanagement erlaubt Bibliotheken das Verlinken verschiedener Datentypen, z.B. von Forschungsdaten mit klassischen Bibliotheksdaten. Andere Bibliotheken erzeugen Querverweise zu anderen Systemen wie z.B. dem coli-conc Projekt, oder reichern ihren Katalog mit Links zu ergänzenden Information aus externen Quellen an. Nutzer*innen wollen Informationen finden, egal wo diese ist und woher sie kommen. Aussagekräftige Verlinkungen können erzeugt werden ohne neue Regeln einzuführen und ohne eine komplexe neue Infrastruktur aufzubauen.

Die im Aufbau begriffene Ungarische Nationale Bibliotheksplattform (ebenfalls auf der Übersicht) konzentriert sich auf ein Graph-Modell und speichert Tripel ab, nicht MARC Daten. Auf diese Weise sind die Daten nicht an ein bestimmtes Format gebunden und die Plattform kann viele verschiedene Bedürfnisse bedienen. Zugleich können Austauschformate nach Bedarf erzeugt werden.

Ein weiteres relevantes Projekt, das im Quadranten Forschungsinformationsmanagement der Übersicht aufgeführt ist, ist Metagrid – ein Projekt, das Daten aus den digitalen Geisteswissenschaften („Digital Humanities“) mit anderen Daten verknüpft, darunter auch Normdateien wie die GND. Allerdings enthalten Normdateien derzeit nicht die umfassenden und detaillierten Informationen, die Historiker benötigen würden. Dies unterstreicht wiederum die Notwendigkeit, Übergänge zwischen Datenquellen zu schaffen, um von der bereits geleisteten Arbeit anderer zu profitieren. Wir können nicht alle alles machen, mahnte eine Teilnehmerin.

Bibliotheksspezifische Formate haben in bestimmten Kontexten immer noch ihre Rolle: Nationalbibliotheken, die Nationalbibliografien veröffentlichen, müssen dies nach einem verlässlichen Regelwerk tun – auch wenn genau diese Regeln in anderen Kontexten obsolet werden könnten.

Gleichzeitig finden sich Bibliotheksdaten neben Daten ganz anderer Art. Ein Beispiel ist die steuerfinanzierte Schweizer E-Government-Initiative, das E-government Schweiz Portal: Alle Daten, die nicht vertraulich sind, müssen für alle Bürger zugänglich gemacht werden. Bibliotheksdaten werden dann beispielsweise neben Wetterdaten etc. veröffentlicht. Die Daten werden im RDF-Format publiziert und damit können die Tripel für jede andere Anwendung nachgenutzt werden, auch wenn noch nicht vorhersehbar ist, was die Nutzer eines Tages mit diesen Daten – vielleicht auch in Kombination mit ganz anderen Daten- machen werden. Was auch sehr spannend ist!

Wie können wir die einzigartigen Daten und die Stärken der Bibliotheken in die Linked-Data-Welt integrieren?

Automatisierte Schlagwortvergabe braucht Sprachkennzeichnung

Ein weiteres Thema, das sich sehr stark herauskristallisierte, ist die automatisierte Schlagwortvergabe und die daraus resultierenden Anforderungen an die zugrundeliegenden Daten.

Metadaten haben häufig große Qualitätsdefizite im Hinblick auf die Maschinenlesbarkeit. So werden z. B. Autoreninformationen, Abstracts etc. in den Metadatensätzen benötigt um eine automatische Schlagwortvergabe zu ermöglichen. Dies erfordert ein Umdenken im Umgang mit den Daten: Welche Art von Daten benötigt wird, wie sie gespeichert und wie sie typisiert werden.

Mehrsprachigkeit ist in diesem Zusammenhang eine weitere große Herausforderung. Aktuelle Normdaten sind so modelliert, dass sie eine bevorzugte Sprache haben. Zukünftige Normdaten müssen flexibler modelliert werden, wie beispielsweise in Wikidata, wo ein Begriff Bezeichnungen in mehr als einer Sprache hat (wie im während der Sitzung erwähnten Beispiel FIFA).

Für die automatische Schlagwortvergabe müssen alle Metadatenelemente sprachcodiert sein, so dass eine für die maschinelle Verarbeitung – und nicht nur für das menschliche Auge – offensichtlich ist, welche Sprache für ein bestimmtes Element oder eine bestimmte Zeichenfolge verwendet wird. Bibliothekar*innen denken manchmal, dass die Angabe der Sprache des Dokuments ausreichen sollte, aber das ist nicht der Fall. Dies zu lösen ist sowohl eine Frage der Koordination als auch eine Frage der Personalausstattung.

Automatisierte Skripts zur Spracherkennung sind ein Teil der Lösung, die allerdings eine gewisse Unschärfe birgt, merkten die Teilnehmer*innen an. Ein Teilnehmer schlug vor:

Wenn wir es schaffen, dass die automatische Verschlagwortung gut funktioniert, könnte das Bibliothekspersonal entlastet werden und sich auf die Sprachkodierung konzentrieren.

Mehr Vernetzung bei diesen Aktivitäten könnte ebenfalls von Vorteil sein. Derzeit sind Initiativen oft lokal begrenzt und die Zusammenarbeit mit anderen Bibliotheken kann langsam und langwierig sein, so die Teilnehmer*innen. Auf internationaler Ebene zusammenzuarbeiten hat große Vorteile, vor allem wenn man mit denen kooperiert, die schon viel weiter sind. Die finnische Nationalbibliothek zum Beispiel entwickelt Lösungen in diesem Bereich und stellt sie für den lokalen Einsatz zur Verfügung.

Auch Linked-Data-Bestrebungen sollten sich nicht lokal oder regional beschränken sondern wenn möglich auf nationaler Ebene mit einer starken Anbindung an eine internationale Infrastruktur stattfinden. Die Tatsache, dass zumindest in Deutschland viele Initiativen traditionell an Bibliotheksverbünde gekoppelt und somit regional ausgerichtet sind, könne manchmal ein Hindernis für die Ausdehnung der Reichweite sein, fand eine Teilnehmerin.  

Bibliothekar*innen müssen ihr Rollenverständnis überdenken

Bei der Diskussion über Metadaten der nächsten Generation geht es oft um Prioritäten. Können wir mehr von den Daten wiederverwenden, die im Vorfeld von Verlagen, Produzenten und Universitäten generiert wurden, ohne viel Zeit für die erneute Erstellung in unseren Bibliotheken aufzuwenden, um Personal für andere Aufgaben freizuschaufeln? Im Gespräch mit Katalogisierenden ist dies oft ein heikles Thema, merkten die Teilnehmer*innen an.

Und es geht nicht nur um die Katalogisierenden … Als Berufsstand müssen wir die Positionen der Bibliotheken herausfordern, hinterfragen und eine breitere Perspektive einnehmen, schlug ein Teilnehmer vor. Die Verwaltung ist oft langsam und träge. Die Bibliothekswelt hat sich in den letzten zehn Jahren im Gegensatz zu anderen Bereichen nicht so sehr verändert.

Schließlich waren sich die Teilnehmer*innen einig, dass wir uns im Zusammenhang der Überführung von Metadaten in die nächste Generation von dem Konzept “Projekt” verabschieden und stattdessen anerkennen sollten, dass es sich um eine fortlaufende Aufgabe handelt, die eine angemessene Personalausstattung, unbefristete Stellen und ausreichende finanzielle Mittel benötigt! Wir sollten nicht mehr länger auf Projektbasis arbeiten.

Über die OCLC Research Gesprächsreihe zu Metadaten der nächsten Generation   

Im März 2021 führte OCLC Research eine Diskussionsreihe durch, die sich auf zwei Berichte stützte:

  1. Transitioning to the Next Generation of Metadata”  
  2. Transforming Metadata into Linked Data to Improve Digital Collection Discoverability: A CONTENTdm Pilot Project”.  

Die Diskussionsrunden wurden in verschiedenen europäischen Sprachen abgehalten und die Teilnehmer*innen konnten ihre eigenen Erfahrungen austauschen, ein besseres Verständnis für das Themengebiet erlangen und Sicherheit für die weitere Planung gewinnen.

Die Eröffnungssitzung eröffnete das Forum für Diskussionen und Austausch und führte in das Thema und seine Aufgabenstellungen ein. Zusammenfassungen aller acht Gesprächsrunden werden auf dem OCLC Research Blog Hanging Together veröffentlicht.

Die abschließende gemeinsame Sitzung am 13. April wird die Diskussionen der verschiedenen Gruppen zusammenfassen. Die Anmeldung für dieses Webinar ist noch offen: Bitte nehmen Sie teil!

The post Deutschsprachige Gesprächsrunde zu Metadaten der nächsten Generation: Formate, Kontexte und Lücken appeared first on Hanging Together.

German round table on next-generation metadata: Formats, contexts and deficits / HangingTogether

OCLC metadata discussion series

As part of the discussion series on Next Generation Metadata, this blog post reports back from the German language round table discussion held in the morning of March 10, 2021. (A German translation of this post is available here.)

Participants from Germany, Switzerland, and Hungary represented national libraries, state libraries, university libraries, and special libraries; combined, they had backgrounds in metadata and collection development, open access and automated subject indexing, metadata concepts and entity management – all the ingredients for a lively and varied discussion.  

Mapping exercise

Map of next-gen metadata projects (German session)

As in all other round table discussions, taking stock of projects in the regions was a first step and resulted in a map mural of projects, which indicated strong activity in the quadrant of bibliographic data and some additional activity in the other sections, research information management (RIM), scholarly communications and cultural heritage. 

Formats and contexts

The “MARC21 –> BIBFRAME” note on the map immediately sparked a general discussion about the suitability of data “formats” in different contexts. While there was agreement that BIBFRAME was more suitable and flexible than MARC, it, too, has its limitations. To actually exchange data, agreements need to be in place (and adhered to!) on how the standard is used. And bridging different types of data is not one of BIBFRAME’s strengths.  

As one participant noted: 

The separation of title and authority data is no longer valid, as in the future all types will be part of one big graph. 

Participants noted a necessity to create gangways between different data sources. New platforms need to be modular and scalable enough in order to accommodate the subtleties of various participating institutions. Moving away from authority files to identity management allows libraries to link e.g., research data with classic library data. Other libraries create cross-references to other systems, like the coli-conc project, or enrich their catalog with links to additional information from external sources. Customers want to find information, regardless of where it is and where it comes from. Meaningful links can be created without introducing new rules and building a complex new infrastructure.  

The nascent Hungarian National Library Platform (also shown on the map) focuses on a graph model that stores triples and not MARC data; that way, the data then is not tied to a specific format and the platform can serve multiple sectors; at the same time, exchange formats can be created as needed to accommodate specific needs.  

Another relevant project in this area listed in the research information management quadrant of the map is Metagrid – a project that links data from the digital humanities with other data, including authority files such as the GND. However, authority files never have enough historical details and the fine-granular information that historians would need. Which again emphasizes the need for creating gangways between data sources to benefit from one another’s work. We cannot all do everything, a participant warned. 

Library specific formats still have their role in specific contexts. National libraries publishing national bibliographies need to do so following a reliable set of rules, even though these very rules might become obsolete in other contexts. 

At the same time, library data finds itself next to data of a very different kind. One example is the tax-funded Swiss E-government data initiative, the E-government Schweiz portal: All data that is not confidential has to be made available for all citizens. Library data is published next to weather data etc., it is published as RDA data and these triples can be used for any application. There is no way to foresee what users might one day do with this data, including in combination, perhaps. Which is also very exciting!  

How can we integrate the libraries’ unique assets and strengths into the linked data world?  

Auto-indexing needs language tagging 

Another theme that emerged quite strongly was that of automated subject indexing and resulting data requirements.  

Current metadata has strong deficits in its quality in terms of machine-readability. For example, author keywords, abstracts etc. are needed in the metadata records to enable auto-indexing. This calls for a shift in the way in which data is handled, which type of data is needed, how it is stored, and how it is typified. 

Multi-lingualism is another big challenge in this context. Current authority data is modelled to have one preferred language. Future authority data needs to be modelled more flexibly, like in Wikidata, where a term has labels in more than one language (as in the example of FIFA mentioned during the session).  

For auto-indexing, all metadata elements need to be language-coded so that it is obvious to machines, not just users, which language is used for a given element or string. Librarians sometimes think that indicating the language of the document should be sufficient but that is not the case. This is both a coordination and a staffing problem.  

Automatic language-detection scripts are part of the solution but that has a certain fuzziness, participants noted. Maybe, one participant suggested:  

If we can get automatic subject tagging to work well, librarian staff could be freed–up for language tagging.  

Scaling the effort could also be beneficial. Currently, auto-indexing initiatives are often just local, and cooperation with library networks can be slow and tedious, participants observed. Cooperating internationally has its benefits, especially when cooperating with those much further ahead. The Finnish National Library, for example, develops solutions in this area and provides them for local deployment.  

Linked data efforts, too, should not be limited to local or regional scales but if possible, take place at the national level with strong links to an international infrastructure. The fact that, at least in Germany, many initiatives are traditionally linked to library networks and thus regional in scale, which can sometimes be a barrier to scaling up, one participant observed.  

Librarians need to revisit their understanding of their role.  

Often, when discussion next generation metadata topics it comes down to priorities. Can we re-use more of the data generated upstream, by publishers, producers, universities, without spending much time on creating it again in our libraries, to free up staff for other work? A difficult topic to raise with cataloguers, though, participants felt.  

And it is not just cataloguers … As a profession, we need to challenge and question positions of libraries which often do not have a broad perspective, one participant suggested. Administration is often slow and sluggish. The library world has not changed that much in the past ten years, unlike other sectors.  

Finally, participants agreed, let us get rid of the “project” concept, but rather acknowledge that this is an ongoing effort which needs appropriate staffing, unlimited job positions, and sufficient financial resources! In this realm at least of next generation metadata, we should no longer be working on a “project” basis. 

About the OCLC Research Discussion Series on Next Generation Metadata   

In March 2021, OCLC Research conducted a discussion series focused on two reports:  

  1. Transitioning to the Next Generation of Metadata”  
  1. Transforming Metadata into Linked Data to Improve Digital Collection Discoverability: A CONTENTdm Pilot Project”.  

The round table discussions were held in different European languages and participants were able share their own experiences, get a better understanding of the topic area, and gain confidence in planning ahead.  

The Opening Plenary Session opened the forum for discussion and exploration and introduced the theme and its topics. Summaries of all eight round table discussions are published on the OCLC Research blog, Hanging Together. This post is preceded by the posts reporting out on  the first English session, the Italian session, the second English session and the French session.

The Closing Plenary Session on April 13 will synthesize the different round table discussions. Registration is still open for this webinar: please join us!  

The post German round table on next-generation metadata: Formats, contexts and deficits appeared first on Hanging Together.

New edition of Data Journalism Handbook published open access / Open Knowledge Foundation

This blogpost is republished from the original with kind permission from Liliana Bounegru.

The first edition of the Data Journalism Handbook started life at the 2011 Mozilla Festival
(see Open Knowledge Foundation’s Flickr album for more photos from the event)

Today The Data Journalism Handbook: Towards a Critical Data Practice (which I co-edited with Jonathan Gray) is published on Amsterdam University Press. It is published as part of a new book series on Digital Studies which is also being launched today. You can find the book here, including an open access version:

The book provides a wide-ranging collection of perspectives on how data journalism is done around the world. It is published a decade after the first edition (available in 14 languages) began life as a collaborative draft at the Mozilla Festival 2011 in London.

The new edition, with 54 chapters from 74 leading researchers and practitioners of data journalism, gives a “behind the scenes” look at the social lives of datasets, data infrastructures, and data stories in newsrooms, media organisations, startups, civil society organisations and beyond.

The book includes chapters by leading researchers around the world and from practitioners at organisations including Al Jazeera, BBC, BuzzFeed News, Der Spiegel,, The Engine Room, Global Witness, Google News Lab, Guardian, the International Consortium of Investigative Journalists (ICIJ), La Nacion, NOS, OjoPúblico, Rappler, United Nations Development Programme and the Washington Post.

An online preview of various chapters from book was launched in collaboration with the European Journalism Centre and the Google News Initiative and can be found here.

The book draws on over a decade of professional and academic experience engaging with the field of data journalism, including through my role as Data Journalism Programme Lead at the European Journalism Centre; my research on data journalism with the Digital Methods Initiative; my PhD research on “news devices” at the universities of Groningen and Ghent; and my research, teaching and collaborations around data journalism at the Department of Digital Humanities at King’s College London.

Further background about the book can be found in our introduction. Following is the full table of contents and some quotes about the book. We’ll be organising various activities around the book in coming months, which you can follow with the #ddjbook hashtag on Twitter.

If you adopt the book for a class we’d love to hear from you so we can keep track of how it is being used (and also update this list of data journalism courses and programmes around the world) and to inform future activities in this area. Hope you enjoy it!