Planet Code4Lib

An update from the 2020 Frictionless Tool Fund grantees / Open Knowledge Foundation

We are excited to share project updates from our 2020 Frictionless Data Tool Fund! Our five grantees are about half-way through their projects and have written updates below to share with the community. These grants have been awarded to projects using Frictionless Data to improve reproducible data workflows in various research contexts. Read on to find out what they have been working on and ways that you can contribute!

Carles Pina Estany: Schema Collaboration

The goal of the schema-collaboration tool fund is to create an online platform to enable data managers and researchers to collaborate on describing their data through writing Frictionless data package schemas. The basics can be seen and tested on the online instance of the platform: the data manager can create a package, assign data packages to researchers, add comments and send a link to the researchers which will use datapackage-ui to edit the package and save it, making it available for the data manager. The next steps are to add extra fields to datapackage-ui and to work on the integration between schema-collaboration and datapackage-ui to make maintenance easier. Carles also plans to have an output of the datapackage as a PDF to help data managers and researchers spot errors. Progress can be followed through the project Wiki and feedback would be welcome through Github issues.

Read more about Carles’ project here: https://frictionlessdata.io/blog/2020/07/16/tool-fund-polar-institute/

 

Simon Tyrrell: Frictionless Data for Wheat

As part of the Designing Future Wheat project, Simon and team have repositories containing a wide variety of heterogeneous data. They are trying to standardise how to expose these datasets and their associated metadata. The first of their portals stores its data in an iRODS (https://irods.org/) repository. They have recently completed the additions to our web module, eirods-dav, that uses the files, folders and metadata stored within this repository to automatically generate the Data Packages for the datasets. The next step is to look at expanding the data that is added to the Data Packages and similarly automatically expose tabular data as Tabular Data Packages. The eirods-dav GitHub repository is at https://github.com/billyfish/eirods-dav and any feedback or queries are very welcome.

Read more about Simon’s project here: https://frictionlessdata.io/blog/2020/08/17/frictionless-wheat/

 

Stephen Eglen: Analysis of spontaneous activity patterns in developing neural circuits using Frictionless Data tools

Stephen and Alexander have been busy over the summer integrating the frictionless tools into a workflow for analysis electrophysiological datasets. They have written converters to read in their ASCII- and HDF5-based data and convert them to frictionless containers.  Along the way, they have given helpful feedback to the team about the core packages. They have settled on the python interface as the most feature rich implementation to work with.  Alexander has now completed his analysis of the data, and we are currently working on a manuscript to highlight our research findings.

Read more about Stephen’s project here: https://frictionlessdata.io/blog/2020/08/03/tool-fund-cambridge-neuro/

 

Asura Enkhbayar: Metrics in Context

How much do we know about the measurement tools used to create scholarly metrics? While data models and standards are neither new nor uncommon to the scholarly space, “Metrics in Context” is all about the very apparatuses we use to capture the scholarly activity embedded in those metrics. In order to confidently use citations and altmetrics in research assessment or hiring and promotion decisions, we need to be able to provide standardized descriptions of the involved digital infrastructure and acts of capturing. Asura is currently refining the conceptual model for scholarly events in the digital space in order to be able to account for various types of activities (both traditional and alternative scholarly metrics). After a review of the existing digital landscape of scholarly infrastructure projects, he will dive into the implementation using Frictionless. You can find more details on the open roadmap on Github and feel free to submit questions and comments as issues!

Read more about Asura’s project here: https://frictionlessdata.io/blog/2020/09/17/tool-fund-metrics/

 

Nikhil Vats: Adding Data Package Specifications to InterMine’s im-tables

Nikhil is working with InterMine to add data package specifications to im-tables (a library to query biological data) so that users can export metadata along with query results. Right now, the metadata contains field names, their description links, types, paths, class description links and primary key(s). Nikhil is currently figuring out ways to get links for data sources, attribute descriptions and class descriptions from their fair terms (or description links). Next steps for the project include building the frontend for this feature in im-tables and getting the rest of required information like result file format (CSV, TSV, etc.) about data in the datapackage.json (metadata) file. You can contribute to this project by opening an issue here or reaching out at chat.intermine.org.

Read more about Nikhil’s project here: https://frictionlessdata.io/blog/2020/07/10/tool-fund-intermine/

Liability In The Software Supply Chain / David Rosenthal

Atlantic Council Report On Software Supply Chains was already rather long when I got to the last of the report's recommendations that I wanted to discuss, the one entitled Bring Lawyers, Guns and Money. It proposes imposing liability on actors in the software supply chain, and I wrote:
The fact that software vendors use licensing to disclaim liability for the functioning of their products is at the root of the lack of security in systems. These proposals are plausible but I believe they would either be ineffective or, more likely, actively harmful. There is so much to write about them that they deserve an entire post to themselves.
Below the fold is the post they deserve. The recommendation in question states:
The US Congress should extend final goods assembler liability to operators of major open-source repositories, container managers, and app stores. These entities play a critical security governance role in administering large collections of open-source code, including packages, libraries, containers, and images. Governance of a repository like GitHub or an app hub like the PlayStore should include enforcing baseline life cycle security practices in line with the NIST Overlay, providing resources for developers to securely sign, distribute, and alert users for updates to their software. This recommendation would create a limited private right of action for entities controlling app stores, hubs, and repositories above a certain size to be determined. The right would provide victims of attacks caused by code, which failed to meet these baseline security practices, a means to pursue damages against repository and hub owners. Damages should be capped at $100,000 per instance and covered entities should include, at minimum, GitHub, Bitbucket, GitLab, and SourceForge, as well as those organizations legally responsible for maintaining container registries and associated hubs, including Docker, OpenShift, Rancher, and Kubernetes.
The recommendation links to two posts:
  • A report by the Paul Weiss law firm entitled The Cyberspace Solarium Commission’s Final Report and Recommendations Could Have Implications for Business:
    Charged with developing a comprehensive and strategic approach to defending the United States in cyberspace, the Commission is co-chaired by Sen. Angus King (I-Maine) and Rep. Mike Gallagher (R-Wisconsin) and has 14 commissioners, including four U.S. legislators, four senior executive agency leaders, and six nationally recognized experts from outside of government.
    and the relevant recommendation in their March 11, 2020 final report is:
    The recommended legislation would hold final goods assemblers (“FGAs”) of software, hardware, and firmware liable for damages from incidents that exploit vulnerabilities that were known at the time of shipment or discovered and not fixed within a reasonable amount of time. An FGA is any entity that enters into an end user license agreement with the user of the product or service and is most responsible for the placement of a product or service into the stream of commerce. The legislation would direct the Federal Trade Commission (“FTC”) to promulgate regulations subjecting FGAs to transparency requirements, such as disclosing known, unpatched vulnerabilities in a good or service at the time of sale.
  • Trey Herr's Software Liability Is Just a Starting Point:
    To make meaningful change in the software ecosystem, a liability regime must also:
    • Apply to the whole software industry, including cloud service providers and operational technology firms such as manufacturing and automotive companies. These firms are important links in the software supply chain.
    • Produce a clear standard for the “duty of care” that assemblers must exercise—the security practices and policies that software providers must adopt to produce code with few, and quickly patched, defects.
    • Connect directly to incentives for organizations to apply patches in a timely fashion.
    The duty of care becomes critically important in defining the standard of behavior expected of final goods assemblers. An effective standard might well create legal obligations to set “end-of-life” dates for software, remove copyright protections that inhibit security research, or block the use of certain software languages that have inherent flaws or make it difficult to produce code with few errors.

What Exactly Are The Proposals?

Because they aren't the same, lets distinguish between these three proposals, the Atlantic Council (AC), the Cyberspace Solarium (CS) Commission, and Trey Herr (TH).

AC

  • Who is liable? Major distributors of open source software.
  • What are they liable for? Enforcing good security practices on the open source developers using their services.
  • What can they do to avoid liability? Not clear, because it isn't clear what enforcement mechanisms repositories can employ against their developers.
  • Who imposes the liability? Victims of negligence by open source developers not the repository upon whom the liability is imposed, or more likely their class action lawyers. But the $100K per incident limit would discourage class actions.
Two key points are that liability applies only to open source software, and that the repositories are liable for actions by others over whom they have little or no control. Both are unlikely to pass constitutional muster; it is true that open source licenses disclaim liability, but so do closed source licenses. Nor, given the international nature of open source development and infrastructure, would they be effective. Repositories could simply move to other jurisdictions, without restricting access by US developers.

CS

  • Who is liable? Vendors of products including software requiring users to agree to an "end user license".
  • What are they liable for? "damages from incidents that exploit vulnerabilities that were known at the time of shipment or discovered and not fixed within a reasonable amount of time".
  • What can they do to avoid liability? Not clear, but including disclosing known vulnerabities at the "time of sale".
  • Who imposes the liability? Victims of covered incidents, or more likely their class action lawyers. Note the lack of any limit to liability.
It isn't clear whether an open source license, which is between the developer and the user of the software, counts as an "end user license", which is between the FGA and the user. This proposal is extraordinarily broad, covering many products with embedded software but no network connectivity, such as a Furby.

Note the distinction between "time of shipment" and "time of sale". Many FGAs of physical products including software are in China, and the time between them shipping the product and it percolating through the retail distribution chain may be many months. The FGA has no way to identify the eventual purchaser to notify them of vulnerabilities discovered in that time. Imported cars contain much software but also have months between shipment and sale, although in this case the dealer network allows for the customer to be identified.

TH

  • Who is liable? The entire software industry, including those using software such as cloud providers and manufacturers of products including software.
  • What are they liable for? Not clear.
  • What can they do to avoid liability? FGAs must observe a "duty of care". How non-FGA participants in the supply chain avoid liability isn't clear.
  • Who imposes the liability? Presumably, the class action lawyers for customers who believe that the "duty of care" was not observed.
This proposal is both far-reaching and vague. It assumes that NIST and others can set uniform, legally enforceable standards for the software development, vulnerability detection and reporting, and patch creation and deployment for all industries including software in their products. In practice, vastly different standards already apply to, for example, avionics software and embedded software in toys. Simply for economic reasons, applying avionics standards to toys is infeasible. This isn't to say standards of the kind envisaged should not be promulgated. They would be a good thing, but they cannot apply uniformly to the entire industry, nor cause rapid improvement.

Specifying The Problem

The only vaguely specified goal of these proposals is, presumably, to reduce the average number of vulnerabilities per installed system. This depends upon a number of factors. Decreasing any of these will theoretically move the world closer to the goal:
  1. The average rate of newly created vulnerabilities times the average number of systems to which they are deployed.
  2. The average time between creating and detection of newly created vulnerabilities.
  3. The average time between detection and development of a patch.
  4. The average time between development of a patch and its deployment to a vulnerable system.
Imposing liability on software providers is primarily intended to force them to adopt better software development practices, thereby affecting the factors as follows:
  1. Better practices should decrease the average rate of newly created vulnerabilities.
  2. Liability for software providers will tend to increase the average time between creation and detection. Because providers aren't liable for vulnerablities they don't know about, they will be motivated to prevent security researchers inspecting their systems, perhaps via the Computer Fraud and Abuse Act. It will decrease the proportion of the software base that is open source, and thus open to inspection.
  3. Better practices will probably increase the average time for patches to be developed, as they will impose extra overhead on the development process.
  4. Liability for software vendors, as opposed to users, is ikely to have no effect on the average rate of patching, since once a patch is available the vendor is off the hook.
In the absence of hard data as to the relative contribution of each factor, my guess would be that that imposing liability on vendors would have a net negative effect.

The greatest leverage would be somehow to increase the rate at which patches are installed in vulnerable systems. Slow patching has been responsible for many of the worst data breaches, including the massive Equifax disaster:
Although a patch for the code-execution flaw was available during the first week of March, Equifax administrators didn't apply it until July 29,
My post Not Whether But When has a sampling of similar incidents, to reinforce this point:
You may be thinking Equifax is unusually incompetent. But this is what CEO Smith got right. It isn't possible for an organization to restrict security-relevant operations to security gurus who never make mistakes; there aren't enough security gurus to go around, and even security gurus make mistakes.
However, it must be noted that an effort to increase the rate of patching is a double-edged sword. Organizations need to test patches before deploying them because a patch in one sub-system can impact the functions of other sub-systems, including security-related functions. Rushing the process, especially the security-related testing, will lead security gurus to make mistakes.

Even the most enthusiastic proponents of imposing liability would admit that doing so would only reduce, not eliminate, the incidence of newly created vulnerabilities. So, as I argued in Not Whether But When, we must plan for a continuing flow of vulnerabilities. Worse, a really important 2010 paper by Sandy Clarke, Matt Blaze, Stefan Frei and Jonathan Smith entitled Familiarity Breeds Contempt: The Honeymoon Effect and the Role of Legacy Code in Zero-Day Vulnerabilities showed that the rate of discovery of vulnerabilities in a code base increases with time. As I wrote in Familiarity Breeds Contempt:
Clarke et al analyze databases of vulnerabilities to show that the factors influencing the rate of discovery of vulnerabilities are quite different from those influencing the rate of discovery of bugs. They summarize their findings thus:
We show that the length of the period after the release of a software product (or version) and before the discovery of the first vulnerability (the ’Honeymoon’ period) is primarily a function of familiarity with the system. In addition, we demonstrate that legacy code resulting from code re-use is a major contributor to both the rate of vulnerability discovery and the numbers of vulnerabilities found; this has significant implications for software engineering principles and practice.

The Internet of Things

In 2018's The Island of Misfit Toys I critiqued Johnathan Zittrain's New York Times op-ed entitled From Westworld to Best World for the Internet of Things which, among other things, proposed a "network safety bond" to be cashed in if the vendor abandoned maintenance for a product, or folded entirely. Insurers can price bonds according to companies’ security practices. There’s an example of such a system for coal mining, to provide for reclamation and cleanup should the mining company leave behind a wasteland:
$15.99 home router
The picture shows the problem with these, and every proposal to impose regulation on the Things in the Internet. It is a screengrab from Amazon showing today's cheapest home router, a TRENDnet TEW-731BR for $15.99:
  • There's no room in a $15.99 home router's bill of materials for a "network safety bond" that bears any relation to the cost of reclamation after TRENDnet abandons it.
  • ...
Anyone who has read Bunnie Huang's book The Hardware Hacker will understand that TRENDnet operates in the "gongkai" ecosystem; it assembles the router from parts including the software, the chips and their drivers from other companies. Given that after assembly, shipping and Amazon's margin, TRENDnet's suppliers probably receive only a few dollars, the idea that they could be found, let alone bonded, is implausible. If they were, the price would need to be increased significantly. So on Amazon's low-to-high price display the un-bonded routers would show up on the first page and the bonded ones would never be seen.
The whole point of the "Internet of Things" is that most of the Things in the Internet are cheap enough that there are lots of them. So either the IoT isn't going to happen, or it is going to be rife with unfixed vulnerabilities. The economic pressure for it to happen is immense.

Karl Bode's House Passes Bill To Address The Internet Of Broken Things reports on an effort in Congress to address the problem:
the House this week finally passed the Internet of Things Cybersecurity Improvement Act, which should finally bring some meaningful privacy and security standards to the internet of things (IOT). Cory Gardner, Mark Warner, and other lawmakers note the bill creates some baseline standards for security and privacy that must be consistently updated (what a novel idea), while prohibiting government agencies from using gear that doesn't pass muster. It also includes some transparency requirements mandating that any vulnerabilities in IOT hardware are disseminated among agencies and the public quickly:
"Securing the Internet of Things is a key vulnerability Congress must address. While IoT devices improve and enhance nearly every aspect of our society, economy and everyday lives, these devices must be secure in order to protect Americans’ personal data. The IoT Cybersecurity Improvement Act would ensure that taxpayers dollars are only being used to purchase IoT devices that meet basic, minimum security requirements. This would ensure that we adequately mitigate vulnerabilities these devices might create on federal networks."
Setting standards enforced by Federal purchasing power is a positive approach. But the bill seems unlikely to pass the Senate. Even if we could wave a magic wand and force all IoT vendors to conform to these standards and support their future products with prompt patches for the whole of their working life, it wouldn't address the problem that the IoT is already populated with huge numbers of un-patched things like the $250 coffee maker whose firmware can be replaced by a war-driver over WiFi. Worse, many of them are home WiFi routers, with their hands on all the home's traffic. Even worse, lots of them are appliances such as refrigerators, whose working lives are 10-20 years. How much 20-year-old hardware do you know of that is still supported?

App Stores

Mobile phone operators have somewhat more control over the devices that connect to their networks, and to defend them need the devices to be less insecure. So they introduced the "walled gardens" called App Stores. The idea, as with the idea of imposing liability on software providers, was that apps in the store would have been carefully vetted, and that insecure apps would have been excluded. Fundamentally, this is the same idea as "content moderation" on platforms such as Facebook and Twitter. That is, the idea that humans can review content and classify it as acceptable or unacceptable.

Recent experience with moderation of misinformation on social media platforms bears out Mike Masnick's Masnick's Impossibility Theorem: Content Moderation At Scale Is Impossible To Do Well. His third point is:
people truly underestimate the impact that "scale" has on this equation. Getting 99.9% of content moderation decisions at an "acceptable" level probably works fine for situations when you're dealing with 1,000 moderation decisions per day, but large platforms are dealing with way more than that. If you assume that there are 1 million decisions made every day, even with 99.9% "accuracy" (and, remember, there's no such thing, given the points above), you're still going to "miss" 1,000 calls. But 1 million is nothing. On Facebook alone a recent report noted that there are 350 million photos uploaded every single day. And that's just photos. If there's a 99.9% accuracy rate, it's still going to make "mistakes" on 350,000 images. Every. Single. Day. So, add another 350,000 mistakes the next day. And the next. And the next. And so on.
The vetting problem facing app stores is very similar, in that the combination of scale and human fallibility leads to too many errors. The catalog of malware detected in the Apple and Android app stores testifies to this. So, just like the IoT, the app store ecosystem will be rife with vulnerabilities.

Code Repositories

The idea that code repositories are liable for the software they host has the same vetting problem as app stores, except worse. App stores are funded by taking 30% of the gross. Code repositories have no such income stream, and no way of charging the users who download code from them. It is open source, so the repository has no way to impose a paywall to fund the vetting process. Nor can they charge contributors. If they did contributors would switch to a free competitor. So, just like the IoT and the app store ecosystem, code repositories will be rife with vulnerabilities.

Downsides of Liability

It looks as though imposing liability on software vendors wouldn't be effective. But it is likely to be worse than that:
  • Without the explicit disclaimer of liability in open source licenses the unpaid individual contributors would be foolish to take part in the ecosystem. Only contributors backed by major corporations and their lawyers would be left, and while they are important they aren't enough to support a viable ecosystem. Killing off open source, at least in the US, would not improve US national security or the US economy.
  • The proposals to impose liability assume that the enforcement mechanism would be class action lawsuits. Class action lawyers take a large slice of any penalities in software cases, leaving peanuts for the victims. In half a century of using software, I have never received a penny from a software-related class action settlement. I believe there were a couple in which, after filing difficult-to-retrieve documentation, I might have been lucky enough to be rewarded with a couplon for a few dollars. Not enough to be worth the trouble. Further, in order that the large slice be enough, class action lawyers only target deep pockets. The deep pockets in the software business are not, in general, the major source of the problems liability is supposed to address.
  • Software is a global business, one which is currently dominated by US multinational companies. They are multinational because the skilled workforce they need is global. If the US imposes harsher conditions on the software business than competing countries, the business will migrate overseas. While physical goods can be controlled by Customs at the limited ports of entry, there are no such choke points on the Internet to prevent US customers acquiring foreign software. Driving the software industry overseas would not improve US national security or the US economy.

How To Fix The Problem

If imposing liability on software providers is not likely to be either effective or advantageous, what could we do instead? Lets start by stipulating that products with embedded software that lack the physical means to connect to the Internet cannot pose a threat to other devices or their user's security via the Internet so can be excluded because they are appropriately covered by current product liability laws.

Thus the liability regime we are discussing is for software in devices connected to the Internet. In the good old days of The Phone Company, connection to the network was regulated; only approved devices could connect. This walled garden was breached by the Carterphone decision, which sparked an outburst of innovation in telephony and greatly benefited both consumers and the economy. It is clear that the Internet could not have existed without the freedom to connect provided by the Carterphone decision. Even were legislation enforcing a "permission to connect" regime for the Internet passed in some countries, it would be impossible to enforce. The Internet Protocols specify how to interconnect diverse networks. They permit situations such as the typical home network, in which a multitude of disparate devices are connected to the Internet via a gateway router performing Network Address Translation and thus rendering the devices invisible to the ISP.

We need to increase the speed of patching vulnerabilities in devices connected to the Internet. We cannot issue devices "passports" permitting them to connect. We cannot impose liability on individual users who do not patch promptly for the same reason that copyright trolls often fail. All we have is an IP address, which is not an identification of the responsible individual.

The alternative is to encourage vendors to support automatic updates. But this also is a double-edged sword. If everything goes right, it ensures that devices are patched soon after the patch becomes available. But if not, it ensures that a supply chain attack compromises the entire set of vulnerable devices at once. As discussed in Securing The Software Supply Chain, current techniques for securing automatic updates are vulnerable, for example to compromise of the vendor's signing key, or malfeasance of the certificate authority. Use of Certificate Transparency can avert these threats, but as far as I know no software vendor is yet using it to secure their updates.

A "UL-style" label certifying that the device would be automatically updated with cryptographically secured patches for at least 5 years would be a simple, easily-understood product feature encouraging adoption. It would have two advantages, making clear that:
  • the software provider assumed responsibility for providing updates to fix known problems, and protecting the certificates that secured the updates.
  • the customer who disabled the automatic patch mechanism assumed responsibility for promptly installing available patches.

Conclusion

The idea of imposing liability on software providers is seductive but misplaced. It would likely be both ineffective and destructive. Ineffective in that it assumes a business model that does not apply to the vast majority of devices connected to the Internet. Destructive in that it would be a massive disincentive to open source contributors, exacerbating the problem I discussed in Open Source Saturation. The focus should primarily be on ensuring that available patches are applied promptly. Automating the installation of patches has risks, but they seem worth accepting. Labeling products that provide a 5-year automated patch system would both motivate adoption, and clarify responsibilities.

Liability up the chain might increase Mean Time Between Failures, but it is Mean Time To Repair that is the real problem. Fixing that with liability in the current state of the chain is setting users up to fail.

LISTSERV 16.5 - CODE4LIB Archives / pinboard

RT @kiru: I forgot to post the call earlier: The Code4Lib Journal () is looking for volunteers to join its editorial committee. Deadline: 12 Oct. #code4lib

First SOTA activation / Mark Matienzo

About a month ago, I got my ham radio license, and soon after I got pretty curious about Summits on the Air (SOTA), an award scheme focused on safe and low impact portable operation from mountaintops. While I like to hike, I’m arguably a pretty casual hiker, and living in California provides a surprising number of options within 45 minutes driving time for SOTA newbies.

Evergreen Community Spotlight: Jessica Woolford / Evergreen ILS

The Evergreen Outreach Committee is pleased to announce that September’s Community Spotlight is Jessica Woolford of Bibliomation, where she is the Evergreen System Manager. Jessica has been involved with the Evergreen community since she started at Bibliomation in 2010. Her first position there was an Application Support Specialist, directly assisting Bibliomation’s libraries with training, helpdesk, and other front end concerns.

While working as a Support Specialist, she explored the Islandora project which was her first foray into Linux and back end work. She learned about OPAC development from a Bibliomation colleague, and leveraged these new skills when she was promoted to Evergreen System Manager. 

Even though she works in a more technical environment now, Jessica has maintained her connection with the libraries and end users. “I can see things from both sides now, from the developer angle and the user angle,” she says.

Jessica is well-known in the Evergreen community as being the long time leader of the Reports Interest Group, recently reconvened after a hiatus, though her affiliation with Reports was almost an accident. “I attended the Grand Rapids conference, and [my boss] asked me to speak on a panel about Reports,” Jessica relates. “I didn’t really know about Evergreen Reports but it turned out I had a knack for it – so it’s a complete accident that I ended up working with Reports!” Since that first conference, Jessica has given presentations or facilitated discussions in 8 other Evergreen conferences – only missing the 2019 conference, right after her son was born.

Jessica’s affinity for Reports has strengthened and in turn been strengthened by her knowledge of PostgreSQL. “Reports have helped me understand Postgres – and the more I know about SQL the more I know about how the Reporter works,” she says. 

As her technical skill set has expanded, Jessica has also become more active in Evergreen bug reporting and testing via Launchpad. “The community has done a great job making Launchpad more accessible and offering training to people,” she says. “There’s been amazing efforts in the last few years to take that fruit and make it hang lower.”

Jessica recommends interest groups as a way for new community members to get involved with Evergreen. “There’s so many interest groups now – just pick one, and jump in,” she says. “Take a topic you’re interested in and join an interest group – if there’s a topic you’re interested in and there isn’t an interest group, make one!”

Do you know someone in the community who deserves a bit of extra recognition? Please use this form to submit your nominations. We ask for your email in case we have any questions, but all nominations will be kept confidential.

Any questions can be directed to Andrea Buntz Neiman via abneiman@equinoxinitiative.org or abneiman in IRC.

Charting a path to a more open future. . . together / HangingTogether

Last week, representatives from OCLC Research and LIBER (the Association of European Research Libraries) presented a webinar to kick off the OCLC-LIBER Open Science Discussion Series. This discussion series, which takes place from 24 September through 5 November 2020, is based upon the LIBER Open Science Roadmap, and will help guide research libraries in envisioning the support infrastructure for Open Science (OS) and their role at local, national, and global levels.

OCLC and LIBER had initially planned a collaborative in-person workshop to take place at the OCLC Library Futures Conference (EMEARC 2020) on March 3 in Vienna. But with COVID rapidly advancing globally at that time, the event was cancelled, and we took some time to plan a larger series of webinars and discussions. 

There are a couple of key goals for our collaboration. First of all, our organizations want to jointly offer a forum for discussion and exploration, and to collectively stimulate the exchange of ideas. But secondly, we want this activity to also inform us as we seek to identify research questions that OCLC and LIBER can collaboratively address to advance Open Science. 

The LIBER Open Science Roadmap provides an excellent, well. . . roadmap. . . for this effort. The report calls upon libraries to “advocate for Open Science locally and internationally, to support Open Science through tools and services and to expand the impact of their work through collaboration and partnerships.” It also states that 

“A revolution is required: one which opens up research processes and changes mindsets in favour of a world where policies, tools and infrastructures universally support the growth and sharing of knowledge.” 

LIBER Open Science Roadmap, page 4.

The LIBER Open Science Roadmap

In the September 24 kick-off webinar, Jeannette Frey, LIBER President and Director of the Bibliothèque Cantonale et Universitaire (BCU) Lausanne, provided an overview of the seven focus areas on the LIBER Roadmap, which I will briefly sketch out here. 

Scholarly Publishing

Open access is still not the default publishing model in scholarly communications. Libraries can help move us toward that goal by initiating and supporting institutional Open Science policies, implementing library publishing efforts, and applying LIBER’s five principles for negotiations with publishers

FAIR Data

Making data findable, accessible, interoperable, and reusable (FAIR) is essential to Open Science, and Frey urged libraries to support FAIR data by investing in training and hiring to ensure we have the skills on hand. She also encouraged ongoing education on the FAIR principles, advocacy to governmental bodies, implementing local data management plan (DMP) policies, and collectively work to improve metadata and ensure it is machine readable. 

Research Infrastructure and the European Open Science Cloud (EOSC)

The European Open Science Cloud (EOSC) is an initiative of the European Commission to build the infrastructure necessary to support Open Science. Once it is rolled out, the EOSC will serve as a single source for discovering, accessing, and reusing research data from across European countries, disciplines, and platforms. Frey encouraged libraries to educate campus stakeholders, harmonize institutional policies to the EOSC, and particularly advocate for EOSC training for early career researchers. 

Metrics and Rewards

The LIBER Open Science Roadmap urges openness and transparency as the default drivers for scholarly metrics. Libraries can support the responsible development and use of research metrics by endorsing the San Francisco Declaration on Research Assessment (DORA) and the Leiden Manifesto for research metrics. Additionally, libraries play an important role in the development of next-generation metrics upholding the DORA and Leiden ideals, ultimately resulting in new methods for assessing and rewarding researchers in their careers, particularly in ways that support Open Science. 

Open Science Skills

The Report from the European Commission’s 2017 Open Science Skills Working Group emphasizes the need for researchers to acquire Open Science skills to support 21st century knowledge sharing. This includes skills such as open access publishing and creating and reusing FAIR data. Libraries have the opportunity to play a key role by developing multidisciplinary Open Science workflows, and providing training and support, particularly for early career researchers. 

Research Integrity

Research and scholarship should not only be open, it should also maintain standards of research integrity, ethics, and conduct. Libraries can support research integrity through partnership with others in their institutions in order to establish Codes of Conduct for Research Integrity, particularly through the advocacy of core OS principles like transparency and openness. They can also play an important role in training researchers about the legal and ethical aspects of scholarly communication and Open Science. 

Citizen Science

Citizen Science is the participation of the general public in the scientific research process. It is widely practiced worldwide but it is not always open. The LIBER Roadmap recommends greater alignment between Citizen Science and Open Science, with opportunities for libraries to support through infrastructure, training, and policy development. 

OCLC Is Working to Prioritize Open Content 

Rachel Frick, Executive Director of the OCLC Research Library Partnership, shared about OCLC’s effort to support Open Science. This includes the OCLC Research publication, Open Content Activities in Libraries: Same Direction, Different Trajectories—Findings from the 2018 OCLC Global Council Survey which seeks to answer a question posed by the OCLC Global Council, “What is the status of open access and open content in libraries around the globe?” The resulting report examines the current and planned open content activities of more than 500 research and university libraries worldwide and confirms that research libraries are highly involved in open content activities (97%). Furthermore, these libraries report significant plans for additional activities in the future, particularly in the areas of research data management and interactions with (digitized) open collections through statistical and machine learning techniques, i.e., “collections as data.” 

For this report, and for OCLC in general, we use a broad definition of Open Content, inclusive of content that is digital, accessible immediately and online (without technical barriers), freely available, and fully reusable. Naturally, this includes Open Science outputs, but is also expansive enough to embrace other non-scholarly content managed by libraries. 

OCLC is committed to privileging open content in order to deliver on our mission as a library cooperative.

That means increasing access to diverse open content, integrating open content into OCLC services, supporting libraries and library-driven open content, and dedicating staffing resources to supporting these efforts. 

The goal is to prioritize open content alongside licensed content, better connecting researchers quickly and seamlessly to the knowledge they need. OCLC is working with publishers worldwide to ensure that their open content is readily identified and accessible in WorldCat and OCLC’s library services. We are also working with open access directories like the Directory of Open Access Journals, HathiTrust, and the Internet Archive. Furthermore, the OCLC Digital Collection Gateway provides a tool for libraries to harvest the metadata from their local, open access repositories into WorldCat. And through our collaboration with UnPaywall, WorldCat users can now be seamlessly directed to the open resource. Today WorldCat users can filter for only open access content, a faceting option also available in other OCLC services. 

Throughout much of 2020, OCLC staff members have worked to provide extended and free e-content during the pandemic. Additionally over 80 library collections have been made openly available during the pandemic, one of the many ways that OCLC is working to support the library community during this time. 

Optimizing open content discoverability and access is heavily dependent upon metadata. In partnership with the German National Library, OCLC has worked to establish improved ways of indicating open and restricted access in MARC 21.  Being able to express openness in a machine readable way will have a big impact on discoverability. This metadata feature is now in place for new bibliographic MARC 21 records. OCLC continues to explore how to retrospectively upgrade existing records as well as how to transition to a 21st century shared entity management infrastructure for library linked data work, work being currently undertaken with financial support from the Andrew W. Mellon Foundation. 

Register for the next webinar

We will be hosting seven small, interactive group discussions over the course of the next six weeks, providing an opportunity for in-depth discussion on the seven focus areas in the LIBER Roadmap. The goal of these discussions is to collectively explore a vision and path forward for the future role of libraries in these areas. Unfortunately, seating is limited for these sessions and we’ve already reached capacity. However, we will be synthesizing these discussions and sharing with the community via blog posts here on the OCLC Research Hanging Together blog and also through the LIBER website.  We also invite you to attend the concluding wrap-up webinar on 5 November, which will provide an overview of the discussions and proposed next steps. You may register for that event here, and it is open to everyone in the scholarly communications community. 

The post Charting a path to a more open future. . . together appeared first on Hanging Together.

Pandoc / Ed Summers

I never got much past dabbling a bit in Haskell. But learning about functional programming (FP), the importance of types, lazy evaluation, and declarative programming more generally really changed how I approached programming in any language.

Previously I (along with much of the computing industry in the 1990s) had been kind of taken with object oriented programming (OOP). In OOP there is a similar attention to types, but things get kind of weird when it comes to how those types get used and orchestrated.

Functional programming starts with the very simple idea of functions which have a name, can take arguments, and return a value. Functions can use other functions within them. Functions can be composed together, where the return value of one becomes an argument for another. Things gets really interesting because in FP languages like Haskell functions are themselves values. This means that functions can take other functions as arguments, and can also return functions.

The difference between FP and OOP gets a bit blurry when you consider that values and functions are sorts of objects. FP languages can implement OOP, as in the Common Lisp Object System. But in FP objects arent’t the center of attention, they are incidental to the operation of functions. I think FP concepts helped me design software because functions do things whereas objects are. In OOP you try to design the correct hierarchy of object types (classes) thinking that it will help your program work if it’s a good model of what you need to do. But in FP you focus on the different transformations of data that need to happen. It’s almost like the philosophical change in perspective that comes from seeing the world less as a set of objects to be manipulated and more like a set of interrelated processes in motion.

Anyway, all this is a digression because I started this post only wanting to say that as much as I’ve liked learning about Haskell I never really used it in any programming projects. But I have used software written in Haskell before, most notably the amazing Pandoc. I wrote my dissertation in Markdown, and have written articles in Markdown, even this blog uses Pandoc to generate references. I’ve come to appreciate how Markdown frees me from a choice of editor, and lets me focus on the words. But it wouldn’t be possible without a tool like Pandoc that makes it easy to combine my Markdown text with a database of my citations, and generate PDF and Word Documents (sometimes EPUBs) and a host of other formats I haven’t fully taken advantage of.

One remarkable thing about Pandoc is it has felt very stable over the years. It does this one thing (document conversion and generation) very well, and it hasn’t wavered. As I’ve perused the documentation I’ve often caught myself wondering as a side thought, who is this jgm? But until today I haven’t looked him up.

It gives me such pleasure to know that jgm is John MacFarlane, a philosophy professor at the University of California at Berkeley, who studies the history and philosophy of logic. Of course he does. It makes me wonder how much of the stability of Pandoc I experienced as a user comes from the FP approach to software development that Haskell provides. Another John, John Backus, famously said in his 1977 Turing Award lecture (Backus, 2007) that FP offers a new more sustainable approach to computation:

Conventional programming languages are growing ever more enormous, but not stronger. Inherent defects at the most basic level cause them to be both fat and weak: their primitive word-at-a-time style of programming inherited from their common ancestor–the von Neumann computer, their close coupling of semantics to state transitions, their division of programming into a world of expressions and a world of statements, their inability to effectively use powerful combining forms for building new programs from existing ones, and their lack of useful mathematical properties for reasoning about programs. An alternative functional style of programming is founded on the use of combining forms for creating programs. Functional programs deal with structured data, are often nonrepetitive and nonrecursive, are hierarchically constructed, do not name their arguments, and do not require the complex machinery of procedure declarations to become generally applicable. Combining forms can use high level programs to build still higher level ones in a style not possible in conventional languages.

I don’t know if Pandoc is a good example of this or not. But with 345 contributors and 13,578 commits (and counting) it’s hard not to see it as a quiet open source software success story. Now I want to read one of Professor MacFarlane’s books :)

References

Backus, J. (2007). Can programming be liberated from the von neumann style?: A functional style and its algebra of programs. In ACM Turing Award lectures. Association for Computing Machinery. https://doi.org/10.1145/1283920.1283933

Fuzzy Matching / Ed Summers

This is just a quick post to bookmark an interesting discussion about why it’s difficult to archive Facebook, at least with current web archiving tech. Ilya Kreymer notes that Facebook’s user interface is heavily driven by HTTP POSTs to just one URL:

https://facebook.com/api/graphql/

As you can see this is the endpoint for Facebook’s GraphQL API. Unlike the typical REST API, where there are different URL names for resources, GraphQL requires a client to HTTP POST a query expressed as a JSON object in order to get back a JSON response that matches the shape of the query.

This pattern is generally known as query-by-example, where the exchange is kind of like a fill-in-the-blank MadLibs game where the client provides the fill-in-the blank statement and the service fills-in-the-blanks and returns it. Maybe Cards-Against-Humanity is a better, more contemporary example. Facebook promulgated the GraphQL standard and it’s used quite a bit now, notably on GitHub.

Anyway, most web archiving tools include a bot that wanders around some region of the web following URL links, saving what is retrieved and looking for more URLs, rinse-lather-repeat. The crawlers use HTTP GET requests to fetch representations of resources using the URL. This crawling the web with GET requests won’t really work with Facebook because the bot needs to do a POST, and needs to know what data to post. Archiving bots or tools like Webrecorder and Brozzler that load and interact with the DOM using some set of user driven or automated behaviors have much more luck recording.

But even Webrecorder has trouble playing back the archived data because it needs to determine which response is appropriate in the archive for a given user interaction in the browser. The usual lookup of a record in the WARC file fails because the [index] it uses is URL based, and (remember) all the URLs are the same. The playback software needs to factor in that POST data that was used in the request. But that POST data is generated at runtime during playback, and could be slightly different from the POST data that was used during recording.

Hence the need for fuzzy matching the POST data in order to locate the correct resource to serve up from the archived data. The problem is that rules for fuzziness need to change as Facebook changes their applications. So if the rule is to look for a particular id by name, and the name for that id changes, then the fuzzy matching will break. Or if the data includes some kind of timestamp generated at runtime during playback then that would cause a match to fail unless it was ignored.

Andy Jackson ventured that Facebook may be intentionally designing their JavaScript this way to ensure that their content doesn’t get archived. It’s hard to say for sure, but they certainly have always tried to keep users in their platform, so it wouldn’t be surprising. It’s sort of fun to imagine what catchy marketing-speak name they might use for the technique in meetings, like brand-loyalty-policy, customer-content-guards, content-protection-framework, boundary-rules or…build-that-wall. Ok, it’s not really that much fun.

I’m not actually sure if Facebook’s GraphQL interface is something that Webrecorder has tackled yet, since I don’t see /api/graphql in the current list of rules. But there does seem to be an /api/graphqlbatch in there. But even if there are good rules if the archive is large there could be a lot of records to sift through if the fuzzy parameters aren’t somehow baked into the index. Obviously I’m just thinking out loud here, but I thought I’d jot this all down as a reminder to dig a little deeper in the future.

One last note is that Ilya led a presentation at a recent IIPC meeting to socialize some of the problems around reliably being able to crawl the social web. He suggested that the problem isn’t just technical and has a social component. People interested in archiving the social web need to work together to maintain them as the inevitable changes occur on the platforms which complicate recording and playback. He described how the Webrecorder project is setting up a continuous integration tools to regularly run a test suite checking that captures are working. When they fail we need a group of dedicated people on hand who can notice the failure and work on a fix and get it deployed.

I think Ilya is right. But part of the opportunity here is to make this community a bit broader a bit broader than the national libraries and cultural heritage organizations in the IIPC. Although starting there makes a lot of sense. One possible partner would be media organizations who routinely cite social media in their own content. Having accurate, authoritative web archives for the content they are citing seems like a very important thing to have. Innovations like Webrecorder’s replayweb.page could offer newsrooms a rich way of presenting social media content from platforms like Twitter and Facebook without being entirely reliant on the content staying available.

Endangering Data Interview with Thomas Padilla / Digital Library Federation

Thomas PadillaThomas Padilla is Interim Head, Knowledge Production at the University of Nevada Las Vegas. He consults, publishes, presents, and teaches widely on digital strategy, cultural heritage collections, data literacy, digital scholarship, and data curation. He is Principal Investigator of the Andrew W. Mellon Foundation supported Collections as Data: Part to Whole and past Principal Investigator of the Institute of Museum and Library Services supported, Always Already Computational: Collections as Data. He is the author of the library community research agenda Responsible Operations: Data Science, Machine Learning, and AI in Libraries.


Tell us a bit about your projects and how you became interested in cultural heritage data and algorithmic and AI approaches to curation and research?

I am interested in cultivating GLAM community capacity around responsible, ethically grounded computational engagement with data. Some of that interest has to do with positionality – me being a mixed race, first generation college student, from a working class background. I’m constantly trying to find ways for my labor to address historic and contemporary marginalization.  

Always Already Computational: Collections as Data was an Institute of Museum and Library Services supported effort that iteratively developed a range of deliverables meant to spark capacity around principles-driven creation of computationally amenable collections . In that work I was very lucky to be joined by Laurie Allen, Stewart Varner, Hannah Frost, Elizabeth Russey Roke, and Sarah Potvin. With a better sense of community need I later embarked on Collections as Data: Part to Whole – an effort supported by the Andrew W. Mellon Foundation. Part to Whole is essentially a regranting and cohort development program. Hannah Scates Kettler, Stewart Varner, Yasmeen Shorish, and I are currently working with 12 institutions (large R1s, historical societies, museums, State-based digital libraries, and more) to develop models that guide collections as data production and models that help organizations develop sustainable services around collections as data. 

Over the course of 2019 I worked as a Practitioner Research in Residence at OCLC Research, interviewing and holding convenings for professionals within and outside of libraries in the United States. This work culminated in the community research agenda Responsible Operations: Data Science, Machine Learning, and AI in Libraries. I felt a lot of pressure to get this work right. I did not want to write some breathless utopian endorsement of AI. Any success I have in that regard is due to the wisdom of the community, any failures are mine. The library community in the United States feels like it has reached a certain level of awareness regarding the pitfalls of AI, helped considerably by the work of Safiya Noble, practitioners like Jason Clark, and an understanding that library community practices have long held the potential to systematically impact communities in a discriminatory manner. 

Rumman Chowdhury introduced me to the concept of responsible operations which was a perfect way to encapsulate where it feels like we are as a community. A number of us want to use AI to strengthen library services but only if it doesn’t compromise commitments to cultivating a more equitable society. Of course, no community is uniform in their beliefs, and libraries are no exception. Some at junior and senior levels have quietly – and not so quietly – expressed the view that preoccupation with responsibility or ethics is orthogonal to progress and allows the library community to be beat in some imagined race with the private sector. These are dangerous views and the stakes are real. We must act accordingly.

 

For years, many in the library and cultural heritage world have critiqued digitization efforts as replicating (or even accelerating) long-standing biases that center on white, male, and US/Eurocentric collection patterns, viewpoints, and catalog descriptions. In both the Santa Barbara Statement on Collections as Data and the Always Already Computational: Collections as Data final report, you and your partners have pointed to a crucial need for critical engagement with biases and shortcomings and an intention to address the needs of vulnerable communities represented in the materials. What are some examples of these approaches that you’ve found to be successful?

Collections as Data: Part to Whole requires that regrantees demonstrate capacity to serve underrepresented communities – a consideration that spans thematic coverage of the collection in question, community buy-in, and a demonstrated commitment to ethical principles that work against the potential for harm. Examples of Part to Whole work addressing your questions include but are not limited to Kim Pham’s effort at the University of Denver to develop a terms of use for collections as data and Amanda Henley and Maria Estorino’s effort at the University of North Carolina Chapel Hill to discover and increase access to Jim Crow laws and other racially-based legislation in North Carolina between Reconstruction and the Civil Rights Movement. 

More broadly, there is so much good work being done. I am super inspired by Dorothy Berry’s advocacy at Harvard, resulting in a 2020-2021 exclusive focus on the digitization of Black American History. I am inspired by the Global Indigenous Data Alliance’s CARE Principles, co-led by Stephanie Russo Carroll and Maui Hudson. A response to the FAIR Principles, CARE problematizes FAIR’s, “focus on characteristics of data that will facilitate increased data sharing among entities while ignoring power differentials and historical contexts.” A CARE principle like indigenous “Authority to Control” presents a difficult and needed challenge to the cultural heritage community. What could it look like for more institutions to relinquish control of collections to their rightful owners? It is not often the case that capital – stolen or not – is returned and I imagine even the most well meaning libraries will struggle mightily within their own hierarchies to make this happen. I appreciate Eun Seo Jo and Timnit Gebru’s effort to bridge the archives community and machine learning community. Attempts to thread the needle on cross-domain work is always tough but it is definitely needed.  T-Kay Sangwand’s Preservation is Political: Enacting Contributive Justice and Decolonizing Transnational Archival Collaborations is a must read. Michelle Caswell’s work – as a whole –  is fundamental to improving efforts in these spaces. 

 

In your Responsible Operations: Data Science, Machine Learning, and AI in Libraries report, you cite Nicole Coleman’s suggestion that, in regard to machine learning, libraries might be better served to “manage bias” rather than attempt (or claim) to eliminate it. Can you talk a little bit more about that framing and why you feel it’s productive in the library world?

I think people heard enough from me about it in Responsible Operations. I encourage folks to read Nicole’s subsequently published article, Managing Bias When Library Collections Become Data

 

What do you think the role for library and information professionals is in larger conversations about “endangering data” and algorithmic and data justice?

I think there are many of us doing this work. While former Illinois University Librarian Paula Kaufman’s testimony before Congress (pg. 77) against a Federal surveillance program gives me chills every time I read it, I often end up thinking about what combination of colleagues, mentors, institutional culture, and personal and professional ethics were in place to make that act of bravery possible. That naturally leads to thinking about what it would take to cultivate similarly principled acts, large and small, among my colleagues. That seems like a promising road to head down. 

 

Is there anything else you want to add, or any work or other projects you want readers to know about?

I appreciate the opportunity to share thoughts during Endangered Data Week. In addition to the people and projects mentioned above, I encourage folks to check out the Indigenous Protocol and Artificial Intelligence Position Paper; Mozilla’s recent work on Data for Empowerment, and Ruha Benjamin’s incredibly powerful Data4BlackLives keynote.

The post Endangering Data Interview with Thomas Padilla appeared first on DLF.

Lucidworks Announces “Easy Button” to Simplify Kubernetes Operations for Fusion (Powered by Platform9 technologies) / Lucidworks

Running Kubernetes for enterprise scale search is a huge operational challenge. For customers who don’t have the capability or operations expertise to manage Kubernetes in house, they can now deploy Fusion 5 and offload operational complexity by using Lucidworks Kubernetes Service.

The post Lucidworks Announces “Easy Button” to Simplify Kubernetes Operations for Fusion (Powered by Platform9 technologies) appeared first on Lucidworks.

Tech Tip: Sorting Cases on Analysis Fields / Harvard Library Innovation Lab

Last month we announced seven new data fields in the Caselaw Access Project. Here are API calls to the cases endpoint that demonstrate how to sort on these fields. Note the query strings, especially the use of the minus sign (-) to reverse order.

All cases ordered by PageRank, a measure of significance, in reverse order, so the most significant come first:

?ordering=-analysis.pagerank.percentile

All cases sorted by word count, from longest to shortest:

?ordering=-analysis.word_count

Introducing the Evergreen Bootstrap OPAC / Evergreen ILS

Guest post by Chris Burton of the Niagara Falls Public Library, who did most of the design work for the Bootstrap OPAC skin that will be available in the upcoming release of Evergreen 3.6. The current TPAC skin will remain the default OPAC skin for 3.6, but it is expected that the new Bootstrap skin will become the default in a future release of Evergreen.

This is a redesign of the current OPAC in the Evergreen ILS meant to enhance the navigation, responsiveness, and accessibility (WCAG 2.0 AA) of the OPAC.

W3C Validator was used to ensure the HTML is valid and the WAVE Accessibility Evaluation Tool was used to assist in ensuring accessibility of the OPAC. If you are interested in either, or the guidelines for accessibility, they are referenced below.

What’s New?

This update includes a major functionality update to the My Account areas where navigation has been all combined into one area and what used to be select list options are now buttons.

Also, advanced search filters have been changed to checkboxes to replace the select lists for ease of use and making the options accessible to all.

There are a number of less notable changes that I’m sure you will notice just by using the OPAC. Made with full responsiveness in mind, it works on any size device.

This change utilizes a number of packages that are well documented and offer a wide amount of options for easy OPAC customization. (References Below)

  • Bootstrap 4 for style and responsiveness to allow for a smooth experience on any device size. Being enabled makes is so that changes are even easier to obtain without needing to customize something.
  • Fontawesome adds icons to the OPAC to allow options to become more contextual, helping break down language barriers.
  • Tooltips are added along with Bootstrap 4 and it’s dependencies. This allows the ? (help) areas in the opac to hold helpful information.
  • Bootstrap datepicker, to ensure easy entry of dates and easy formatting to ensure the data comes through correctly

Dev Tips

 

Enabling the Template

Open the virtual host configuration in apache at /etc/apache2/eg_vhost.conf

Find this line

# Templates will be loaded from the following paths in reverse order.
PerlAddVar OILSWebTemplatePath "/openils/var/templates"

And add below it 

PerlAddVar OILSWebTemplatePath "/openils/var/templates-bootstrap"

Then restart Apache: systemctl restart apache2.service.

jQuery

jQuery is required for some of the added functionality so it has been moved from optional to required. The existing simple JS function $(s) had to be changed to get(s) to stop conflicting with jQuery.

Grid System

The new template is built on the Bootstrap 4 Grid which helps keep elements responsive for all device sizes. Take a look to learn more about it here: https://getbootstrap.com/docs/4.0/layout/grid/

Snippets

Tooltips were updated to Bootstrap tooltips. The script should be present only on pages that run them.  https://getbootstrap.com/docs/4.0/components/tooltips/   

<!–data-html allows use of HTML tags in the tooltip–>
<a href=”#” title=”text to show on tooltip” data-html=”true” data-toggle=”tooltip”>
<i class=”fas fa-question-circle” aria-hidden=”true”></i>
</a>
<!–This is needed to activate the tooltips on the page and is activated Globally by default in js.tt2–>
<script>
$jQuery(document).ready(function(){  $jQuery(‘[data-toggle=”tooltip”]’).tooltip();});
</script>

Bootstrap Datepicker is supported. Here is a simple date example” https://bootstrap-datepicker.readthedocs.io/en/latest/

<div class=”input-group date” data-provide=”datepicker”>
<input type=”text” class=”form-control” name=”expire_time”  value=”[% expire_time | html %]” data-date-format=”mm/dd/yyyy”>
<div class=”input-group-addon”>
<span class=”glyphicon glyphicon-th”></span>
</div>
</div>

Reference

Delete all S3 key versions with ruby AWS SDK v3 / Jonathan Rochkind

If your S3 bucket is versioned, then deleting an object from s3 will leave a previous version there, as a sort of undo history. You may have a “noncurrent expiration lifecycle policy” set which will delete the old versions after so many days, but within that window, they are there.

What if you were deleting something that accidentally included some kind of sensitive or confidential information, and you really want it gone?

To make matters worse, if your bucket is public, the version is public too, and can be requested by an unauthenticated user that has the URL including a versionID, with a URL that looks something like: https://mybucket.s3.amazonaws.com/path/to/someting.pdf?versionId=ZyxTgv_pQAtUS8QGBIlTY4eKmANAYwHT To be fair, it would be pretty hard to “guess” this versionID! But if it’s really sensitive info, that might not be good enough.

It was a bit tricky for me to figure out how to do this with the latest version of ruby SDK (as I write, “v3“, googling sometimes gave old versions).

It turns out you need to first retrieve a list of all versions with bucket.object_versions . With no arg, that will return ALL the versions in the bucket, which could be a lot to retrieve, not what you want when focused on just deleting certain things.

If you wanted to delete all versions in a certain “directory”, that’s actually easiest of all:

s3_bucket.object_versions(prefix: "path/to/").batch_delete!

But what if you want to delete all versions from a specific key? As far as I can tell, this is trickier than it should be.

# danger! This may delete more than you wanted
s3_bucket.object_versions(prefix: "path/to/something.doc").batch_delete!

Because of how S3 “paths” (which are really just prefixes) work, that will ALSO delete all versions for path/to/something.doc2 or path/to/something.docdocdoc etc, for anything else with that as a prefix. There probably aren’t keys like that in your bucket, but that seems dangerously sloppy to assume, that’s how we get super weird bugs later.

I guess there’d be no better way than this?

key = "path/to/something.doc"
s3_bucket.object_versions(prefix: key).each do |object_version|
  object_version.delete if object_version.object_key == key
end

Is there anyone reading this who knows more about this than me, and can say if there’s a better way, or confirm if there isn’t?

Catching up with past NDSA Innovation Awards Winners: AIMS Project / Digital Library Federation

The AIMS Project (An Inter-Institutional Model for Stewardship) won a 2012 Innovation Award in the Project category. AIMS participants were recognized for their work developing a framework for stewarding born-digital content and filling the gap between applying standards such as OAIS and the necessary workflows and tools for implementation. The responses to this Q&A were provided by AIMS Project participants from Stanford University, University of Hull, and University of Virginia.

What have you/project teams been doing since receiving an NDSA Innovation Award?

Stanford: Made the digital archivist position continuing (aka “real”), 2+ years ago we added another full-time digital archivist. DLSS & Special Collections collaborated to build our capacity and procedures for acquiring and processing and delivering b-d materials. Received 3 grants to develop our open-source email processing/delivery platform (ePADD project, discovery online). This last has morphed into a new grant application by Harvard & the Univ. of Manchester (w/ us as consultants) to further develop ePADD with more preservation elements.

  • Total born-digital collections acquired since 2012: ~140 accessions and ~250 TB. Born-digital processing projects (processed and in progress) include: Amos Gitai, Dorothy Fadiman, Helen & Newton Harrison, Ted Nelson, New Dimensions, Silicon Genesis, Ruth Asawa, Lourdes Portillo. Other collection acquisition highlights (unprocessed) include: Rebecca Solnit, Lynn Hershman-Leeson, Marlon Riggs, Bob Stein, David Bohrman.
  • Through the born-digital program, Stanford and Virginia are members of the Software Preservation Network and both nodes for the Emulation as a Service Infrastructure (EaaSI) project
  • Stanford DLSS and Special Collections has also worked together with a number of other institutions, including University of Michigan, Duke University, Indiana University, and Princeton University to develop ArcLight, an open source discovery and delivery environment for archives.
  • After Yale, Mark Matienzo served as the Director of Technology for the Digital Public Library of America, and joined Stanford in 2016.

Virginia: We have also made digital preservation and management a priority by making the AIMS position permanent.  We have been fortunate to have both digital archivists and a digital preservation librarian as full time positions.

Hull: Simon Wilson retained responsibility for born-digital archives when he returned to his substantive role as Senior Archivist. Hull retained a high profile across the UK with lots of advocacy for encouraging organisations to take practical steps with digital preservation and proposed that digital archives could be undertaken as a shared-service between multiple archive services.

  • The project gave us a huge boost of confidence with increased advocacy within the institution and lead to the inclusion of born-digital archives as key activity for the library service
  • Colleagues from Hull collaborated with the University of York in a project funded by JISC to look at the suitability of Archivematica to support research data management activity – an opportunity to review and identify similarities and differences between research data and born-digital archives
  • Advocated and secured funding from a range of sources including The National Archives to create an archive for Hull UK City of Culture 2017

What did receiving the NDSA Innovation award in 2012 for AIMS mean to you and/or the project team?

Recognition of work that was critical to the basic operations within archives then and now. This was an international group that came together, identified significant challenges, and developed strategies to address them.

The Award also helped introduce and integrate our work into the larger preservation community. Since 2012, Virginia, for example, has been very active in the NDSA with two staff being elected as Coordinating Committee Chairs and several others being chairs of Interest and Working Groups.

The encouragement of working with others for mutual benefit – a legacy that has remained central to our philosophy. Simon Wilson served on the Digital Preservation Coalition’s Partnership and Sustainability Sub-committee (2016-2019) and contributed to the international curatorial team reviewing NDSA Levels of Digital Preservation

What efforts, advances, or ideas over the last 5-8 years have caught your attention or interest in the area of digital stewardship?

There are too many to note but the rise of Distributed Digital Preservation Services has made significant advances to help many organizations understand and implement digital preservation in a cost effective manner. Software preservation and emulation have also risen to the fore based on much of the scholarly foundations of folks like those at MITH. With the rise of cloud services, emulated environments are now much more standardized than they were in the AIMS years.

The AIMS project was a significant collaborative and technical endeavor. What components of the project do you think have sustained or grown in the digital stewardship community over time? What ideas or work from the project had you hoped would gain traction in the community, but did not quite catch on?

We still live in hope of an integrated hierarchical collections discovery platform and UI. Entities like the DPLA, though one of the largest digital portals in the world, still lack the means to represent hierarchical collections. Much of our archival materials (including born digital) are difficult to discover and access.

What are some priorities or challenges you see for digital stewardship?

Better integration of new technologies such as augmented reality (which includes artificial intelligence and machine learning). There is too much data being produced for humans to manage themselves.

Metadata is still largely siloed by organization and efforts to integrate and iterate metadata is still a major challenge for the library and archives professions.

Digital preservation is still a major challenge for any organization that manages digital content. Much of the funding still comes from collections budgets and a shift to consider preservation akin to infrastructure (like electricity) is the only way we will be able to scale to meet the challenge of preserving the cultural record.

Hull’s experience has been very dependent on project funding and this has seen phases of activity / in-activity which has demonstrated the need for dedicated resource to transition into a service which can be maintained though for the long term it should be considered part of business as usual with all members of the team contributing to this strand of activity.

The post Catching up with past NDSA Innovation Awards Winners: AIMS Project appeared first on DLF.

Moxie Marlinspike On Decentralization / David Rosenthal

The Ecosystem Is Moving: Challenges For Distributed And Decentralized Technology is a talk by Moxie Marlinspike that anyone interested in the movement to re-decentralize the Internet should watch and think about. Marlinspike concludes "I'm not entirely optimistic about the future of decentralized systems, but I'd also love to be proven wrong".

I spent nearly two decades building and operating in production the LOCKSS system, a small-ish system that was intended, but never quite managed, to be completely decentralized. I agree with Marlinspike's conclusion, and have been writing with this attitude at least 2014's Economies Of Scale In Peer-to-Peer Networks. It is always comforting to find someone coming to the same conclusion via a completely different route, as with scalability expert Todd Hoff in 2018 and now Moxie Marlinspike based on his experience building the Signal encrypted messaging system. Below the fold I contrast his reasons for skepticism with mine.

Marlinspike's talk is in two parts. The theme of the first is that [4:33] "user expectations of software are evolving rapidly, and evolving rapidly is in conflict with decentralization". He uses a raft of examples of centralized systems that have out-evolved their decentralized competitors, including Slack vs. IRC, Facebook vs. e-mail, WhatsApp vs. XMPP. The [6:04] decentralized protocols are stuck in time, whereas the centralized protocols are constantly iterating.

What he doesn't say, but that reinforces his point, is that many of the techniques routinely used by centralized systems to improve the user experience, such as A/B testing, are difficult if not impossible to apply to decentralized systems. Further, decentralization imposes significant overheads compared to a centralized version of the same system. The idea that, for example, "Twitter but decentralized" would take market share away from Twitter is implausible. It would lack Twitter's ways of finding out what its users want, it would be much slower than Twitter at implementing and deploying those features, and once deployed they would be slower.

One major reason that it would be slower was pointed out by Paul Vixie in his 2014 article Rate-limiting State. Unless decentralized systems implement rate limits, as Bitcoin does, they are vulnerable to what are, in effect, DDoS attacks of various kinds. I discussed this problem in Rate-Limits. The result is that decentralized systems will be slowed not merely by the overheads of decentralization, but by the need to artificially slow the system as a defense measure.

Marlinspike's second theme starts when he asks [7:22] "why do we want decentralization anyway?". He lays out [7:32] four goals touted by advocates of decentralization:
  • Privacy
  • Censorship resistance
  • Availability
  • Control
He examines each in turn, making the case that centralized systems deliver a better user experience than decentralized systems along each of these axes. I can only summarize his arguments; you should watch each segment carefully.

Privacy [8:01]

Marlinspike starts by pointing out that "Most decentralized systems in the world are not encrypted by default". This is because the key and certificate management problems are significantly harder in a P2P system, which needs to implement something like PGP's Web of Trust, than a centralized system.

But what advocates of decentralization mean by privacy is both data privacy implemented by encryption, and metadata privacy implemented by "data ownership". This implies that each user owns and operates her own service which contains her data. Marlinspike comments that this seems antiquated, "left over from a time when computers were for computer people". The vast majority of users lack the skills necessary to do this. Although Marlinspike does have the necessary skills, and [9:57] does run his own e-mail server, this doesn't provide meaningful data or metadata privacy because "every e-mail I send or receive has gmail on the other end of it". Thus Google has a copy of (almost) every one of his e-mails.

As he says, real data protection requires end-to-end encryption, but metadata protection requires innovation. Both will happen faster in centralized systems because they can change faster. Signal provides metadata protection in the form of private groups, private contact discovery and sealed sender, so the centralized service has no visibility into group state or membership, or who is talking to whom. Marlinspike provides [10:52] a fascinating description of the cryptography behind private groups.

He says [16:01] "P2P is not necessarily privacy-preserving". Originally, Signal's voice and video calls operated on a P2P basis, with direct contact between the parties. But users said "do you mean someone can just call me and get my IP address?" So now they are routed via the service but, since they are end-to-end encrypted, it cannot see the content. It does know the parties' IP addresses, but an attacker would have to compromise the server to identify their IP addresses unambiguously.

Censorship resistance [17:09]

Marlinspike's model of censorship is that the censor, for example the Great Firewall of China, blocks access to services of which it disapproves. The problem is that if a service can be discovered by the user, it can be discovered by the censor. And, given automation, even if there are multiple providers of the service, it is likely that the censor can discover them at least as fast as the users, leading to a game of whack-a-mole. But if users are identified by each provider, whenever they are forced by the censor to switch they have to reconstruct their social network. This is an asymmetric game, where the cost to the censor is much less than the cost to the users.

Centralized services such as WhatsApp and Signal use techniques such as proxy sharding (each user can discover only a small subset of the access points) to make it hard for the censor to discover all the service access points quickly, and domain fronting to make it costly to block the access points the censor does discover. But the basic requirement for defending against this kind of censorship is [21:07] rapid response, which is difficult in a decentralized system.

Marlinspike doesn't discuss the other type of censorship resistance, resistance to data being deleted from decentralized systems, such as blockchains.

Availability [21:31]

In his brief discussion, Marlinspike uses the example of sharding a database between two data centers, which halves the mean time between failures for the system as a whole.

This is somewhat misleading. In his example, the system has gone from a binary failure model, it is either up or down, to a model where failures degrade the system rather than cause complete failure. In many cases this is preferable, especially if there are large numbers of shards so a failure degrades the system only slightly. Fault-tolerance can be an important feature of decentralized systems (e.g. LOCKSS). But the fault-tolerance comes at two kinds of cost, the cost of replication, and the cost of coordinating between the replicas. Done right, decentralization can improve both fault-tolerance and resilience against attack, but only at significant cost in resources and performance.

Control [22:26]

Marlinspike starts "People feel the Internet is this terrible place, in ways I don't think people used to feel, ... and a lot of this comes down to a feeling that we have a lack of control." He continues by discussing two ways decentralization advocates suggest users can exert control, switching among federated services, and extensibility.

In a federated environment, different services can behave differently, so when one no longer satisfies a user's need, she can switch to another. Marlinspike assumes that the user's identity is per-service, as it is for example with e-mail (user@example.com). This does make switching difficult as doing so requires the user to rebuild their social graph. He observes that many people still use Yahoo mail!

His assumption does cover many cases, but it is possible for decentralized systems to share a single user-generated identity (Self-sovereign identity). An example is the use of a public key as an identity.

His example of a [25:24] "protocol that's designed to be extended, so that people can modify the technology in ways that meet their needs" is XMPP, which as he says ended up as a morass of XEPs. The result was a lot of uncertainty in the user experience - "you want to send a video, there's a XEP for that, does the recipient support that?". And despite its extensibility, it couldn't adapt to major changes like mobile environments. The result wasn't control, since XEP extensions provided little value unless they were adopted everywhere. Similarly, he points to Bitcoin, where extensibility takes the form of forks, leading to fragmentation. This has more to do with open source than decentralization, which the cryptocurrency world has failed at.

Conclusion

Marlinspike concludes that the problem is that developing and deploying technology involves "buildings full or rooms full of people sitting in front of computers 8 hours/day every day forever". To change technology so it serves our needs better, what is needed is to make developing and deploying technology easier, which isn't what decentralization does.

Marlinspike vs. Me

My skepticism was laid out in, among others, It Isn't About The Technology, Decentralized Web Summit2018: Quick Takes and Special Report on Decentralizing the Internet. Then I was asked to summarize what would be needed for success apart from working technology (which we pretty much have)? My answer, in What Does The Decentralized Web Need? was four things:
  • A sustainable business model. A decentralized system in which all nodes run the same software isn't decentralized. A truly decentralized system needs to be supported by an ecosystem with multiple suppliers, each with a viable business model, and none big enough to dominate the market. As W. Brian Arthur demonstrated in 1994, increasing returns to scale make this hard to achieve in technology markets. And almost the only semi-viable business model for small Web companies is advertising, with really strong increasing returns to scale.
  • Source
    Anti-trust enforcement. As Steve Faktor wrote:
    It turns out that startups are Trojan horses. We think of them as revolutionaries when in fact, they’re the farm team for the establishment.
    These days startups get bought by the incumbent giants before they can become big and very profitable, and thus pose a threat to the incumbents. Without a return to effective anti-trust enforcement, this is what would happen if, despite the odds, a decentralized system succeeded.
  • The killer app. I wrote:
    The killer app will not be "[centralized app] but decentralized", because it won't be as good as [centralized app]. Even if it were, these days who needs Second Life, let alone "Second Life, but on the blockchain"? It has to be something that users need, but that cannot be implemented by a centralized system.
    It is really hard to find an application that can't be implemented on a centralized system, and even harder to find one of them that users would actually want.
  • A way to remove content. I wrote:
    Unfortunately, politicians love to pose as defending their constituents from bad people by passings laws censoring content on the Web, preferably by forcing the incumbent platforms to do it for them. Laws against child porn and "terrorism", and for the "right to be forgotten", "protection" of personal information, and "protection" of intellectual property all require Web publishing systems to implement some means for removing content.
    ...
    In the absence of mechanisms that enable censorship, it won't just be the incumbent platforms trying to kill our new, small companies, it will be governments.
    Removing content from a well-designed decentralized system is hard to implement, which is why the advocates believe they are censorship-resistant. But succeeding in the face of both the incumbent platforms and governments is unlikely.
Ether miners 07/09/19
Finally, if the decentralized system is implemented, deployed and becomes successful, it needs to stay decentralized. As we see with by far the most prominent decentralized technology, blockchains, this never happens. As I described in 2014's Economies of Scale in Peer-to-Peer Networks, very powerful economic forces drive centralization of a successful decentralized system.

As you can see, Marlinspike's arguments are based largely on technical issues, whereas mine are based largely on economic issues. But we agree that the fundamental problem is that decentralized systems inherently provide users a worse experience than centralized systems along the axes that the vast majority of users care about. We each place stress on a different set of factors causing this. Marlinspike makes a strong case that they provide a worse experience even along the axes that the decentralized advocates claim to care about. I make the case that even if they defeat the odds and succeed, like blockchains they will not remain actually decentralized.

Open Data Day 2021 will take place on Saturday 6th March / Open Knowledge Foundation

Open Data Day 2021

We are pleased to announce that Open Data Day 2021 will take place on Saturday 6th March.

Open Data Day is the annual global celebration of open data facilitated by the Open Knowledge Foundation. The Open Data Day website is opendataday.org.

Groups from around the world create local events on the day where they will use open data in their communities. It is an opportunity to show the benefits of open data and encourage the adoption of open data policies in government, business and civil society.

In March 2020, more than 300 events took place across the world to mark the tenth Open Data Day despite some events having to shift online due to event restrictions relating to the COVID-19 pandemic.

Thanks to the generous support of our funders – Datopian, the Foreign & Commonwealth Office, Hivos, the Latin American Open Data Initiative (ILDA), Mapbox, Open Contracting Partnership and Resource Watch – we were able to give out more than 60 mini-grants to support the running of great community events on Open Data Day 2020. 

Learn all about those events and discover organisations celebrating open data near you by reading our round-up blogpost.

If you or your organisation would like to give financial support for Open Data Day or would be interested in sponsoring our mini-grant scheme, please get in touch by emailing opendataday@okfn.org. We will announce more details about the 2021 mini-grant scheme in the coming months.

For Open Data Day 2021, you can connect with others and spread the word using the #OpenDataDay or #ODD2021 hashtags. Alternatively you can join the Google Group to ask for advice or share tips.

By March 2021, we hope that in-person events will be able to take place in many locations but we know that differing levels of COVID-19 restrictions will be in force in a number of countries so we are looking at how best we can support the organisation of more virtual events.

Find out more about Open Data Day by visiting opendataday.org where you can also add your event to the global map, find recommended data resources and use a free logo generator to create a logo to help your city mark the event.

Less is (sometimes) More / Ed Summers

Below is a short presentation that I prepared for iPRES 2020 (a.k.a #WeMissiPRES) which was held remotely due to the Coronavirus pandemic.


In the next 9 minutes I hope to convince you that a text file of numbers is an important resource for archiving the web. Yes, that’s right just a list of numbers like this:

My hat is off to the organizers because I couldn’t have asked for a better person to speak after than Rhiannon, since she just presented on the topic of significant properties and OAIS. Hopefully you will see the connection too in a moment.

There doesn’t appear to be anything significant about these numbers. What possible preservation value could a list of numbers like this have for archiving the web?

Of course significant properties have been the topic of significant critique from the digital preservation community. In 2006 Chris Rusbridge noted that:

.. there is no way of precisely defining the designated community, and similarly no way of foretelling the properties that future users might deem significant. This leads to pressure for preservation that must be faithful to the original in all respects. (Rusbridge, 2006)

And this pressure to remain faithful to the “original” can sometimes work perniciously to guarantee that instead nothing is preserved. It’s all or nothing – and most of the time that means nothing.

15 years ago John Kunze (who has an uncanny ability for naming things) gave a talk here at iPRES titled Future Proofing the Web in which he introduced the idea of “preservation through desiccation”. He drew attention to the properties of paper that made is such a successful preservation medium, and asked us to consider the venerable IETF RFC standards archive which used plain text files without fonts, graphics, colors, diacritics, but which retained “essential cultural value”. Part of the argument John made was:

The simplest technologies to maintain and understand today are the simplest to carry forward and to recreate in the future.

Today this principle is known as minimal computing–at least in some digital humanities circles. But the idea goes back further to the earlyish days of the web, when in 1998 Tim Berners-Lee wrote down the Principle of Least Power to describe his process for designing web standards like HTML:

When designing computer systems, one is often faced with a choice between using a more or less powerful language for publishing information, for expressing constraints, or for solving some problem. This finding explores tradeoffs relating the choice of language to reusability of information. The “Rule of Least Power” suggests choosing the least powerful language suitable for a given purpose.

Ok, so what does all this have to do with a list of numbers? To understand that I need to quickly tell you about three interrelated problems we encountered on the Documenting the Now project (they should sound familiar). Documenting the Now is a project (thank you Mellon Foundation) that is cultivating a community of practice for social media and web archiving that centers the rights, safety and voices of content creators. The project started in 2014 in the wake of the murder of Michael Brown in Ferguson, Missouri with the recognition that:

  1. Social media presents a huge opportunity for documenting previously undocumented historical events. However cultural organizations often (rightly) steer clear of engaging in it because of concerns about how to provide meaningful access without harming the people doing the documentation. You may remember this subject being described last year at iPRES by Michelle Caswell’s in her keynote: Whose Digital Preservation?.
  2. Researchers of all disciplinary stripes routinely create collections of social media for use as data in their research. But by and large they do not provide access to these collections because social media platforms forbid it.
  3. Content creators in social media have little control over how their data is being used in archives, and instead are the subject of widespread surveillance capitalism (Zuboff, 2015).

Why would we want to wade into this river you might ask? Honestly, it was the voices of the activists in Ferguson that kept us going as we tried to find what we could do so that their work was not forgotten. It is worth stating clearly at this point, that there is no technical-fix for this problem. Memory is a people problem. Tools can help (and hurt), but there is no silver bullet (Brooks, 1975; Stiegler, 2012 ).

Over the past five years we’ve developed a few tools that can be used separately or in combination to address parts of these problems given the right set of actors to use them responsibly. Here’s the basic intervention we made while focused on the social media platform Twitter, which was so critical to documenting the events in Ferguson:

  1. Twitter do not allow data to be collected from them and then reshared with third parties. It’s bad for business, because they want to sell it. But they do allow the sharing tweet identifiers (long numbers like above), and explicitly encourage academic researchers to do this. Why not encourage the sharing of tweet id datasets in digital repositories and provide a view into them as a whole. That’s why we created The Catalog.

  2. But how do you create these lists of tweet identifiers? And what would you do with a list once you downloaded them? We created a few tools, mostly twarc for collecting data from the Twitter API and Hydrator which lets you turn those identifiers back into data again.

  3. Ok, fine. But what about the rights of content creators? What say do they have in how their data is collected? First, twarc only collects public tweets. So if their account is protected it won’t show up in the filter stream or search API endpoits that twarc uses. But the same is also true of the API endpoint that the Hydrator uses. If a tweet id dataset is published and then the creator decides to delete it or protect their account the data can no longer be “hydrated”. This gives some measure of agency back to content creators.

This obviously isn’t a perfect solution because many content creators need more control, and some need less (we’re working on that too). Researchers studying things like disinformation campaigns won’t be happy with the deletes that go missing from hydrated datasets. But the Catalog’s primary purpose is to serve as a clearinghouse for where these datasets live in fuller representations in repositories. I’m normally neutral on OAIS but in this case I think its actually useful to consider the tweet identifiers as an OAIS Dissemination Information Package (DIP). Using the contact information in the Catalog it’s within the realm of possibility to gain access to the original data by reaching out and becoming a project partner rather than a third party.

But rather than convincing you that the work we’ve done on Documenting the Now is the bees knees, the cat’s meow, or a real humdinger (sorry I got lost in a thesaurus) I hope to have convinced you that (sometimes) less is more. Strategically sharing less data can serve the interests of digital preservation and access. Less isn’t just a matter of technical sustainability but it’s also lever (Shilton, 2012) that we have at our disposal when we consider the positionality of our memory work. Digital preservation isn’t always about the highest resolution representation with the most significant properties. Use this value lever wisely!


Here is an audio version of this post with some slides. Spoiler Alert: there are slides containing a bee, a cat and a bell.


References

Brooks, F. (1975). The mythical man month. Addison-Wesley.

Rusbridge, C. (2006). Excuse me ... some digital preservation fallacies. Ariadne, (46). Retrieved from http://www.ariadne.ac.uk/issue/46/rusbridge/

Shilton, K. (2012). Values levers: Building ethics into design. Science, Technology & Human Values, 374–397.

Stiegler, B. (2012). Relational ecology and the digital pharmakon. Culture Machine, 13. Retrieved from https://culturemachine.net/wp-content/uploads/2019/01/464-1026-1-PB.pdf

Zuboff, S. (2015). Big other: Surveillance capitalism and the prospects of an information civilization. Journal of Information Technology, 30(1), 75–89.

Harnessing the Power of OrCam / Information Technology and Libraries

The OrCam reader is an AI enabled device that helps sight challenged readers to access print materials. This article is a first person account of a public library's experience in employing the OrCam technology.

What More Can We Do to Address Broadband Inequity and Digital Poverty? / Information Technology and Libraries

While libraries have always worked to help breach the digital divide by providing free Internet access, public access computers and teaching media literacy, the current pandemic has made it abundantly clear that much more needs to be done. This article proposes ways that libraries might work with community, state, national and even global partners to help promote universal broadband.

A Collaborative Approach to Newspaper Preservation / Information Technology and Libraries

This column explores a collaborative undertaking between the Denton Public Library in Denton, Texas, and the University of North Texas Libraries (UNT) to build digital access to the city of Denton’s newspaper of record, the Denton Record-Chronicle (DRC). The process included coordination with the newspaper publisher, solidifying agreements between the libraries, obtaining grant funding for the project, and ensuring scheduled uploads to build digital access to the DRC via The Portal to Texas History’s Texas Digital Newspaper Program (TDNP). TDNP builds open access to Texas newspapers, and the partnership between the Denton Public Library and UNT exemplifies the value of collaboration to preserving history and building digital access to research materials

Evaluating the Impact of the Long-S upon 18th-Century Encyclopedia Britannica Automatic Subject Metadata Generation Results / Information Technology and Libraries

This research compares automatic subject metadata generation when the pre-1800s Long-S character is corrected to a standard < s >. The test environment includes entries from the third edition of the Encyclopedia Britannica, and the HIVE automatic subject indexing tool. A comparative study of metadata generated before and after correction of the Long-S demonstrated an average of 26.51 percent potentially relevant terms per entry omitted from results if the Long-S is not corrected. Results confirm that correcting the Long-S increases the availability of terms that can be used for creating quality metadata records. A relationship is also demonstrated between shorter entries and an increase in omitted terms when the Long-S is not corrected.

Analytics and Privacy / Information Technology and Libraries

When selecting a web analytics tool, academic libraries have traditionally turned to Google Analytics for data collection to gain insights into the usage of their web properties. As the valuable field of data analytics continues to grow, concerns about user privacy rise as well, especially when discussing a technology giant like Google. In this article, the authors explore the feasibility of using Matomo, a free and open-source software application, for web analytics in their library’s discovery layer. Matomo is a web analytics platform designed around user-privacy assurances. This article details the installation process, makes comparisons between Matomo and Google Analytics, and describes how an open-source analytics platform works within a library-specific application, EBSCO’s Discovery Service.

Likes, Comments, Views / Information Technology and Libraries

This article presents a content analysis of academic library Instagram accounts at eleven land-grant universities. Previous research has examined personal, corporate, and university use of Instagram, but fewer studies have used this methodology to examine how academic libraries share content on this platform and the engagement generated by different categories of posts. Findings indicate that showcasing posts (highlighting library or campus resources) accounted for more than 50 percent of posts shared, while a much smaller percentage of posts reflected humanizing content (emphasizing warmth or humor) or crowdsourcing content (encouraging user feedback). Crowdsourcing posts generated the most likes on average, followed closely by orienting posts (situating the library within the campus community), while a larger proportion of crowdsourcing posts, compared to other post categories, included comments. The results of this study indicate that libraries should seek to create Instagram posts that include various types of content while also ensuring that the content shared reflects their unique campus contexts. By sharing a framework for analyzing library Instagram content, this article will provide libraries with the tools they need to more effectively identify the types of content their users respond to and enjoy as well as make their social media marketing on Instagram more impactful.

Applying Gamification to the Library Orientation / Information Technology and Libraries

By providing an overview of library services as well as the building layout, the library orientation can help newcomers make optimal use of the library. The benefits of this outreach can be curtailed, however, by the significant staffing required to offer in-person tours. One academic library overcame this issue by turning to user experience research and gamification to provide an individualized online library orientation for four specific user groups: undergraduate students, graduate students, faculty, and community members. The library surveyed 167 users to investigate preferences regarding orientation format, as well as likelihood of future library use as a result of the gamified orientation format. Results demonstrated a preference for the gamified experience among undergraduate students as compared to other surveyed groups.

Using the Harvesting Method to Submit ETDs into ProQuest / Information Technology and Libraries

The following case study describes an academic library’s recent experience implementing the harvesting method to submit electronic theses and dissertations (ETDs) into the ProQuest Dissertations & Theses Global database (PQDT). In this lesser-known approach, ETDs are deposited first in the institutional repository (IR), where they get processed, to be later harvested for free by ProQuest through the IR’s Open Archives Initiative (OAI) feed. The method provides a series of advantages over some of the alternative methods, including students’ choice to opt-in or out from ProQuest, better control over the embargo restrictions, and more customization power without having to rely on overly complicated workflows. Institutions interested in adopting a simple, automated, post-IR method to submit ETDs into ProQuest, while keeping the local workflow, should benefit from this method. 

Making Disciplinary Research Audible / Information Technology and Libraries

Academic libraries have long consulted with faculty and graduate students on ways to measure the impact of their published research, which now include altmetrics. Podcasting is becoming a more viable method of publicizing academic research to a broad audience. Because individual academic departments may lack the ability to produce podcasts, the library can serve as the most appropriate academic unit to undertake podcast production on behalf of researchers. The article identifies what library staff and equipment are required, describes the process needed to produce and market the published episodes, and offers preliminary assessments of the podcast impact.

Integrated Technologies of Blockchain and Biometrics Based on Wireless Sensor Network for Library Management / Information Technology and Libraries

The Internet of Things (IoT) is built on a strong internet infrastructure and many wireless sensor devices. Presently, Radio Frequency Identification embedded (RFID-embedded) smart cards are ubiquitous, used for many things including student ID cards, transportation cards, bank cards, prepaid cards, and citizenship cards. One example of places that require smart cards is libraries. Each library, such as a university library, city library, local library, or community library, has its own card and the user must bring the appropriate card to enter a library and borrow material. However, it is inconvenient to bring various cards to access different libraries. Wireless infrastructure has been well developed and IoT devices are connected through this infrastructure. Moreover, the development of biometric identification technologies has continued to advance. Blockchain methodologies have been successfully adopted in various fields. This paper proposes the BlockMetrics library based on integrated technologies using blockchain and finger-vein biometrics, which are adopted into a library collection management and access control system. The library collection is managed by image recognition, RFID, and wireless sensor technologies. In addition, a biometric system is connected to a library collection control system, enabling the borrowing procedure to consist of only two steps. First, the user adopts a biometric recognition device for user authentication and then performs a collection scan with the RFID devices. All the records are recorded in a personal borrowing blockchain, which is a peer-to-peer transfer system and permanent data storage. In addition, the user can check the status of his collection across various libraries in his personal borrowing blockchain. The BlockMetrics library is based on an integration of technologies that include blockchain, biometrics, and wireless sensor technologies to improve the smart library.

Teaching Digital Curation / Ed Summers

I’m lucky to be teaching a class about digital curation for undergraduate information studies students this semester. I struggled a bit over the summer with how to structure the class. Of course it is important to focus on the concepts and theories of digital curation. But I think it’s also important to provide practical exercises that make those concepts concrete. I didn’t want to privilege either approach. A sociotechnical framing of information studies has been extremely important in my own research, but asking undergraduates to dive into the STS literature is asking for a lot I think. I want to whet their appetite for STS approaches (maybe they will take a senior seminar or go on to grad school). But I also want them to gain some knowledge and skills that they can use in their current and future work.

I think it is also critically important to situate the study of digital curation so that it isn’t simply an esoteric matter that’s only relevant for cultural heritage organizations. Digital curation is set of concerns and practices that are present in students’ every day lives, and can be found all throughout society–and these practices have real, social and political consequences. Jobs in the GLAM sector or digital humanities can be difficult to find, and the practices of digital curation extend into several fields: data science, data visualization, computer security, software development, etc. I think it’s important to show the continuity between these areas of expertise when teaching digital curation at the undergraduate level–but this is pretty challenging.

Fortunately the class requires students to have completed introductions to computer programming (Python) and information studies–so those are things I can build on in the class. After some genuinely helpful advice on Twitter, and a review of other digital curation courses (some at UMD, and some elsewhere) I decided to formulate my learning outcomes using a layered approach where each of the modules build up from basic principles and competencies:

  1. Files and File Systems
  2. Formats and Standards
  3. Internal Metadata
  4. External Metadata: Description
  5. Platforms
  6. Community
  7. Infrastructure

Each module is two weeks long. In the first week we focus on a reading. I’m mostly using Trevor Owens’ Theory and Craft of Digital Preservation and Data Feminism by Catherine D’Ignazio and Lauren Klein, but I’ve got a few other readings sprinkled in there like The Joy of Standards by Andrew Russell and Lee Vinsel, etc. The second week is focused on a Jupyter notebook exercise that tests out the ideas we’ve learned about in the discussion.

For example we just finished the first module where we focused on files and file systems. In week 1 we discussed what the characteristics of digital objects are according to Owens chapter 2. Then in week 2 we explored how to interact with the file system and calculate file sizes. I generated extracts from the Digital Corpora datasets data so that each student would have a unique set of files to work with in their notebook. I’ve been leaning on Google’s Jupyter environment Colab to get out of having to play sysadmin to everyone on their personal computers, although students are welcome to use their own Jupyter environments if they want.

I’m going to try to keep some notes here about how the class is going as I move through the semester. This is mostly just for me to keep track of how to improve things, but maybe it’s of interest to others involved in teaching digital curation and digital preservation. The class is being taught asynchronously, although I do have live Zoom office hours which have been pretty lightly attended so far.

I actually started this post just wanting to jot down one interesting aspect to the first discussion we had about digital objects. After reading chapter 2 in Owens I and reflecting on what the characteristics of digital objects are, I asked them to give an example of a digital object that they didn’t see mentioned in the chapter. Here is a ranked list of the types of digital objects the ~50 students mentioned (I did the conceptual grouping):

  • Images (GIF, Vector, PNG, PSD, Camera Roll) (9)
  • Software (Apps, Video Games, Media Player, source code, software updates) (7)
  • Video (mp4, avi, mov) (5)
  • Audio (voicemail, music, playlists) (4)
  • Social Media (photos, video, text, profiles, interaction) (3)
  • Money (blockchain, bank account) (3)
  • Instant Messages (iMessage, Facebook Messenger, Instagram DMs, Twitter DMs) (2)
  • Computer Hardware (hard drive, pixels, tapes) (2)
  • Articles (PDF) (2)
  • Calendars (iCal)
  • Contacts (vCard)
  • Clipboard (copy/paste)
  • Slide Presentations (Powerpoint)

The focus on images, video and audio did not surprise me, and I was expecting to explore these a bit more when we examine internal metadata. But I was surprised and encouraged to see that they identified software itself as a digital object. I think there’s a tendency to think of digital objects as files, and tools for rendering them (software) as somehow separate. But when it comes to computation code is data and data is code. I also thought it was great to see the students grappling with the physical and logical aspects of digital objects as they tested whether various types of hardware were digital objects.

In addition to exploring multimedia formats I am planning on discussing complex objects like social media posts and profiles when we discuss Platforms. But it could be interesting to maybe look at SMS messages given some of the interest around messaging. How best to do this in a useful and accessible way in Jupyter notebook is a work in progress. I need to introduce their final project in a few weeks and am planning to give them the option of choosing either to write a prose “data story” that explores the various aspects of a specific digital object or format, or to write Jupyter notebook that does the same except with a more computational approach. I still need to work out the details there so if you have ideas please let me know.

Wikidata, Wikibase and the library linked data ecosystem: an OCLC Research Library Partnership discussion / HangingTogether

In late July the OCLC Research Library Partnership convened a discussion that reflected on the current state of linked data. The discussion format was (for us) experimental — we invited participants to prepare by viewing a pre-recorded presentation, Re-envisioning the fabric of the bibliographic universe – From promise to reality* The presentation covers experiences of national and research libraries as well as OCLC’s own journey in linked data exploration. OCLC Researchers Annette Dortmund and Karen Smith-Yoshimura looked at relevant milestones in the journey from entity-based description research, prototypes, and on to actual practices, based on work that has been undertaken with library partners right up to the present day.

Discussion participants joined from a wide variety of backgrounds: people who were new to linked data work, those that were more experienced, those that had participated in Project Passage and are participating in the OCLC Shared Entity Management Infrastructure project and those engaged in the Program for Cooperative Cataloging’s Wikidata pilot as well as in the Art & Rare Materials BIBFRAME Ontology Extension (ARM) work. Several people had worked in Wikidata directly, others are at institutions engaged in stand-alone Wikibase instances, or considering using Wikibase for linked data experimentation.

Overall themes covered in the meeting:

  • Entity management is seen as a pathway to engagement with and access to digital collections.
  • Creating entities for faculty and graduate students is an activity of some, and one area of focus is uploading researcher ORCID iDs to Wikidata.
  • Articulating the value of contributing to Wikidata, particularly to institutional leaders, is a challenge for many.
  • On the other hand, some have experimented with contributing to Wikidata on behalf of an institution as a “work from home” project during the COVID-19 pandemic.
  • There is still a need for training and exemplars especially for specific material types and to support specific workflows.
  • Is there a fallacy in the holy grail of a single hub, or can we see opportunities for surfing the semantic web?

The discussion revealed some areas of concern or opportunity:

  • Data modeling is a major challenge, for all types of material.
  • Identifiers for subject strings need to be minted as needed.
  • Community vs local control: Participants anticipate tensions in managing descriptions and entities by a community rather than by a small group of approved experts.
  • Art collections have special descriptive requirements that need to be accommodated.
  • The concept of Federated Wikibases sounds promising, but there is a need to understand how this will work in practice

Resources shared during the discussion included:

*The title of this session is inspired by kalan Knudson Davis’s blog post, An insider’s look at “Project Passage” in seven linked data lessons, six constants, five changes … and four webcomics in which she characterized the linked data endeavor as “re-envisioning the very fabric of the Bibliographic Universe.”

The post Wikidata, Wikibase and the library linked data ecosystem: an OCLC Research Library Partnership discussion appeared first on Hanging Together.

Evergreen 3.6-beta1 available / Evergreen ILS

The Evergreen Community is pleased to announce the availability of the first beta release for Evergreen 3.6. This release contains various new features and enhancements, including:

  • A new experimental public catalog skin based on the Bootstrap framework that offers improved responsiveness and accessibility.
  • The default staff interface for catalog searching is now the Angular one. The older AngularJS staff catalog remains available but is now labelled the “traditional” catalog.
  • A new course materials/course reserves module.
  • A new interface for managing “hopeless” holds.
  • Interfaces for sending test emails and SMS messages to patrons.
  • Enhancements for printing and emailing records from the public catalog.
  • The conversion of more staff interfaces to Angular, including acquisitions search and providers, manage authorities, MARC batch edit, and booking capture.
  • New web APIs for supporting patron authentication by EZProxy and PatronAPI.
  • A new Action/Trigger reactor that can make web service requests via GET and POST.
  • A new interface for managing curbside pickup appointments.
  • Support for the v3 Stripe API for credit card payments.
  • Certain types of reports can now calculate subtotals.
  • Support for the open source website analytics tool Matomo.
  • SIP2 patron lookup can now accept the patron username in addition to the barcode.

For more information on these and more, please read the initial draft of the release notes.

Evergreen admins installing the beta or upgrading a test system to the beta should be aware of the following:

  • The minimum version of PostgreSQL required to run Evergreen 3.6 is PostgreSQL 9.6.
  • The minimum version of OpenSRF is 3.2.
  • This release adds two new OpenSRF services, open-ils.curbside and open-ils.courses.
  • The release also adds a new Perl module dependency, Config::General.
  • The beta release should not be used for production.

Since the next Evergreen Bug Squashing Week runs from 21 through 25 September 2020, we highly encourage testing of this beta release as soon as possible.

Don't Say We Didn't Warn You / David Rosenthal

Just over a quarter-century ago, Stanford Libraries' HighWire Press pioneered the switch of academic journal publishing from paper to digital when they put the Journal of Biological Chemistry on-line. Even in those early days of the Web, people understood that Web pages, and links to them, decayed over time. A year later, Brewster Kahle founded the Internet Archive to preserve them for posterity.

One difficulty was that although academic journals contained some of the Web content that  was most important to preserve for the future, the Internet Archive could not access them because they were paywalled. Two years later, Vicky Reich and I started the LOCKSS (Lots Of Copies Keep Stuff Safe) program to address this problem. In 2000's Permanent Web Publishing we wrote:
Librarians have a well-founded confidence in their ability to provide their readers with access to material published on paper, even if it is centuries old. Preservation is a by-product of the need to scatter copies around to provide access. Librarians have an equally well-founded skepticism about their ability to do the same for material published in electronic form. Preservation is totally at the whim of the publisher.

A subscription to a paper journal provides the library with an archival copy of the content. Subscribing to a Web journal rents access to the publisher's copy. The publisher may promise "perpetual access", but there is no business model to support the promise. Recent events have demonstrated that major journals may vanish from the Web at a few months notice.

This poses a problem for librarians, who subscribe to these journals in order to provide both current and future readers with access to the material. Current readers need the Web editions. Future readers need paper; there is no other way to be sure the material will survive.
Now, Jeffrey Brainard's Dozens of scientific journals have vanished from the internet, and no one preserved them and Diana Kwon's More than 100 scientific journals have disappeared from the Internet draw attention to this long-standing problem. Below the fold I discuss the paper behind the Science and Nature articles.

Brainard writes:
Eighty-four online-only, open-access (OA) journals in the sciences, and nearly 100 more in the social sciences and humanities, have disappeared from the internet over the past 2 decades as publishers stopped maintaining them, potentially depriving scholars of useful research findings, a study has found.

An additional 900 journals published only online also may be at risk of vanishing because they are inactive, says a preprint posted on 3 September on the arXiv server. The number of OA journals tripled from 2009 to 2019, and on average the vanished titles operated for nearly 10 years before going dark, which “might imply that a large number … is yet to vanish,” the authors write.
The preprint he refers to is Open is not forever: a study of vanished open access journals by Mikael Laakso, Lisa Matthias and Najko Jahn. Their abstract reads:
The preservation of the scholarly record has been a point of concern since the beginning of knowledge production. With print publications, the responsibility rested primarily with librarians, but the shift towards digital publishing and, in particular, the introduction of open access (OA) have caused ambiguity and complexity. Consequently, the long-term accessibility of journals is not always guaranteed, and they can even disappear from the web completely. The purpose of this exploratory study is to systematically study the phenomenon of vanished journals, something that has not been done before. For the analysis, we consulted several major bibliographic indexes, such as Scopus, Ulrichsweb, and the Directory of Open Access Journals, and traced the journals through the Internet Archive’s Wayback Machine. We found 176 OA journals that, through lack of comprehensive and open archives, vanished from the web between 2000–2019, spanning all major research disciplines and geographic regions of the world. Our results raise vital concern for the integrity of the scholarly record and highlight the urgency to take collaborative action to ensure continued access and prevent the loss of more scholarly knowledge. We encourage those interested in the phenomenon of vanished journals to use the public dataset for their own research.
The preprint provides an excellent overview of the state of formal preservation efforts for open access journals, and what I expect will be an invaluable dataset for future work. But this isn't news, see for example the 2014 overview in The Half-Empty Archive.

We and others have been taking "collaborative action to ensure continued access and prevent the loss of more scholarly knowledge" for more than two decades. I covered the early history of this effort in 2011's A Brief History of E-Journal Preservation. In particular, the "long tail" of smaller, open-access journals, especially non-English ones, has been a continuing concern:
Over the years, the LOCKSS team have made several explorations of the long tail. Among these were a 2002 meeting of humanities librarians that identified high-risk content such as World Haiku Review and Exquisite Corpse, and work funded by the Soros Foundation with South African librarians that identified fascinating local academic journals in fields such as dry-land agriculture and AIDS in urban settings. Experience leads to two conclusions:
  • Both subject and language knowledge is important to identifying the worthwhile long-tail content.
  • Long-tail content in English is likely to be open access; in other languages much more is subscription.
Both were part of the motivation behind the LOCKSS Program's efforts to implement National Hosting networks.
What have we learned in the last two decades that illuminates the work of Laakso et al? First, they write:
While all digital journals are subject to the same threats, OA journals face unique challenges. Efforts around preservation and continued access are often aimed at securing postcancellation access to subscription journals—content the library has already paid for. The same financial incentives do not exist when journals are freely available.
Two decades ago the proportion of open access journals was very low. Our approach to marketing journal preservation to librarians was to treat it as "subscription insurance":
Libraries have to trade off the cost of preserving access to old material against the cost of acquiring new material. They tend to favor acquiring new material. To be effective, subscription insurance must cost much less than the subscription itself.
Even though we managed to keep the cost of participating in the distributed LOCKSS program very low, relatively few libraries opted in. A greater, but still relatively small proportion of libraries opted into the centralized Portico archive. Doing so involved simply signing a check, as opposed to the LOCKSS program that involved signing a smaller check plus actually running a LOCKSS box. Nevertheless, as I wrote in 2011:
Despite these advantages, Portico has failed to achieve economic sustainability on its own. As Bill Bowen said discussing the Blue Ribbon Task Force Report:
"it has been more challenging for Portico to build a sustainable model than parts of the report suggest."
Libraries proved unwilling to pay enough to cover its costs. It was folded into a single organization with JSTOR, in whose $50M+ annual cash flow Portico's losses could be buried.
Thus we have:
Lesson 1: libraries won't pay enough to preserve even subscription content, let alone open-access content.
This is understandable because libraries, even national libraries, have been under sustained budget pressure for many years:
The budgets of libraries and archives, the institutions tasked with acting as society's memory, have been under sustained attack for a long time. ... I drew this graph of the British Library's annual income in real terms (year 2000 pounds). It shows that the Library's income has declined by almost 45% in the last decade.

Memory institutions that can purchase only half what they could 10 years ago aren't likely to greatly increase funding for acquiring new stuff; it's going to be hard for them just to keep the stuff (and the staff) they already have.
The budget pressures are exacerbated by the inexorable rise of the subscriptions libraries must pay to keep the stockholders of the oligopoly academic publishers happy. Their content is not at risk; Elsevier (founded 1880) is not going away, nor is the content that keeps them in business.

The participants in the academic publishing ecosystem with the money and the leverage are the government and philanthropic funders. Laasko et al write:
Over the last decade, an increasing number of research funders have implemented mandates that require beneficiaries to ensure OA to their publications by either publishing in OA journals or, when choosing subscription journals, depositing a copy of the manuscript in an OA repository. In addition, many of these mandates also require publications to be deposited in a repository when publishing in OA journals to secure long-term access.... Recently, coalitions S has proposed a rather radical stance on preservation, which requires authors to only publish in journals with existing preservation arrangements. If implemented, such a mandate would prevent authors from publishing in the majority of OA journals indexed in the DOAJ (10,011 out of 14,068; DOAJ, 2019).
Note that funders are mandating both open access and preservation, without actually funding the infrastructure that make both possible.

Even if funding were available for preserving open-access journals, how would the at-risk journals be identified? In the days of paper journals, librarians were in the path between scholars and the articles they needed, both because the librarians needed to know which journals to pay for, and the scholars needed help finding the relevant journals. The evolution of the Web has largely removed them from this path. Scholars access the articles they need not via the journal, but via general (Google) or specialized (Google Scholar) search engines. Thus librarians' awareness of the universe of journals has atrophied. Note the difficulties Laakso et al had in identifying open-access journals that had died:
A journal-level approach, on the other hand, is challenging because no single data source exists that tracks the availability and accessibility of journals over time. Large indexes, for example, primarily hold records of active journals, and journal preservation services only maintain records of participating journals. To solve this problem and to create a dataset that is as comprehensive as possible, we consulted several different data sources—title lists by the DOAJ, Ulrichsweb, Scopus title lists, and previously created datasets that might point to vanished OA journals ... We collected the data manually, and each data source required a unique approach for detecting potential vanished journals
Thus we have:
Lesson 2: No-one, not even librarians, knows where most of the at-risk open-access journals are.
None of the human participants (authors, reviewers, publishers, librarians, scholars) in the journal ecosystem places a priority on preservation, and the funds available per-journal are scarce. Thus human intervention in the preservation process must be eliminated, both because it is unafforable and it is error-prone.

Thus we have:
Lesson 3: The production preservation pipeline must be completely automated.
Just as with Web archiving in general, e-journal preservation is primarily an economic problem. There is way too much content per dollar of budget. The experience of the traditional e-journal preservation systems shows that ingest, and in particular quality assurance is the most expensive part of the system, because it is human-intensive. There is a trade-off between quality and quantity. If quality is prioritized, resources will flow to the high-profile journals that are at lower risk. Lesson 3 shows that effective preservation systems have to be highly automated, trading quality for quantity. This is the only way to preserve the lower-profile, high-risk content.

Thus we have:
Lesson 4: Don't make the best be the enemy of the good. I.e. get as much as possible with the available funds, don't expect to get everything.
Based on this experience, Vicky and I decided that, since traditional efforts were not preserving the at-risk content, we needed to try something different. Last February, I posted The Scholarly Record At The Internet Archive describing this new approach:
The Internet Archive has been working on a Mellon-funded grant aimed at collecting, preserving and providing persistent access to as much of the open-access academic literature as possible. The motivation is that much of the "long tail" of academic literature comes from smaller publishers whose business model is fragile, and who are at risk of financial failure or takeover by the legacy oligopoly publishers. This is particularly true if their content is open access, since they don't have subscription income. This "long tail" content is thus at risk of loss or vanishing behind a paywall.

The project takes two opposite but synergistic approaches:
  • Top-Down: Using the bibliographic metadata from sources like CrossRef to ask whether that article is in the Wayback Machine and, if it isn't trying to get it from the live Web. Then, if a copy exists, adding the metadata to an index.
  • Bottom-up: Asking whether each of the PDFs in the Wayback Machine is an academic article, and if so extracting the bibliographic metadata and adding it to an index.
Although they are both focused on the open-access literature, note the key differences between this and the traditional approaches assessed by Laakso et al:
  • The focus is on preserving articles, not journals. This makes sense because the evolution of the Web means that the way scholars access articles is no longer via the journal, but via links and searches directly to the individual article.
  • Neither librarians nor publishers are involved in identifying content for preservation. It is found via the normal Web crawling technique of following links to a page, extracting the links on that page, and following them in turn.
  • Neither libraries nor publishers are involved in funding preservation. Because the processes at the Internet Archive are entirely automated, their cost increment in production over the Web crawling that the Internet Archive already does is small. Thus the preservation effort is economically sustainable, it is a byproduct of the world's premier Web archiving program.
Source
Recently, Bryan Newbold posted an update on progress with this work on the Internet Archive's blog, entitled How the Internet Archive is Ensuring Permanent Access to Open Access Journal Articles:
Of the 14.8 million known open access articles published since 1996, the Internet Archive has archived, identified, and made available through the Wayback Machine 9.1 million of them (“bright” green in the chart above). In the jargon of Open Access, we are counting only “gold” and “hybrid” articles which we expect to be available directly from the publisher, as opposed to preprints, such as in arxiv.org or institutional repositories. Another 3.2 million are believed to be preserved by one or more contracted preservation organizations, based on records kept by Keepers Registry (“dark” olive in the chart). These copies are not intended to be accessible to anybody unless the publisher becomes inaccessible, in which case they are “triggered” and become accessible.

This leaves at least 2.4 million Open Access articles at risk of vanishing from the web (“None”, red in the chart). While many of these are still on publisher’s websites, these have proven difficult to archive.
The difficulties Newbold refers to are mostly the normal "difficulties" encountered in crawling the Web, 404s, paywalls, server outages, etc. Despite the difficulties, and the limited resources available, this effort has collected and preserved 61% of the known open access articles, where the formal preservation efforts have collected and preserved 22%.

You can both access and contribute to the results of this effort:
we built an editable catalog (https://fatcat.wiki) with an open API to allow anybody to contribute. As the software is free and open source, as is the data, we invite others to reuse and link to the content we have archived. We have also indexed and made searchable much of the literature to help manage our work and help others find if we have archived particular articles. We want to make scholarly material permanently available, and available in new ways– including via large datasets for analysis and “meta research.”
Arguably, the Internet Archive is inadequate as a preservation service, for two reasons:
  • Storage reliability. There are two issues. First, the Archive maintains two full replicas of its 45+ petabytes of content, plus some partial replicas. Technically, two replicas is not enough for safety. But the Archive is so big that maintaining a replica is a huge cost, enough that a third would break the budget. Second, as I discussed in A Petabyte For A Century, the combination of huge scale and long timeframe places datasets the size of the Archive's beyond our ability to ensure perfect storage.
  • Format obsolescence. In 1995 Jeff Rothenberg wrote an article in Scientific American that first drew public attention to the fact that digital media have none of the durable properties of paper. But his focus was not on "bit-rot" but on the up-to-then rapid obsolescence of digital formats. This has remained a major concern among digital preservationists ever since but, as I have been pointing out since 2007, it is overblown. Firstly, formats on the Web are in effect network protocols, the most stable standards in the digital domain. Research shows the rate of obsolescence is extremely slow. Secondly, if preserved Web formats ever do become obsolete, we have two different viable techniques to render them; the LOCKSS Program demonstrated Transparent Format Migration of Preserved Web Content in 2005, and Ilya Kreymer's oldweb.today shows how they can be rendered using preserved browsers.
While the Internet Archive may not be a perfect preservation system, Lesson 4 teaches that the search for perfection is self-defeating. Laakso et al write:
the Internet Archive’s Wayback Machine once again proved to be an invaluable tool, which enabled us to access the journal websites, or most often fragments thereof, and record the year of the last published OA issue and when the journal was last available online.
...
The Internet Archive, and especially the Wayback Machine, have proven to be invaluable resources for this research project since following the traces of vanished journals would have been much more uncertain and imprecise otherwise. In some cases, the Internet Archive also saves cached snapshots of individual articles, so they remain accessible, yet the snapshots do not necessarily amount to complete journal volumes
Again, it is important to focus on preserving the articles, not the journals, which these days are relevant to scholars only as an unreliable quality tag. Many of the articles from Laakso et al's "vanished journals" are already preserved, however imperfectly, by the Internet Archive. Improving the process that collects and identifies such articles, as the Archive is already doing, is a much more fruitful approach than:
collaborative action in preserving digital resources and preventing the loss of more scholarly knowledge
which is the approach that, over the past two decades, has proven to be inadequate to the task.

The Return of the DIG (Documentation Interest Group) / Islandora

The Return of the DIG (Documentation Interest Group) manez Thu, 09/17/2020 - 16:43
Body

The Islandora Documentation Interest Group is coming back! 

It has been on hiatus for a few years now, but the DIG is being relaunched as a place where we can have discussions about the best approaches for building and maintaining Islandora documentation, identify and prioritize gaps and opportunities for improvement, and coordinate community work to write and edit our shared Islandora documentation. 

The updated Terms of Service for this group are here: https://github.com/islandora-interest-groups/Islandora-Documentation-Interest-Group

Along with my co-convenors Mirko Hanke and Jeff Rubin, I would like to invite you to fill out this Doodle poll to find a time for a kick-off meeting. One of the first orders of business at the initial meeting will be to establish a day and time for a recurring monthly meeting.

Noting well / Mita Williams

Scribble, scribble, scribble (Eh! Mr Gibbon?)

Last week I read an article that made me very uncomfortable. I had been diagnosed by the author and was found to be diseased.

The Twittering Machine is powered by an insight at once obvious and underexplored: we have, in the world of the social industry, become “scripturient—possessed by a violent desire to write, incessantly.” Our addiction to social media is, at its core, a compulsion to write. Through our comments, updates, DMs, and searches, we are volunteers in a great “collective writing experiment.” Those of us who don’t peck out status updates on our keyboards are not exempt. We participate too, “behind our backs as it were,” creating hidden (written) records of where we clicked, where we hovered, how far we scrolled, so that even reading, within the framework of the Twittering Machine, becomes a kind of writing.

Going Postal: A psychoanalytic reading of social media and the death drive, Max Read for Bookforum

The scripturient among us cannot stop writing even though social media brings no joy. Some of us opted for a lesser evil and have Waldenponded to the cozyweb

Unlike the main public internet, which runs on the (human) protocol of “users” clicking on links on public pages/apps maintained by “publishers”, the cozyweb works on the (human) protocol of everybody cutting-and-pasting bits of text, images, URLs, and screenshots across live streams. Much of this content is poorly addressable, poorly searchable, and very vulnerable to bitrot. It lives in a high-gatekeeping slum-like space comprising slacks, messaging apps, private groups, storage services like dropbox, and of course, email.

from Cozyweb by Venkatesh Rao

In other words, some of us have opted to keep writing compulsively but mostly to ourselves.

I’ve found Notion to be welcome respite from the public square of Twitter or even the water-cooler of Slack. While I used to plan trips on Pinterest, I now find myself saving inspirational images to Notion. Instead of relying on Facebook or Linkedin to catalog my connections, I’ve been building my own relationship tracker in Notion.

Like the living room, Notion appeals to both the introverted and extroverted sides of my personality. It’s a place where I can create and test things out in private. Then, when I’m craving some external validation, I can show off a part of my workspace to as many or as few people as I want. It’s a place where I can think out loud without worrying about the judgement of strangers or the tracking of ad targeting tools.

Notion is the living room of the cozyweb by by Nick deWilde

Exhausted by my own doomscrolling, I recently pledged to myself to spend less time on social media. But I still had a scribbling habit that needed to be maintained. I found myself researching why so many of the few remaining bloggers that I knew were so obsessed with Notion and other tools that were unfamiliar to me.

It’s the worldwideweb. Let’s share what we know.

The tools of the notearazzi

Notion describes itself as ‘the all-in-one workspace’ for all of “your notes, tasks, and wikis”. That sounds more compelling than the the way that I would describe it: Notion allows you to build workflows from documents using linked, invisible databases.

For example, here is a set of pages that can be arranged as a task board, a kaban board, a calendar, or a list, just by changing your view of the information at hand.

(In this way Notion reminds me of Drupal except all of the database scaffolding is invisible to the user.)

There are other note taking tools that promise to revolutionize the work and the workflow of the user: Roam Research (that turns your “graph connected” notes into a ‘second brain’), RemNote (that turns your study notes into spaced repetition-flashcards), and Obsidian (that turns your markdown notes into a personal wiki / second brain on your computer).

And there is still Evernote.

Personal Knowledge Management

These types of note-taking systems are also known as personal knowledge management or PKM.

https://mobile.twitter.com/Bopuc/status/1305469230725431296

The Digital Garden

From the above diagram, you can see that PKM systems are also called Digital Gardens. Patrick Tanguay wrote a short backgrounder on this concept with a great set of links to explore.

In short: brief notes from your own thinking, heavily linked back and forth, continually added to and edited.

The goal is to have a library of notes of your own thinking so you can build upon what you read and write, creating your own ideas, advancing your knowledge.

Digital Gardens, Patrick Tanguay

The word garden was chosen carefully to describe this concept. We find ourselves in a world in which almost all of our social media systems are algorithm-influenced streams. To find the contemplative space we need to think, we need to find a slower landscape.

Remember a couple months ago when I wrote about Matt Caulfield’s alternative to CRAAP called SIFT? Well, I’m invoking him again for his 2015 post called The Garden and the Stream: A Technopastoral.

I don’t want people to get hung up on the technology angle. I think sometimes people hear “Federated Thingamabob” and just sort of tune out thinking “Oh, he’s talking about a feature of Federated Thingamabob.” But I’m not. I’m really not. I’m talking about a different way to think your online activity, no matter what tool you use. And relevant to this conference, I’m talking about a different way of collaborating as well.

Without going to much into what my federated wiki journal is, just imagine that instead of blogging and tweeting your experience you wiki’d it. And over time the wiki became a representation of things you knew, connected to other people’s wikis about things they knew.

So when I see an article like this I think — Wow, I don’t have much in my wiki about gun control, this seems like a good start to build it out and I make a page.

The first thing I do is “de-stream” the article. The article is about Oregon, but I want to extract a reusable piece out of it in a way that it can be connected to many different things eventually. I want to make a home page for this idea or fact. My hub for thinking about this.

The Garden and the Stream: A Technopastoral, Mike Caulfield

I used to think of blog posts as part of a growing garden, but my framing has shifted and now I think of the blog as the headwaters of the first sluggish stream (and the beginning of the end of the web as we know it):

Whereas the garden is integrative, the Stream is self-assertive. It’s persuasion, it’s argument, it’s advocacy. It’s personal and personalized and immediate. It’s invigorating. And as we may see in a minute it’s also profoundly unsuited to some of the uses we put it to.

The stream is what I do on Twitter and blogging platforms. I take a fact and project it out as another brick in an argument or narrative or persona that I build over time, and recapitulate instead of iterate.

The Garden and the Stream: A Technopastoral, Mike Caulfield

Caulfield alludes to the associative power of links after he compares the original vision of Vannevar Bush’s MEMEX and the topology of the World Wide Web:

Each memex library contains your original materials and the materials of others. There’s no read-only version of the memex, because that would be silly. Anything you read you can link and annotate. Not reply to, mind you. Change. This will be important later.

Links are associative. This is a huge deal. Links are there not only as a quick way to get to source material. They aren’t a way to say, hey here’s the interesting thing of the day. They remind you of the questions you need to ask, of the connections that aren’t immediately evident.

Links are made by readers as well as writers. A stunning thing that we forget, but the link here is not part of the author’s intent, but of the reader’s analysis. The majority of links in the memex are made by readers, not writers. On the world wide web of course, only an author gets to determine links. And links inside the document say that there can only be one set of associations for the document, at least going forward.

The Garden and the Stream: A Technopastoral, Mike Caulfield

Mike Cauldfield’s own digital garden was a personal wiki and there some reader/writers who have opted to go this route using Tiddlywiki or a variation.

There is no one way to grow your own digital garden. Gardens are personal and they grow to suit the space and time that you are able to give them. There are digital gardens that are wild and overgrown like a verdant English garden and then there are the closely controlled and manicured gardens known as BASB.

The Second Brain

BASB stands for Building A Second Brain. Unlike our own feeble wetware, these BASB systems exist so we do not forget passing notions. They are also promoted as environments that lend themselves to creative thinking because, just like our own minds, they encourage the generation of new thoughts by the association of disparate ideas from different fields, places, or times.

To be honest, during most of the time I spent researching for this post, every time I read the phrase second brain, I immediately dismissed it as glib marketing and not as a concept worth serious considering. But then I watched a YouTube video of a medical student who had taken a $1500 course on building brain building and he could not stop singing its praises.

From that video, I learned that Second Brain building wasn’t just making links between concepts and waiting for creativity to descend or a book to emerge. The framing of the activities that it prescribes are closer to a Project Management System in which efforts are directly ultimately to outcomes and outputs. That system is also known as PARA.

Image from: Building a Second Brain: The Illustrated Notes by Maggie Appleton

Not every building a second brain (BASB) system is build on the foundations of PARA. There are those who decide to populate their new Roam Research space using the Smart Note system or the Zettelkasten approach.

Zettelkasten

When I was doing research for my 2015 Access talk about index cards and bibliographic systems, I dimly remember coming across the note taking system of sociologist Niklas Luhmann which turned into a 90,000+ card zettelkasten into over 70 books. I distinctly remember coming across the system again when I was reading about Beck Trench’s Academic Workflow:

I use the Zettelkasten method of note-taking, by which I mean that I create notes that contain a single idea or point that is significant to me. These notes are usually linked to other notes, authors, and citations, allowing me to understand that single idea in the context of the larger literature that I’m exploring. I use the knowledge management software Tinderbox to write these notes and map their associations. I’ve created a series of videos that explain exactly how I do this. I also sync my Tinderbox zettels with DEVONthink using these scripts so that I can search my own notes alongside my articles to find connections I might otherwise miss.

Academic Workflow: Reading, Beck Trench

From what I can tell, many people’s first introduction to the zettelkasten method has been through this website or the 2017 book How to Take Smart Notes by Sonke Ahrens (2017). I haven’t read the book yet but I was so intrigued that I have ordered a copy. From a review of the work:

The book is written in an essayistic and very readable style, humorous and anecdotal, which makes both the practical advice as well as the underlying philosophy very accessible and convincing. Ahrens offers a compelling meta-reflection on the pivotal role of writing in – and as – thinking, and as such, he also formulates a timely and important advocacy of the humanities. It is therefore regrettable that in his emphasis on proliferating personal productivity and ‘boosting’ written output with Luhmann’s slip box system, Ahrens neglects to critically reflect upon the luring dangers of academic careerism for truly original scholarship… The explosion of publishing outlets is in turn tightly connected with the increasing governmentalization and commodification of academic life (Miller 2015), and while Ahrens continually emphasizes the potential of increasing written output with Luhmann’s method, he unfortunately misses the opportunity to reflect on the very conditions of academic life that create a demand for a book like his own in the first place.

Book review: How to Take Smart Notes, Reviewed by Melanie Schiller, Journal of Writing Research (2017)

How might academic libraries figure into these systems

While keeping in mind that the knowledge workers who commit strongly to a holistic note-taking system are a minority of our patrons, how can academic libraries support those students, faculty, and academic staff who use specialized note-taking software?

Personally, I think at a minimum, we must try to keep as much of our material as copy-able as possible. In other words, we should keep our investments in DRM-locked material as small possible.

But I’ll boil it down to this. It came down to who had the power to change things. It came down to the right to make copies.

On the web, if you wanted to read something you had to read it on someone else’s server where you couldn’t rewrite it, and you couldn’t annotate it, you couldn’t copy it, and you couldn’t add links to it, you couldn’t curate it.

These are the verbs of gardening, and they didn’t exist on the early web.

The Garden and the Stream: A Technopastoral, Mike Caulfield

What might happen if we try on the idea that a library is a type of stock that both readers and writers can draw upon for their respective knowledge flow.

Stock and flow are just different ways of expressing garden and stream. Mike Caulfield looks at OER in this context and I found this framing as very useful.

Everything else is either journal articles or blog posts making an argument about local subsidies. Replying to someone. Building rapport with their audience. Making a specific point about a specific policy. Embedded in specific conversations, specific contexts.

Everybody wants to play in the Stream, but no one wants to build the Garden.

Our traditional binary here is “open vs. closed”. But honestly that’s not the most interesting question to me anymore. I know why textbook companies are closed. They want to make money.

What is harder to understand is how in nearly 25 years of the web, when people have told us what they THINK about local subsidies approximately one kajillion times we can’t find one — ONE! — syllabus-ready treatment of the issue.

You want ethics of networked knowledge? Think about that for a minute — how much time we’ve all spent arguing, promoting our ideas, and how little time we’ve spent contributing to the general pool of knowledge.

Why? Because we’re infatuated with the stream, infatuated with our own voice, with the argument we’re in, the point we’re trying to make, the people in our circle we’re talking to.

The Garden and the Stream: A Technopastoral, Mike Caulfield

Conclusion

A scholar reads texts from the library and thoughtfully creates personal notes from their reading. Those notes grow, get connected to other notes, help generate new notes and associations, and, in time, help generate the scholar’s own text that — hopefully — will become part of that same library. “A scholar is just a library’s way of making another library” (Daniel C. Dennett, Consciousness Explained).

Once again, it makes me wonder whether our institutions should consider adopting the professional mission that Dan Chudnov made for himself in 2006: Help people build their own libraries.

Because those scholar’s notes? They are also a library.

Goodtables: Expediting the data submission and submitter feedback process / Open Knowledge Foundation

by Adam Shepherd, Amber York, Danie Kinkade, and Lilly Winfree

This post, originally published on the BCO-DMO blog, describes the second part of our Frictionless Data Pilot collaboration.

 

Logos for Goodtables and BCO-DMO

 

Earlier this year, the Biological and Chemical Oceanography Data Management Office (BCO-DMO) completed a pilot project with the Open Knowledge Foundation (OKF) to streamline the data curation processes for oceanographic datasets using Frictionless Data Pipelines (FDP). The goal of this pilot was to construct reproducible workflows that transformed the original data submitted to the office into archive-quality, FAIR-compliant versions. FDP lets a user define an order of processing steps to perform on some data, and the project developed new processing steps specific to the needs of these oceanographic datasets. These ordered processing steps are saved into a configuration file that is then available to be used anytime the archived version of the dataset must be reproduced. The primary value of these configuration files is that they capture and make the curation process at BCO-DMO transparent. Subsequently, we found additional value internally by using FDP in three other areas. First, they made the curation process across our data managers much more consistent versus the ad-hoc data processing scripts they individually produced before FDP. Second, we found that data managers saved time because they could reuse pre-existing pipelines to process newer versions submitted for pre-existing datasets. Finally, the configuration files helped us keep track of what processes were used in case a bug or error was ever found in the processing code. This project exceeded our goal of using FDP on at least 80% of data submissions to BCO-DMO to where we now use it almost 100% of the time.

As a major deliverable from BCO-DMO’s recent NSF award the office planned to refactor its entire data infrastructure using techniques that would allow BCO-DMO to respond more rapidly to technological change. Using Frictionless Data as a backbone for data transport is a large piece of that transformation. Continuing to work with OKF, both groups sought to continue our collaboration by focusing on how to improve the data submission process at BCO-DMO.

 

Goodtables detects a duplication error

Goodtables noticed a duplicate row in an uploaded tabular data file.

 

Part of what makes BCO-DMO a successful data curation office is our hands-on work helping researchers achieve compliance with the NSF’s Sample and Data Policy coming from their Ocean Sciences division. Yet, a steady and constant queue of data submissions means that it can take some weeks before our data managers can thoroughly review data submissions and provide necessary feedback to submitters. In response, BCO-DMO has been creating a lightweight web application for submitting data while ensuring such a tool preserves the easy experience of submitting data that presently exists. Working with OKF, we wanted to expedite the data review process by providing data submitters with as much immediate feedback as possible by using Frictionless Data’s GoodTables project.

Through a data submission platform, researchers would be able to upload data to BCO-DMO and, if tabular, get immediate feedback from Goodtables about whether it was correctly formatted or any other quality issues existed. With these reports at their disposal, submitters could update their submissions without having to wait for a BCO-DMO data manager to review. For small and minor changes this saves the submitter the headache of having to wait for simple feedback. The goal is to catch submitters at a time where they are focused on this data submission so that they don’t have to return weeks later and reconstitute their headspace around these data again. We catch them when their head is in the game.

Goodtables provides us a framework to branch out beyond simple tabular validation by developing data profiles. These profiles would let a submitter specify the type of data they are submitting. Is the data a bottle or CTD file? Does it contain latitude, longitude time or depth observations? These questions, optional for submitters to answer, would provide even further validation steps to get improved feedback immediately. For example, specifying that a file contains latitude or longitude columns could detect whether all values fall within valid bounds. Or that a depth column contains values above the surface. Or that the column pertaining to the time of an observation has inconsistent formatting across some of the rows. BCO-DMO can expand on this platform to continue to add new and better quality checks that submitters can use.

Goodtables detects incorrect longitudes

Goodtables noticed a longitude that is outside a range of -180 to 180. This happended because BCO-DMO recommends using decimal degrees format between -180 t0 180 and defined a Goodtables check for longitude fields.

Power and Status (and Lack Thereof) in Academe: Academic Freedom and Academic Librarians / In the Library, With the Lead Pipe

In Brief

Academic librarians do not experience full academic freedom protections, despite the fact that they are expected to exercise independent judgment, be civically engaged, and practice applied scholarship. Academic freedom for academic librarians is not widely studied or well understood. To learn more, we conducted a survey which received over 600 responses from academic librarians on a variety of academic freedom measures. In this article, we focus specifically on faculty status for librarians and the ways this intersects with academic freedom perceptions and experiences. Even though all librarians who answered our survey share similar experiences when it comes to infringements on their freedom, faculty librarians are more likely to feel they are protected in their free expression. We find it useful to situate librarians within a growing cohort of “third space” academic professionals who perform similar duties to traditional faculty but lack tenure and its associated academic freedom protections. We argue that more attention needs to be paid in the library profession to academic freedom for librarians, and that solidarity with other non-traditional faculty on campus is a potential avenue for allyship and advocacy.

Introductory Note

In November 2016, some colleagues and I made a LibGuide based on a popular hashtag syllabus, Trump Syllabus 2.0.1 The syllabus, drafted in response to Trump Syllabus, was crowdsourced by Black academics seeking to counter the limited vision of the first syllabus, written by primarily white scholars, in an attempt to historicize how we arrived at a Donald Trump presidency. Where the Trump Syllabus centered political, labor, and populist movements as the lineage of Trump’s ascendence, Trump Syllabus 2.0 highlights the genealogy of white supremacy: anti-blackness, homophobia, misogyny, transphobia, ableism, and settler colonialism. The works cited in the syllabus are predominantly scholarly texts, along with popular press articles and key primary sources, and the vast majority of the titles were held by the library I worked in at the time. My colleagues and I set up a guide with tabs for each week of the syllabus and linked to the catalog records for each title we already held, and to various licensed and unlicensed versions of other materials on the syllabus. Some of our library colleagues were not on board with us publishing this guide, fearing backlash, but we were not prohibited from doing so. Fast forward to January 2017: a right-wing student blog, backed by a conservative think tank, wrote a hit piece about our LibGuide, which received so much attention in the right-wing mediasphere that it eventually captured the attention of our campus public relations team. We were initially not asked to take the guide down, but when Fox News called our college president’s office to inquire about the guide the following week, the President told the library to remove the guide immediately. I did push back, cautiously, against the decision, but ultimately realized I was powerless to change the situation without risking my job. I had always assumed I was protected in my work product by academic freedom, but I learned that week that I wasn’t. As an at-will employee at a private liberal arts college, academic freedom very clearly didn’t extend to me or any of my staff colleagues.

–Alexis Logsdon

Introduction

The LibGuide experience of one of this article’s authors led to conversations between the coauthors, early in 2017, about academic freedom for academic librarians in the United States. Specifically, what protections do we really have and why does academic freedom matter to us? The election of Trump sparked a moment of professional introspection in academic libraries that continues to this day: what were our public engagement obligations? Do academic librarians need freedom of expression, and if so, what are the lived experiences of academic freedom for librarians across social identities? These questions led us to conduct a national survey of academic librarians in the fall of 2018. The resulting data has allowed us to study academic freedom for librarians, and its relationship to other factors like social identity and job status. In previous outputs of our research, we have discussed the history and state of academic freedom for academic librarians more broadly, and also highlighted findings related to race, sexuality, gender, and more.2

In this article, we will focus on the relationship between academic freedom and faculty status for librarians and how this surfaced in our survey findings. Faculty status is the factor most associated in the common imagination and the literature with academic freedom protections. Yet many librarians lack faculty status, have partial status, or are unsure of their protections regardless of their official status. Even when classified or considered faculty, academic librarians are rarely treated as peers by other disciplinary faculty or university administrators. For these reasons, academic librarians are members of the academy with a markedly more tenuous hold on academic freedom claims. We hypothesized at the outset of our research that when librarians’ job status is precarious, they will feel less free to express themselves in the workplace and will be highly attuned to penalties for academic expression. Our survey did find interesting distinctions between faculty and non-faculty librarians when it came to a variety of measures around academic freedom. Indeed, faculty status affected respondents’ perceptions of academic freedom more than any other variable we studied.

Before sharing and discussing our survey results around this topic, we seek to contextualize librarians’ academic freedom within the context of the widespread, growing precarity of higher education workers. Academic librarians experience significant insecurity that is related to their membership in an ever-growing class of higher education workers who occupy a liminal space between faculty and clerical staff. Budgetary challenges and neoliberalism in higher education have led institutions to retreat from offering stable, tenure-protected employment and instead increasingly rely on academic professional staff and contingent faculty.3 This enables administrators to scale back autonomy, equitable pay, and protections like academic freedom. Academic librarians have long occupied less stable and powerful positions on their campuses than traditional faculty.4 Therefore, we believe academic librarians’ experiences with academic freedom are worth investigating further in their own right. However, librarians are also situated within a larger ecosystem of growing precarity on campus. Understanding our role in this context can help us identify allies and avenues for advocacy.

In the absence of tenure, academic librarians and other academic staff experience insecurity in their jobs that impedes their academic freedom. Our research is interested in more than just policies but also how freedom of expression plays out (or not) in a variety of lived workplace experiences. We will share findings that suggest that faculty status truly matters for librarians to feel protected in their work activities. Yet we will also describe a higher education landscape in which faculty status is available only to some librarians and certainly not the majority of library workers. The trajectory is toward fewer faculty-classified library positions, not more. Our article offers a question as well as an argument: if faculty status is critical to academic freedom, but is only available to some of us, how can we advocate for better freedoms apart from that?

Methods and Scope

We conducted a survey in Fall 2018 to study librarians’ perceptions of how protected they were by their institution’s academic freedom policies. We asked about a wide spectrum of “silencing” actions for academic librarians, from being skipped over for a promotion to being demoted to being fired outright, and inquired about how these formal and informal punishments impacted librarians’ lives. We also asked our respondents to share their demographic information, which enabled us to correlate their experiences with their social identities.

Our research project overall is a mixed method study, with an initial survey that we plan to follow up with interviews and textual analysis later this year. The survey was designed to gather information about academic librarians’ job status, experiences of academic freedom, and socioeconomic positionality. We asked approximately 30 questions that were a mix of closed, multiple choice, and open-ended questions. The questions about academic freedom were primarily matrix table questions for which respondents could rank their experiences on a scale. Many of the social identity questions allowed for “other” and filled-in textual responses if respondents felt that none of the offered categories applied to them. We also provided space for open-ended comments at the end of the survey and for respondents to provide their contact information if they were willing to be interviewed at a later date.

We issued the survey via national listservs and social media in the Fall of 2018. We intentionally promoted the survey on a wide variety of professional listservs and using hashtags on social media to reach librarians of color.5 We had over 700 people start and just over 600 people complete the survey. We filtered out respondents who did not agree to our IRB-approved consent form and those who stated that they did not currently work in an academic library. Our survey questions were based on our hypotheses and also modeled after similar surveys.6 Our previously published ACRL 2019 conference paper provides a summary of responses to our survey as well as deeper dives into how responses corresponded to race and financial insecurity.7 In this article, we will focus primarily on librarian faculty status and how this corresponds to lived experiences of academic freedom.

Defining Academic Freedom

Academic freedom is a contested concept, so it is important for the purposes of our article to state that we align with those who believe scholarship and civic engagement—especially in librarianship—are inextricably linked. Generally, the core principles of academic freedom referenced in most U.S. institutional policies adhere to the 1940 statement on academic freedom and tenure from the American Association of University Professors (AAUP).8 This statement proposes three primary protections: the right to freely teach without interference, the right to research without interference, and the right to express oneself in the community without interference.9 While scholars and administrators generally agree on these basic precepts, they diverge when it comes to who, how, and how much these protections apply. One school of thought asserts that scholarship should be “pure” and remain disengaged from the civic sphere. In this formulation, academic freedom applies only to teaching and scholarship that is allegedly devoid of politics and “neutral.”10 Another faction, with whom we are aligned, points to the origins of modern academic freedom as an important project intended to protect faculty (particularly those from the social sciences) whose scholarship engages directly with society.  Historian Joan W. Scott, in her essay exploring this claim, argues that distinguishing between politics and scholarship is “easier in theory than in practice” and “the tension between professorial commitments and academic responsibility is an ongoing one that the principle of academic freedom is meant to adjudicate.”11 As an applied profession, librarianship presumes a link between scholarship and civic engagement. Academic freedom is thus a deeply relevant issue for our field.

Academic Freedom and Librarians

The concept of academic freedom in libraries is complicated by the library profession’s focus on the parent concept of intellectual freedom and the heterogeneous nature of library employment. The Association for College & Research Libraries (ACRL) has issued a number of statements in defense of academic freedom for academic librarians.12 However, the devotion to intellectual freedom for our users gets conflated with and obscures advocacy for our own academic freedom.13 Academic librarians have one foot in academia and another in librarianship, with academic freedom a norm in the former but not the latter. Indeed, supporters of library neutrality—the focus of a battle that mostly plays out in public libraries—often uphold intellectual freedom at the expense of other rights and freedoms. Similar to the purity arguments put forth for academic work, some librarians claim neutrality as a core library value rooted in the American Library Association’s guiding values, the Enlightenment, and political liberalism.14 Others, including us, argue that library neutrality is conceptually impossible and also puts workers and the public at risk.15

Media reports, anecdotes on social media, and the library literature all confirm ongoing barriers to librarians’ academic freedom. Attempts by community members to censor or ban materials in libraries are so commonplace that the American Library Association promotes an annual “Banned Books Week” and collects statistics from libraries on the issue.16 However, librarians experience other forms of infringement on their academic freedom that receive less attention from the profession. Library science scholar Noriko Asato provides a long history of infringements on librarians’ academic freedom, not just in collection development decisions, but also when they question library policy, engage politically, or even about choices in their personal lives, and these same infringements were reported in our survey.17 These are not simply problems of the past. Librarians in the University of California system learned during their union contract negotiations in 2018 that their institution believed academic freedom did not apply to them; this became the primary issue during their ultimately successful negotiations.18 According to a recent survey of Canadian librarians, they face restrictions on what they research, and struggle to pursue scholarship in light of their other responsibilities.19 Even when librarians are not directly restricted in their research or personal expression, they face structural inequities in terms of funding and time to do research compared to disciplinary faculty, leading indirectly to infringements on their autonomy.20 Only half of all liberal arts college librarians report feeling “protected in their work as a librarian,” according to a survey conducted by librarian Meghan Dowell in 2018.21 Dowell’s findings echo what non-faculty librarians reported in our survey, which is not surprising given how many liberal arts college librarians are classified as staff/non-faculty. Librarians who stage exhibits are also regularly confronted with pushback and are forced to take them down.22 Perhaps most alarmingly, librarians—especially librarians of color—have also been subject to harassment and abuse from the public for their workplace choices or public positions.23 These experiences are reflected in our survey findings as well, discussed in more detail below: more than 20% of respondents reported fear that their identity put them at personal risk.

Third Space Professionals

Academic librarians are situated within a broader context of academic professionals, beyond traditional faculty, on campus. Generally, however, there is little in the academic freedom literature that specifically studies non-faculty higher education workers. Despite our absence from the scholarship, research and anecdotal evidence from the media indicate that academic freedom issues surface regularly for academic professional staff on campus. Sometimes, professional staff are performing duties similar to faculty but are unprotected when our pedagogy is questioned or we protest institutional policy. Other times, since these are problems for staff rather than faculty, these issues are often not considered to have anything to do with academic freedom in the first place.

As some of the longest-serving quasi-academic professionals on campus—not traditional faculty, but also not clerical or facilities staff—the experiences of academic librarians serve as a bellwether and a proxy for issues that undoubtedly resonate for our academic support professional peers.24 The number of academic support professionals grew rapidly in the late 1990s and we continue to comprise a significant portion of higher education workforces.25 There is cross-disciplinary literature on the complicated roles and identities of academic support staff, who occupy what educational studies scholar Celia Whitchurch calls a “third space” on their campuses.26 Despite the growth of this group of higher education workers, the persistent and predominant characterization of the academic workforce is a simple binary of either professors or clerical staff. However, academic support professionals, perhaps most notably librarians and academic technologists, increasingly assume duties that were once reserved solely for traditional professors: teaching, research, and service.27 Even though these staff are often doing faculty-like work such as teaching, service, and research, traditional faculty protections—including academic freedom—do not apply.28

Without the protections of tenure and its associated governance, academic freedom as a right and protection is arguably toothless. With the erosion of tenure protections, in part through the dispersal of traditional faculty work to contingent faculty and professional academic support staff, “academic freedom today may be as endangered as it has been at almost any moment since the AAUP’s inception.”29 Tenure was never just about protecting research, according to academic freedom expert Hank Reichman, but instead must be championed for all involved in teaching and research on campus.30 Yet we now have a class of workers on higher education campuses who are expected to be educators and lead students in traditional paths of learning, but could easily lose their jobs and livelihood if there is blowback to their speech or other professional choices. Even when institutional policies around academic freedom are broad and inclusive of staff, in the absence of tenure, staff do not have the same meaningful freedom as faculty with tenure protections. If one can be fired at will, then one will almost certainly be guarded. Further, even if workers are covered by academic freedom policies in principle, it is usually unclear if all their activities are protected. This is why many advocates believe the core of the academic freedom fight goes beyond having the right policy in place and is actually about extending tenure protections on campuses.31 In the following sections, we aim to bring a librarian-centered lens to this conversation, to make the case that in addition to contingent faculty, librarians and other academic professional staff must be brought into protection as well, given the nature of their work on campus.

Librarians as Third Space Professionals

Academic freedom as it manifests for traditional faculty does not map neatly onto librarians’ jobs and experiences. Like some faculty, academic librarians often engage in applied scholarship and are enacting professional expertise on a day-to-day basis in the academic sphere. However, academic librarians typically work within rigidly hierarchical library workplaces. Unlike traditional faculty—who operate with significant autonomy and whose spheres (teaching, research, and service) are fairly well-defined—academic librarians also engage in a wide variety of professional activities well beyond just research and teaching and are usually directly supervised in this work. They are usually reviewed against a different set of performance metrics than traditional faculty. Because librarians are more closely supervised and tend to have less power in their workplaces, many duties of academic librarians might be subject to penalties and pushback to a greater degree than those of disciplinary faculty.32 As we will discuss more in our article, academic librarians also occupy a wide range of job classifications and only some are in traditional, tenure-protected positions. Many librarians are at-will employees or have some faculty-like rights but not all. Unpacking academic freedom for librarians, therefore, requires a different and broader picture than looking only at institutional policies and rigidly defined cases.

Librarians occupy myriad job classifications on their campuses, complicating research and understanding around this topic. For instance, when ACRL collects data from libraries on librarians’ faculty status, they ask an additional eight questions to establish clarity on the nature of that status. Additionally, ACRL then asks respondents to further detail whether or not librarians are “fully, partially, or not at all” included in policies such as “eligible for leaves of absence or sabbaticals on the same basis as other faculty” or “have access to funding on the same basis as faculty.” ACRL’s data from 2017 indicates that out of 1,645 responding academic libraries, FTE librarians at half of these (838, or 51%) had faculty status. However, 38% of libraries reported that their librarians have faculty status but not tenure. Interestingly, more libraries reported that their librarians fully have “the same protections of academic freedom as other faculty” than reported that their librarians have faculty status (70% compared to 51%).33 This can likely be explained by the fact that some institutions do apply academic freedom policies to staff and students, but also could be because respondents made assumptions about their protections when they might actually not be present in policy or in practice.

When it comes to librarians’ professional identity, institutional context therefore plays a key role. Approximately 60% of our survey respondents claim to be “faculty or faculty-like” in their status.34 Our findings do not tell us what that means to our respondents and this label is open to interpretation, especially for librarians who often have some kind of quasi-faculty status that is understood or experienced differently for individual librarians on the same campus. In their article on the role of academic librarians in their institutions, Rachel Fleming-May and Kimberly Douglass write, “The lack of consensus on the meaning and value of librarianship to academic institutions is also a likely contributor to the disparate treatment of librarians with faculty status from institution to institution.”35 In studying the professional identity of librarians as related to their job classification, Shin Freedman found, and our respondents reported the same, that librarians’ self-identity is closely correlated to institutional context, rather than broader professional norms and understandings. In other words, whether or not you identify as faculty-like has a lot to do with how your institution and library administration categorizes and treats you.36 This may seem like an obvious point, but it is worth calling out the distinction between traditional faculty identity and norms, which tend to be national in scope and much simpler to define—either tenure-track or contingent with clear rights understood to align or not with these two categories—and librarians’ roles and identities, which are much more locally bounded. While our survey relied on self-identification, we conjecture that self perception as being “faculty-like” is the strongest indicator of how librarians feel their autonomous work life is respected on their campuses. This has implications for librarians’ ability to advocate for their rights or even imagine alternatives to their current situations, likely compounded by how competitive the job market is for librarians.37 Many librarians accept the classifications as they are wherever they can get a job, which is unsurprising given the challenging job market and also how much murkiness surrounds this issue in the literature and in practice.

While the pros and cons of tenure for librarians are widely debated in the academic library literature, there is consensus that tenure is valuable when it comes to defending librarians’ academic freedom. Indeed, academic freedom is regularly cited as a primary reason for academic librarians to maintain or seek faculty status and tenure.38 Librarians publish on controversial topics to advance the field of librarianship and must regularly make potentially unpopular decisions in library operations. According to librarians Catherine Coker et. al., “If a librarian’s academic freedom is not protected, then, like teaching faculty, he or she might give a guarded and abridged version of the thoughts and ideas in his or her research. In addition, librarians may also guard against purchasing and disseminating controversial informational resources to help answer users’ questions, if they feel under threat that their job could be on the line.”39 Joshua Kim, an academic technologist who writes a regular column for Inside Higher Ed, asserts that he would accept a lower salary in exchange for tenure because of the freedom he would have to do critical, applied research in learning innovation.40 Librarians with tenure and with clarity around their status report higher job satisfaction, including when it comes to academic freedom protections.41

Our Findings

Survey Responses

Academic librarians with faculty status, according to our survey respondents, differ greatly in their perceptions of academic freedom protections from librarians who do not identify as faculty. In every category of job duty we asked about, librarians who identified as faculty-like reported feeling protected in their work at higher rates (Figure 1). Perhaps predictably, some of the biggest disparities were in areas that are most faculty-like in function: research and publishing (72% of faculty-identified librarians vs. 58% of non-faculty librarians), instruction (67% vs. 56%), and interactions with faculty (69% vs. 55%). But there was also a stark contrast in responses about non-library campus activities (63% vs. 49%) and library programming work (68% vs 54%), both arguably central functions of librarianship and crucial sites of outreach and relationship building for academic librarians.

Even with the higher numbers for faculty librarians, our findings offer confirmation of what we saw in the literature in terms of the heterogeneity of librarian faculty status and relative power on campus. Indeed, the figures are remarkable: a quarter of our faculty respondents did not feel well protected in their research and publishing activities, with similar responses for instruction and programming. These are the very “third space” areas where librarian innovation and creativity are seemingly most encouraged, and yet many of us do not feel like we can freely choose how we go about these tasks. According to the literature, faculty status for librarians varies widely from institution to institution in terms of what protections it affords. Our findings appear to confirm that faculty status for librarians does not in and of itself equate to feeling fully protected.

While there was a wide gap in the sense of safety for librarians of differing status, they report feeling silenced to the same general degree and by many of the same things (Figure 3). By far the largest number of respondents, 50% of faculty librarians and 45% of non-faculty librarians, reported feeling silenced by “fear that speaking up will hurt my career.” More than 20% in both categories felt silenced by “fear that my identity will put me at personal risk,” suggesting that certain social identities put people at a greater risk for targeted harassment, regardless of faculty status. In addition, 18% of our respondents (both faculty and non-faculty) also reported feeling afraid for their personal safety if they were to speak out about their beliefs. These two findings resonate with Lara Ewen’s article on librarians and targeted harassment. Citing an ALA panel from 2018 called “Bullying, Trolling, and Doxxing, Oh My! Protecting Our Advocacy and Public Discourse around Diversity and Social Justice,” Ewen describes the divergent experiences of two librarians:

Sweeney, who is white, said she was challenged mainly for the presumed content of the research, while Cooke, who is African American, was harassed in a way that made it clear that her race was a factor. Cooke was bombarded with hate mail and threatening voicemails. Both researchers feared that Cooke’s photograph, email address, and phone number had been copied from UIUC’s website and distributed throughout racist communities online.42

While librarians of various social identities are targeted for their research, the magnitude of the threats is often much higher for librarians from marginalized communities.

Why does fear of punishment seem to outweigh actual experiences of reprimands? The literature we reviewed earlier in this article points to a number of possible answers, all likely contributing to this disparity. Academic librarians have any number of legitimate reasons to feel insecure, even in the absence of experiencing or witnessing direct penalties. Faculty status for librarians often comes with explicitly fewer protections than what is written into policy for disciplinary faculty. Even with faculty status, librarians typically have less security and power in their institutions than other faculty. Many librarians who are classified as staff are keenly aware that their positions are ultimately precarious, even though, like other “third space” academic professionals, they perform work that—were it being done by disciplinary faculty—would be protected by academic freedom policies. As discussed in the literature review, many librarians work within a rigid hierarchy under direct supervision with far less autonomy than traditional disciplinary faculty. It is understandable that librarians would have a sense of caution and insecurity in these settings. Finally, the dramatic transformation of the academic workforce in recent decades, referenced earlier in this article, itself presents an existential threat for academic librarians and our administrators. Already more precarious on our campuses, we can see from these trendlines (and many others) that academic libraries are in defense mode when it comes to our budgets and workforce. All of these factors likely contribute to academic librarians perceiving a wide variety of potential threats to their work even in the absence of direct punishment, while simultaneously recognizing that their managers and library leadership are feeling their own set of pressures to avoid institutional conflict and protect their budgets and staff.

The final set of questions we asked in our survey was about the impacts of punishment (Figure 4). Of those who had experienced punishments, a substantial number said it had affected their engagement and motivation at work, impacted their mental well-being, their relationships with co-workers, and their sense of belonging in their position. About 80% of non-faculty librarians reported that the punishments they had experienced had impacted their mental health, and a nearly identical number said the punishment had a negative effect on their motivation and engagement at work. The numbers were only slightly lower for librarians in faculty positions: around 70% reported these same impacts. Around 60% of respondents in both groups said their experience with punishments had influenced their relationships with colleagues and students, and more than half reported that the experiences had made them question whether they belonged in their positions. Other impacts that were reported by more than 40% of both faculty and non-faculty librarians included feeling that they could not adequately do their jobs, and, disturbingly, considering whether they belonged in the profession at all. These responses resonate with what librarian Kaetrena Davis Kendrick terms “the low morale experience.”43 In Kendrick’s study, “participants reported emotional, physiological, or cognitive responses to low morale” after a trigger event, which in turn lead to “a negative effect on [their] daily practice of librarianship.”44 While Kendrick studied abuse in the workplace as the trigger for low morale experience and our survey asked about the impacts of academic freedom infringements, there is significant overlap in both experiences and impacts.

Figure 1. Librarians’ Perceptions of Free Expression by Faculty Status

bar graph with accessible equivalent linked below
Figure 1. A bar chart visualizing librarians’ perceptions of their protections for free expression, according to faculty status.
Accessible equivalent of this chart as a table.

Figure 2. Librarians’ Experiences of Academic Freedom Infringements by Faculty Status

bar graph with accessible equivalent linked below
Figure 2. A bar chart visualizing librarians’ experiences of infringements of academic freedom, according to faculty status.
Accessible equivalent of this chart as a table.

Figure 3. Experiences of Feeling Silenced by Faculty Status

line graph full text equivalent linked below
Figure 3. A bar chart visualizing librarians’ experiences of being silenced, shown according to faculty status.
Full text equivalent of this chart as a list.

Figure 4. Respondents who reported being “somewhat” or “significantly” impacted by punishments, by faculty status.

line graph full text equivalent linked below
Figure 4. A bar chart visualizing respondents who reported being “somewhat” or “significantly” impacted by punishments, shown according to faculty status.
Full text equivalent of this chart as a list.

Open-Ended Comments

In the survey’s open-ended comments field, many librarians offered insights into how their work environment failed to protect them. Their reasons were complex and varied, but overall they describe workplaces where managers and library directors make unpopular decisions and librarians feel afraid to question these decisions. When they did question them, many librarians told stories of being informally punished by being given fewer opportunities or getting subpar reviews, and they feared “there will be subtle punishments for expressing beliefs that are odds with the administration.” Several respondents talked about how research was treated in their faculty librarian positions: they had their research agenda questioned or outright denied, they had library leadership who sought to abolish research as a core job function, and they experienced informal punishments because of their research topics. Some commented that academic freedom seemed to apply most in their institutions when it was tied to research and publishing. Even with faculty status, many librarians feel they are treated differently from their peers in academic departments. More than one person reported that their research agenda was questioned by supervisors, that they had little control over their own schedules, and that they were “routinely” tone policed during performance reviews. The hierarchical workplaces in which librarians typically work, as described in our literature review, seem to complicate and sometimes seriously interfere with librarians’ freedom to freely pursue their research agendas.

In keeping with the rest of our survey results, non-faculty librarians felt less certain that their speech and actions in the workplace were protected, even while experiencing similar academic freedom infringements as faculty librarians. In open-ended responses, a striking number of non-faculty librarians discussed the lack of clarity around academic freedom protections and, worse, library leadership (both managers and deans) who claim to support free expression but then respond negatively to it in practice. Reading non-faculty librarians’ comments as a group reveals a consistent narrative of uncertainty, insecurity, and mixed messages:

[I]t feels like our library leadership wants it both ways: librarians that will be active in high-profile research, publishing, professional and community orgs, etc., but also never say anything leadership doesn’t like. And what gets considered “controversial” at my library often seems pretty unpredictable.

My university displays a wild mismatch between its stated policies and their application—academic freedom is not supported in general, especially at the library level.

My institution claims to uphold academic freedom, but there is a silent understanding that said freedom really only means “Freedom to uphold the ‘party’ line.”

I think in theory they defend academic freedom, but in practice they are scared of anything that they perceive will damage their image.

This is but a small sample of comments about mixed signals; this was one the most common complaints in our responses. These librarians point to a pattern of denied agency, of contradictory messages about their academic freedom, and managers unwilling or unable to defend their employees when the latter’s work product is questioned. The implication for many librarians is that outspokenness is something to avoid and to discourage in others. As one respondent eloquently described it,

[I]t feels as if librarians, whether faculty or not, are taught to be nice and congenial. Thus, the culture of the profession does not lend itself to speaking up without being labeled.

The culture of “niceness” in libraries goes well beyond the scope of the current research, but is worth exploring as a root cause of much confusion and conflict arising from academic freedom expectations.45 Niceness and neutrality work in tandem to create conditions that shut certain people out of the professional conversation, and even out of working in the library profession themselves.46

Faculty librarians reported some of the same experiences with unspoken restrictions and less-than-encouraging messages from supervisors, albeit in smaller numbers. Some of the comments echo what non-faculty librarians experience, but some point to specific inconsistencies between the rights they purportedly enjoy as faculty and how their libraries interpret those rights:

Most of the unfreedoms I experience are internal to the library. It is very conservative in comparison to the university. I don’t mean politically, I mean in risk taking and allowing a wide range of debate and speech. I have faced repercussions for things that are exceedingly trivial.

My institution embraces social justice, but the library does not. I have been here for [length of time redacted] and in that time I have contributed a great deal to the community, but it is lost in the micromanaging by the dean.

Another librarian reported having to change research topics in order to be granted a sabbatical, and their comments reveal a keen awareness of how this violated their academic freedom: “requiring me to research something [my dean] really likes is a violation of my academic freedom, but I’m tired of fighting him and just need a break.” These remarks point to a schism between institutional values about academic freedom and libraries’ more measured, cautious approach. This kind of fractured experience can happen in the other direction as well: in the University of California librarians’ recent contract negotiations, one of the sticking points for faculty librarians was to have academic freedom protections written into their contract. While UC librarians have faculty status, the pushback they experienced made it clear that the university saw librarians as excluded from essential protections that come with that status.47

Conclusion

Many librarians are living in a culture of fear on their campuses. Despite working in academic settings where academic freedom is held up as a value and is presumed by many to apply to librarians, our survey respondents reported significant limits to their free expression. Librarians are expected to enact independent, expert judgment frequently throughout their workdays. We are purchasing materials for our libraries, planning programs, teaching students, and have unique curricular insights. Yet, we learned in our survey, many librarians are in workplaces where free expression is discouraged and even punished. Indeed, as evidenced in Figures 2 and 3 above, more than a third of librarians surveyed said they’d been informally punished, and 45% said they worried that speaking up would hurt their career. Further, a culture of silencing and fear leads to a foreclosing of underrepresented voices, upholds the status quo, and hinders growth in our institutions.

Our research confirms some of our hypotheses about the role of faculty status in librarians’ academic freedom protections.  We were surprised, however, to discover that where faculty and non-faculty differ most is in their perceptions about their protections; we found a strong connection between faculty status and perceiving free expression to be protected. Respondents without faculty status reported feeling protected in their job functions at lower rates than faculty librarians in every category we asked about, often with differences of ten percentage points or more. When it came to infringement of academic freedom, however, faculty and non-faculty librarians reported similar experiences. This raises interesting questions: why do faculty librarians feel more protected, even as they report being punished for their actions and speech? Is there something about faculty status that empowers librarians to speak more freely, in spite of potential punishments? Or is there more security in these positions, so that the punishments are easier to bear? Whatever the reasons, it seems clear that faculty librarians are better positioned to speak out in their campus communities, take a critical approach to the core responsibilities of their positions, and generally be confident that they can approach their work without fear their views will get them fired.

While it is beyond our powers and the scope of this research to resolve the disparity in academic freedom of faculty and non-faculty librarians, we can offer a way to begin examining what leads some librarians to feel protected and others not to. In the third space continuum, faculty librarians reside closer to traditional faculty and feel less precarious. Conversely, non-faculty librarians, as evidenced in their survey responses and open-ended comments, are often forced to navigate complex, sometimes contradictory messages about their academic freedom from their managers. Other times, the message is quite clear: they are considered staff, and staff are explicitly not covered by academic freedom in their institutions.

How then do we advocate for more and clearer academic freedom protections for librarians of all job classes? As with any endemic problem, the solution needs both local and systemic dimensions. We offer some suggestions here, but we also encourage our colleagues to think about how these strategies would play out in their local context and if there are others that might work better for you. When it comes to solutions, we believe the answer lies in raising awareness about this issue, understanding one’s role in the academic ecosystem locally and beyond, and identifying allies beyond our own ranks. On a systemic and national level, the path will involve calling upon national networks and following successful models of progressive change. Librarian professional organizations should attend to academic freedom as a distinct issue apart from book censorship and freedom for our users. We know from the ACRL statistics cited above that more than two-thirds of academic libraries believe their librarians to have the same academic freedom protections as faculty. Starting a conversation about these stated norms that are in conflict with our respondents’ reported experiences could lead to clearer protections. We should also participate in broader organizations like the AAUP and other groups agitating for academics, and push the issue of librarians within those bodies.

The local level has the most potential for meaningful change. It is imperative that librarians know exactly what or whether their employee handbooks, bylaws, union contracts, or other governing documents have to say about academic freedom for librarians. It may be that your handbook says nothing about academic freedom, but that is good information to have. If you have venues for discussing shared values around academic freedom within your library, try starting a conversation there. If there are explicit policies about academic freedom for faculty and students, but not for staff, are there official governance structures (such as faculty meetings or a university senate) through which this issue could be raised? Who on your campus outside the library are likely allies, such as contingent faculty or academic technologists?

By situating librarians within the framework of “third space professionals,” we can shift and clarify the conversations around academic freedom happening in our profession and on our campuses. When news stories tell us that even traditional faculty are at risk of losing their jobs from free expression, it follows that uncertainty and precarity are amplified the farther one is from the centers of power. Adjunct instructors, at-will staff, and others in more insecure positions on their campuses are particularly vulnerable. Organizing and agitating alongside other third space colleagues—academic technologists, staff researchers, lecturers—might be a more effective way to capture the attention and support of protected faculty and senior administrators. Third space academic professionals may be suffering the same self-censorship instinct because of their own employment precarity, but through allyship and solidarity, we all might secure greater freedoms. Building solidarity with local allies is an avenue toward greater power, such as organizing together into a union.48 While librarians often enjoy a stature on campus that other third space professionals do not (whether because of pay, additional benefits, or permanent employment status), the existential threat to higher education employment will be felt by us all.49 Relying on tenure alone limits access to academic freedom protections to a select few and seems to be a losing path forward. If we collaborate together through unionizing or otherwise, we have the best chance of highlighting the need for academic freedom protections that extend beyond the tenure framework.

Appendix

Perception of Protections, By Union Affiliation
  Union Non-union
Social Media 40% 47%
Interactions with other staff 73% 75%
Workplace policies 58% 59%
Off-campus activities 67% 66%
Programming 52% 51%
Interactions with students 69% 67%
Research and publishing 64% 62%
Instruction 65% 61%
Cataloging 36% 32%
Interactions with faculty 71% 64%
Non-library campus activities 62% 55%
Collection Development 69% 58%

Acknowledgements

Yupei Liu, a computer science and statistics major at the University of Minnesota, helped immensely with statistical analysis and creating charts from our data. We are grateful to Aaron Albertson for his help testing our survey design. We also wish to thank early readers of our draft, Heather Tompkins and Rachel Mattson. Finally, we also wish to thank the publishing editor of the submitted article, Ryan Randall, and peer reviewers, Meghan Dowell and Ian Beilin.


Accessible Equivalents

Figure 1 as a Table

Types of Expression
Faculty Non-faculty
Collection development 79% 67%
Cataloging 75% 66%
Research and publishing 72% 58%
Interactions with other staff 70% 62%
Interactions with faculty 69% 55%
Off-campus activities 68% 62%
Programming 68% 54%
Instruction 67% 56%
Interactions with students 66% 59%
Non-library on-campus activities 63% 49%
Questioning workplace policies 59% 52%
Social media 51% 47%

Return to Figure 1 caption.

Figure 2 as a Table

Types of Infringement
Faculty Non-faculty
Informally penalized for question workplace 31% 35%
Told not to participate in org. activity 18% 20%
Directed to change work 14% 16%
Formally penalized for question workplace 5% 5%

Return to Figure 2 caption.

Figure 3 as a Table

Types of Effects
Faculty Non-faculty
Fear that speaking up will hurt my career 49% 45%
Fear that my identity will put me at personal risk 21% 22%
Fear that speaking up could jeopardize my personal safety 17% 17%
Complaints from colleagues students or staff about my academic activities 11% 8%
Complaints from colleagues students or staff about my non-academic activities 5% 11%
Threats and harassment from coworkers students or faculty 5% 10%
Threats and harassment from the public 3% 3%
Complaints from the public about my non-academic activities 2% 3%
Complaints from the public about my academic activities 1% 4%

Return to Figure 3 caption.

Figure 4 as a Table

Effects of punishment: Responses of “somewhat” or “significantly”
Faculty Non-faculty
Motivation and engagement at work 71% 80%
Mental Well-being 70% 80%
Relationships with coworkers and students 60% 64%
Sense that I belong in this position 56% 64%
Ability to adequately do my job 46% 51%
Sense that I belong in this profession 37% 42%
Physical Well-being 30% 34%

Return to Figure 4 caption.

Footnotes

  1. N.D.B. Connolly and Keisha N. Blain, “Trump Syllabus 2.0,” Public Books, June 28, 2016, https://www.publicbooks.org/trump-syllabus-2-0/.
  2. Danya Leebaw and Alexis Logsdon, “The Cost of Speaking Out: Do Librarians Truly Experience Academic Freedom?” (Association of College & Research Libraries Annual Conference, Cleveland, OH, April 2019). http://hdl.handle.net/11299/203282
  3. Jennifer Washburn, University Inc.: The Corporate Corruption of Higher Education (New York: Basic Books, 2005).
  4.  Rachel A. Fleming-May and Kimberly Douglass, “Framing Librarianship in the Academy: An Analysis Using Bolman and Deal’s Model of Organizations,” College & Research Libraries 75, no. 3 (May 2014): 389-415, https://doi.org/10.5860/crl13-432.
  5. We are white women employed at a large research university library in management, reference, and instruction positions. We tried to share our survey with as wide and diverse a pool of respondents as possible, well beyond our own limited networks, in order to best understand how socioeconomic positionality correlates with academic freedom for library workers.
  6. Becky Marie Barger, “Faculty Experiences and Satisfaction with Academic Freedom,” Doctor of Philosophy, Higher Education, University of Toledo, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1279123430; “Inclusive and Functional Demographic Questions,” University of Arizona Office of Lesbian, Gay, Bisexual, Transgender, Queer and Questioning (LGBTQ) Affairs, accessed 1/23/2019, https://lgbtq.arizona.edu/sites/lgbtq.arizona.edu/files/Inclusive%20and%20Functional%20Demographic%20Questions.pdf; Meghan Dowell, “Academic Freedom & the Liberal Arts Librarian,” CAPAL18 Conference, University of Regina, Saskatchewan, Canada, 2018. https://capalibrarians.org/wp/wp-content/uploads/2018/07/6C_Dowell_slides-notes.pdf.
  7. Leebaw and Logsdon, “Cost of Speaking Out,” ACRL 2019.
  8. American Association of University Professors, “1940 Statement of Principles on Academic Freedom and Tenure,” 1940, https://www.aaup.org/report/1940-statement-principles-academic-freedom-and-tenure.
  9. Hank Reichman, The Future of Academic Freedom (Baltimore: Johns Hopkins University Press, 2019), xiv.
  10. Stanley Fish, “Academic Freedom and the Boycott of Israeli Universities,” in Who’s Afraid of Academic Freedom?, ed. Akeel Bilgrami and Jonathan R. Cole (New York: Columbia University Press, 2015), 275–92.
  11. Joan W Scott, “Knowledge, Power, and Academic Freedom,” in Bilgrami and Cole, Who’s Afraid, 78.
  12. For instance: Association for College & Research Libraries (ACRL), “ACRL Statement on Academic Freedom,” 2015, http://www.ala.org/acrl/standards/academicfreedom; Joint Committee on College Library Programs, “ACRL Joint Statement on Faculty Status of College and University Librarians,” 2012, http://www.ala.org/acrl/standards/jointstatementfaculty
  13. Gemma DeVinney, “Academic Librarians and Academic Freedom in the United States: A History and Analysis,” Libri 36, no. 1 (1986): 24-39; Noriko Asato, “Librarians’ Free Speech: The Challenge of Librarians’ Own Intellectual Freedom to the American Library Association, 1946-2007” Library Trends 63, no. 1 (Summer 2014): 75-105. http://doi.org/10.1353/lib.2014.0025; Richard A. Danner and Barbara Bintliff, “Academic Freedom Issues for Academic Librarians,” Legal Reference Services Quarterly 25, no. 4 (2007): 13-35. https://doi.org/10.1300/J113v25n04_03.
  14. See documents cited here: American Library Association, “Intellectual Freedom: Issues and Resources,” accessed February 3, 2020, http://www.ala.org/advocacy/intfreedom; John Wenzler, “Neutrality and Its Discontents: An Essay on the Ethics of Librarianship” portal: Libraries and the Academy 19, no. 1 (2019): 55–78. https://doi.org/10.1353/pla.2019.0004.
  15. Amelia N. Gibson, Renate L. Chancellor, Nicole A. Cooke, Sarah Park Dahlen, Shari A. Lee, and Yasmeen L. Shorish, “Libraries on the Frontlines: Neutrality and Social Justice,” Equality, Diversity and Inclusion: An International Journal 36, no. 8 (2017): 751-766. https://doi.org/10.1108/EDI-11-2016-0100. See also remarks from many of the panelists at the American Library Association 2018 Midwinter Meeting’s President’s Program as highlighted in “Are Libraries Neutral?” American Libraries, June 1, 2018, https://americanlibrariesmagazine.org/2018/06/01/are-libraries-neutral/.
  16. See https://bannedbooksweek.org/ from the American Library Association.
  17. Asato, “Librarians’ Free Speech.”
  18. Armando Carrillo, “UC Librarians Conclude Negotiations of Salary Increases and Academic Freedom Protections” Daily Bruin, April 9, 2019. https://dailybruin.com/2019/04/09/uc-librarians-conclude-negotiations-of-salary-increases-and-academic-freedom-protections/.
  19. Mary Kandiuk and Harriet M. Sonne de Torrens, “Academic Freedom and Librarians’ Research and Scholarship in Canadian Universities,” College & Research Libraries 79, no. 7 (November 2018): 931-947, https://doi.org/10.5860/crl.79.7.931
  20. Fleming-May and Douglass, “Framing Librarianship,” 395
  21. Meghan Dowell, “Academic Freedom & the Liberal Arts Librarian,” CAPAL18, University of Regina, Saskatchewan, Canada, 2018. https://capalibrarians.org/wp/wp-content/uploads/2018/07/6C_Dowell_slides-notes.pdf.
  22. Stephanie Beene and Cindy Pierard, “RESIST: A Controversial Display and Reflections on the Academic Library’s Role in Promoting Discourse and Engagement,” Urban Library Journal 24, no. 1 (January 1, 2018). https://academicworks.cuny.edu/ulj/vol24/iss1/6.
  23. Laura Ewen, “Target: Librarians: What Happens When Our Work Leads to Harassment—Or Worse,” American Libraries Magazine, June 3, 2019. https://americanlibrariesmagazine.org/2019/06/03/target-librarians-harassment-doxxing/.
  24. While outside the scope of this article, we encourage readers to review the literature on “critical university studies,” which explores how campus educators outside of the ever-shrinking category of tenure track faculty operate within university structure. See Stefano Harney and Fred Moten, The Undercommons: Fugitive Planning & Black Study (London: Minor Compositions, 2013) and la paperson, A Third University Is Possible (Minneapolis: University of Minnesota Press, 2017). Both Moten and Harney and paperson, among others, locate the spaces of radical transformation of the university outside of tenure track faculty positions. The work that these scholars see as central to injecting needed critiques of power and white supremacist, capitalist, patriarchal structures of the university resides almost wholly in the work done by educators (in the broadest sense) with the most precarious positions. https://manifold.umn.edu/projects/a-third-university-is-possible.
  25. See Judith E. Berman and Tim Pitman, “Occupying a ‘Third Space’: Research Trained Professional Staff in Australian Universities,” Higher Education 60, no. 2 (2010): 157–69. https://doi.org/10.1007/s10734-009-9292-z.
  26. Celia Whitchurch, “Shifting Identities and Blurring Boundaries: The Emergence of Third Space Professionals in UK Higher Education,” Higher Education Quarterly 62, no. 4 (October 2008): 377–96. https://doi.org/10.1111/j.1468-2273.2008.00387.x. The notion of a “third space” has been introduced and sometimes deeply studied in a number of disciplines with quite variable meanings and implications (i.e., place-based versus cultural versus professional). In libraries, see James Elborg for a place-based understanding of third space theory: “Libraries As the Spaces Between Us: Recognizing and Valuing the Third Space,” Reference & User Services Quarterly 50, no. 4 (2011): 338-350.
  27. Bruce Macfarlane, “The Morphing of Academic Practice: Unbundling and the Rise of the Para-Academic: The Morphing of Academic Practice,” Higher Education Quarterly 65, no. 1 (January 2011): 59–73. https://doi.org/10.1111/j.1468-2273.2010.00467.x ; Fiona Salisbury and Tai Peseta, “The ‘Idea of the University’: Positioning Academic Librarians in the Future University,” New Review of Academic Librarianship 24, no. 3/4 (July 2018): 244–64. https://doi.org/10.1080/13614533.2018.1472113; also see Reichman, The Future of Academic Freedom, 5.
  28. Related to these points is the literature on the academic identity that “third space” professionals bring to their roles, with disciplinary norms and an expectation of academic freedom baked into their ways of being an academic. See Celia Whitchurch, “The Rise of the Blended Professional in Higher Education: A Comparison between the United Kingdom, Australia and the United States,” Higher Education 58, no. 3 (September 1, 2009): 407–18. https://doi.org/10.1007/s10734-009-9202-4; Glen A. Jones, “The Horizontal and Vertical Fragmentation of Academic Work and the Challenge for Academic Governance and Leadership,” Asia Pacific Education Review 14, no. 1 (March 1, 2013): 75–83. https://doi.org/10.1007/s12564-013-9251-3; Berman and Pittman, “Occupying a Third Space;” and Macfarlane, “Morphing of Academic Practice,” 65.
  29. Reichman, Future of Academic Freedom, 4.
  30. Reichman, Future of Academic Freedom, 7.
  31. Reichman, Future of Academic Freedom, 8.
  32. Fleming-May and Douglass, “Framing Librarianship.”
  33. Mary Petrowski, Academic Library Trends and Statistics (Chicago: Association of College & Research Libraries, 2017) 5, 136, 246, & 400.
  34. We did not ask respondents who claimed faculty status whether or not they were tenured or pre-tenure. In retrospect, it would have been useful to further disaggregate the faculty librarians to learn if tenured status also affected their responses. However, it is also worth noting that even with pre-tenure librarians included, faculty librarians overall feel more secure in their academic freedom protections than non-faculty librarians.
  35. Fleming-May and Douglass, “Framing Librarianship,” 394.
  36. Shin Freedman, “Faculty Status, Tenure, and Professional Identity: A Pilot Study of Academic Librarians in New England,” portal: Libraries and the Academy 14, no. 4 (October 2014): 533–65. https://doi.org/10.1353/pla.2014.0023.
  37. Eamon Tewell, “Employment Opportunities for New Academic Librarians: Assessing the Availability of Entry Level Jobs,” portal: Libraries and the Academy 12, no. 4 (October 2012): 407-423.
  38. Catherine Coker, Wyoma vanDuinkerken, and Stephen Bales, “Seeking Full Citizenship: A Defense of Tenure Faculty Status for Librarians,” College & Research Libraries 71, no. 5 (September 2010): 406-420. https://doi.org/10.5860/crl-54r1;  Elise Silva, Quinn Galbraith, and Michael Groesbeck. “Academic Librarians’ Changing Perceptions of Faculty Status and Tenure,” College & Research Libraries 78, no. 4 (May 2017): 428-441. https://doi.org/10.5860/crl.78.4.428.
  39. Coker, vanDuinkerken, and Bales. “Seeking Full Citizenship.”
  40. Joshua Kim, “What Percent of Your (Academic) Salary Would You Trade for Tenure?” Inside Higher Ed (May 12, 2009), https://www.insidehighered.com/blogs/technology-and-learning/what-percent-your-academic-salary-would-you-trade-tenure
  41. Melissa Belcher, “Understanding the experience of full-time nontenure-track library faculty: Numbers, treatment, and job satisfaction,” The Journal of Academic Librarianship 45, no. 3 (May 2019): 213-219, https://doi.org/10.1016/j.acalib.2019.02.015.
  42. Lara Ewen, “Target: Librarians: What Happens When Our Work Leads to Harassment—Or Worse,” American Libraries Magazine, June 3, 2019. https://americanlibrariesmagazine.org/2019/06/03/target-librarians-harassment-doxxing/
  43. Kaetrena Davis Kendrick, “The Low Morale Experience of Academic Librarians,” Journal of Library Administration, November 17, 2017.
  44. Kaetrena Davis Kendrick, “The Low Morale Experience of Academic Librarians”.
  45. See Fobazi Ettarh, “Vocational Awe and Librarianship: The Lies We Tell Ourselves,” In the Library with the Lead Pipe, January 18, 2018; and Gina Schlesselman-Tarango, “The Legacy of Lady Bountiful: White Women in the Library,” Library Trends 64, no. 4 (2016), both of which offer an intersectional critique of how libraries enforce a performative librarian identity that purports to be neutral, nurturing, and inoffensive.
  46. Gibson et al, “Libraries on the Frontlines: Neutrality and Social Justice.”
  47. Snowden Becker, Twitter thread, https://twitter.com/snowdenbecker/status/1044297787066671104; Martin Brennan, “UC Administration: “Academic Freedom is not a good fit for your unit””, UC-AFT Librarians Blog, August 13, 2018, https://ucaftlibrarians.org/2018/08/13/uc-administration-academic-freedom-is-not-a-good-fit-for-your-unit/.
  48. We did ask about union status in our survey, but there was little difference in any area between union and non-union respondents (See Appendix).
  49. For more on librarian attitudes toward unionization, see Rachel Applegate, “Who Benefits? Unionization and Academic Libraries and Librarians,” Library Quarterly 79, no. 4 (October 2009): 443-463; Stephanie Braunstein and Michael F. Russo, “The Mouse That Didn’t Roar: The Difficulty of Unionizing Academic Librarians at a Public American University,” in In Solidarity: Academic Librarian Labour Activism and Union Participation in Canada, Mary Kandiuk and Jennifer Dekker, eds. (Sacramento: Litwin Books, 2013).and Chloe Mills and Ian McCollough, “Academic Librarians and Labor Unions: Attitudes and Experiences,” portal 18, no.4 (October 2018): 805-829.

OpenSRF 3.2.1 released / Evergreen ILS

We are pleased to announce the release of OpenSRF 3.2.1, a message routing network that offers scalability and failover support for individual services and entire servers with minimal development and deployment overhead.

OpenSRF 3.2.1 is a small bugfix release that includes the following changes:

  • A fix to prevent certain requests that return chunked messages from timing out prematurely.
  • A fix void some deprecation warnings when running autoreconf -i.
  • An improvement to the installation instructions for Debian Buster.

To download OpenSRF and view the full release notes, please visit the downloads page.

We would also like to thank the following people who contributed to the release:

  • Galen Charlton
  • Bill Erickson
  • Chris Sharp
  • Jason Stephenson

Catching up with past NDSA Innovation Awards Winners: Mid-Michigan Digital Practitioners / Digital Library Federation

The Mid-Michigan Digital Practitioners (MMDP) won a 2016 Innovation Award in the Organization category. MMDP was recognized for taking an innovative approach to providing support and guidance to the digital preservation community. The responses to this Q&A were provided by Rick Adler, Ed Busch, and Bryan Whitledge.

What have you/the project team been doing since receiving an NDSA Innovation Award?

Since receiving the award, we have continued to do what we do best – connecting archivists, librarians, curators, historians, digital humanities experts, and other kindred professionals and students across Michigan. Cultural heritage workers have Screen shot of past Mid-Michigan Digital Practitioners meeting recordings on the MSU Kaltura websitea disposition to share knowledge with others. MMDP is all about sharing knowledge and our constituency is other cultural heritage workers. We connect via our semi-annual meetings (which, thanks to support from the Library of Michigan, and other institutions, have remained free for attendees) and through our listserv list.

In light of the public health emergency, we didn’t hold a spring meeting, but we did hold some virtual check-ins to connect with the MMDP community and share experiences about working from home, dealing with job cuts at our institutions, or returning to the physical workspace. We are looking forward to a fully virtual fall meeting – we think that the Mid-Michigan Digital Practitioners should be able to pull off a great virtual meeting!

One effort we undertook a few years ago was to create a directory of experts. Conferences and meetings are great, and so is a listserv list, but sometimes it is nice for one person to connect with another to speak in-depth about a specific topic. The directory is a list of MMDP members who are willing to share their expertise in different skills and tools with other MMDP members on a one-on-one basis. If someone is looking for someone with policy-writing skills, we’ve got that. If another person needs some help with StoryMapJS, we’ve got that, too. And if another MMDP member needs some help cataloging Cherokee-language materials, there is an expert who can help with that!

We also have an MMDP member who led a pilot grant in Michigan to explore the creation of a statewide digital preservation network. While the MMDP wasn’t part of the grant, we definitely contributed to getting the word out across the state. MMDP members have been at the table every step of the way. The project is now moving to the next phase in creating a digital preservation network and the MMDP is one venue for sharing information about the project with the people most likely to work with it.

What did receiving the NDSA Innovation award in 2016 for MMDP mean to you and/or the project team?

Back when we started, we were an experiment… and it worked. So, the recognition was very meaningful. The award definitely raised our profile outside of Michigan. Hopefully, we have inspired other digital practitioners from around the country to form similar groups. For us, in terms of our Michigan constituency, it reinforced our conviction that what we are doing is valuable and needs to be sustained. Many of our more recent members might not know about the NDSA Innovation award, but the commitment, effort, and spirit that led NDSA to bestow the award upon us are still present in everything we strive to do for our community.

What efforts, advances, or ideas over the last 5-8 years have caught your attention or interest in the area of digital stewardship?

Lowering the barriers to entry—across the board—for digital culture. The barriers are numerous and they aren’t solely financial. The network we mentioned a moment ago is an example of that. Here in Michigan, we have some world-class institutions and they can create homegrown digital preservation environments that are second-to-none. But we also have many small historical societies with historical collections that are just as important, yet they don’t have the tools, the staff, or the finances to allow them to join a major digital-preservation endeavor. MMDP members can help to make digital preservation accessible to institutions of all stripes in Michigan. Our members have varying levels of knowhow about a wide range of digital stewardship topics (advocacy, governance, technical infrastructure skills, developing training materials, etc.), and encouraging them to share what they know expands the potential of cultural heritage professionals around Michigan. Also, we can lean on the Screen shot of past Mid-Michigan Digital Practitioners meeting recordings on the MSU Kaltura websitetechnological tools and skills at those institutions that support the network to make the essential technology of digital preservation accessible to all at a relatively low cost. Hopefully, through a project like this, every library, archives, museum, and historical society in the state can jump in and join the digital preservation effort. And we can get all of those historic photos off of old flash drives!

Another set of barriers that we hope to do away with are the limits to access that surround much of our digital cultural content. We are inspired by all of the various digital efforts across the state and the country. But there are so many fantastic resources that are buried behind paywalls and even more fantastic resources that don’t see the light of day because of the costs associated with making them available. One of our members works with cultural institutions all across the state to help them share their collection metadata through the Digital Public Library of America and a new state portal called Michigan Memories. But that isn’t enough. We also need to find resources for institutions with fantastic content but no means to host it, and help them preserve it or make it accessible with low-cost or free tools. It includes developing K-12 lesson plans and curricula supported by the freely available primary sources—we know how great this content is, but we also have to be aware that many of our target audiences are swamped with information and they might not have time to wade through hundreds of primary sources across several different platforms to develop a the perfect lesson. If we can help with that, students across the state benefit.

The MMDP project provides a great example of a regional collective that represents a wide range of libraries, archives, and museums. What successes or challenges have emerged over the now 7 years of this project?

As with any endeavor, especially one operated solely by volunteers, it can be difficult to sustain. But we have been fortunate to keep this going with new volunteers who rotate on and off our planning team as they have time. Our leadership and governance is truly 100% flexible. This means that our planning team varies in size and composition all the time. We have had planning team members tell us that their other responsibilities in life have picked up, so they have to take a break from MMDP. One year later, they are back on the planning team conference calls and recruiting speakers for our next workshop.

Overall, one of the major successes has been the low-risk opportunity for leadership afforded to our members. Becoming a member of the planning team is as simple as saying, “I would like to help.” From there, the responsibilities are divvied up as needed. When we say “low-risk,” it doesn’t necessarily mean easy or not important. Putting together a conference for 80+ attendees is no simple feat. But we have such a great group and the low-pressure nature of the MMDP really allows a new leader to learn the ins and outs without fear of failure. And, of course, the veteran MMDP members are always available as a safety net to help out as needed. Dozens of cultural heritage workers in Michigan can include a stint with MMDP’s planning team as part of their leadership experience.

We have had another success in that our efforts have been recognized by the Library of Michigan and a few of the professional organizations in Michigan for librarians, archivists, and museum professionals. We have been offered space in the Library of Michigan’s facilities to host our conferences, and we have been able to partner with other professional organizations to host a one-day workshop or a panel on digital stewardship in their conferences. It is great that other organizations and institutions in Michigan recognize that we are a special group and they support us—it allows us to keep serving anyone in Michigan looking for more information about anything and everything related to digital stewardship. 

What are some priorities or challenges you see for digital stewardship?

2020 has definitely brought about many challenges in all aspects of life. Because of the current public health emergency, the resulting budget cuts, and calls for meaningful change in policies related to equity and inclusion, the priorities for digital stewardship will have to change, too. Digital cultural heritage seemed to many people like a nice “extra” thing in their lives. With remote learning, we saw how digital cultural heritage immediately became a necessity for students. And it became a comfort for people looking for a moment of peace—they could explore a museum’s holdings through a public-facing DAM or do some genealogy using digital newspapers. We also need to take stock of the work we are doing and how it can best serve all of our communities, which may mean reorienting some of the priorities we defined before March of 2020.

In light of the seismic upheavals on many fronts, MMDP foresees tough times in trying to execute our priority of continuing to facilitate the sharing of digital stewardship information among our members. With tightening budgets on the horizon and more demands for digital cultural heritage, our members need to be able to get the most out of the limited time and funds that we have. There are so many new tools, new initiatives, and new skills—every one of us could spend a lifetime learning about them (and spend a ton of money in the process). By sharing some information and offering advice like “try this, and avoid that,” we hopefully can save people a lot of time, effort, and money to accomplish their digital stewardship goals.

Another priority will be to continue to lower the barriers to entry to digital stewardship. Michigan is a big state with a wide variety of needs in that realm. MMDP is one helpful piece in a larger puzzle of knowledge-sharing and collaboration that will be needed to ensure that Michigan’s cultural heritage is preserved and made accessible to the people who could use it.

The post Catching up with past NDSA Innovation Awards Winners: Mid-Michigan Digital Practitioners appeared first on DLF.