Planet Code4Lib

These are the grantees of the Open Data Day 2019 mini-grant scheme! / Open Knowledge Foundation

Open Data Day is an important date for a broad community that works for a more open world, where information can benefit more people.  To support the efforts made by different groups and organizations on this day, we have developed the Open Data Day mini-grants, where, along with other organizations interested in having a more open world, we provide funds to events in different parts of the world.

Since we started the mini-grant program we have seen a growing number of events with great ideas to make open data work for more people. Each of these organisations will receive $300 to cover for the different things they need to run their events.

Without further ado, we present the mini-grant supported events for this year and their organizers.

  1. OpenStreetMap Catalá will go out to the streets to obtain accessibility data of the city and edit them on the map.
  2. Code for Columbus will leverage the city’s new open data portal as well as open data sources from the regional planning commission to gain insight into social problems in the city.
  3. Técnicas Rudas wants to disseminate the use of open datasets on the incidence of violence and criminality to map femicides at the municipal level for all of Mexico, as well as to visualize relationships with other types of violence such as kidnappings, robbery, or homicide.
  4. Open Knowledge Brasil / UG Wikimedia in Brazil will scrape several databases, create fully-described items with lots of props for all existing dams in Brazil on Wikidata after the Brumadinho dam disaster.
  5. YouthMappers Kenya and OpenStreetMap Kenya will have an introduction on what open data is, training on how to contribute to the OpenStreetMap project by working on an existing project in Kenya and finally how this data can be accessed and used in various sectors e.g. urban planning, agriculture etc.
  6. The Center for education and transparency will host a discussion with Serbian citizens, local government, media and civil society representatives on the subject of opening geospatial data in Serbian municipalities
  7. Code for Curitiba will train people to perform the collaborative data mapping of the city of Curitiba with real experiments.
  8. OpenStreetMap Taiwan and Wikimedia Taiwan will host a series of events to improve the geographic data quality of hospital education and woman related items on Wikidata and will add the corresponded geographic information on Wikidata for galleries, archives, libraries, and museums, hoping to make a map for GLAM in Taiwan.
  9. Open Data Delaware will use the day to progress on existing civic projects and to introduce the projects, as well as the ideas and key concepts of Open Data and transparency, to a broad audience from their local community.
  10. In Argentina, the Centro Latinoamericano de Derechos Humanos (CLADH), Fundación Nuestra Mendoza, the Journalism School of the Universidad Maza, and the Public Policy and Planning Direction of the Universidad Nacional de Cuyo will work with youth to analyze data and identify how many women have official positions in different levels of government. They want to do a report on this.

  1. AfricArxiv wants to raise public awareness about the importance of science in French-speaking Africa (and Benin in particular) of joining the open science movement and getting involved in data openness.
  2. The Librarians at the Bindura University of Science Education (BUSE) will make work with faculty members to demystify the concept of open data. They will present and discuss the way forward and advocate for the establishment of an open data repository at BUSE.
  3. The Association for the Promotion of Open Science in Haiti and Francophone Africa will sensitize the researchers of the University of Yaoundé I to adopt best practices of open research data.
  4. The Young Academy of Slovenia will gather people to raise awareness of the importance of sharing open data among young researchers from different scientific fields, from natural science to social sciences and humanities.
  5. Africatech will bring together researchers, biologists, environmentalists, professors, policy-makers and experts to discuss issues related to open science in the Democratic Republic of Congo.
  6. Sociedad para la Ciencia Abierta y la Conservación de la Biodiversidad (SCiAC) in Costa Rica will develop a training workshop on the use of tools to release data. They will also train on coding to analyze data in environmental sciences to contribute to reproducibility and science teaching.
  7. HackBo and mutabiT want to bridge reproducible research and publishing techniques with data activism to create a booklet on those topics using the Panama Declaration on Open Science from the Global South, using such declaration as a use case, showcasing the Open Science tools and techniques for/from the Global South that we have been developing and using since 2014.
  8. Fundación Internet is going to Create a collaborative repository of Bolivia’s social research databases as a backup to ensure its permanent availability and concentrate databases in one place.
  9. The Centro Latinoamericano de Investigaciones Sobre Internet in Venezuela will showcase the methods and resources of open science, showing them as a valid alternative for university students.
  10. Fundación Karisma will work to improve the quality of open biodiversity data available on citizen science platforms and create a campaign about the importance of these data for science and social action.

  1. The National University of Lesotho will discuss the use of data for sustainable development, focusing on gender equality and youth empowerment.
  2. The Committee for a Better New Orleans will host a game night for New Orleanians to “play mayor for a day” by balancing their city budget via
  3. Safety First for Girls Outreach Foundation will Showcase a good example of how open data can bring positive change by sharing the learnings from Safety Report: Core Issues Affecting Safety of Girls in Zambia.
  4. iWatch Africa will explore how online data tools can use open data on violence and discrimination against women and girls in Ghana and to promote gender equality.
  5. Escuela de Fiscales will run a workshop to introduce people to open data, show impact cases and debate the best approach to engage people in the context of upcoming elections
  6. EldoHub will work with young people to equip them with knowledge on how they can leverage on open data/open government, find opportunities for meaningful employment (inclusive jobs for disadvantaged African youth) and how they can help local government to be more open for inclusive youth participation.
  7. Open Knowledge Colombia wants to demonstrate and to raise awareness on salary differences among genders in Colombia and start a debate on the rights and inequalities of women in the Colombian society through open data.
  8. NaimLab will work in the city of Chiclayo to decentralize the use of open data from the capital and foster the use with a community of activists and civil society organizations, promoting transparency of public information and generate follow-up from citizen participation.
  9. Economía Femini(s)ta wants to estimate the annual cost of menstrual management products per person in 2019 in Argentina and to make an infographic with such estimations.
  10. The Women Economic and Leadership Transformation Initiative (WELTI) will explore what role data plays with regards to women’s health with an emphasis on cancer.

  1. Abriendo Datos Costa Rica will track the status of public works that use public money in different parts of the country.
  2. BudgIT Foundation will work with grassroots organisations and other people to show how public allocated funds are spent for the benefit of Nigerians
  3. Datos Abiertos Medellín), Exploratorio Parque Explora will open public health data, diseases related with air quality, number of bicycle routes and trees planted versus main goal appearing in the Local Government Development Plan, and other public policy data related with air quality.
  4. Girolabs will bring the community together in an event investigate the last 10 years of funding of political parties from the Electoral Tribunal and the impact of the spending in publicity from the Tribunal to improve the participation in the elections.
  5. Datasketch will review the contracts of public servants in Colombia. By inferring their gender from their name, they will analyze gender salary gaps.
  6. ACCESA will gather the community to highlight the advancements of the agenda on Open Contracting in Costa Rica. They will share the Results for Costa Rica of the   Transparent Public Procurement Rating (TPPR), the guide for Open Data in Public Procurements and the new OGP commitment related to the implementation of OCDS.
  7. Connected Development will bring together data enthusiast with social workers, journalists, government officials, community-based organization (CBO), activists and youth, and share skills with them around using data to enhance their work.
  8. School of Data in Guatemala will create a space to learn skills about how to use public contracts data, guide participants through the data of the State’s Law for Public Procurement, in order to understand on how contracts are carried out in practice.
  9. SocialTIC will showcase and use public open data with a special focus on new data releases from Mexico City’s new administration and national public expenses.
  10. School of Data in Bolivia will introduce people to the basics of open data and open contracting, including talks to learn about cash flows and topics related to the use of public funds through the lenses of gender equality.


Together with all the funders [The Open Contracting Program of Hivos, Mapbox, Frictionless Data for Reproducible Research, the Foreign and Commonwealth Office of the United Kingdom,  the Latin American Initiative for Open Data (ILDA) and the Open Contracting Partnership ] for this years’ ODD mini-grants, Open Knowledge International wants to thank the community for all your applications. We encourage you all you register your event on the Open Data Day website.

To those who were not successful on this occasion, we encourage you to participate next time the scheme is available. To the grantees, we say congratulations and we look forward to working with you and sharing your successful event with the community!


Jennifer Pringle: Evergreen Contributor of the Month / Evergreen ILS

The Evergreen Outreach Committee is pleased to announce that February’s Contributor of the Month is Jennifer Pringle.  Jennifer works for BC Libraries Cooperative (The Co-op) as Support Helpdesk & Trainer. The Co-op supports the Sitka consortium of public, post-secondary, K-12, and special libraries across three Canadian provinces.

Jennifer’s Evergreen involvement began in 2008 when she worked at Whistler Public Library, which became the 5th Sitka public library to migrate to Evergreen and one of the earliest public libraries using Evergreen outside of Georgia PINES.  Jennifer later moved to the Co-op where she leveraged her frontline experience with Evergreen to support and train consortium members.

“My day to day is answering questions,” Jennifer says. “A lot of what I contribute to the community is from issues submitted by our libraries — if we can’t fix it, I go to Launchpad.”  

Jennifer is very active in Launchpad, having filed 93 Launchpad bugs and commented on 134 bugs. She has also participated in several Evergreen Community Bug Squashing events.  In addition, Jennifer has provided 27 testing signoffs for Evergreen code, and has contributed 5 documentation commits to the community.

Acquisitions has long been a focal point of Jennifer’s community work.  Her consortium was one of the first adopters of Evergreen Acquisitions. Over the years, Jennifer has been involved with the Acquisitions Interest Group, including drawing wider community attention to Acquisitions bugs and filing workflow improvement requests.

More recently, Jennifer has become involved with the Student Success Working Group, where she encourages the Co-op’s postsecondary libraries to provide feedback.  She has also been focusing on testing related to the new Web Staff Client. “Switching to the Web Client has been a big change, but a good change,” she states.  Jennifer and her coworkers put Evergreen releases through months of testing before an upgrade, allowing them to uncover issues that may be missed by others.

Jennifer encourages new community members to be brave.  “It may be scary to ask that first question, but the community isn’t scary,” she advises.  “No one person is working in isolation in the community. Our community is just so interconnected, which I think is fantastic.”

Do you know someone in the community who deserves a bit of extra recognition?  Please use this form to submit your nominations.  We ask for your email in case we have any questions, but all nominations will be kept confidential.

Any questions can be directed to Andrea Buntz Neiman via or abneiman in IRC.

Custom-built keyboard / Jez Cope


I'm typing this post on a keyboard I made myself, and I'm rather excited about it!

Why make my own keyboard?

  1. I wanted to learn a little bit about practical electronics, and I like to learn by doing
  2. I wanted to have the feeling of making something useful with my own hands
  3. I actually need a small, keyboard with good-quality switches now that I travel a fair bit for work and this lets me completely customise it to my needs
  4. Just because!

While it is possible to make a keyboard completely from scratch, it makes much more sense to put together some premade parts. The parts you need are:

  • PCB (printed circuit board): the backbone of the keyboard, to which all the other electrical components attach, this defines the possible physical locations for each key
  • Switches: one for each key to complete a circuit whenever you press it
  • Keycaps: switches are pretty ugly and pretty uncomfortable to press, so each one gets a cap; these are what you probably think of as the "keys" on your keyboard and come in almost limitless variety of designs (within the obvious size limitation) and are the easiest bit of personalisation
  • Controller: the clever bit, which detects open and closed switches on the PCB and tells your computer what keys you pressed via a USB cable
  • Firmware: the program that runs on the controller starts off as source code like any other program, and altering this can make the keyboard behave in loads of different ways, from different layouts to multiple layers accessed by holding a particular key, to macros and even emulating a mouse!

In my case, I've gone for the following:

Laplace from, a very compact 47-key ("40%") board, with no number pad, function keys or number row, but a lot of flexibility for key placement on the bottom row. One of my key design goals was small size so I can just pop it in my bag and have on my lap on the train.
Elite-C, designed specifically for keyboard builds to be physically compatible with the cheaper Pro Micro, with a more-robust USB port (the Pro Micro's has a tendency to snap off), and made easier to program with a built-in reset button and better bootloader.
Gateron Black: Gateron is one of a number of manufacturers of mechanical switches compatible with the popular Cherry range. The black switch is linear (no click or bump at the activation point) and slightly heavier sprung than the more common red. Cherry also make a black switch but the Gateron version is slightly lighter and having tested a few I found them smoother too. My key goal here wa to reduce noise, as the stronger spring will help me type accurately without hitting the bottom of the keystroke with an audible sound.
Blank grey PBT in DSA profile: this keyboard layout has a lot of non-standard sized keys, so blank keycaps meant that I wouldn't be putting lots of keys out of their usual position; they're also relatively cheap, fairly classy IMHO and a good placeholder until I end up getting some really cool caps on a group buy or something; oh, and it minimises the chance of someone else trying the keyboard and getting freaked out by the layout...
QMK (Quantum Mechanical Keyboard), with a work-in-progress layout, based on Dvorak. QMK has a lot of features and allows you to fully program each and every key, with multiple layers accessed through several different routes. Because there are so few keys on this board, I'll need to make good use of layers to make all the keys on a usual keyboard available. Dvorak Simplified Keyboard

I'm grateful to the folks of the Leeds Hack Space, especially Nav & Mark who patiently coached me in various soldering techniques and good practice, but also everyone else who were so friendly and welcoming and interested in my project.

I'm really pleased with the result, which is small, light and fully customisable. Playing with QMK firmware features will keep me occupied for quite a while! This isn't the end though, as I'll need a case to keep the dust out. I'm hoping to be able to 3D print this or mill it from wood with a CNC mill, for which I'll need to head back to the Hack Space!

UC San Diego Library Receives Mellon Grant for Joint Project / DuraSpace News

To address improving communication and exchange of data between local and national digital preservation repositories UC Dan Diego has been awarded a one-year Mellon grant to collaborate with UC Santa Barbara, Emory University, Northwestern University and DuraSpace to design tools that will enable libraries and archives to seamlessly deposit content into distributed digital preservation systems (DDPs), update that content over time and reliably restore content if needed. Read  the news story here.

The post UC San Diego Library Receives Mellon Grant for Joint Project appeared first on

งานขาย เหมาะกับคนที่มีนิสัยอย่างไร / Fiona Bradley

งานขายมีอยู่ด้วยกันหลายรูปแบบ เช่น พนักงานที่ทำหน้าที่ขายสินค้า sale marketing เป็นต้น จะมีหน้าที่หลักของตำแหน่งนี้ก็คือการขายสินค้าและบริการของบริษัทให้ได้ตามเป้าหมายที่กำหนดไว้ งานนี้จะว่าง่ายก็ง่าย จะว่ายากก็ยาก ที่กล่าวเช่นนั้นก็เพราะว่า ตำแหน่งงานแบบนี้ขึ้นอยู่กับคนทำด้วยว่ามีความเหมาะสมกับงานมากแค่ไหน ซึ่งลักษณะและคุณสมบัติที่เหมาะกับงานนี้ต้องมีลักษณะดังนี้

1.ใจเย็น งานขายเปรียบเสมือนตัวแทนของบริษัทที่จะต้องไปพบกับลูกค้า ซึ่งลูกค้าที่เข้ามาติดต่อมีหลายรูปแบบ ทั้งที่เรื่องมาก คุยไม่รู้เรื่อง เอาแต่ใจหรือแบบที่คุยง่าย เข้าใจง่าย หากเจอลูกค้าที่คุยง่าย เข้าใจง่าย การติดต่อประสานงานก็ง่ายแต่หากเจอลูกค้าที่จู้จี้ เรื่องมาก คุยเท่าไหร่ก็ไม่ยอมฟัง ในฐานะพนักงานที่ทำหน้าที่ขายสินค้าและเป็นตัวแทนของบริษัทในการติดต่อขายสินค้าจะต้องใจเย็นพูดด้วยถ้อยคำที่สุภาพ ห้ามตอบโต้ลูกค้าด้วยวาจาที่หยาบคาย ไม่เสียงดัง ตะโกนหรือตะคอกใส่ลูกค้า เพราะจะทำให้เสียลูกค้าได้

2.มีไหวพริบ หา งาน sale เราต้องนำเสนอสินค้าให้กับลูกค้าและสินค้าของเราจะต้องตอบโจทย์ความต้องการของลูกค้าให้ได้ สินค้าบางอย่างสามารถใช้งานได้หลายรูปแบบ เมื่อลูกค้าถามมาต้องตอบโต้ลูกค้าได้ว่า ลูกค้าแต่ละคนจะมีปัญหาที่ต่างกันเราจะใช้ไหวพริบในการเชื่อมโยงให้ได้ว่าสินค้าของเราดีต่อลูกค้าอย่างไร ลูกค้าถึงจะซื้อสินค้าของเรา จึงต้องใช้ไหวพริบในการพูดกับลูกค้าสูงมาก เพื่อเป็นการเพิ่มยอดขายให้สูงขึ้น

3.ซื่อตรง การขายสินค้าและบริการห้ามพูดอวดอ้างเกินจริงว่าสินค้าของเราดีเสียทุกอย่าง ลูกค้าเป็นคนที่มีความคิด เค้าคิดได้ว่าสิ่งไหนเป็นอย่างไร การพูดอวดอ้างสินค้าต้องเป็นความจริง สินค้ามีข้อดีข้อเสียอย่างไร

4.รับความกดดันได้ งานขายผลงานอยู่ที่ยอดขาย หากยอดขายไม่เป็นไปตามที่บริษัทกำหนดย่อมมีแรงกดดันเกิดขึ้น ดังนั้นหากต้องการทำงานประเภทนี้ต้องยอมรับกับแรงกดดันให้ได้

5.พยามยามให้มาก การจะขายสินค้าและบริการ เราไม่สามารถบังคับให้ผู้ซื้อมาซื้อเราได้ เราต้องใช้ความพยายามในการนำเสนอสินค้าให้ตรงกับความต้องการของลูกค้าให้ได้ หากลูกค้าคนนี้ไม่ซื้อเราก็ต้องหาลูกค้ารายใหม่ไปเรื่อย ต้องมีลูกค้าที่สนใจและซื้อสินค้าแน่นอน เรียกว่างานขายเป็นงานที่ต้องใช้ความพยายามมากงานหนึ่ง

นี่เป็นคุณลักษณะสำหรับคนที่เหมาะกับงานขายหากคุณมีลักษณะดังกล่าวแล้ว งานขายถือว่าเป็นงานที่เหมาะกับคุณ และคุณจะมีรายได้จากงานขายเป็นจำนวนมาก

The post งานขาย เหมาะกับคนที่มีนิสัยอย่างไร appeared first on ข่าวประกาศจัดหางาน เพื่อโอกาสที่ดีกว่า.

DLF Director Bethany Nowviskie Leaves CLIR, Joins JMU / Digital Library Federation

Digital Library Federation Director Bethany Nowviskie is leaving CLIR to serve as Dean of Libraries at James Madison University starting July 1, 2019. She will also join the tenured faculty of JMU’s Department of English. Over the coming months, Nowviskie will serve as senior advisor to DLF to ensure a smooth transition of her responsibilities, and has additionally been appointed a CLIR Distinguished Presidential Fellow.

“It has been a great honor to serve DLF’s passionate and dedicated community over the past four years,” Nowviskie said, “and to guide a period in which we expanded both our mission and the vital perspectives present in our membership.”

Under the auspices of its board of directors, CLIR will initiate a review of DLF to plan for the next iteration of the Federation, building on its strengths and assuring its continuity into the next decade. The review will involve the DLF advisory committee, CLIR Board, staff, and members of the DLF community. “As DLF approaches its 25th anniversary, we want to be sure that its programs and projects remain strong and strategically vital,” said CLIR Board Chairman Christopher Celenza.

Joanne Kossuth, founding director of 1Mountain Road consultancy, will lead the review that will conclude in early 2020. “1MountainRoad is excited to embark on this journey to the future with the DLF and CLIR communities,” said Kossuth. “The committed membership provides many opportunities for collaboration, expanded communities of practice, and innovation.”

Over the next few months, meetings will be held with DLF working groups and advisory boards to understand current activities, identify areas for growth, and explore ideas for future innovation. Meetings with staff and engagement in new planning activities for the Forum will focus on strengthening DLF as a CLIR program and as an international forum for collaboration and innovation. Working with CLIR leadership, strategic partnerships and participation will be reviewed, including the potential for a senior advisory committee on the review. At the conclusion of these meetings the feedback will be incorporated into planning for the next organizational steps and the 2020 Forum.

Since Nowviskie’s appointment to DLF in 2015, she has worked energetically to expand, diversify, and engage the DLF community, fostering connections to liberal arts colleges, museums, archives, historically black colleges and universities, and civic data groups. “The contributions of DLF will continue to be integral to its communities of practice and to the evolution of digital technologies as a public good,” said CLIR President Charles Henry. “As CLIR’s mission and responsibilities expand to a global scale, DLF’s portfolio will similarly broaden, seeking new opportunities attendant on an increasingly complex and interdependent world digital environment.”

More details of the review and its findings and recommendations will be published in the coming months.

The Council on Library and Information Resources is an independent, nonprofit organization that forges strategies to enhance research, teaching, and learning environments in collaboration with libraries, cultural institutions, and communities of higher learning. It is home to the Digital Library Federation, an international network of member institutions and a robust community of practice, advancing research, learning, social justice, and the public good through the creative design and wise application of digital library technologies.

The post DLF Director Bethany Nowviskie Leaves CLIR, Joins JMU appeared first on DLF.

The value of regional user Groups for a global community: Collaboration Update from Concytec and the Peruvian DSpace User Group / DuraSpace News

The Peruvian Council of Science, Technology and Technological Innovation (Concytec) has been collaborating closely with DuraSpace to advance and promote Open Access to scholarly publications in Peru.

After the successful completion of the first DSpace & DSpace-CRIS Workshop held in July 2018–probably the most attended DSpace workshop to date–and the incorporation of Concytec as a platinum member of DuraSpace in September, this joint effort has given birth to the Peruvian DSpace User Group, a vibrant community of research managers, librarians and IT staff from different institutions all around the country, working in support of scholarly communication. Following the membership, Concytec now is in the Leadership Group of DSpace, representing the needs, ideas and visions of the Peruvian community at the governance level.

Since the beginning of the National Network of Open Access Repositories in 2013, Concytec recommended DSpace as the software platform to support the now more than 160 institutional repositories integrated into the network. In order to have accurate records reflecting Peruvian users, Concytec lead the initiative that supported the update of the DuraSpace registry. Over 120 DSpace Peruvian installations were added and 43 existing ones were updated. To date, the registry reports a total of 171 instances of DSpace in Peru.

To promote the exchange of information and experiences in the community, the User Group has set a wiki page, a dedicated slack channel, a mailing list, and is already organizing both virtual and in-person meetings.

The introductory event of the community was a webinar held on December 4, 2018. More than one hundred participants attended a presentation given by Erin Tripp, Executive Director of DuraSpace, and César Olivares, Sub Director of Management of Information and Knowledge of Concytec, who shared with them the vision pursued by the group as well as the communication and collaboration channels. In the same event, the group attended a presentation from a renowned Peruvian librarian, Libio Huaroto from Universidad Peruana de Ciencias Aplicadas, who elaborated on the best practices in the use of persistent identifiers in institutional repositories.

On January 2, 2019, Joan Caparrós, from CSUC (Spain), was invited to present a second webinar of the group, about the technical details for the implementation of the handle service in DSpace. Webinar recordings and presentation slides for all events are available at the wiki page of the group.

In the meantime, a lot of chat and community support has been going on in the slack channel, which has become one of the most active ones in the Spanish language. Satisfaction surveys have been conducted and are being used to gather the needs of the community members and plan for the next activities and collaboration initiatives.

It is also noteworthy that the slack channel has enabled different collaboration opportunities with the fellow Brazilian DSpace User Group, with some users sharing in both communities.

On February 22, 2019 a third webinar is scheduled about the implementation of DSpace in cloud services, by Luis Maguiña, Technologies Coordination at the Pontificia Universidad Católica del Perú.  We welcome the community to attend this event. Pre-registration is required and is available here.  Please note if the session fills up a recording will be made available on the wiki page.

The post The value of regional user Groups for a global community: Collaboration Update from Concytec and the Peruvian DSpace User Group appeared first on

Care, Code, and Digital Libraries: Embracing Critical Practice in Digital Library Communities / In the Library, With the Lead Pipe

In Brief

In this article, the author explores the necessity of articulating an ethics of care in the design, governance, and future evolution of digital library software applications. Long held as the primary technological platforms to advance the most radical values of librarianship, the digital library landscape has become a re-enactment of local power dynamics that privilege wealth, whiteness, and masculinity at the expense of meaningful inclusive practice and care work. This, in turn, has the net result of self-perpetuating open access digital repositories as tools which only a handful of research institutions can fully engage with, and artificially narrows the digital cultural heritage landscape. By linking local narratives to organizational norms and underlining the importance of considering who does the work, and where they can do it, the author explores manifestations of care in practice and intentional design, and proposes a reframing of digital library management and governance to encourage greater participation and inclusion, along with “user-first” principles of governance.

By Kate Dohe


Digital programs in research libraries, such as institutional repositories and digital collections of unique special collections materials, are deep in their second or even third decade. The broad swath of products, technologies, projects, and professional practices that undergird individual efforts are mature, even as individual libraries are subject to economic stratification that impedes full engagement with those technologies and practices (Dohe 2018). A variety of specializations within the digital library practitioner community continue to emerge each year–digital scholarship, digital curation, digital publishing, or digital strategies to name a few–and it is rare to find an academic library in the Association of Research Libraries (ARL) that does not include digital initiatives in its strategic plan or mission. Clearly, the profession places a great deal of value on such efforts, and the most idealistic and ambitious mission statements emphasize the power of digital libraries to bridge cultural and geopolitical divides (“ICDL Mission” n.d.), share “the record of human knowledge” (“HathiTrust Mission and Goals” n.d.), or facilitate global access to scholarly product (“Mission and Vision, Texas Digital Library” n.d.).  Digital projects have long been hailed as the ethical or even radical solution to our crises of the hour, whether those crises are journal pricing, original publishing, scientific reproducibility, research data management, or textbook affordability.

Yet here we are, twenty years later, and none of those crises have been solved. We built our digital repositories, invested our time and infrastructure, and struggle to reach users (Salo 2008). The contemporary digital library product landscape is currently reduced to commercial options owned by the same content owners and vendors (Schonfeld 2017b) that exuberantly pillage our collections budgets every year (MIT Libraries n.d.), and a handful of open source options with similar governance structures and substantial community dominance by a smattering of wealthy, historically white (Hathcock 2015) ARL member institutions. Digital library initiatives across the U.S. are reckoning with very real questions of financial or legal sustainability, while the doors to participation remain firmly closed to broad swaths of the higher education landscape. Even as a significant amount of the profession emphasizes the importance of digital projects work, the cloistered technical community that contributed to this state of affairs is poorly understood by many librarians outside “Digital Etc.” specialists. The end result is elite institutions making products for other elite institutions, and every year the technical and economic barriers to entry grow higher.

How did we get so far from the truly radical roots of digital libraries, when the Budapest Open Access Initiative urged libraries, governments, and scholars to “unit[e] humanity in a common intellectual conversation and quest for knowledge” (Chan et al. 2002)? Why are our technical products failing our users? How is so much talent and investment (Arlitsch and Grant 2018) producing such mediocre results ? More importantly, how do we re-invigorate our own open source projects and fulfill the ultimate missions of digital libraries? How can we create truly participatory digital library project communities? Familiar wolves are at the door, slyly promising vertical product integration and improved discovery as they buy commercial digital projects platforms left and right (Schonfeld 2017a). This isn’t ground we can afford to cede back to the same commercial interests that have put libraries on the ropes financially for decades.  

After an illuminating discussion over breakfast with a male colleague in my library’s technology division, I began to interrogate the ways this problem is a byproduct of social reproduction at our local institutions. He and I shared responsibility for digital library initiatives as peer department heads in our ARL library’s IT division. My librarians1 managed our digital collections, stakeholders, and users, while his developers were responsible for technical implementation of our portfolio of digital library applications, including code contributions to international projects. We had developed a mutually trusting work partnership and collegial friendship by the time we sat down at a diner on the second day of the Code4Lib national conference.  Our seemingly innocuous conversation over coffee prompted me to reflect upon the full weight of gendered assumptions regarding the divide between our positions and the value of our respective labor, and underscored the ways these assumptions between individuals ripple through communities. While this conversation occurred between two colleagues at one research library, we also occupy roles and operate within social systems reproduced throughout the profession that dictate and shape the nature of our work relationship. Open source digital library communities are largely driven by the priorities of technical staff like us at elite research libraries like ours, who frequently exist in a siloed, overwhelmingly white, predominantly cis-male micro-culture within their home libraries (Askey and Askey 2017), creating a masculinized environment that outsiders often negotiate through participation, emulation, or willful ignorance (Brandon, Ladenson, and Sattler 2018). The inherently gendered tensions between predominantly male IT groups and a feminized library workforce inevitably permeate the communities and applications imbued with our professional values. Radical change to community projects requires a codified framework for equitable, just, and caring interpersonal communication that begins at the local level.

Whose Community Projects?

Any given library’s digital collections, institutional repository, and digital scholarship projects are typically powered by a variety of software applications, components, and services, rather than a single monolithic “digital library” application capable of serving up all types of content and data effectively. Some of these services come packaged from commercial vendors, like Worldcat’s ContentDM or Elsevier’s recently acquired Digital Commons, and the relationship between the software provider and customer library is similar to that of any other software or content package. Many more digital library technologies (including some of those implemented in commercial products) are community-supported open-source projects. Some of the most prominent examples include digital repository applications Fedora and DSpace, digital collections interface tools Samvera and Islandora, content viewer frameworks like the International Image Interoperability Framework (IIIF), content creation tools like Omeka and Open Journal Systems, and discovery services based on Blacklight. This is far from a complete picture of the digital library project landscape, but serves to highlight the complex nature of implementing and maintaining an open source digital library program.

It is fitting that collections and content intended to reach the global citizenry should be available with open source software applications. Moreover, many of these applications are created, customized, and maintained by staff at the research and cultural heritage institutions that also steward the content. These are among the few products that we make, that are most directly for us, our content, and our users. This should represent a shift in power dynamics from vended solutions that is nearly as significant as the shift to open access to information. To borrow an analogy from Safiya Noble’s dissertation “Searching for Black Girls: Old Traditions in New Media” (Noble 2012), open source digital library technologies are comparable to solar panels that “facilitate independent, democratic participation by citizens, and [show] that design impacts social relations at economic and political levels” in opposition to controlled and closed systems peddled as a “galaxy of knowledge” (Appleton 2019) even as they proclaim their openness and transparency.

Community–and consequently, community membership–is critical to understanding these open source digital library projects. As open-source applications, anyone may download, install, and run digital library applications, though the technical skills to effectively customize and maintain these applications are non-trivial and often out of reach for anyone but professional software developers. This technical overhead can be an exceptionally high barrier to clear for participation in the community. As an example, the Samvera community and toolkit requires adopters to make a staggering amount of critical and frequently binding technical decisions before even getting started; production-level adoption of the latest version of Fedora is constrained to fewer than two dozen institutions worldwide at the time of this writing (“‘Fedora 4 Deployments – Fedora Repository,’  2018); DSpace has a robust adoption base but until the still-forthcoming release of DSpace 7, two mutually exclusive user interfaces. Upgrading applications over time–a necessity for a professional digital library program that promises permanence and preservation as a core service–also proves to be a fraught, labor-intensive effort, as seen in the slow adoption of Fedora 4 (“Designing a Migration Path – Fedora Repository – DuraSpace Wiki” n.d.), or widely reported problems upgrading to OJS 3.

Governance structures of many of these projects tend to overlap in both structure, reward systems, and membership. Institutions often have two avenues for participation in the governance and decision-making of these products—pay membership fees to secure a seat in leadership, and/or employ software developers who are talented enough to contribute code back to the application’s core source code. Skilled developers with a high degree of institutional support may become official “committers,” which is often a meritorious individual achievement on par with elected professional national service, and the committers themselves have a strong say in product development and roadmaps. Because this labor is extremely technical, administrative representation in steering committees or product leadership are often themselves technical department heads, division managers, or ADs/AULs. Institutions with the resources to participate in these application communities at this level are often further privileged with grant funding opportunities to develop new tools or applications within this digital content ecosystem, and thus reify their status as community leaders. Avenues for participation outside the programmer/management dyad within these open source product communities can be largely limited to programmer support roles like documentation, request management and release testing (as is the case with the DSpace Community Advisory Team), or specialist interest groups with no codified governance power, as is the case with the proliferation of groups in the Samvera (“Samvera IG/WG Framework – Samvera.,” n.d.) community.

Largely absent in these communities are liaisons, curators, or actual end users, and consequently there is a fundamental disconnect between developers of these applications and the front line users who must navigate, curate, and use the contents of such systems. Many of the design discussions I have been privy to in local and organizational settings privilege the discussion of objects and data over people—the pursuit of a more perfect object model without centering and clearly articulating the user’s needs. Hand-waving at “more discoverable” is often unexamined without clearly arriving at discoverable by who, for what purposes, and how we know that. The net result of this insular community development is that programmers and the people who supervise them at wealthy and historically white American institutions are making considerable product and implementation decisions about the most potent tools in our arsenal to resist neoliberalism. Excluding those who possess insight into the social, political, and experiential impacts of technology from the messy discursive process of making it undermines the value of a collaborative professional tradition, and protects institutional white supremacy and all its trappings of valorized productivity. “The master’s tools will never dismantle the master’s house” (Lorde 1984).

The economic and racial stratification resulting from this community insularity is counter to the self-proclaimed social justice spirit of early digital initiatives that emphasized their commitment to the public good of global open access to information, and open source technology to serve it. Access barriers for users cannot be lowered when the technological barriers for a diverse member community are simultaneously raised. No HBCUs are listed as Fedora adopters of any version outside consortial support.  Only four community colleges have deployed and registered their own DSpace repositories (and two of the listed, registered repositories are defunct at the time of this writing) even as the Community College Consortium for Open Educational Resources (“Community College Consortium for Open Educational Resources.” n.d.) and similar initiatives emphasize the vital importance of access to locally-developed open educational resources for the (frequently non-white, non-traditional, poorer) student populations at community colleges.  Accessibility, particularly compliance with WCAG 2.1 AA standards required at a growing number of institutions in response to lawsuits (Carlson n.d.), continues to be elided with responses that individual members of “the community” need to identify and resolve such fundamental issues on their own (“DSpace 6 and WCAG Website Accessibility.” n.d.).

While repository registries are voluntary and therefore inherently incomplete, they do paint a picture of self-identified organizations more likely to engage actively in governance and community initiatives for those products. The explanations for this delta between applications that serve clear needs and a potential user base are often presented as self-evident—these institutions “lack the resources,” which is typically a euphemism for “can’t afford a full time developer and systems team inside a library” (Hamill 2015). Often in the same breath, applications like DSpace or the aging ePrints application are presented as “turnkey” solutions in literature (Maynard et al. 2013) and documentation (“What Is DSpace? – DSpace KnowledgeBase” 2011). Professional hosting for open source repository software is a comparably new phenomenon when one considers the lengthy history of such projects, many of which are designed as “single tenant” applications that can be difficult to scale for multiple institutions, or even multiple projects. Furthermore, while the software is open source, there are obvious risks regarding public service sustainability any time a vendor comes into the picture. Extensible repository systems like Fedora are of abstract utility outside a very limited community, and the talent to configure and manage those applications comes dear. Positions go unfilled, and issues can only be solved by a handful of developers at a few institutions.

Organizations without resources to participate in the open source communities may select vendor solutions for digital projects, which often prove to be less costly than the required FTE and skill set for supporting a major open source digital library initiative. It is no coincidence that major companies like Wiley and Elsevier are buying products like Atypon and Digital Commons respectively (Schonfeld 2017b), to integrate with the scholarly enterprise software suites each company is building and marketing to provosts, directors of research, and university presidents. In this landscape, open access and cultural heritage content ceases to be an ethical imperative and instead becomes a lucrative revenue stream for organizations that have nakedly demonstrated their opposition to the free and open exchange of information over decades of doing business with libraries.

The disadvantages of this state of affairs in the open source digital projects community are several-fold. Open source tools aren’t designed to be adopted by the communities they could theoretically best serve, as users and content creators. This in turn artificially narrows the cultural heritage landscape as digital content is never shared or has no long-term stewardship. The communities that do adopt these applications are frequently so small that only a few people are equipped to share expertise with each other (which mitigates the advantages of having a community of practitioners). Ultimately, the products become worse over time and present market opportunities for the same commercial interests that are hollowing out the mission of academia (Seale 2013, Bourg 2014, Mountz et al. 2015), and put the entire open access and digital scholarship enterprise at risk.

Local Cultures

Examining “…the people building these systems and the environments in which the software is produced, as part of the software’s ecology” (Sadler and Bourg 2015) is essential to understanding how digital library applications evolved in the manner they have. Open source digital library architecture is not built by Silicon Valley techbros gleefully commodifying the labor of women and people of color (Hoffmann and Bloom 2016). It is built by our colleagues and friends; people we interact with every day on listservs, on calls, in Slack channels, and in the halls. Many developers and technical staff chose comparably lower-paying positions in higher education, and libraries in particular, because they value the library’s mission and workplace, and care about work-life balance–a far cry from “brogrammer” culture (Crum et al. 2015). We are on the same side, and value the same things. Yet library IT culture is still a place apart within libraries, often very literally (Askey and Askey 2017)–a place with its own language, norms, rhythms, and priorities.

Libraries have long been understood as feminized workplaces, with (largely white) female librarians and non-technical support staff, and a higher proportion of (largely white) male managers (Schlesselman-Tarango 2016). Library IT, particularly in academic libraries, is often the opposite. Women occupy a minority of positions and are less likely to take supervisory positions, and are less likely to be compensated comparably with male supervisors with equivalent experience and expertise (Lamont 2009). The work environments are rarely as openly hostile or sexist in the vein of Silicon Valley, but entire books (Brandon, Ladenson, and Sattler 2018) are dedicated to women who must navigate alienation, imposter syndrome, overt sexism, and unconscious bias throughout their careers in library IT. Gender is a vital dimension to understanding technological influence within libraries; as Roma Harris’s influential article on the topic states: “Given the strong cultural and ideological associations between masculinity and technology in Western society, it is impossible to consider the social shaping of technology in librarianship without taking into account the gendered nature of library work, particularly since studies of technological change in other sectors of the labor force reveal that the work of women and men is generally segregated, in part along lines structured by their association with or their use of particular technologies” (Harris 1999).

Institutions with the resources to hire “Digital Etc. Librarians” often rely on these positions to “bridge the gap” between librarians and library IT, or “collaborate” through internal marketing and external proselytizing about the merits of a system designed largely by technical staff. These librarians often end up in service provision roles to ameliorate systemic usability flaws (mediated institutional repository submission workflows are a prime example of this). This in turn limits the opportunities for these librarians to collect and advance user needs or participate in the creation of better systems and projects. The work of a Digital Etc. Librarian bears all the signifiers of carework typified by the broader profession of librarianship and explored at length in “on capacity and care” (Nowviskie 2015), “Library Technologies and the Ethics of Care” (Henry 2016), and others. It is also frequently composed of bullshit task completion (Schmidt 2018) generated by questionable user interface decisions in software applications. Furthermore, occupants of this role often feel most immediately the tensions between the patriarchal and technocratic “future of the library” and feminized care work explored in Mirza & Seale’s “Dudes Code, Ladies Coordinate” presentation at DLF 2017 (Mirza and Seale 2017b) as well as their “Who Killed The World?” chapter (Mirza and Seale 2017a) in Gina Schlesselman-Tarango’s Topographies of Whiteness. In both works, the authors examine the ways in which the technocratic libraries of “the future” (and present) elevate technological production at the expense of care work required to support the end users of those products. This valorization of final product over emotional process positions Digital Etc. Librarians as handmaidens to a vision of libraries that poorly emulates the commercial IT industry.

Moreover, as digital initiatives, maker spaces, and technology initiatives for libraries occupy a progressively more prominent place in the strategic objectives of a given library, this isolated microculture is increasingly pushed forward as “the future of libraries” (Mirza and Seale 2017a) at the expense of feminized labor and values of librarianship. This explicit valuing of technological solutionism by local institutions is then echoed in the committees and organizations responsible for maintaining and governing open access digital library projects. Just as technology is a reflection of the human values of its creators (Noble 2012, Winner 1986), the governance structures of digital library projects are a product of the values of the most influential adopters of these technologies, with explicit and nearly exclusionary value placed on functional code and technological work as an “in kind” contribution to those projects, as seen with Fedora (“Fedora Leadership Group In-Kind Guidelines” 2018) and Islandora (“Islandora and Fedora 4” 2014) as notable examples. These are the only products that “count” (Mountz et al. 2015) in this corner of the academic community. This, in turn, is underpinned by interpersonal dynamics within organizations, and the net result is that some of our worst biases manifest in the products we make.

“Just like all politics is local, all culture is local,” Dr. Chris Bourg stated in her Code4Lib 2018 keynote speech in Washington, DC (Bourg 2018). Aimed squarely and unapologetically at the ways white men can use their de facto positions of power and group belonging within library IT departments to create—or hinder—inclusive environments, the keynote combined evidence and sociological theory with blunt instructions for white cis men in library IT to be better. Vouch for colleagues. Make space. Reduce stereotypical and exclusionary cultural markers. Be cognizant of the bleed between social and professional.  Definitely don’t get beers with the fellas and talk about the womenfolk.2

My own professional background echoes many of the findings and narratives of workplace studies and examinations of library IT culture, including those described in Dr. Bourg’s keynote. I am a white female Digital Etc. Librarian by trade, accustomed to being described by others in terms of my interpersonal skills and characteristics, with my technical chops left as a vague afterthought. I currently supervise only white and male faculty librarians in my library’s IT division, and I worked with nearly exclusively white and male developers throughout my decade or so in the profession at ARL institutions and private companies–places with money. My current library is similar to the physically isolated IT spaces Askey and Askey describe, with our generally male technology division housed in a maze-like basement behind swipe card access points, and a highly collegial environment that relies heavily on technical knowledge and project-driven work that often seems disconnected from “the upstairs.”  I’ve always been “the woman in the room,” and even sometimes “one of the fellas” (always with an asterisk by my name), ready with an invitation to Game Night or a deep dive on George R. R. Martin’s A Song of Ice and Fire. Alienation in both the male spaces of library IT and female-dominated librarian communities has shadowed me for much of my career. Dr. Bourg’s Code4Lib keynote rang true to my lived experience as a woman marginalized within a system I recognized I was complicit in perpetuating.

I was deeply surprised, then, when my colleague and breakfast companion asked for my impressions of the keynote the next day, and then confided3 to me that he became emotional and somewhat defensive during Dr. Bourg’s speech. He continued that maybe some men needed to be spoken to as she had, but he felt put off by what he perceived as stereotypes and assumptions about men in her presentation. I had long perceived this colleague as both cool-as-they-came and a reliably empathetic ally, so this admission unsettled me for its resemblance to white fragility (DiAngelo 2011) from someone I had never found susceptible to it, and who had long ago earned my respect for his introspection. If this genuinely well-intentioned male colleague’s perceptions were so far from my own, I thought, then how on earth can our library technology departments become more approachable and accessible?

Whither the Ethics of Care?

Ethics of Care has emerged in recent decades as a powerful, intentionally feminist ethical framework centering relationships and emotion in moral development, typically credited to Carol Gilligan for originating the theory (Webteam n.d.). With time and effort, the understanding and application of this ethic has evolved to encompass a broader array of intersectional (Eugene 1992, Graham 2007) professional (Noddings 1990), and political (Tronto 1993, Hankivsky 2014) implications for care work.4

Joan Tronto delineates four components of ethics of care (Tronto 2012)

  • Responsibility: Assuming a willingness to respond to a need within a relationship
  • Attentiveness: Observing a need within a relationship
  • Competence: Addressing a need effectively
  • Responsiveness: Empathy for the perspective of others

The explicit values of Tronto’s framework–equality, freedom from oppression, democracy–align handily with the mission of librarianship, and the relational, emotional, empathetic work of academic librarianship across disciplines are easily understood as care work. Tronto’s exploration of care work provides actionable criteria for characteristics of care. These criteria in turn make it possible to meaningfully assess the effectiveness of caring actions. In short, it helps us articulate the difference between simply telling colleagues and users “my door is always open” or “you can email me with any questions” and proactively working to reduce the psychological and interpersonal barriers that prevent people from taking those actions.

High performing librarians exercise strong empathetic skills to identify, respond, assume responsibility, and effectively seek solutions to reducing those barriers. In particular, Digital Etc. Librarians are often asked to do constant translation and code switching between programmers, curators, students, and faculty under the auspices of “bridging communication gaps,” yet frequently earn less than their programming counterparts, and have diminished influence in the direction and governance of digital library products. These librarians are doing the heavy lifting of emotional labor on behalf of the technical colleagues who are empowered to enact actual change in their repository communities, while they themselves call into yet another interest group to debate whether “Title” should be required or only recommended in a system workflow.

My breakfast companion and I were both aware (perhaps to varying degrees) that other staff in our library frequently approached me explicitly as this colleague’s “translator.” I often found myself assuming a disproportionate amount of emotional labor to explain technical concepts, or how decisions were made by our software developers, and occasionally provide encouragement and support for team members who felt apprehensive talking to my colleague. I had long understood this dynamic as problematic, particularly as a female peer manager in a technology division, but minimized my own feelings of frustration over it as “part of my job.” Furthermore, much of my career as a Digital Etc. Librarian had involved the same work, codified as position responsibilities–who was I to be annoyed when someone would privately take me aside and ask “ok, can you tell me what he meant by this? I couldn’t follow the technical explanation and I’m embarrassed to ask him about it.”

Care in Our Home Institutions

When care work is denigrated by our own research libraries, through both our employment practices and our local interpersonal behaviors, we create patterns of behavior and exclusion that manifest both locally and in our products. If we continue to privilege coding over care as if the two are fully disconnected, and hand the reins of what should be our most intentional and accessible applications to a homogenous cohort of well-intentioned but isolated decision makers who are removed from direct and constant care work for end users or colleagues, then we are complicit in the neoliberal hollowing of the academic library mission to use our resources for the public good. We produce software that serves the needs of technologists employed at rich white universities first, and everyone else as an afterthought. This is solvable and avoidable. Locally, we can embrace and elevate the care work done by the librarians whose fates and careers are increasingly bound up in the viability of digital library software.

Stepping back to that fundamental question from my breakfast with my colleague—if reflective and helpful white men who want to be allies are struggling to respond competently to calls for more inclusive, caring spaces, what can be done? Like too many women in this #metoo moment (though one with the privileges of whiteness, financial security, sound health, and more), I am tired. I am tired of patiently explaining, or pulling back the curtain on my own experiences. I am tired of answering men who ask “why didn’t you tell me?” when I believe a better question they could ask themselves is “what could I have done differently to help others be comfortable confiding in me?” Moreover, this telling and retelling of what men can do to be better allies and why they need to take action may help with attentiveness to the problem and even assuming responsibility, but does little for developing competence or responsiveness on its own. “Doing the thing is doing the thing,” as Amy Poehler put it in her memoir (Poehler 2018), and our profession needs more opportunities for those who would support marginalized communities to practice the thing.

For the previous three years, I had worked with another former colleague on an improv workshop specifically for librarians and technologists, which took into account the shifting landscape of librarianship in higher ed and gave our players an informal space to practice the essential skills of collaboration without the pressure of real expectations (Pappas and Dohe 2017). Many of our workshop’s objectives echoed Dr. Bourg’s recommendations, with a performative twist—make your partner look good. Be present. Practice listening with undivided attention. Commit to affirmation as a means to develop the best ideas. Avoid assumptions about common knowledge. Decenter yourself and focus on the needs of the ensemble. While the intent of the workshop was to foster collaboration across domains of distributed expertise, the same skillset applied to both allyship and effective care work, and represents a low-stakes learning environment to develop communication competence. These are concrete abilities that one must practice like coding, not fuzzy personality-driven soft skills that are difficult to assess or articulate. The pursuit of professional development opportunities and training on these skills should be taken as seriously as any request to attend a coding workshop, and just as we would expect a programmer to share a new tool or language with the team, we can and must expect the same from participants in a communication workshop.

Furthermore, the care work performed by librarian-technologists and Digital Etc. Librarians can be emphasized and recognized within library IT departments and divisions in a number of ways.

  • De-emphasize and decouple quantity of submissions (especially faculty submissions) in repositories as a metric of performance.
  • Elevate and make visible the user research that informs local product decisions as an essential part of application research and documentation.
  • Emphasize demonstrable methods of emotional work, not “collaborate with stakeholders” as a panacea in position descriptions.
  • Stop treating diversity exclusively as a pipeline problem and reward efforts to connect with and meet the needs of underrepresented communities.
  • For the love of capybaras, get in front of users before decision points have whistled by.

COAR, Care, and the Evolution of Digital Library Communities

At the time of this writing, digital library applications are at a pivotal juncture in their development and future evolution. High-profile crises in major projects, notably the closure of the Digital Preservation Network, and layoffs at the Digital Public Library of America, are focusing community attention on the governance of digital library projects and sustainability of membership-driven initiatives. Questions of in-kind labor contributions are likely to rise as local library budgets continue to shrink, but so long as these contributions are limited only to coding and development activities, prospective participants and supporters will continue to be artificially limited.

The Coalition for Open Access Repositories (COAR) has requested comments on their “Next Generation Repositories” proposal (“COAR Next Generation Repositories | Draft for Public Comment” 2018), and the proposal does specify at a number of points that inclusivity and user engagement are guiding principles for the document. However, the user stories provided highlight a number of self-perpetuating assumptions about the nature of a human user as a high-level researcher that one would typically find at a high-level research institution in a Western nation. Students, public users motivated by personal interest, disabled users, and exclusively mobile users are nowhere to be found in the design of the “Next Generation Repository,” leaving one to wonder if the Next Generation User is expected to evolve as well.  

search results on COAR for accessibility, showing: Nothing found for 'accessibility' Try a different search?
Search results for “Accessibility” on COAR Draft for Public Comment

Shifting practice within a community requires reconceptualizing the values of that community, and in this regard Black feminists, womanists, and care scholars are instructive. In “To Be of Use,” Toinette M. Eugene emphasized connections, caring, and personal accountability, rather than the “arbitrary and fragile” market model of community (Eugene 1992). This humanist and explicitly Afrocentric centering of community broadens its scope beyond coders and managers, and instead encompasses the communal ways of knowing and doing work in this space. The organizations that sustain and steward digital repository products have a number of opportunities to engage with and support an ethics of care in the design and governance of their applications. One easy win is to establish parity between the influence of committers and non-programmers. What if quality end-user documentation, or design work, or user survey design, or accessibility assessments were credited and elevated by the projects in the same explicit way code is? What if those contributions shaped the strategic direction of those applications and communities? What if community outreach were baked into the charges of working groups, to seek new opportunities for growth and inclusive design?Put in the parlance advocated by the collective authors of “For Slow Scholarship: A Feminist Politics of Resistance through Collective Action in the Neoliberal University,” what if we counted differently (Mountz et al. 2015)?

The Mukurtu project (Christen, Merrill, and Wynne 2017) and community is emerging as a leader in inclusive digital cultural heritage practices. While the project’s primary application does not fulfill many of the essential tasks required of a repository, the content management system does accommodate behavioral metadata, cultural signifiers, and the expression of permissions aligned carefully to its community of indigenous peoples. Moreover, these features were not identified and prioritized in a vacuum, nor was development work undertaken with the expectation that a community on the receiving end of centuries of violence and oppression would be eager to accept an existing repository platform. Instead, the project originated as a grassroots program driven by community needs, evolved in response to the shared requirements of historically marginalized communities, and centered collaboration and consultation as the guiding principles of development. Ultimately, Mukurtu demonstrates the potential of an application and community with an inclusive ethics of care embodied in the mission of the platform and its evolution.


Four years after Bess Sadler and Chris Bourg’s Code4Lib Journal article calling for explicitly feminist discovery products, and twenty years after Roma Harris shone light on the gendered power differentials in library technology change management, little has meaningfully changed with regard to the participants and governance structures of our digital repository ecosystem. In fact, newly emerged technologies such as IIIF continue to mimic governance structures of other technical products, which in turn replicate the same imbalances in decision-making explored above.

What is now emerging is an unabashedly feminist and inclusive call to action as a critical mass of librarians interrogate the ecosystems of digital library participation and reproduction. The practitioners of emotional labor and care work continue to be de-emphasized in conversations about products with very real impacts on their users, their careers, and the health of a hugely important strategic initiative within libraries. Repositories and linked data platforms have the potential to be our most potent leveler of access and privilege, if we choose to embrace our responsibility and respond with intention. As Chris Bourg stated in her keynote at Code4Lib, this isn’t a pipeline problem, one that can be solved by just getting more “diverse humans” into the mix, as though it can be fixed with some magic combination of attributes. It’s an environmental problem that originates in our home institutions and the elevation of coding over collaboration, of objects over humans, and in-jokes over inclusion, and ultimately serves to starve our own digital repository applications.

Evolution of these communities without a rethinking of product governance may be slow. On a night during a conference when my “one of the fellas” asterisk was available to me, I spoke with a number of repository developers who proceeded to complain about the changes at the DLF Forum over the last few years, scoffing that “no one even puts code on the screen anymore.” As an individual who had co-taught improv at the DLF Forum as a means of strengthening collaboration between those who can teach, and those who can code, I found this to be a terribly myopic attitude. It came across to me as a distillation of the belief  that collaboration and soft skills and learning from users should be someone else’s skillset, or that there’s nothing to be gleaned from presentations that center the experiences of students, people of color, people with disabilities, public communities, and the complex, messy universe of invisible “end users” of our digital products if those presentations don’t also include an illegible (and inaccessible) screenshot of a JSON file.

DLF is where I saw “Dudes Code, Ladies Coordinate.” I attended that year’s Forum with my Code4Lib breakfast companion, and I remember at the time wishing that he had attended that particular session. I especially wished this a day later, when that same colleague forgot our prior plans to meet for lunch, and instead went out with repository developers from another institution to talk about emerging technical issues with strategic implications. I was not invited to that discussion, and instead I spent a few hours reflecting on how little I might be professionally or personally respected by the same people I needed to work with most closely. I understood his invitation to breakfast at Code4Lib, and the emotionally challenging conversation we shared, as a tacit effort to repair a fairly serious personal rent between us. I recognized it as one reciprocal act of care, in the bounds of one working relationship, at one ARL institution. One site of cultural change.


Huge thank yous to my reviewers Dr. Melissa Villa-Nicholas and Ian Beilin, and my publishing editor Kellee Warren at In the Library with the Lead Pipe, for your labor and thoughtfulness in helping to shape this piece. I’d like to thank a number of people for reading and engaging with the earliest versions of this article, especially Erin Pappas for encouraging me to seek publication, Joseph Koivisto and Vin Novara for extensive feedback, and Bria Parker, Joanne Archer, Rebecca Wack, Rachel Gammons, and Kelsey Corlett-Rivera for their suggestions and support throughout. And finally, I have to extend my gratitude to Ben Wallberg, whose generosity as a colleague, collaborator, and friend made much of this article possible.


Appleton, Gaby. “Guest Post: Supporting a Connected Galaxy of Knowledge.” The Scholarly Kitchen, January 28, 2019.

Arlitsch, Kenning, and Carl Grant. “Why So Many Repositories? Examining the Limitations and Possibilities of the Institutional Repositories Landscape.” Journal of Library Administration 58, no. 3 (March 2018): 264–81.

Askey, Dale, and Jennifer Askey. 2017. “One Library, Two Cultures.” In Feminists Among Us: Resistance and Advocacy in Library Leadership, edited by Shirley Lew and Baharak Yousefi. Library Juice Press.

Bourg, Chris. 2014. “The Neoliberal Library: Resistance Is Not Futile.” Feral Librarian (blog), January 16, 2014.

———. 2018. “For the Love of Baby Unicorns: My Code4Lib 2018 Keynote.” March 14, 2018.

Brandon, Jenny, Sharon Ladenson, and Kelly Sattler. 2018. We Can Do IT: Women in Library Information Technology.

Chan, Leslie, Darius Cuplinskas, Michael Eisen, Fred Friend, Yana Genova, Jean-Claude Guédon, Melissa Hagermann, et al. 2002. “Budapest Open Access Initiative.” Budapest Open Access Initiative. February 14, 2002.

Carlson, Laura L. n.d. “Higher Ed Accessibility Lawsuits, Complaints, and Settlements.” Information Technology Systems and Services, University of Minnesota Duluth. Accessed January 12, 2019.

Christen, Kimberly, Alex Merrill, and and Michael Wynne. 2017. “A Community of Relations: Mukurtu Hubs and Spokes.” D-Lib Magazine 23 (5/6).

“COAR Next Generation Repositories | Draft for Public Comment.” 2018. Internet archive. March 24, 2018.

“Code4Lib Community Statement in Support of Chris Bourg.” 2018. C4l18-Keynote-Statement. March 19, 2018.

“Community College Consortium for Open Educational Resources.” n.d. Accessed July 2, 2018.

Crum, Janet, Aaron Dobbs, William Helman, and and Kelly Sattler. 2015. “More than Money: Recruiting and Retaining Library IT Staff.” presented at the LITA Forum, November 14.

“Designing a Migration Path – Fedora Repository – DuraSpace Wiki.” n.d. Accessed January 6, 2019.

Dohe, Kate. 2018. “Linked Data, Unlinked Communities.” Lady Science (blog), “Libraries and Tech” Series.

“DSpace 6 and WCAG Website Accessibility.” n.d. DSpace Community – DSpace 6 and WCAG Website Accessibility. Accessed July 3, 2018.

Eugene, Toinette M. 1992. “To Be of Use.” Journal of Feminist Studies in Religion 8(2): 138-147.

“‘Fedora 4 Deployments – Fedora Repository.’  Accessed.” 2018. Wiki. DuraSpace Wiki. June 11, 2018. 4 Deployments.

“Fedora Leadership Group In-Kind Guidelines.” 2018. Wiki. Fedora Repository – DuraSpace Wiki. January 31, 2018.

Graham, Mekada. “The Ethics of Care, Black Women and the Social Professions: Implications of a New Analysis.” Ethics & Social Welfare 1, no. 2 (May 2007): 194–206.

Hamill, Lois. 2015. “So You Want an Institutional Repository but Don’t Have….” presented at the Midwest Archives Conference, Lexington, KY, May 9.

Hankivsky, Olena. 2014. “Rethinking Care Ethics: On the Promise and Potential of an Intersectional Analysis.” American Political Science Review 108, no. 2 (May 2014): 252–64.

Hathcock, April. “White Librarianship in Blackface: Diversity Initiatives in LIS.” In the Library with the Lead Pipe, October 7, 2015.

Henry, Ray Laura. 2016. “Library Technologies and the Ethics of Care.” The Journal of Academic Librarianship 42 (3): 284–85.

Hoffmann, Anna Lauren, and Raina Bloom. 2016. “Digitizing Books, Obscuring Women’s Work: Google Books, Librarians, and Ideologies of Access.” Ada New Media (blog). May 1, 2016.

“ICDL Mission.” International Children’s Digital Library (ICDL). Accessed February 17, 2019.

“Islandora and Fedora 4.” 2014. Islandora Website. December 15, 2014.

Lamont, Melissa. 2009. “Gender, Technology, and Libraries.” Information Technology and Libraries 28 (3): 137.

Lorde, Audre. 1984. “The Master’s Tools Will Never Dismantle the Master’s House.” Sister Outsider: Essays and Speeches. Ed. Berkeley, CA: Crossing Press. 110-114. 2007.

Maynard, Aubrey, Laura Gentry, Adam Mosseri, Courtney Whitmore, Margaret Diaz, Camille Chidsey, and and Kelly Kietur. 2013. “‘Fedora Commons or DSpace: A Comparison for Institutional Digital Content Repositories.’” presented at the NDSA Annual Meeting.

Mirza, Rafia, and Maura Seale. 2017a. “Who Killed the World? White Masculinity and the Technocratic Library of the Future.” In Topographies of Whiteness: Mapping Whiteness in Library and Information Science, edited by Gina Schlesselman-Tarango. Library Juice Press.

———. 2017b. “Dudes Code, Ladies Coordinate: Gendered Labor in Digital Scholarship.” presented at the DLF Forum, Pittsburgh, PA, October 22.

“Mission and Goals, HathiTrust Digital Library.” HathiTrust Digital Library. Accessed February 17, 2019.

“Mission and Vision, Texas Digital Library.” Texas Digital Library (blog). Accessed February 17, 2019.

MIT Libraries. n.d. “Elsevier Fact Sheet.” Scholarly Publishing – MIT Libraries (blog). Accessed January 7, 2019.

Mountz, Alison, Anne Bonds, Becky Mansfield, Jenna Loyd, Jennifer Hyndman, Margaret Walton-Roberts, Ranu Basu, et al. “For Slow Scholarship: A Feminist Politics of Resistance through Collective Action in the Neoliberal University.” ACME: An International Journal for Critical Geographies 14, no. 4 (2015): 1235–59.

Noble, Safiya. 2012. “Searching for black girls: old traditions in new media.”

Noddings, Nel. “Feminist Critiques in the Professions.” Review of Research in Education 16 (1990): 393–424.

Nowviskie, Bethany. 2015. “On Capacity and Care.” Bethany Nowviskie (blog). October 4, 2015.

Pappas, Erin, and and Kate Dohe. 2017. “Lessons from the Field: What Improv Teaches Us About Collaboration.” Library Leadership & Management 32 (1).

Poehler, Amy. 2018. Yes Please. DEY STREET Books.

Salo, Dorothea. 2008. “Innkeeper at the Roach Motel.” Library Trends 57:2.

“Samvera IG/WG Framework – Samvera.” n.d. Wiki. DuraSpace Wiki.

Schlesselman-Tarango, Gina. “The Legacy of Lady Bountiful: White Women in the Library.” Library Trends 64, no. 4 (2016): 667–86.

Schmidt, Jane. 2018. “Innovate This ! Bullshit in Academic Libraries and What We Can Do about It.” Institutional Repository. RULA Digital Repository. May 29, 2018.

Schonfeld, Robert C. 2017a. “Cobbling Together the Pieces to Build a Workflow Business.” The Scholarly Kitchen, February 9, 2017.

———. 2017b. “Reflections on ‘Elsevier Acquires Bepress.’” Ithaka S+R (blog). August 7, 2017.

Seale, Maura. “The Neoliberal Library.” In Information Literacy and Social Justice: Radical Professional Praxis, edited by Lua Gregory and Shana Higgins, 39–61. Library Juice Press, 2013.

Tronto, Joan. Moral Boundaries: A Political Argument for an Ethic of Care. New York: Routledge, 1993.

———.. 2012. “Partiality Based on Relational Responsibilities: Another Approach to Global Ethics.” Ethics and Social Welfare 6 (3): 303–16.

Winner, Langdon. The Whale and the Reactor: A Search for Limits in an Age of High Technology. University of Chicago Press, 1986.

Webteam. n.d. “Carol Gilligan.” Ethics of Care. Accessed July 5, 2018.

“What Is DSpace? – DSpace KnowledgeBase.” 2011. Wiki. DuraSpace Wiki. December 2, 2011.


  1. There is much more to explore in the demographic composition of “technical librarians” in systems, digital curation, data management, and other positions that require stronger IT skills as a function of their position. Further, people in these positions who may be perceived as “outsiders” to the majority cohort anecdotally take on masculine qualities in an effort to either fit in or establish dominance, which is surfaced in several narratives included in We Can Do I.T. edited by Jenny Brandon, Sharon Ladenson, and Kelly Sattler
  2. Unsurprisingly, this keynote earned Dr. Bourg the vitriol of internet trolls who reduced these exhortations to “she’s saying girls can’t like Star Trek!” and decried the leftist takeover of libraries. The Code4Lib conference organizers and community issued a statement of support:
  3. This colleague has read earlier versions of this article, and has told me I may share this conversation as a part of the piece. We had multiple conversations about this article in which I asked him to affirm his consent and reflect on this conversation, and his feedback and changes have been helpful. However, in the development of the article, I frequently became anxious about what this would mean for him in particular and other colleagues more generally, which caused me to consider and reflect seriously on the ways in which I still elevate and prioritize white male feelings. All I can do is the work.
  4. While this article focuses on gendered dynamics within a specific community, it is also vitally important to consider the intersectional nature of racial, ableist, and economic systems that come to bear on care ethics within academic settings and the ways in which many people are excluded from the Digital Etc. practitioner community.

Open data governance and open governance: interplay or disconnect? / Open Knowledge Foundation

Authors: Ana Brandusescu, Carlos Iglesias, Danny Lämmerhirt, Stefaan Verhulst (in alphabetical order)

The presence of open data often gets listed as an essential requirement toward “open governance”. For instance, an open data strategy is reviewed as a key component of many action plans submitted to the Open Government Partnership. Yet little time is spent on assessing how open data itself is governed, or how it embraces open governance. For example, not much is known on whether the principles and practices that guide the opening up of government – such as transparency, accountability, user-centrism, ‘demand-driven’ design thinking – also guide decision-making on how to release open data.

At the same time, data governance has become more complex and open data decision-makers face heightened concerns with regards to privacy and data protection. The recent implementation of the EU’s General Data Protection Regulation (GDPR) has generated an increased awareness worldwide of the need to prevent and mitigate the risks of personal data disclosures, and that has also affected the open data community. Before opening up data, concerns of data breaches, the abuse of personal information, and the potential of malicious inference from publicly available data may have to be taken into account. In turn, questions of how to sustain existing open data programs, user-centrism, and publishing with purpose gain prominence.

To better understand the practices and challenges of open data governance, we have outlined a research agenda in an earlier blog post. Since then, and perhaps as a result, governance has emerged as an important topic for the open data community. The audience attending the 5th International Open Data Conference (IODC) in Buenos Aires deemed governance of open data to be the most important discussion topic. For instance, discussions around the Open Data Charter principles during and prior to the IODC acknowledged the role of an integrated governance approach to data handling, sharing, and publication. Some conclude that the open data movement has brought about better governance, skills, technologies of public information management which becomes an enormous long-term value for government. But what does open data governance look like?

Understanding open data governance

To expand our earlier exploration and broaden the community that considers open data governance, we convened a workshop at the Open Data Research Symposium 2018. Bringing together open data professionals, civil servants, and researchers, we focused on:

  • What is open data governance?
  • When can we speak of “good” open data governance, and
  • How can the research community help open data decision-makers toward “good” open data governance?

In this session, open data governance was defined as the interplay of rules, standards, tools, principles, processes and decisions that influence what government data is opened up, how and by whom. We then explored multiple layers that can influence open data governance.

In the following, we illustrate possible questions to start mapping the layers of open data governance. As they reflect the experiences of session participants, we see them as starting points for fresh ethnographic and descriptive research on the daily practices of open data governance in governments.

Figure: Schema of an open data governance model

The Management layer

Governments may decide about the release of data on various levels. Studying the management side of data governance could look at decision-making methods and devices. For instance, one might analyze how governments gauge public interest in their datasets – through data request mechanisms, user research, or participatory workshops? What routine procedures do governments put in place to interact with other governments and the public? For instance, how do governments design routine processes to open data requests? How are disputes over open data release settled? How do governments enable the public to address non-publication? One might also study cost-benefit calculations and similar methodologies to evaluate data, and how they inform governments what data counts as crucial and is expected to bring returns and societal benefits.

Understanding open data governance would also require to study the ways in which open data creation, cleaning, and publication are managed itself. Governments may choose to organise open data publication and maintenance in house, or seek collaborative approaches, otherwise known from data communities like OpenStreetMaps.

Another key component is funding and sustainability. Funding might influence management on multiple layers – from funding capacity building, to investing in staff innovations and alternative business models for government agencies that generate revenue from high value datasets. What do these budget and sustainability models look like? How are open data initiatives currently funded, under what terms, for how long, by whom and for what? And how do governments reconcile the publication of high value datasets with the need to provide income for public government bodies? These questions gain importance as governments move towards assessing and publishing high value datasets.

Open governance and management: To what extent is management guided by open governance? For instance, how participatory, transparent, and accountable are decision-making processes and devices? How do governments currently make space for more open governance in their management processes? Do governments practice more collaborative data management with communities, for example to maintain, update, verify government data?   

The Legal and Policy layer

The interplay between legal and policy frameworks: Open data policies operate among other legal and policy frameworks, which can complement, enable, or limit the scope of open data. New frameworks such as GDPR, but also existing right to information and freedom of expression frameworks prompt the question of how the legal environment influences the behaviour and daily decision-making around open data. To address such questions, one could study the discourse and interplay between open data policies as well as tangential policies like smart city or digitalisation policies.

Implementation of law and policies: Furthermore, how are open data frameworks designed to guide the implementation open data? How do they address governmental devolution? Open data governance needs to stretch across all government levels to unlock data from all government levels. What approaches are experimented with to coordinate the implementation of policies across jurisdictions and government branches? To what agencies do open data policies apply, and how do they enable or constrain choices around open data? What agencies define and move forward open data, and how does this influence adoption and sustainability of open data initiatives?

Open governance of law and policy: Besides studying the interaction of privacy protection, right to information, and open data policies, how could open data benefit from policies enabling open governance and civic participation? Do governments develop more integrated strategies for open governance and open data, and if so, what policies and legal mechanisms are in place? If so, how do these laws and policies enable other aspects of open data governance, including more participatory management, more substantive and legally supported citizen participation?  

The Technical and Standards layer

Governments may have different technical standards in place for data processing and publication, from producing data, to quality assurance processes. Some research has looked into the ways data standards for open data alter the way governments process information. Others have argued that the development of data standards is reference how governments envisage citizens, primarily catering to tech-literate audiences.

(Data) standards do not only represent, but intervene in the way governments work. Therefore, they could substantially alter the ways government publishes information. Understood this way, how do standards enable resilience against change, particularly when facing shifting political leadership?

On the other hand, most government data systems are not designed for open data. Too often, governments are struggling to transform huge volumes of government data into open data using manual methods. Legacy IT systems that have not been built to support open data create additional challenges to developing technical infrastructure, but there is no single global solution to data infrastructure. How could then governments transform their technical infrastructure to allow them to publish open data efficiently?

Open governance and the technical / standards layer: If standards can be understood as  bridge building devices, or tools for cooperation, how could open governance inform the creation of technical standards? Do governments experiment with open standards, and if so, what standards are developed, to what end, using what governance approach?

The Capacity layer

Staff innovations may play an important role in open data governance. What is the role of chief data officers in improving open data governance? Could the usual informal networks of open data curators within government and a few open data champions make open data success alone? What role do these innovations play in making decisions about open data and personal data protection? Could governments rely solely on senior government officials to execute open data strategies? Who else is involved in the decision-making around open data release? What are the incentives and disincentives for officials to increase data sharing? As one session participant mentioned: “I have never experienced that a civil servant got promoted for sharing data”. This begs the question if and how governments currently assess performance metrics that support opening up data. What other models could help reward data sharing and publication? In an environment of decreased public funding, are there opportunities for governments to integrate open data publication in existing engagement channels with the public?

Open governance and capacity: Open governance may require capacities in government, but could also contribute new capacities. This can apply to staff, but also resources such as time or infrastructure. How do governments provide and draw capacity from open governance approaches, and what could be learnt for other open data governance approaches?  


Next steps

With this map of data governance aspects as a starting point, we would like to conduct empirical research to explore how open data governance is practised. A growing body of ethnographic research suggests that tech innovations such as algorithmic decision-making, open data, or smart city initiatives are ‘multiples’ — meaning that they can be practiced in many ways by different people, arising in various contexts.

With such an understanding, we would like to develop empirical case studies to elicit how open data governance is practised. Our proposed research approach includes the following steps:

  • Universe mapping: Identifying public sector officials and civil servants involved in deciding how data gets managed, shared and published openly (this helps to get closer to the actual decision-makers, and to learn from them).
  • Describing how and on what basis (legal, organisational & bureaucratic, technological, financial, etc.) people make decisions on what gets published and why.
  • Observe and describe different approaches to do open data governance, looking at enabling and limiting factors of opening up data.
  • Describe gaps and areas of improvement with regards to open data governance, as well as best practices.

This may surface how open data governance becomes salient for governments, under what circumstances and why. If you are a government official, or civil servant working with (open) data, and would like to share your experiences, we would like to hear from you!  

Early Bird Registration for the 2019 Evergreen International Conference has been extended! / Evergreen ILS

Early Bird Registration for the 2019 Evergreen International Conference has been extended one week to February 22nd. Take advantage of the discounted registration rate of $220 while you can!

Here is the registration link:

General conference information can be found here:


Announcing the Frictionless Data Tool Fund / Open Knowledge Foundation

Apply for a mini-grant to build an open source tool for reproducible research using Frictionless Data tooling, specs, and code base

Today, Open Knowledge International is launching the Frictionless Data Tool Fund, a mini-grant scheme offering grants of $5,000 to support individuals or organisations in developing an open source tool for reproducible science or research built using the Frictionless Data specifications and software. We welcome submissions of interest until the 30th of April 2019.

The Tool Fund is part of the Frictionless Data for Reproducible Research project at Open Knowledge International. This project, funded by the Sloan Foundation, applies our work in Frictionless Data to data-driven research disciplines, in order to facilitate reproducible data workflows in research contexts. At its core, Frictionless Data is a set of specifications for data and metadata interoperability, accompanied by a collection of software libraries that implement these specifications, and a range of best practices for data management. The core specification, the Data Package, is a simple and practical “container” for data and metadata.

With this announcement we are looking for individuals or organizations of scientists, researchers, developers, or data wranglers to build upon our existing open source tools and code base to create novel tooling for reproducible research. The fund will be accepting submissions from now until the end of April 2019 for work which will be completed by the end of the year.

This builds on the success of the first tool fund in 2017 which funded the creation of libraries for Frictionless Data specifications in a range of additional programming languages.

For this year’s Tool Fund, we would like the community to work on tools that can make a difference to researchers and scientists.

Applications can be submitted by filling out this form by 30 April 2019 latest.

The Frictionless Data team will notify all applicants whether they have been successful or not at the very latest by the end of May. Successful candidates will then be invited for interviews before the final decision is given. We will base our choice on evidence of technical capabilities and also favour applicants who demonstrate an interest in practical use of the Frictionless Data Specifications. Preference will also be given to applicants who show an interest working with and maintaining these tools going forward.

For more questions on the fund, speak directly to us on our  forum, on our Gitter chat or email us at

MarcEdit Custom Report Writer / Terry Reese

I periodically get requests for a variety of different types of custom, one-off reports for addition to MarcEdit.  Some of these can be accommodated in the current tooling, some of these can’t.  Often times, I encourage folks to look at the COM or API components as this provides full access to the records and allows users to create and prepare the data output in whatever format is most appropriate.  However, I do realize that this can be challenging for users without a programming background or someone to help create the custom reports. 

Since I was already planning to make a small update to the program to correct a few odd issues when working the SRU data in Alma – I went ahead and added a custom report writer.  The tool is pretty simple.  Right now, it’s basically designed around creating counts.  You can search for specific data, either as a match case or regular expression, and return back a report noting # of times in the file and # of records. 

Here’s an example of how this could work on a recent request.  A user was interested in retrieving data in the 008 related to language and see how many different types of languages occurred within a record set.  To generate this report, the user would do the following:

1) Open the file in the MarcEditor

2) Select Reports/Custom Report.  This will generate the following Window.

3) Here, the user can search for data inside the record or by regular expression.  I envision that the lion’s share of queries built with this tool will be regular expression in nature.  To answer the language question, we need to read bytes 35, 36, and 37.  Since MARC starts count at zero, that means in a base 1 (which regex uses), we’ll be reading bytes 36,37, and 38.  So, we create the expression:

Let’s see what this is actually doing.  (=008.{2}.{35}) is identifying that the search should be done in the 008 field, and that we can skip the first two bytes (blank spaces in MarcEdit’s mnemonic format).  Then the tool will read forward 35 bytes.  I could have written this as .{37} and not broken these into two separate read operations, but I personally like to keep them separate because the expression is then easier to read.  This ends group 1.

That reads the next 3 bytes (the language code)

We now have three regular expression groups.
$0 – matches the entire field
$1 – matches =008 to byte 37
$2 – matches the 3 language codes

Assuming that we want to group on the language codes, we need to now set some values.  First, we check the Use Regular Expression option.  This will display a textbox next to the GroupBy Checkbox.  This represents the regular expression group value to group data by.  In our case, we want to use group 2.  We then set a save file and check the desired output options.  The windows should look like this:

When the user runs this operation, a tab delimited report is generated at the save to location.  In the case of a sample file, that output would look like the following:

Key ((=008.{2}.{35})(.{3}))	Total	Total Records
|||	476	476
eng	748	748
ger	2	2

Data is output in tab delimited format, and includes a header with the search criteria, and then the group by value, and the specified count values.

I could envision the custom report writer being expanded based on user feedback.  The idea here is to create a tool that is a bit more flexible than the canned tools and provide users with one more tool for their toolbelt.


What Is AI-Powered Search? / Lucidworks

Search and ye shall find. But search with an AI-powered search platform and ye (or ye customer) shall find, learn, and even discover!

Although most vendors say their search platform is fortified with AI, “AI-powered” isn’t a phrase that should be used lightly. So what does AI-powered search really mean and what is the value it can generate for your business? Lucidworks’ Senior Solutions Architect, Karthik Chelladurai explains, “Basic search is matching the text of your search term with the text in the document database. AI-powered search allows us to bring in multiple dimensions of the user and data available to produce the most relevant results.”

Let’s zoom out to understand what those additional dimensions are, how they impact the user in their search journey, and how AI-powered search can create immense value for businesses.

Back to Basics: What Is Search?

Think of basic search as an ecosystem that takes your query, scans through all the information available, and then presents the items that have an exact text match to the keywords you entered. For example, if you go on and search “iPads” you expect the site to go through its product catalog and show you iPads. But what if you actually wanted to view iPad covers? Or what if it shows you the iPad 3, which you already purchased from the site? Or what if you put a space in between “i” and “pad” and you get the dead end “No Results Match your Query” pop-up or alternatively return results for pads of paper or mouse pads?

There are so many things that can create friction and prevent users from finding information they need, including missed opportunities to make smarter recommendations and a failure to learn from user behavior to better serve them and others the next time.

Put User Data to Work for a More Valuable Experience

Many search platforms, such as Google Search Appliance (GSA), don’t learn much from an individual user’s behavior or search history; you’ll be given the same results as anyone else who searched for those same words on the site, regardless of your previous queries and clicks. While collecting data on user behavior is already common practice for many companies, they’re missing out on the next important step: learning from the data in real-time to produce more relevant results and recommendations based on things like user location, search history, and the behavior of users’ similar to them.

AI-Powered Search by the numbers:
• 6% of e-commerce visits that include engagement with AI-powered recommendations drive 37% of revenue, Salesforce
• One day per working week (19.8% of work time) is wasted by employees searching for information to do their job effectively, Interact

What Exactly Is AI Doing for Search?

Systems are already tracking an incredible amount of inputs considering that most of what we do online is driven by search. Even apps you don’t think of as ‘search’ rely on it at their core. The value of tools like Craigslist, Zillow, Amazon, streaming radio, and relies entirely on their ability to easily search and find relevant information. AI-powered search provides the next generation of search result relevance that learns from user behavior in real-time as they’re searching to help bridge the gap between human and computer language.

“When we think of AI-powered search we’re referring to how we take user interaction data and wrap it into search to improve relevancy, improve poor queries, misspellings, etc.,” explains William Tseng, Lucidworks Regional Director, Sales Engineering. “Basically we’re building a search solution that empowers users to define what’s important to them.”

The value of AI-powered search is the constant loop of information that happens in the background of the user’s journey; it informs smarter recommendations in digital commerce, enables more personalization within the experience, and saves time for knowledge workers who rely on locating documents to do their jobs.

Building Blocks Supporting AI-powered Search

Norbert Krupa, Lucidworks Senior Solutions Engineer says, “AI has become a hyped-up term that can mean different things to different people. For me, AI-powered search means learning from the user to deliver the next best action, and the capability of the system to auto-tune results based on what it learns from users.” Here are a few examples of the building blocks behind AI-powered search that help a search app learn and improve:

Signals Boosting for More Relevant Results

The more data available to an AI-powered search engine, the more relevant results it can return to a user. Aggregate behavior such as click-throughs, conversions, and queries teach the engine which content is most relevant, making traditional keyword search smarter. AI-powered search leverages these signals to learn which results your users see as the most relevant for your more popular queries.

It is also able to learn what kind of product characteristics generally matter the most across all queries through building out machine-learned ranking models. AI-powered search weighs these models, in addition to other similar users’ behavior, location, and more, to calculate and present the most relevant results. Read more on how machine-learned ranking models result in better search results here…

Personalization and Recommendations that Understand Your Individual Users

According to a study from Infosys, 74 percent of consumers get frustrated with product information that’s not personalized. For example, if you just purchased an iPad and then searched “screen protector,” AI-powered search will rank iPad screen protectors higher than the Pixel 3 screen protectors. The engine is interpreting your query in the context of what it knows about you.

Recommendations rely on that same logic to suggest complementary items at checkout that you did not search for, but are still relevant product suggestions based on your behavior. An AI-powered platform can update these recommendations in real time, which can have a major impact on conversions and average order value. Read more on the power of recommendations in retail here…

Smarter Results Through Semantic Understanding

AI can power semantic search, which is a more nuanced and domain-specific understanding of what users are typing in and what those words mean within each user’s query and context. For example, synonym discovery and misspelling detection allow us to find the best smokehouse whether we search BBQ, barbecue, or even berbeque.

Clustering and classification techniques train the engine to understand different words that can be a part of the same category, ie, purse and handbag, sneakers and tennis shoes, outerwear and coats. Semantic Knowledge Graphs enable the engine to understand entities, disambiguate phrases with multiple potential meanings, and to gain a nuanced understanding of the user’s intent in order to perform a conceptual search instead of just text-based matching. Additional natural language processing (NLP) techniques allow us to talk to Siri like we talk to our friends “What’s the weather in San Francisco today?” and have her reply “Here’s the weather for San Francisco today.” (You’ll probably want a light jacket.) Read more on the power of classification, clustering, and semantic search here…

One more important thing to note: AI-powered search is best when kept transparent. Black box solutions where you have to trust a one-size-fits-all algorithm don’t allow you to control or customize results to fit your specific needs. Fusion’s AI-powered search puts relevancy in the control of the owner to make it easy to get “under the hood” and see the mechanics at work to tune results and business rules to best serve your customers.

What to Read Next:

The post What Is AI-Powered Search? appeared first on Lucidworks.

Editorial: Just Enough of a Shared Vision / Code4Lib Journal

What makes a vibrant community? A shared vision! When we live into a shared vision, we can accomplish big goals even when our motivations are not completely aligned.

A Principled Approach to Online Publication Listings and Scientific Resource Sharing / Code4Lib Journal

The Max Planck Institute (MPI) for Psycholinguistics has developed a service to manage and present the scholarly output of their researchers. The PubMan database manages publication metadata and full-texts of publications published by their scholars. All relevant information regarding a researcher's work is brought together in this database, including supplementary materials and links to the MPI database for primary research data. The PubMan metadata is harvested into the MPI website CMS (Plone). The system developed for the creation of the publication lists, allows the researcher to create a selection of the harvested data in a variety of formats.

Developing Weeding Protocols for Born Digital Collections / Code4Lib Journal

As collections continue to be digitized and even be born digital, the way we handle collection development needs to also shift towards a digital mindset. Digital collections development are not so much concerned about shelf or storage space, as expansion can be as simple as procuring a new hard drive. Digital collections, when not archival, need to focus on issues of access and accessibility. For a born digital library, quality and usefulness must be the primary factors in the collection development policy. This article will walk through the steps taken by one digital library ( to assess their collections with an eye to quality and user experience as well as a multi-phase deaccessioning project that occurred and is ongoing. The process, including the multi-iteration drafting of subject specific rubrics, targeted to the needs of the site’s core audience. It also included the quantitative assessment of thousands of items in the collection and the distribution of qualitative and quantitative data to stakeholders across the country. Special attention to the setting of minimal required standards and the communication of those standards was paid. Finally, as this process is now an ongoing review schema for LearningMedia, the article will discuss the issues faced in this project, recommendations for other organizations attempting their own digital weeding/deaccessioning projects, and the plans for the future of the project.

Querying OCLC Web Services for Name, Subject, and ISBN / Code4Lib Journal

Using Web services, search terms can be sent to WorldCat's centralized authority and identifier files to retrieve authorized terminology that helps users get a comprehensive set of relevant search results. This article presents methods for searching names, subjects or ISBNs in various WorldCat databases and displaying the results to users. Exploiting WorldCat's databases in this way opens up future possibilities for more seamless integration of authority-controlled vocabulary lists into new discovery interfaces and a reduction in libraries’ dependence on local name and subject authority files.

Content Dissemination from Small-scale Museum and Archival Collections: Community Reusable Semantic Metadata Content Models for Digital Humanities / Code4Lib Journal

This paper highlights the challenges in content dissemination in Cultural Heritage (CH) institutions by digital humanities scholars and small Museums and Archival Collections. It showcases a solution based on Community Reusable Semantic Metadata Content Models (RM’s) available for download from our community website. Installing the RM's will extend the functionality of the state of the art Content Management Framework (CMF) towards numismatic collections. Furthermore, it encapsulates metadata using the Resource Description Framework in Attributes (RDFa), and the vocabulary. Establishing a community around RM’s will help the development, upgrading and sharing of RM's models and packages for the benefit of the Cultural Heritage community. A distributed model for Community Reusable Semantic Metadata Content Models will allow the community to grow and improve, serving the needs and enabling the infrastructure to scale for the next generation of humanities scholars.

Considering past due twenties When i'ng encountered Impotence problems. I'michael wholesome and not over weight. My spouse and i hope I actually required cialis this initially knowledge about Impotence problems. We extremely advise people speak to your medical professional to test an example or perhaps doctor prescribed when you're in the least interested, or perhaps have got been able to a negative moment having Male impotence. Put this post on'to end up being ashamed to approach your doctor plus especially ask for it. My spouse and i've tried out the blue over here pill, which often didn'testosterone levels enable. I also tested out a 10mg serving with cialis, that we enjoyed. Your 20mg the place My spouse and i'meters utilized to my own erection quality getting when I'd been a youngster. Taking 20mg supplement of cialis a period of time before, and also my own assurance is definitely hundred% around my effectiveness. As well, My partner and i'ng got actually zero unwanted side effects.I'm 55 why not check here years of age. Started out having Cialis during medical doctors index idea resulting from adverse reactions together with Viagra (supper headaches). Commenced using 10mg Cialis just about every several nights you can check here and located so that it is effective without Website any web site unwanted side effects. Immediately after six times of apply My partner and i made lower back pain and also transitory aches and pains/discomfort in calves. Endurable; however , not enjoyment to manage. Could turn via continuous employ to use as needed and also exchange signal of a 5mg serving. Extremely effective.

Challenges in Sustainable Open Source: A Case Study / Code4Lib Journal

The Archivists' Toolkit is a successful open source software package for archivists, originally developed with grant funding. The author, who formerly worked on the project at a participating institution, examines some of the challenges in making an open source project self-sustaining past grant funding. A consulting group hired by the project recommended that -- like many successful open source projects -- they rely on a collaborative volunteer community of users and developers. However, the project has had limited success fostering such a community. The author offers specific recommendations for the project going forward to gain market share and develop a collaborative user and development community, with more open governance.

Never Best Practices: Born-Digital Audiovisual Preservation / Code4Lib Journal

Archivists specializing in time-based born-digital workflows walk through the technical realities of developing workflows for born-digital video. Through a series of use cases, they will highlight situations wherein video quality, subject matter, file size and stakeholder expectations decisively impact preservation decisions and considerations of "best practice" often need to be reframed as "good enough."

Using Cloud Services for Library IT Infrastructure / Code4Lib Journal

Cloud computing comes in several different forms and this article documents how service, platform, and infrastructure forms of cloud computing have been used to serve library needs. Following an overview of these uses the article discusses the experience of one library in migrating IT infrastructure to a cloud environment and concludes with a model for assessing cloud computing.

SCOPE: A digital archives access interface / Code4Lib Journal

The Canadian Centre for Architecture (CCA) identified certain technological issues, namely extensive reference workflows and under-utilizing existing metadata, as significant barriers to access for its born-digital archives. In collaboration with Artefactual Systems, the CCA built SCOPE, a digital archives access interface. SCOPE allows for granular file- and item-level searching within and across digital archives, and lets users download access copies of the collection material directly to a local machine. SCOPE is a free, open-source tool. The beta version is available to the public, and a second phase is under-development as of Spring 2019.

Creating an Institutional Repository for State Government Digital Publications / Code4Lib Journal

In 2008, the Library of Virginia (LVA) selected the digital asset management system DigiTool to host a centralized collection of digital state government publications. The Virginia state digital repository targets three primary user groups: state agencies, depository libraries and the general public. DigiTool's ability to create depositor profiles for individual agencies to submit their publications, its integration with the Aleph ILS, and product support by ExLibris were primary factors in its selection. As a smaller institution, however, LVA lacked the internal resources to take full advantage of DigiTool's full set of features. The process of cataloging a heterogenous collection of state documents also proved to be a challenge within DigiTool. This article takes a retrospective look at what worked, what did not, and what could have been done to improve the experience.

Making the Move to Open Journal Systems 3: Recommendations for a (mostly) painless upgrade / Code4Lib Journal

From June 2017 to August 2018, Scholars Portal, a consortial service of the Ontario Council of University Libraries, upgraded 10 different multi-journal instances of the Open Journal Systems (OJS) 3 software, building expertise on the upgrade process along the way. The final and the largest instance to be upgraded was the University of Toronto Libraries, which hosts over 50 journals. In this article, we will discuss the upgrade planning and process, problems encountered along the way, and some best practices in supporting journal teams through the upgrade on a multi-journal instance. We will also include checklists and technical troubleshooting tips to help institutions make their upgrade as smooth and worry-free as possible. Finally, we will go over post-upgrade support strategies and next steps in making the most out of your transition to OJS 3. This article will primarily be useful for institutions hosting instances of OJS 2, but those that have already upgraded, or are considering hosting the software, may find the outlined approach to support and testing helpful.

Wrangling Electronic Resources: A Few Good Tools / Code4Lib Journal

There are several freely available tools today that fill the needs of librarians tasked with maintaining electronic resources, that assist with tasks such as editing MARC records and maintaining web sites that contain links to electronic resources. This article gives a tour of a few tools the author has found invaluable as an Electronic Resources Librarian.

Improving the discoverability and web impact of open repositories: techniques and evaluation / Code4Lib Journal

In this contribution we experiment with a suite of repository adjustments and improvements performed on Strathprints, the University of Strathclyde, Glasgow, institutional repository powered by EPrints 3.3.13. These adjustments were designed to support improved repository web visibility and user engagement, thereby improving usage. Although the experiments were performed on EPrints it is thought that most of the adopted improvements are equally applicable to any other repository platform. Following preliminary results reported elsewhere, and using Strathprints as a case study, this paper outlines the approaches implemented, reports on comparative search traffic data and usage metrics, and delivers conclusions on the efficacy of the techniques implemented. The evaluation provides persuasive evidence that specific enhancements to technical aspects of a repository can result in significant improvements to repository visibility, resulting in a greater web impact and consequent increases in content usage. COUNTER usage grew by 33% and traffic to Strathprints from Google and Google Scholar was found to increase by 63% and 99% respectively. Other insights from the evaluation are also explored. The results are likely to positively inform the work of repository practitioners and open scientists.

CONFERENCE REPORT: Code4Lib 2010 / Code4Lib Journal

Conference reports from the 5th Code4Lib Conference, held in Asheville, NC, from February 22 to 25, 2010. The Code4Lib conference is a collective volunteer effort of the Code4Lib community of library technologists. Included are three brief reports on the conference from the recipients of conference scholarships.

A Systematic Approach to Collecting Student Work / Code4Lib Journal

Digital technology has profoundly changed design education over the past couple of decades. The digital design process generates design solutions from many different angles and points of views, captured and expressed in many file formats and file types. In this environment of ubiquitous digital files, what are effective ways for a design school to capture a snapshot of the work created within their school, and to create a long-term collection of student files for purposes of research and promotion, and for preserving the history of the school? This paper describes the recent efforts of the Harvard Graduate School of Design in creating a scalable and long-term data management solution for digital student work files. The first part describes the context and history of student work at the Harvard Graduate School of Design. The second section of the paper focuses on the functionality of the tool we created, and lastly, the paper looks at the library’s current efforts for the long-term archiving of the collected student files in Harvard’s digital repository.

EU’s chilling copyright crackdown an ‘attack on openness’ / Open Knowledge Foundation

EU negotiators have struck a deal over copyright reform that is an ‘attack on openness’, the new chief executive of Open Knowledge International has warned. Catherine Stihler, a former MEP and vice-chair of the European Parliament’s consumer protection committee, said the changes will restrict internet freedoms for millions of users.

The agreement will require platforms such as Youtube, Twitter or Google News to take down user-generated content that could breach intellectual property and install filters to prevent people from uploading copyrighted material. That means memes, GIFs and music remixes may be taken down because the copyright does not belong to the uploader. It could also restrict the sharing of vital research and facts, allowing ‘fake news’ to spread.

The proposed changes will now head to the European Parliament for a vote among all MEPs in March or April.

Open Knowledge International is a non-profit organisation which fights for open data and helps groups access and use data to address social problems. Catherine Stihler, chief executive of Open Knowledge International, said:

“This deeply disappointing deal is an attack on openness. The copyright crackdown will lead to a chilling effect on freedom of speech across the EU. We want people to be empowered to build, share and reuse their own data and content freely and openly, and this move goes against that principle.

It does not enhance citizens’ rights, and could lead to Europe becoming a more closed society – restricting how we share research that could lead to medical breakthroughs or how we share facts to combat the spread of ‘fake news’.

I urge MEPs to vote down this proposal and fight for a future where our world is more open.”

For more information on how you can help save your internet, you can visit or sign the online petition along with millions of others.

Webinar Registration Open: “DSpace Docker for Repository Managers: Running Any Version of DSpace from your Desktop” / DuraSpace News

DuraSpace presents a Community Webinar, DSpace Docker for Repository Managers: Running Any Version of DSpace from your Desktop

On Tuesday, March 5, 2019 at 11:00 AM ET (convert to your timezone), join Terry Brady, Georgetown University Library and Pascal Becker, The Library Code, when they present, “DSpace Docker for Repository Managers: Running Any Version of DSpace from your Desktop.

In 2018, the DSpace development team packaged DSpace to be run with Docker. This made it possible to start any version of DSpace from your desktop with a simple command line call.  The use of Docker has created a more flexible development environment for DSpace contributors. Docker also offers great potential for repository managers which will be the focus of this webinar.  Topics will include:

  • What is a Docker image and what images have been published for DSpace
  • How to install Docker
  • How to launch DSpace 6 and DSpace 7 using Docker
  • How to participate in DSpace testing using Docker

If you are a repository manager who is interested in previewing new DSpace functionality or would like to become more involved in DSpace development or are a potential DSpace contributor who would like to learn how you can get started with the project, we encourage you to attend.  Time will be reserved at the end of the presentation for your questions.

Space is limited and pre-registration is required.

Register today!

The post Webinar Registration Open: “DSpace Docker for Repository Managers: Running Any Version of DSpace from your Desktop” appeared first on

Supersize Apache Superset with Lucidworks Fusion — Part 2 / Lucidworks

If you haven’t read the first post in this series, jump to Supersize Apache Superset With Lucidworks Fusion — Part 1.

There are many challenges that users encounter when they try to build software on their machines. While I still find it helpful to work on some apps locally, it saves time to use software on a platform similar to the production platform. That’s why, this week, in a blog post with almost no code, I have included a link to Lucidworks Labs.

Using Lucidworks Labs you can launch an app that already includes data for restaurants in Sacramento. All you need to do is run the index workflow and the data will be transformed and loaded into a Fusion collection for use with Superset.

You will need a GitHub link to start a stream. Click here to create a Superset instance to try it out as a visualization engine for Fusion:

From that link, click “Create Instance” and select “Superset”.

Once Fusion is up and running, visit the URL listed in the dashboard. There, you can log in with the username and password provided there as well. Once in, you will see an existing app that’s preloaded and an app with the option to create/import a new app. Click the box labeled Sacramento Geospatial on the left to enter the app.

sacramento geospatial
To connect to Superset, visit the IP of your fusion application and the port 8088. For example: # note this is not a real domain. It is an example.

The username is admin and the password is superset.

Next, connect Superset to your Fusion app. Once you authenticate, click “Sources” in the top nav, and click “Databases” (Sources > Databases). Add the name GeospatialFusion. Then add a connection string.

Here is the structure of the connection string:
hive://admin:<password>@<Internal IP>:8768/default;transportMode=binary?auth=CUSTOM

The Internal IP can be found in your instance dashboard. To locate your Internal IP, click the “Console Log” tab to change your dashboard view. Once you do that, search the page for cloud-init[1111]: ci-info: | ens4 | True. The IP that follows is the one you need to use for the value of Internal IP. Then click “Save”.

To add a table click “Sources” in the top nav and click “Tables” (Sources > Tables). Below, you will find the settings for the Tables.

Databases: GeospatialFusion
Schema: default
Tables: sacramento_geospatial

Click “Save”.

Creating a Basic Pie Chart with the Sample Data

Let’s quickly create a pie chart that graphs the share of restaurants by city in the Sacramento Metro region. To jump in, click “Charts” at the top nav. In the upper right corner of the List Charts view, click the green plus button.

In the view of the create a new chart, select default.sacramento_geospatial as the data source and the Pie Chart as the visualization type. Then click “Create new chart.” In this view, change the Time range in the left to No Filter. For Metrics, select COUNT(*). In the Filters section, modify the SQL statement to be COUNT(*) > 25, so we eliminate cities that don’t have many eateries. For GROUP BY, add the CITY_s field. Change the Row limit value to 10,000. And, voila:

pie chart Apache Superset

As you continue your journey to improve access to information for your customers with an AI-powered search engine, visualization will be an asset at every stage of your search development lifecycle. When you are thinking about what words to boost, or collaborating with your business partners, having visualizations available to support your analysis can be very helpful.

After you get everything set up, feel free to load your own data into Lucidworks Fusion to check out visualizations in your lab instance.

Stay tuned for PART 3 when we will be exploring geospatial visualizations for our search corpus.

Learn More

The post Supersize Apache Superset with Lucidworks Fusion — Part 2 appeared first on Lucidworks.

February 12, 1809, and Wikipedia’s Evolution / Dan Cohen

Abraham Lincoln and Charles Darwin were both born on February 12, 1809, and this odd fact used to be featured at the top of their Wikipedia entries. As Roy Rosenzweig noted 15 years ago in his groundbreaking essay “Can History be Open Source? Wikipedia and the Future of the Past,” this “affection for surprising, amusing, or curious details” was a key marker separating popular and academic history. At the time, Wikipedia was firmly on the popular side of that line.

Whereas history professors highlighted larger historical themes and the broad context of an individual’s life—placing the arc of one person’s existence within the complex patterns of historiography—the editors of Wikipedia often obsessed about single points and unusual coincidences, such as Al Jolson and Mary Pickford being in the same Ohio town during the 1920 presidential campaign, or Woodrow Wilson having written his initials on the underside of a table in the Johns Hopkins University history department.

Since Roy wrote that essay, I’ve kept an informal log of the lifespan of historical oddities on Wikipedia, which acts as an anecdotal measure of the online encyclopedia’s evolution, or perhaps convergence, with more “serious” history. When Roy gave Wikipedia that serious look in the pages of the Journal of American History—at a time when there was still furious opposition to its use in academic settings, with dire warnings from faculty to undergraduates who relied on it—the Lincoln/Darwin factoid had been on Darwin’s page for over a year, since July 18, 2004. It was placed there by an enthusiastic early Wikipedian with the handle Brutannica. (As Brutannica’s user page on Wikipedia helpfully notes, their handle was “an apparent misunderstanding of a character in the much-missed 18th episode of Pokemon, not from the world’s most renowned encyclopaedia.”)

The line about Charles Darwin having the exact same birthday as Abraham Lincoln lasted almost six years, until June 15, 2010, when Wikipedian Intelligentsium ruthlessly removed it over the objections of Playdagame6991. (Intelligentsium to Playdagame6991: “I don’t see how the bit about Lincoln is relevant.”)

Wikipedia’s early, long-lasting, and more shameful historical problems were of course massive omissions rather than trivial additions like the shared Lincoln/Darwin birthday. The lack of entries for many important women, the overemphasis on Pokemon and Star Wars over entire genres of culture, have been far more problematic than the appearance of Woodrow Wilson’s graffiti, and critical efforts have arisen to correct these imbalances.

But the slow-burn effort to correct the nature of historical writing on Wikipedia has been more subtle but still discernible over the last decade, evident in countless small contests like the one between Intelligentsium and Playdagame6991. It would be interesting to do a more systematic analysis of such battles to see how historical writing on Wikipedia has evolved into a form that seems today more recognizable and acceptable to those in the academy.

IT Improves Productivity! / David Rosenthal

In The Productivity Paradox David Rotman writes:
Productivity growth in most of the world’s rich countries has been dismal since around 2004. Especially vexing is the sluggish pace of what economists call total factor productivity—the part that accounts for the contributions of innovation and technology. In a time of Facebook, smartphones, self-driving cars, and computers that can beat a person at just about any board game, how can the key economic measure of technological progress be so pathetic? Economists have tagged this the “productivity paradox.”

Some argue that it’s because today’s technologies are not nearly as impressive as we think. The leading proponent of that view, Northwestern University economist Robert Gordon, contends that compared with breakthroughs like indoor plumbing and the electric motor, today’s advances are small and of limited economic benefit. Others think productivity is in fact increasing but we simply don’t know how to measure things like the value delivered by Google and Facebook, particularly when many of the benefits are “free.”
My view is that IT is only one of the factors driving the decrease of productivity in the general economy, but that there are some areas of the economy in which IT is greatly increasing productivity. An explanation is below the fold.

The original productivity paradox was described by Erik Brynjolfsson in 1993's The productivity paradox of information technology:
One of the core issues for economists in the past decade has been the productivity slowdown that began in the early 1970s. Even after accounting for factors such as the oil price shocks, most researchers find that there is an unexplained residual drop in productivity as compared with the first half of the post-war period. The sharp drop in productivity roughly coincided with the rapid increase in the use of IT ... Although recent productivity growth has rebounded somewhat, especially in manufacturing, the overall negative correlation between economy-wide productivity and the advent of computers is behind many of the arguments that IT has not helped US productivity or even that IT investments have been counter-productive.
In The Puzzle of the US Productivity Slowdown Timothy Taylor runs through a number of possible explanations put forward by the Congressional Budget Office, and the CBO's explanation for why they don't apply:
  • Is the productivity slowdown a matter of measurement issues?
  • Is the productivity slowdown a result of slower growth feeding back to reduced productivity growth?
  • Is it a result of less human capital for US workers, either as a result of less experience on the job or reduced growth in education?
  • Is the problem one of overregulation?
  • Is the scientific potential for long-term innovation declining?
The CBO's skepticism of last of these is based on this observation:
no evidence exists of an abrupt change around 2005 connected to such developments.
But as I discussed in Falling Research Productivity, based on Scott Alexander's Considerations on Cost Disease and Are Ideas Getting Harder to Find? by Nicholas Bloom et al, it is certainly the case that R&D productivity is falling. The simple explanation is in this comment:
Kelvin Stott's 2-part series Pharma's broken business model: Part 1: An industry on the brink of terminal decline and Part 2: Scraping the barrel in drug discovery uses a simple economic model to show that the Internal Rate of Return (IRR) of Pharma companies is already less than their cost of capital, and will become negative in 2020. Stott shows that this is a consequence of the Law of Diminishing Returns; because the most promising research avenues (i.e. the ones promising the greatest return) are pursued first, the returns on a research dollar decrease with time.
It is likely that the slowdown around 2004 was due to a combination of factors, none large in isolation, combining to exceed a critical level. One of them was very probably IT, because the notorious failure rate of large IT projects was driving up the cost while driving down the benefits. These pie charts, showing that the odds of success decay rapidly with size, are based on a study of over 50,000 software projects over 8 years by the Standish Group.

But there are clearly some areas of the economy where IT has greatly improved productivity. One recent example is documented in a report from blockchain analytics firm Chainalysis (hat tip to Technology Review's The Download):
We took a look at hacks that target cryptocurrency organizations such as exchanges. These hacks involve large thefts, often stealing tens or even hundreds of millions of dollars directly from exchanges. Hacking dwarfs all other forms of crypto crime, and it is dominated by two prominent, professional hacking groups. Together, these two groups are responsible for stealing around $1 billion to date, at least 60% of all publicly reported hacks. And given the potential rewards, there’s no question hacking will continue; it is the most lucrative of all crypto crimes.
So, thanks to IT, two small groups mounted heists yielding "around $1B to date". This level of productivity would have been impossible before the advent of IT. Crypto crime is but one small example of the extraordinary productivity of IT-enabled criminals. Others include:
  • I have written before about the immensely profitable business of ransomware. Two years ago, discussing the losses from Internet crime, Quinn Norton wrote:
    The pre­dic­tions for this year from some analy­sis is that we’ll hit seventy‐five bil­lion in ran­somware alone by the end of the year.
    These are total losses; the fraction realized by the ransomware gangs is much less, probably only a few percent. But that's still a level of productivity impossible without IT.
  • Probably even more profitable is advertising click fraud. The Association of National Advertisers wrote about 2017:
    The third annual Bot Baseline Report reveals that the economic losses due to bot fraud are estimated to reach $6.5 billion globally in 2017. This is down 10 percent from the $7.2 billion reported in last year's study. The fraud decline is particularly impressive recognizing that this is occurring when digital advertising spending is expected to increase by 10 percent or more.
    That's $6.5B/year revenue for the click fraudsters. Before IT, criminal gangs grossing $6.5B/year would have had much lower margins. Drug smugglers, for example, would have to spend on planes, boats, staff, and bribes, not to mention raw materials. None of these are needed for click fraud.
  • But these amounts are small change compared to Wall Street's ill-gotten gains in the Global Financial Crisis:
    To begin with, a number of big hedge funds figured it out. Unlike investment banks, however, they couldn't make serious money by securitising loans and selling CDOs (collateralised debt obligations), so they had to wait until the bubble was about to burst and make their money from the collapse. And this they did. Major hedge funds including Magnetar, Tricadia, Harbinger Capital, George Soros, and John Paulson made billions of dollars each by betting against mortgage securities as the bubble ended, and all of them worked closely with Wall Street in order to do so.
    The CDOs and even the underlying mortgages depended upon IT systems such as that operated by Mortgage Electronic Recording Systems (MERS):
    It is the company created and owned by all of the big banks to process title to property in the U.S. Approximately 60% of the nation’s residential mortgages are recorded in the name of MERS.

    MERS is a shell corporation with no employees, but thousands of officers.
    MERS, the banks and the mainstream financial press all say that it was simply to save fees by digitizing mortgage electronic.

    But as Ellen Brown notes, there is in reality a very different reason that the big banks created MERS:
    The rating agencies required that the conduit be “bankruptcy remote,” which meant it could hold title to nothing ….
    These criminal schemes would have been impossible without the IT systems on Wall Street. The Financial Crisis Cost Every American $70,000, Fed Study Says. The US population is around 325 million, so the cost to Americans alone was around twenty-three trillion dollars. The proportion that ended up with the perpetrators was small, but still vastly exceeds the proceeds of crypto-crime, ransomware, and click-fraud.
Some crime, typically sex and drugs, is included in some countries' GDP computation, but the US:
has no plans “for now” to start counting illegal sex and drugs. “We need to look at the issue more closely to see what data are available before any decision could be made,” said Jeannine Aversa, chief of public affairs and outreach at the United States Bureau of Economic Analysis, in a statement. “We haven’t done any research yet, so we don’t know how much this would add to the U.S. economy as measured by G.D.P.”
So, what we have is a small number of people (the denominator) generating a large amount of income (the numerator), which implies high productivity. But since their income is not included in the numerator but they are counted in the denominator, the effect is to reduce GDP and thus reported productivity.

In The market for cyber-insurance is growing, The Economist writes that companies are increasingly incurring losses from cyber-crime:
Such mishaps are feeding a fast-growing market for specialist cyber-insurance. Solid numbers are in short supply, but Munich Re, a reinsurer, reckons that a market that wrote $4bn of premiums in 2018 could be writing $8bn-9bn by 2020. Rob Smart of Mactavish, a firm that works with big British insurers, says that “almost all” the firms’ clients have inquired about cyber-insurance in the past couple of years.
Cyber-insurance, and the expenditures incurred in recovering from attacks, are included in the GDP figures, so at least a little of the effect of Internet criminality increases GDP. But, as The Economist makes clear, cyber-insurance is not a panacea. For example:
working out who was behind a particular hack has already made the news. Mondelez, an American food company hit by the NotPetya malware, is suing Zurich, a big insurance firm, for refusing to pay out under a general insurance policy. Zurich cites an exclusion clause for losses related to war, on the ground that the NotPetya attack is thought to have been carried out by Russia.
As I discussed in Correlated Cryptojacking, a much bigger problem is that of correlated risk:
Perhaps the biggest difficulty for insurers is that the risks posed by cyber-attacks are not independent of each other. If an oil refinery in Texas floods, that does not mean one in Paris is any more likely to do so. Insurers build that independence into their risk models, and depend upon it in their calculations of the maximum they may have to pay out in a single year. But a newly found flaw in software can make all users vulnerable simultaneously. Insurers fret that a single big attack could hit many of their clients at once. In the worst case, the value of claims might be more than they could meet. ... Whether the industry can figure out a way to deal with such “risk aggregation” is an open question. As one insider says, it “sort of breaks the whole concept of insurance a bit”.
What all this shows is that "GDP" and "productivity" are pretty silly things to measure. This is also emphasized by Adam Tooze's tweet of this graph:
Just one more reality check on the relative performance of the advanced economies in terms of labour productivity. Nothing to choose btw Germany, US, Switzerland AND France. The difference is in hours worked/unemployment rate.
The point being the difference between GDP per hour worked, and the more normal graphs of GDP per head. Oh, and ignore Ireland, whose GDP is mostly made up of tax avoidance by large US companies, and Norway, whose GDP is mostly North Sea oil, neither of which involves a lot of work by their population.

UPDATE: LYRASIS and DuraSpace Intent to Merge / DuraSpace News

On January 24, 2019 LYRASIS and DuraSpace announced their intent to merge, to form a robust new home for Community-Supported Programs and Services.

Since then the Boards and staffs of LYRASIS and DuraSpace have been engaged in a due diligence fact-finding process aimed at gathering and synthesizing feedback from stakeholders to aid in leadership’s decision-making.

To date, DuraSpace Executive Director Erin Tripp has met with the Fedora Steering Group, VIVO Leadership Group, DSpace Leadership Group, Certified DuraSpace Partners of DSpace, and began meeting with individual platinum and gold members to share information and answer questions. We will also be holding an open informational webinar:

“Amplifying Impact: LYRASIS and DuraSpace Town Hall Meeting”

We hope our members and stakeholders will  join this online event. A recording will be made available following the live event. Also, If you are part of a community of practice or community governance group we encourage you to compile questions and feedback on behalf of the group and share it with us. Feel free to send to Erin Tripp, Executive Director, DuraSpace at and/or Robert Miller, CEO, LYRASIS at by Friday, February 22.

The post UPDATE: LYRASIS and DuraSpace Intent to Merge appeared first on

Fusion In Action Via Manning’s MEAP Program / Lucidworks

As the big data architect for the County of Sacramento, Guy Sperry was part of the decision making team that implemented Lucidworks Fusion to unite data over 35 lines of business. It was 2015 and he was responsible for building an enterprise content management system that would allow relevant search across regional and local government.

“We had to bring together content from fragmented technologies that crossed departmental lines, budgets, regulatory environments, you name it,” he recalled. “And all of it needed to be delivered, through access controls — both internally and externally to the public.”

Guy had been turned onto Fusion from his years of working with Solr. Solr, distributed by the Apache Foundation, is largely built by Lucidworks committers. It is also one of the many staples that makes Fusion — well, Fusion.

“While at ‘Sac County,’ we used Fusion to do things beyond the scope of its initial value,” he said.

At the time, Lucidworks actively marketed Fusion as a powerful search platform — but not as a solution for data delivery and analytics. Meanwhile, Guy and his team were busy leveraging Fusion for security analytics, geospatial data delivery, ECM data delivery, aggregate KPI analysis, as well as the all-important property tax fees assessment and analysis.

After attending back-to-back Lucidworks customer advisory board (CAB) meetings and sharing his ideas (and possibly even having some influence on the roadmap), he decided to pitch Manning Publications about the vast possibilities of Fusion. They agreed and early last year, Fusion in Action was set in motion. The book aims to provide instructions on setting up and optimizing performance of the Fusion search platform. And then how to let Fusion loose for all sorts of atypical uses.

Continuous Delivery Via MEAP

When you have a full-time job, writing sometimes has to take a back seat, which is why Manning’s MEAP (Manning Early Access Program) is ideal. The program allows for writers to deliver chapters as they are completed.

Through Manning’s MEAP program, you can purchase an advanced copy of Fusion in Action, entitling you to an electronic copy of each chapter, which is sent to you as Sperry completes it. While books can take a year or more to write and publish, with the MEAP program, you don’t need to waste valuable time waiting for its content to be wrapped up with a bow. You can leverage tips from the chapters Guy has already written to make Fusion perform. Any subsequent revisions made to chapters are also sent to you.

Another benefit of the program is that MEAP readers are invited to contribute to the writing process. As Fusion in Action chapters are released, MEAP readers can provide Guy feedback through Manning’s Author Online Forum. Reader’s insights will benefit the author as well as future readers by ultimately improving the content of the published book.

Once Fusion in Action is finalized, MEAP readers will be first to receive the published eBook, before it’s made available to the general public.

Customer Empowerment

Guy joined the Lucidworks team last fall as Director of Technical Enablement. The role, like the book, is all about empowering customers to get maximum value from our products.

Gain access to Guy’s in depth knowledge of Fusion by signing up for Fusion in Action with the MEAP program here:

For a limited time, use the code fusionlw for a 40% discount.

Fusion in Action book cover

Learn More:

The post Fusion In Action Via Manning’s MEAP Program appeared first on Lucidworks.

Meta-munging with hammer and file / Library Tech Talk (U of Michigan)

many hammers

Paul Schaffner gives an introduction to batch editing metadata using tools that have worked well for him as part of his role in the Text Creation Unit (TCU) within the University of Michigan Library's Digital Content and Collections Department. The instructions and guidance provided, while originally aimed at cataloguers, can be utilized by anyone by following along with Paul's instructions and referring to the suggested resources and links within the article.