Planet Code4Lib

Fighting the Web Flab / David Rosenthal

Source: Frederic Filloux
At Monday Note, Frederic Filloux's Bloated HTML, the best and the worse starts where I've started several times, with the incredibly low density of actual content in today's Web:
When reading this 800 words Guardian story — about half of page of text long — your web browser loads the equivalent of 55 pages of HTML code, almost half a million characters. To be precise: an article of 757 words (4667 characters and spaces), requires 485,527 characters of code ... “useful” text (the human-readable article) weighs less than one percent (0.96%) of the underlying browser code. The rest consists of links (more than 600) and scripts of all types (120 references), related to trackers, advertising objects, analytics, etc.
But he ends on a somewhat less despairing note. Follow me below the fold for a faint ray of hope.

Filloux continues:
In due fairness, this cataract of code loads very fast on a normal connection.
His "normal" connection must be much faster than my home's 3Mbit/s DSL. But then the hope kicks in:
The Guardian technical team was also the first one to devise a solid implementation of Google's new Accelerated Mobile Page (AMP) format. In doing so, it eliminated more than 80% of the original code, making it blazingly fast on a mobile device.
Great, but AMP is still 20 bytes of crud for each byte of content. What's the word for 20 times faster than "blazingly"? The Accelerated Mobile Page project has three components. First, some JavaScript that:
implements all of AMP's best performance practices, manages resource loading and gives you [custom tags], all to ensure a fast rendering of your page. Among the biggest optimizations is the fact that it makes everything that comes from external resources asynchronous, so nothing in the page can block anything from rendering. Other performance techniques include the sandboxing of all iframes, the pre-calculation of the layout of every element on page before resources are loaded and the disabling of slow CSS selectors.
Among the things the JavaScript implements are extended HTML tags that can be used to improve performance:
some HTML tags are replaced with AMP-specific tags (see also HTML Tags in the AMP spec). These custom elements, called AMP HTML components, make common patterns easy to implement in a performant way.
Finally, Google supports the use of AMP with a proxy cache that:
fetches AMP HTML pages, caches them, and improves page performance automatically. When using the Google AMP Cache, the document, all JS files and all images load from the same origin that is using HTTP 2.0 for maximum efficiency.
The cache also validates the pages it caches confirming that:
the page is guaranteed to work, and that it doesn't depend on external resources. The validation system runs a series of assertions confirming the page’s markup meets the AMP HTML specification.
Source: Frederic Filloux
Filloux then reports some interesting comparisons:
As an admittedly biased reference point, I took one of the first texts, World Wide Web Summary, written in HMTL by its inventor Tim Berners-Lee. Published in 1991, it probably is one of the purest, most barebones forms of hypertext markup language: less that 4200 characters of readable text for less that 4600 characters of code. That’s a 90% usefulness rate as shown in the table below (you can also refer to my original Google Sheet here, to get precise numbers, stories URLs and formulae).
The table (click on the image above) is interesting for the wide range of "usefulness rate", from 91% to 1%:
The big surprise (at least for me) comes from the Progressive Web App implemented by the Washington Post. The Plain HTML page offers roughly the same content as the PWA version, but with a huge gain in HTML size.
The Washington Post PWA page uses less than one-tenth as many bytes to deliver equivalent content. That's double the improvement The Guardian got with AMP. Progressive Web Apps are a technique created by Google about a year ago for building Web pages that, by using local storage, service workers and asynchronous behavior to provide app-like user experiences:
Google is just starting to promote the PWA on a large scale and the tools are already available. ... Because it supports Push notifications and other features until now reserved to native apps, PWA has great potential for publishers
It is clearly capable of impressive performance gains, with only about 1 byte of crud for each byte of content. Filloux's equivocal about the prospects for AMP and PWA. Although Google has ways of punishing sites that don't get with the program, I'm more pessimistic. The tools people use to generate their pages emit HTML that is just brain-dead (e.g. the same enormous <div> specification on adjacent phrases). Only people who simply don't care could put out stuff like this.

[Updated to correct ungrammatical sentence]

The Signal is Evolving / Library of Congress: The Signal

Photo of Mosaic of Minerva, Roman goddess of wisdom, in the Jefferson building, Library of Congress.

Minerva, Roman goddess of wisdom. Mosaic by Elihu Vedder within central arched panel leading to the Visitor’s Gallery. Library of Congress Thomas Jefferson Building, Washington, D.C. Digital photograph by Carol Highsmith, 2007. LC call number LOT 13860.

When The Signal debuted over eight years ago, its focus was exclusively on the then-new challenge of digital preservation, which is why its URL was The Signal was a forum for news and information about digital preservation — unique problems and solutions, standards, collaborations and achievements. The Signal’s authors interviewed leaders in the field, profiled colleagues and drew attention to exemplary projects.

In time, The Signal became a leading source of information about digital preservation. The success of The Signal’s community engagement was evident in the volume of responses we got to our blog posts and the dialogs they sparked; some posts still attract readers and get comments years after the posts’ original publications.

The scope of The Signal has grown organically beyond digital preservation and we are reflecting that growth by changing The Signal’s URL to Old links will still work but will redirect to the new URL. If you subscribe to an RSS feed, please change that URL to

We will continue to share information about Library of Congress digital initiatives and cover broad topics such as digital humanities, digital stewardship, crowd sourcing, computational research, scholar labs, data visualization, digital preservation and access, eBooks, rights issues, metadata, APIs, data hosting and technology sharing and innovative trends.


Improving Search for Rackspace Email / SearchHub

As we countdown to the annual Lucene/Solr Revolution conference in Boston this October, we’re highlighting talks and sessions from past conferences. Today, we’re highlighting George Bailey and Cameron Baker’s talk, “Rackspace Email’s solution for indexing 50k documents per second”.

Customers of Rackspace Email have always needed the ability to search and review login, routing, and delivery information for their emails. In 2008, a solution was created using Hadoop MapReduce to process all of the logs from hundreds of servers and create Solr 1.4 indexes that would provide the search functionality. Over the next several years, the number of servers generating the log data grew from hundreds to several thousands which required the cluster of Hadoop and Solr 1.4 servers to grow to ~100 servers. This growth caused the MapReduce jobs for indexing the data to take anywhere from 20 minutes to several hours.

In 2015, Rackspace Email set out to solve this ever growing need to index and search billions of events from thousands of servers and decided to leverage SolrCloud 5.1. This talk covers how Rackspace replaced over ~100 physical servers with 10 and improved functionality to allow for documents to be indexed and searchable within 5 seconds.

George Bailey is a Software Developer for Rackspace Email Infrastructure.

Cameron Baker is a Linux Systems Engineer for Rackspace Email Infrastructure.

Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented by George Bailey & Cameron Baker, Rackspace from Lucidworks

lucenerevolution-avatarJoin us at Lucene/Solr Revolution 2016, the biggest open source conference dedicated to Apache Lucene/Solr on October 11-14, 2016 in Boston, Massachusetts. Come meet and network with the thought leaders building and deploying Lucene/Solr open source search technology. Full details and registration…

The post Improving Search for Rackspace Email appeared first on

Evergreen 2014 / Equinox Software

This past weekend I visited a farm in Central Washington and was able to see the full life cycle of crop production.  In one area, closed and light controlled, seeds germinate into small seedlings.  When large enough, the seedlings are tempered and prepared for movement out to the greenhouse.  In the greenhouse, the plants are carefully monitored and cultivated as they grow.  The last phase is moving the plants, now hardy and larger, out into the open air where, under the sun, they grow and fully develop for harvest.  My visit to the farm came at just the right time—there were fully grown plants ready for harvesting within the next few weeks and new seedlings, which will become next year’s crop, were just starting to grow.  While taking in this cyclical process of growth and harvest I couldn’t help but think about the growth of Evergreen over the years.

2014 is the year that saw the first seeds planted for the next generation of Evergreen.  While we all know and love the XUL staff client, the power and flexibility of newer technologies, such as AngularJS, was leading the Evergreen community to explore new options for a web based staff interface.  In January 2014, a prototype for the web client was released to the Evergreen community, thanks to the research and work of Bill Erickson and the rest Equinox development team.  Bill planted those first seeds and gave the community something to cultivate.  After evaluating the prototype, the community came together to move forward with the project. With the support of many development partners (BC Libraries Cooperative, Bibliomation, C/W MARS, GPLS, Grand Rapids Public Library, Howe Library, Kenton County Public Library, MassLNC, NC Cardinal, OhioNET, PaILS, Pioneer Library System, and SC LENDS), the web client project became a reality.  And with that, the project moved into the greenhouse, where real growth and careful cultivation could occur.

Like staging the crop on the farm, the development for the web client was broken up into sprints to tackle the modules individually to allocate proper time for each stage of growth and development.  Since 2014, Equinox has continued steady development on the web client sprints.  The goal of the web client was to maintain feature parity with the old client by porting over newer HTML interfaces and re-writing the older XUL interfaces.  Happily, and with much input from the users, many improvements to use and usability have been incorporated throughout the process.  In order to allow the web client to grow, the community decided to stop accepting new features into the XUL client, but development did not cease.  New features have been developed alongside the web client and upon implementation there will some new features such as customizable copy alerts and statistical popularity badges along with the new browser based interface.

The web client is currently in the last stages of the greenhouse phase of development.  Sprints 1, 2, and 3, Circulation, Cataloging, and Administration/Reports, respectively, are complete.  Sprint 4, Acquisitions and Serials, is currently in development and will be completed this fall. Sprints 5 (Booking, Offline Mode, etc.) and 6 (Bug Fixing) will round out the development phase and, upon completion, the Evergreen web client will move out of the greenhouse and into the community for use where it will continue to grow organically to meet the needs of the community.

As a trainer, I introduce new libraries to Evergreen and the Evergreen community and help translate their workflows to a new interface and ILS.  Evergreen feels like home to me and I hope that I have been able to help other libraries feel at home with Evergreen as well.  Through community support and development, Evergreen has undergone tremendous growth in the past 10 years.  It is constantly evolving and becoming a stronger ILS that meets the needs of its users.  The web client is the next phase of this evolution and it is a big step forward.  I’m looking forward to getting to know “Webby” and seeing what the harvest will bring in the next 10 years.  

–Angela Kilsdonk, Education Manager

This is the ninth in our series of posts leading up to Evergreen’s Tenth birthday.

Linked Open Data Visualisation at #GLAMVR16 / Conal Tuohy

On Thursday last week I flew to Perth, in Western Australia, to speak at an event at Curtin University on visualisation of cultural heritage. Erik Champion, Professor of Cultural Visualisation, who organised the event, had asked me to talk about digital heritage collections and Linked Open Data (“LOD”).

The one-day event was entitled “GLAM VR: talks on Digital heritage, scholarly making & experiential media”, and combined presentations and workshops on cultural heritage data (GLAM = Galleries, Libraries, Archives, and Museums) with advanced visualisation technology (VR = Virtual Reality).

The venue was the Curtin HIVE (Hub for Immersive Visualisation and eResearch); a really impressive visualisation facility at Curtin University, with huge screens and panoramic and 3d displays.

There were about 50 people in attendance, and there would have been over a dozen different presenters, covering a lot of different topics, though with common threads linking them together. I really enjoyed the experience, and learned a lot. I won’t go into the detail of the other presentations, here, but quite a few people were live-tweeting, and I’ve collected most of the Twitter stream from the day into a Storify story, which is well worth a read and following up.

My presentation

For my part, I had 40 minutes to cover my topic. I’d been a bit concerned that my talk was more data-focused and contained nothing specifically about VR, but I think on the day the relevance was actually apparent.

The presentation slides are available here as a PDF: Linked Open Data Visualisation

My aims were:

  • At a tactical level, to explain the basics of Linked Data from a technical point of view (i.e. to answer the question “what is it?”); to show that it’s not as hard as it’s usually made out to be; and to inspire people to get started with generating it, consuming it, and visualising it.
  • At a strategic level, to make the case for using Linked Data as a basis for visualisation; that the discipline of adopting Linked Data technology is not at all a distraction from visualisation, but rather a powerful generic framework on top of which visualisations of various kinds can be more easily constructed, and given the kind of robustness that real scholarly work deserves.

Linked Data basics

I spent the first part of my talk explaining what Linked Open Data means; starting with “what is a graph?” and introducing RDF triples and Linked Data. Finally I showed a few simple SPARQL queries, without explaining SPARQL in any detail, but just to show the kinds of questions you can ask with a few lines of SPARQL code.

What is an RDF graph?What is an RDF graph?

While I explained about graph data models, I saw attendees nodding, which I took as a sign of understanding and not that they were nodding off to sleep; it was still pretty early in the day for that.

One thing I hoped to get across in this part of the presentation was just that Linked Data is not all that hard to get into. Sure, it’s not a trivial technology, but barriers to entry are not that high; the basics of it are quite basic, so you can make a start and do plenty of useful things without having to know all the advanced stuff. For instance, there are a whole bunch of RDF serializations, but in fact you can get by with knowing only one. There are a zillion different ontologies, but again you only need to know the ontology you want to use, and you can do plenty of things without worrying about a formal ontology at all. I’d make the case for university eResearch agencies, software carpentry, and similar efforts, to be offering classes and basic support in this technology, especially in library and information science, and the humanities generally.

Linked Data as architecture

People often use the analogy of building, when talking about making software. We talk about a “build process”, “platforms”, and “architecture”, and so on. It’s not an exact analogy, but it is useful. Using that analogy, Linked Data provides a foundation that you can build a solid edifice on top of. If you skimp on the foundation, you may get started more quickly, but you will encounter problems later. If your project is small, and if it’s a temporary structure (a shack or bivouac), then architecture is not so important, and you can get away with skimping on foundations (and you probably should!), but the larger the project is (an office building), and the longer you want it to persist (a cathedral), the more valuable a good architecture will be. In the case of digital scholarly works, the common situation in academia is that weakly-architected works are being cranked out and published, but being hard to maintain, they tend to crumble away within a few years.

Crucially, a Linked Data dataset can capture the essence of what needs to be visualised, without being inextricably bound up with any particular genre of visualisation, or any particular visualisation software tool. This relative independence from specific tools is important because a dataset which is tied to a particular software platform needs to rely on the continued existence of that software, and experience shows that individual software packages come and go depressingly quickly. Often only a few years are enough for a software program to be “orphaned”, unavailable, obsolete, incompatible with the current software environment (e.g. requires Windows 95 or IE6), or even, in the case of software available online as a service, for it to completely disappear into thin air, if the service provider goes bust or shuts down the service for reasons of their own. In these cases you can suddenly realise you’ve been building your “scholarly output” on sand.

By contrast, a Linked Data dataset is standardised, and it’s readable with a variety of tools that support that standard. That provides you with a lot of options for how you could go on to visualise the data; that generic foundation gives you the possibility of building (and rebuilding) all kinds of different things on top of it.

Because of its generic nature and its openness to the Web, Linked Data technology has become a broad software ecosystem which already has a lot of people’s data riding on it; that kind of mass investment (a “bandwagon”, if you like) is insurance against it being wiped out by the whims or vicissitudes of individual businesses. That’s the major reason why a Linked Data dataset can be archived and stored long term with confidence.

Linked Open Data is about sharing your data for reuse

Finally, by publishing your dataset as Linked Open Data (independently of any visualisations you may have made of it), you are opening it up to reuse not only by yourself, but by others.

The graph model allows you to describe the meaning of the terms you’ve used (i.e. the analytical categories used in your data can themselves be described and categorised, because everything is a node in a graph). This means that other people can work out what your dataset actually means.

The use of URIs for identifiers means that others can easily cite your work and effectively contribute to your work by creating their own annotations on it. They don’t need to impinge on your work; their annotations can live somewhere else altogether and merely refer to nodes in your graph by those nodes’ identifiers (URIs). They can comment; they can add cross-references; they can assert equivalences to nodes in other graphs, elsewhere. Your scholarly work can break out of its box, to become part of an open web of knowledge that grows and ramifies and enriches us all.

Statistical Popularity Badges / Equinox Software

Statistical Popularity Badges allow libraries to set popularity parameters that define popularity badges, which bibliographic records can earn if they meet the set criteria.  Popularity badges can be based on factors such as circulation and hold activity, bibliographic record age, or material type.  The popularity badges that a record earns are used to adjust catalog search results to display more popular titles (as defined by the badges) first.  Within the OPAC there is a new sort option called “Sort by Popularity” which will allow users to sort records based on the popularity assigned by the popularity badges.

Popularity Rating and Calculation

Popularity badge parameters define the criteria a bibliographic record must meet to earn the badge, as well as which bibliographic records are eligible to earn the badge.  For example, the popularity parameter “Circulations Over Time” can be configured to create a badge that is applied to bibliographic records for DVDs.  The badge can be configured to look at circulations within the last 2 years, but assign more weight or popularity to circulations from the last 6 months.

Multiple popularity badges may be applied to a bibliographic record.  For each applicable popularity badge, the record will be rated on a scale of 1-5, where a 5 indicates the most popularity.  Evergreen will then assign an overall popularity rating to each bibliographic record by averaging all of the popularity badge points earned by the record.  The popularity rating is stored with the record and will be used to rank the record within search results when the popularity badge is within the scope of the search.  The popularity badges are recalculated on a regular and configurable basis by a cron job.  Popularity badges can also be recalculated by an administrator directly on the server.

Creating Popularity Badges

There are two main types of popularity badges:  point-in-time popularity (PIT), which looks at the popularity of a record at a specific point in time—such as the number of current circulations or the number of open hold requests; and temporal popularity (TP), which looks at the popularity of a record over a period of time—such as the number of circulations in the past year or the number of hold requests placed in the last six months.

The following popularity badge parameters are available for configuration:

  • Holds Filled Over Time (TP)
  • Holds Requested Over Time (TP)
  • Current Hold Count (PIT)
  • Circulations Over Time (TP)
  • Current Circulation Count (PIT)
  • Out/Total Ratio (PIT)
  • Holds/Total Ratio (PIT)
  • Holds/Holdable Ratio (PIT)
  • Percent of Time Circulating (Takes into account all circulations, not specific period of time)
  • Bibliographic Record Age (days, newer is better) (TP)
  • Publication Age (days, newer is better) (TP)
  • On-line Bib has attributes (PIT)
  • Bib has attributes and copies (PIT)
  • Bib has attributes and copies or URIs (PIT)
  • Bib has attributes (PIT)

To create a new Statistical Popularity Badge:

  1. Go to Administration>Local Administration>Statistical Popularity Badges.
  2. Click on Actions> Add badge.
  3. Fill out the following fields as needed to create the badge:

(Note: only Name, Scope, Weight, Recalculation Interval, Importance Interval, and Discard Value Count are required)

  • Name: Library assigned name for badge.  Each name must be unique.  The name will show up in the OPAC record display.  For example: Most Requested Holds for Books-Last 6 Months.  Required field.
  • Description: Further information to provide context to staff about the badge.
  • Scope: Defines the owning organization unit of the badge.  Badges will be applied to search result sorting when the Scope is equal to, or an ancestor, of the search location.  For example, a branch specific search will include badges where the Scope is the branch, the system, and the consortium.  A consortium level search, will include only badges where the Scope is set to the consortium.  Item specific badges will apply only to records that have items owned at or below the Scope.  Required field.
  • Weight:  Can be used to indicate that a particular badge is more important than the other badges that the record might earn.  The weight value serves as a multiplier of the badge rating.  Required field with a default value of 1.
  • Age Horizon:  Indicates the time frame during which events should be included for calculating the badge.  For example, a popularity badge for Most Circulated Items in the Past Two Years would have an Age Horizon of ‘2 years’.   The Age Horizon should be entered as a number followed by ‘day(s)’, ‘month(s)’, ‘year(s)’, such as ‘6 months’ or ‘2 years’.  Use with temporal popularity (TP) badges only.
  • Importance Horizon: Used in conjunction with Age Horizon, this allows more recent events to be considered more important than older events.  A value of zero means that all events included by the Age Horizon will be considered of equal importance.  With an Age Horizon of 2 years, an Importance Horizon of ‘6 months’ means that events, such as checkouts, that occurred within the past 6 months will be considered more important than the circulations that occurred earlier within the Age Horizon.
  • Importance Interval:  Can be used to further divide up the timeframe defined by the Importance Horizon.  For example, if the Importance Interval is ‘1 month, Evergreen will combine all of the events within that month for adjustment by the Importance Scale (see below).  The Importance Interval should be entered as a number followed by ‘day(s)’, ‘week(s)’,  ‘month(s)’, ‘year(s)’, such as ‘6 months’ or ‘2 years’.  Required field.
  • Importance Scale: The Importance Scale can be used to assign additional importance to events that occurred within the most recent Importance Interval.  For example, if the Importance Horizon is ‘6 months’ and the Importance Interval is ‘1 month’, the Importance Scale can be set to ‘6’ to indicate that events that happened within the last month will count 6 times, events that happened 2 months ago will count 5 times, etc. The Importance Scale should be entered as a number followed by ‘day(s)’, ‘week(s)’,  ‘month(s)’, ‘year(s)’, such as ‘6 months’ or ‘2 years’.
  • Percentile:  Can be used to assign a badge to only the records that score above a certain percentile.  For example, it can be used indicate that only want to assign the badge to records in the top 5% of results by setting the field to ‘95’.  To optimize the popularity badges, percentile should be set between 95-99 to assign a badge to the top 5%-1% of records.
  • Attribute Filter:  Can be used to assign a badge to records that contain a specific Record Attribute.  Currently this field can be configured by running a report (see note below) to obtain the JSON data that identifies the Record Attribute.  The JSON data from the report output can be copied and pasted into this field.   A new interface for creating Composite Record Attributes will be implemented with future development of the web client.
    • To run a report to obtain JSON data for the Attribute Filter, use SVF Record Attribute Coded Value Map as the template Source.  For Displayed Fields, add Code, ID, and/or Description from the Source; also display the Definition field from the Composite Definition linked table.  This field will display the JSON data in the report output.  Filter on the Definition from the Composite Definition liked table and set the Operator to ‘Is not NULL’.
  • Circ Mod Filter: Apply the badge only to items with a specific circulation modifier.  Applies only to item related badges as opposed to “bib record age” badges, for example.
  • Bib Source Filter:  Apply the badge only to bibliographic records with a specific source.
  • Location Group Filter:  Apply the badge only to items that are part of the specified Copy Location Group.  Applies only to item related badges.
  • Recalculation Interval: Indicates how often the popularity value of the badge should be recalculated for bibliographic records that have earned the badge.  Recalculation is controlled by a cron job.  Required field with a default value of 1 month.
  • Fixed Rating: Can be used to set a fixed popularity value for all records that earn the badge.  For example, the Fixed Rating can be set to 5 to indicate that records earning the badge should always be considered extremely popular.
  • Discard Value Count:  Can be used to prevent certain records from earning the badge to make Percentile more accurate by discarding titles that are below the value indicated.   For example, if the badge looks at the circulation count over the past 6 months, Discard Value Count can be used to eliminate records that had too few circulations to be considered “popular”.  If you want to discard records that only had 1-3 circulations over the past 6 months, the Discard Value Count can be set to ‘3’.  Required field with a default value of 0.
  • Last Refresh Time: Displays the last time the badge was recalculated based on the Recalculation Interval.
  • Popularity Parameter: Types of TP and PIT factors described above that can be used to create badges to assign popularity to bibliographic records.
  1. Click OK to save the badge.

New Global Flags

OPAC Default Sort:  can be used to set a default sort option for the catalog.  Users can always override the default by manually selecting a different sort option while searching.

Maximum Popularity Importance Multiplier:  used with the Popularity Adjusted Relevance sort option in the OPAC.  Provides a scaled adjustment to relevance score based on the popularity rating earned by bibliographic records.  See below for more information on how this flag is used.

Sorting by Popularity in the OPAC

Within the stock OPAC template there is a new option for sorting search results called “Most Popular”.  Selecting “Most Popular” will first sort the search results based on the popularity rating determined by the popularity badges and will then apply the default “Sort by Relevance”.  This option will maximize the popularity badges and ensure that the most popular titles appear higher up in the search results.

There is a second new sort option called “Popularity Adjusted Relevance” that can be turned on by editing the ctx.popularity_sort setting in the OPAC template configuration.  The “Popularity Adjusted Relevance” sort option can be used to find a balance between popularity and relevance in search results.  For example, it can help ensure that records that are popular, but not necessarily relevant to the search, do not supersede records that are both popular and relevant in the search results.  It does this by sorting search results using an adjusted version of Relevance sorting.  When sorting by relevance, each bibliographic record is assigned a baseline relevance score between 0 and 1, with 0 being not relevant to the search query and 1 being a perfect match.  With “Popularity Adjusted Relevance” the baseline relevance is adjusted by a scaled version of the popularity rating assigned to the bibliographic record.  The scaled adjustment is controlled by a Global Flag called “Maximum Popularity Importance Multiplier” (MPIM).  The MPIM takes the average popularity rating of a bibliographic record (1-5) and creates a scaled adjustment that is applied to the baseline relevance for the record.  The adjustment can be between 1.0 and the value set for the MPIM.  For example, if the MPIM is set to 1.2, a record with an average popularity badge score of 5 (maximum popularity) would have its relevance multiplied by 1.2—in effect giving it the maximum increase of 20% in relevance.  If a record has an average popularity badge score of 2.5, the baseline relevance of the record would be multiplied by 1.1 (due to the popularity score scaling the adjustment to half way between 1.0 and the MPIM of 1.2) and the record would receive a 10% increase in relevance.  A record with a popularity badge score of 0 would be multiplied by 1.0 (due to the popularity score being 0) and would not receive a boost in relevance.

Popularity Badge Example

A popularity badge called “Long Term Holds Requested” has been created which has the following parameters:

Popularity Parameter:  Holds Requested Over Time

Scope: CONS

Weight: 1 (default)

Age Horizon: 5 years

Percentile: 99

Recalculation Interval: 1 month (default)

Discard Value Count: 0 (default)

This popularity badge will rate bibliographic records based on the number of holds that have been placed on it over the past 5 years and will only apply the badge to the top 1% of records (99th percentile).

If a keyword search for harry potter is conducted and the sort option “Most Popular” is selected, Evergreen will apply the popularity rankings earned from badges to the search results.

harry potter search

Title search: harry potter. Sort by: Most Popular.

harry potter popularity title search

The popularity badge also appears in the bibliographic record display in the catalog. The name of the badge earned by the record and the popularity rating are displayed in the Record Details.

A popularity badge of 5.0/5.0 has been applied to the most popular bibliographic records where the search term “harry potter” is found in the title. In the image above, the popularity badge has identified records from the Harry Potter series by J.K. Rowling as the most popular titles matching the search and has listed them first in the search results.

harry potter record detail badge display

Copy Alerts / Equinox Software

The Copy Alerts feature allows library staff to add customized alert messages to copies. The copy alerts will appear when a specific event takes place, such as when the copy is checked in, checked out, or renewed. Alerts can be temporary or persistent: temporary alerts will be disabled after the initial alert and acknowledgement from staff, while persistent alerts will display each time the alert event takes place. Copy Alerts can be configured to display at the circulating or owning library only or, alternatively, when the library at which the alert event takes place is not the circulating or owning library. Copy Alerts at check in can also be configured to provide options for the next copy status that should be applied to an item. Library administrators will have the ability to create and customize Copy Alert Types and to suppress copy alerts at specific org units.

Adding a Copy Alert

Copy Alerts can be added to new copies or existing copies using the Volume/Copy Editor. They can also be added directly to items through the Check In, Check Out, Renew, and Item Status screens.

To add a Copy Alert in the Volume/Copy Editor:

1. Within the Volume/Copy Editor, scroll to the bottom of the screen and click on Copy Alerts.


2. A New Copy Alert window will pop up.


3. Select an alert Type and enter an additional alert message if needed. Check the box next to Temporary if this alert should not appear after the initial alert is acknowledged. Leaving the Temporary box unchecked will create a persistent alert that will appear each time the action to trigger the alert occurs, such as check in or check out.


4. Click OK to save the new Copy Alert. After a Copy Alert has been added. Clicking on the Copy Alerts button in the Volume/Copy Editor will allow you to add another Copy Alert and to view and edit Existing Copy Alerts.


5. Make any additional changes to the item record and click Store Selected to store these changes and the new copy alert(s) to the Completed Copies tab. If you are done modifying the copy, click Save & Exit to finalize the changes.


To add a Copy Alert from the Check In, Check Out, or Renewal screens:

1. Navigate to the appropriate screen, for example to Circulation>Check In.
2. Scan in the item barcode.
3. Select the item row and go to Actions>Add Copy Alerts or right click on the item row and select Add Copy Alerts.


4. The Add Copy Alert window will pop up. Select the alert Type, add an additional alert message if needed, and Click OK to save. This alert will be added to the copy.


To add a Copy Alert from the Item Status screen:

1. Go to the Detail View of the Item Status screen.
2. In the bottom left-hand corner of the item record there is a Copy Alerts option. Click Add to create a new copy alert.

3. The Add Copy Alert window will pop up. Select the alert Type, add an additional alert message if needed, and Click OK to save. This alert will be added to the copy.

Triggering a Copy Alert

The Copy Alert will appear when the action required to trigger the alert occurs. For example, the Normal Checkin Alert will appear when the item is checked in:


If Next Status options have been configured for the Checkin Alert, staff will see a drop down menu that allows then to select the next Status for the copy:


Managing Copy Alerts

Copy Alerts can be managed from the Item Status screen. Within the Quick Summary tab of the Detailed View of an item, click on Manage to view and Remove copy alerts.



Administration of Copy Alerts

Copy Alert Types

Copy Alert Types are created and managed in Administration>Local Administration>Copy Alert Types. Copy Alert Types define the action and behavior of an alert message type. The Alert Types included in a stock installation of Evergreen are:

• Normal checkout
• Normal checkin
• Checkin of missing copy
• Checkin of lost-and-paid copy
• Checkin of damaged copy
• Checkin of claims-returned copy
• Checkin of long overdue copy
• Checkin of claims-never-checked-out copy
• Checkin of lost copy

To create a new Copy Alert Type:

1. Go to Administration>Local Administration>Copy Alert Types.
2. Click on Create and fill out the following fields as needed:
Name: name of the Copy Alert Type.
Active: indicates if the alert type is currently in use (Yes) or not (No).
State: indicates the Copy Status of the item at the time of the event.
Event: the action that takes place in the ILS to trigger the alert.
Scope Org Unit: indicates which org unit(s) the alert type will apply to.
Next Status: can be used with temporary Checkin Alerts only. If a next status is configured, staff will be presented with a list of statuses to choose from when the item is checked in. Next statuses should be configured by using the Copy Status ID # surrounded by curly brackets. For example {7, 11}.
Renewing?: indicates if the alert should appear during a renewal.
Invert location?: if set to yes, this setting will invert the following two settings. For example, if an alert is set to appear at the Circulating Library only, inverting the location will cause the alert to appear at all libraries except the Circulating Library.
At Circulation Library?: indicates if the alert should appear at the circulation library only.
At Owning Library?: indicates if the alert should appear at the owning library only.
3. Click Save.

To edit an existing Copy Alert Type:

1. Go to Administration>Local Administration>Copy Alert Types.
2. Click on the type and go to Actions>Edit or right-click and select Edit.
3. Make changes to the existing configuration and click Save.


Copy Alert Suppression

The Copy Alert Suppression interface can be used to suppress alert types at a specific org unit. Suppression of alerts will adhere to the organization unit hierarchy. For example, if an alert is suppressed at the System level, it will be suppressed for all descendent branches.

To suppress an alert type:

1. Go to Administration>Local Administration>Copy Alert Suppression.
2. Click Create and select the Alert Type that you want to suppress from the drop down menu.
3. Next, select the Org Unit at which the alert should be suppressed.
4. Click Save.


NEW RELEASE: Message-based Integrations for Fedora / DuraSpace News

From Aaron Coburn, Programmer and Systems Administrator, Amherst College

Amherst, MA  I would like to announce the immediate availability of version 4.6.0 of the Fedora Messaging Toolbox.

The messaging toolbox is designed to support a variety of asynchronous integrations with external tools and services, such as a Solr search engine or an external Triplestore. Version 4.6.0 of the messaging toolbox is compatible with both the forthcoming 4.6.0 release of the Fedora Commons server and previous releases of Fedora.

Learn More About Scholars@Duke / DuraSpace News

From Julia Trimmer, Manager, Faculty Data Systems & Analysis, Office of the Provost, Duke University

Durham, NC  Will you be attending the Symplectic User Conference at Duke University on September 13 and 14?  If you would like to get together around that event to learn more about VIVO at Duke University, members of the Scholars@Duke team are available to meet before or after the event.

NEW Fedora Repository Web Site / DuraSpace News

Austin, TX  DuraSpace is pleased to announce that the Fedora team recently completed a redesign of The site was designed in consultation with members of the Fedora Leadership Group and reflects a modern, mobile-friendly approach that makes it easy to find key items first.

Blueprint for a system surrounding Catholic social thought & human rights / Eric Lease Morgan

This posting elaborates upon one possible blueprint for comparing & contrasting various positions in the realm of Catholic social thought and human rights.

We here in the Center For Digital Scholarship have been presented with a corpus of documents which can be broadly described as position papers on Catholic social thought and human rights. Some of these documents come from the Vatican, and some of these documents come from various governmental agencies. There is a desire by researchers & scholars to compare & contrast these documents on the paragraph level. The blueprint presented below illustrates one way — a system/flowchart — this desire may be addressed:


The following list enumerates the flow of the system:

  1. Corpus creation – The system begins on the right with sets of documents from the Vatican as well as the various governmental agencies. The system also begins with a hierarchal “controlled vocabulary” outlined by researchers & scholars in the field and designed to denote the “aboutness” of individual paragraphs in the corpus.
  2. Manual classification – Reading from left to right, the blueprint next illustrates how subsets of document paragraphs will be manually assigned to one more more controlled vocabulary terms. This work will be done by people familiar with the subject area as well as the documents themselves. Success in this regard is directly proportional to the volume & accuracy of the classified documents. At the very least, a few hundred paragraphs need to be consistently classified from each of the controlled vocabulary terms in order for the next step to be successful.
  3. Computer “training” – Because the number of paragraphs from the corpus is too large for manual classification, a process known as “machine learning” will be employed to “train” a computer program to do the work automatically. If it is assumed the paragraphs from Step #2 have been classified consistently, then it can also be assumed that the each set of similarly classified documents will have identifiable characteristics. For example, documents classified with the term “business” may often include the word “money”. Documents classified as “government” may often include “law”, and documents classified as “family” may often include the words “mother”, “father”, or “children”. By counting & tabulating the existence & frequency of individual words (or phrases) in each of the sets of manually classified documents, it is possible to create computer “models” representing each set. The models will statistically describe the probabilities of the existence & frequency of words in a given classification. Thus, the output of this step will be two representations, one for the Vatican documents and another for the governmental documents.
  4. Automated classification – Using the full text of the given corpus as well as the output of Step #3, a computer program will then be used to assign one or more controlled vocabulary terms to each paragraph in the corpus. In other words, the corpus will be divided into individual paragraphs, each paragraph will be compared to a model and assigned one more more classification terms, and the paragraph/term combinations will be passed on to a database for storage and ultimately an indexer to support search.
  5. Indexing – A database will store each paragraph from the corpus along side metadata describing the paragraph. This meta will include titles, authors, dates, publishers, as well as the controlled vocabulary terms. An indexer (a sort of database specifically designed for the purposes of search) will make the content of the database searchable, but the index will also be supplemented with a thesaurus. Because human language is ambiguous, words often have many and subtle differences in meaning. For example, when talking about “dogs”, a person may also be alluding to “hounds”, “canines”, or even “beagles”. Given the set of controlled vocabulary terms, a thesaurus will be created so when researchers & scholars search for “children” the indexer may also return documents containing the phrase “sons & daughters of parents”, or another example, when a search is done for “war” documents (paragraphs) also containing the words “battle” or “insurgent” may be found.
  6. Searching & browsing – Finally, a Web-based interface will be created enabling readers to find items of interest, compare & contrast these items, identify patterns & anomalies between these items, and ultimately make judgments of understanding. For example, the reader will be presented with a graphical representation of controlled vocabulary. By selecting terms from the vocabulary, the index will be queried, and the reader will be presented with sortable and groupable lists of paragraphs classified with the given term. (This process is called “browsing”.) Alternatively, researchers & scholars will be able to enter simple (or complex) queries into an online form, the queries will be applied to the indexer, and again, paragraphs matching the queries will be returned. (This process is called “searching”.) Either way, the researchers & scholars will be empowered to explore the corpus in many and varied ways, and none of these ways will be limited to any individuals’ specific topic of interest.

The text above only outlines one possible “blueprint” for comparing & contrasting a corpus of Catholic social thought and human rights. Moreover, there are at least two other ways of addressing the issue. For example, it it entirely possible to “simply” read each & every document. After all, that is they way things have been done for millennium. Another possible solution is to apply natural language processing techniques to the corpus as a whole. For example, one could automatically count & tabulate the most frequently used words & phrases to identify themes. One could compare the rise & fall of these themes over time, geographic location, author, or publisher. The same thing can be done in a more refined way using parts-of-speech analysis. Along these same lines there are well-understood relevancy ranking algorithms (such as term frequency / inverse frequency) allowing a computer to output the more statistically significant themes. Finally, documents could be compared & contrasted automatically through a sort of geometric analysis in an abstract and multi-dimensional “space”. These additional techniques are considerations for a phase two of the project, if it ever comes to pass.

Evergreen 2013: Linus’s Law / Equinox Software

By 2013 Evergreen was, to coin a phrase, “nominally complete.”  It had gained the features needed to check off most of the right RFP boxes, and so be considered alongside other ILS’s with a significantly older code base.  Acquisitions and serials, along with circulation, cataloging, authority control, and the (underrated, in my opinion) booking functionality were all in place.  By this point it had a modern, pluggable OPAC infrastructure, integration with many 3rd party products to expand its functionality, and was attracting attention via non-traditional use cases such as publishing house backend systems.  So, we developers were done, right?

Not at all.

In years past, the development team working on Evergreen had been small, and grew slowly.  In important ways, though, that began to change around 2013.  Previously, having more than twelve distinct contributors in a month submitting code for inclusion in the master repository was quite rare, and usually happened right around the time when a new release was being polished.  But from late 2012 through all of 2013, 15-25 contributors became the rule and less than that was the exception.  That is a solid 20-30% increase, and is significant for any project.

At the software level this was a period of filing down rough edges and broadening the talent pool.  There were few truly massive technological advances but there were many, and varied, minor improvements made by a growing group of individuals taking time to dive deeper into a large and complex codebase.  Importantly, this included ongoing contributions from a Koha developer on a now-shared bit of infrastructure, the code we both use to parse searches against our respective catalogs.

In short, 2013 is the year that we began to truly realize one of the promises of Open Source, something that is attributed to Linus Torvalds of Linux fame.  Specifically that given enough eyeballs, all bugs are shallow.  What this means is that as your project adds users, testers, and developers, it becomes increasingly likely that bugs will be discovered early, classified quickly, and that the solution will be obvious to someone.

In some ways this can be a critical test for an Open Source project.  Many projects do not survive contact with an influx of new development talent.  For some projects, that is political.  For others, it is a consequence of early design decisions.  Fortunately, Evergreen passed that test, and that is in large part a credit to its community.  After seven years and significant scrutiny, Evergreen continued to improve and its community continued to grow.

— Mike Rylander, President

This is the eighth in our series of posts leading up to Evergreen’s Tenth birthday.

Transmission #8 – Return to Regularly Scheduled Programming / LITA

Thank you to everyone who participated in my feedback survey! I have parsed the results (a little less than 100 responses) and I’m currently thinking through format changes.

I’ll give a full update on the changes to come and more after we conclude our initial ten interviews in October. Stay tuned, faithful viewers.

In today’s webisode, I am joined by one of my personal all-time favorite librarians and colleagues, Michael Rodriguez. Michael is Electronic Resources Librarian at the University of Connecticut. Enjoy his perspectives on one of my favorite topics, librarianship in the intersection of collections, technology, and discovery.

Begin Transmission will return September 12th.

bittorrent for sharing enormous research datasets / Jonathan Rochkind says:

We’ve designed a distributed system for sharing enormous datasets – for researchers, by researchers. The result is a scalable, secure, and fault-tolerant repository for data, with blazing fast download speeds.

There are data sets from researchers are several respected universities listed, including the University of Michigan and Stanford.

Filed under: General

technical debt/technical weight / Jonathan Rochkind

Bart Wronski writes a blog post about “technical weight”, a concept related to but distinct from “technical debt.”  I can associate some of what he’s talking about to some library-centered open source projects I’ve worked on.

Technical debt… or technical weight?

…What most post don’t cover is that recently huge amount of technical debt in many codebases comes from shifting to naïve implementations of agile methodologies like Scrum, working sprint to sprint. It’s very hard to do any proper architectural work in such environment and short time and POs usually don’t care about it (it’s not a feature visible to customer / upper management)…


…I think of it as a property of every single technical decision you make – from huge architectural decisions through models of medium-sized systems to finally way you write every single line of code. Technical weight is a property that makes your code, systems, decisions in general more “complex”, difficult to debug, difficult to understand, difficult to change, difficult to change active developer.…


…To put it all together – if we invested lots of thought, work and effort into something and want to believe it’s good, we will ignore all problems, pretend they don’t exist and decline to admit (often blaming others and random circumstances) and will tend to see benefits. The more investment you have and heavier is the solution – the more you will try to stay with it, making other decisions or changes very difficult even if it would be the best option for your project.…





Filed under: General

iCampMO Instructors Announced / Islandora

Islandora Camp is heading down to Kansas City, courtesy of our hosts at the University of Missouri Kansas City. Camp will consist of three days: One day of sessions taking a big-picture view of the project and where it's headed (including big updates about Islandora CLAW) one day of hands-on workshops for developers and front-end administrators, and one day of community presentations and deeper dives into Islandora tools and sites. The instructors for that second day have been selected and we are pleased to introduce them:


Rosie Le Faive started with Islandora in 2012 while creating the a trilingual digital library for the Commission for Environmental Cooperation. With experience and - dare she say - wisdom gained from creating highly customized sites, she's now interested in improving the core Islandora code so that everyone can use it. Her interests are in mapping relationships between objects, and intuitive UI design. She is the Digital Infrastructure and Discovery librarian at UPEI, and develops for Agile Humanities.  This is her second Islandora Camp as an instructor.

Jared Whiklo began working with Islandora in 2012. After stumbling and learning for a year, he began to give back to the community in late 2013. He has since assisted in both Islandora and Fedora releases and (to his own disbelief) has become an Islandora 7.x-1.x, Islandora CLAW, and Fedora committer. His day job is Developer with Digital Initiatives at the University of Manitoba Libraries. His night job is at the Kwik-E-Mart.


Melissa Anez has been working with Islandora since 2012 and has been the Community and Project Manager of the Islandora Foundation since it was founded in 2013. She has been a frequent instructor in the Admin Track and developed much of the curriculum, refining it with each new Camp.

Sandy Rodriguez is the Digital Special Collections Coordinator at the University of Missouri—Kansas City.  She has been working with Islandora for almost three years and currently serves as a member of the Islandora Metadata Interest Group and the Metadata Tools Subgroup.

How to Write a User Experience Audit / LibUX

A User Experience Audit, or UX Audit for short, is something that should be conducted in the very beginning steps of a website, web application, dedicated app, or similar redesign project. Sometimes referred to as a deck or part of a design brief, UX Audits are typically done before user interface (UI) design occurs, and primarily consists of data intake, data compiling, research, and data visualization through presentation.

A UX Audit is simultaneously in-depth design research, and a cursory presentation of data. The UX Audit doesn’t jump to conclusions, or proposes finite UI and UX mechanics, but more so evaluates the project’s current state in order to:

  • compile qualitative data
  • conduct peer evaluation
  • discover interactive pain points
  • evaluate quantitative data
  • identify accessibility errors
  • survey information architecture
  • point out any branding violations
  • propose additional UX testing

Ultimately the UX Audit should serve as a compilation of the previous mentioned research, identify what data is missing or would need to be captured going forward, and would function as a point-of-departure for the next steps – which commonly would be sketching, wireframing, interactive wireframing, or prototyping depending on your development process.

The UX Audit is not a wireframe, it isn’t a design or user interface proposal, and it typically doesn’t highlight project requirements from stakeholders (although this is not uncommon). Additionally, a UX Audit’s summary does propose and provide recommendations based on the data compiled, but doesn’t do so in a way that graphically exceeds anything more than facilitating understanding (so no high fidelity solutions or graphics). As such, a UX Audit acts as a timestamp or bookmark as to project’s history, and serves as documentation for improvement. UX Auditing is also the prefered mode for project development, which is opposite of simply giving a project a ‘face lift’ without concern or regard to a project’s history or incremental improvement.

Once completed, the UX Audit and its recommendations are then given to a UI designer, front-end dev, web designer, or similar position who would begin designing, iterating, or wireframing (preferably in a medium that is close as possible to the final deliverable). The UI professional would then be in a better position going forward to wireframes (for example), and would be aware of the target audience, previous errors, and what data is present and what data is missing.

Possible Parts of a UX Audit

Depending on the project – be it a web application, website, or native app redesign – and on what data is available, each UX Audit is going to be different. Whenever possible, intake as much data as possible, because this data is a UX professional’s bread and butter, and they should spend a decent amount of time collecting, collating, filtering, interpreting, and visualizing it for stakeholders, developers, and UI professionals.

Although most UX Audits are essentially made from the same data and parts, they can follow any format or order. The following sections are some suggestions as to what can be included in a UX Audit:


This sometimes is called an Executive Summary, a Project Overview, or even an Introduction to the Problem. Despite what it’s called, the Introduction serves the function of briefly and succinctly introducing the the intent of the redesign, who all/what departments are involved in the project, and the scope of the UX Audit. Also accompanying, or contained in the Introduction are Research Objectives and a Table of Contents.

Research Objectives

The Research Objectives highlights and presents the hard deliverables of the UX Audit, as well as sets up the expectations of the reader as to what research will be presented.

Competitor Analysis

The UX Audit’s competitor analysis section is usually derived from data from a parent organization or competitors. A parent organization could be a policy commission, an accrediting body, location peers, etc. As for competitors, this can be determined by competitive sales, customers, consumers, goods, or services – all of which are being viewed for usage, and are best on increasing conversions.

The Competitor Analysis section can be comprised of a list of these peers, and hyperlinks to the similar projects’ website, web application, or native app. It also contains a features, functionality, and interactive elements comparison, in the form of percentages, and usually presents the comparisons through data visualization. This enables readers to see what percentage of peers have a responsive website, a homepage newsletter sign-up, sticky navigations, or other such features (which establishes baseline/industry UI standards).

Quantitative Data

Quantitative data refers to ‘the numbers’, or project traffic, usage, device/browser statistics, referrals, add-on blockers, social media sharing, and pathway mapping. All of this is quantitative data, is part of design research, and hopefully has already been set up on the project you are redesigning. Adobe Analytics and Google Analytics offer a lot of different solutions, but require a lot of customization, learning, or a significant financial investment. The following is a list of common industry software for this type of quantitative data:

Qualitative Data

Qualitative Data usually refers to the customer experience (CX) side of data, and can contain customer behavior, demographics, feedback, and search terms. This is usually derived from surveying mechanisms like Qualtrics or SurveyMonkey, embedded feedback tools like Adobe Experience Manager, search engine optimization (SEO) information from titles, spent advertising, and meta data tracking over time, and Google Trends and Google Insights.


The Accessibility portion of a UX Audit should contain both WCAG 2.0 AA and AAA errors, color contrast checking for fonts and UI mechanisms against their backgrounds, and even JavaScript errors that appear in the console log. Software used to help this portion of the UX Audit is WebAIM: Color Contrast Checker, WAVE: Web Accessibility Evaluation Tool, and any web browser’s web development toolbar’s JavaScript console window.

Interactive Pain Points

Interactive Pain Points refers to egregious UI and UX errors, non-typical animations or interactions, and unexpected functionality. This can be as little as forms and buttons being too small to click on a touch-based or mobile device, dysfunctional carousel buttons, all the way to hover navigations being jerky, and counter-intuitive forms wherein the labels cover up the input fields. This is usually best presented in the UX Audit through screenshots or videos, with annotations about what is occurring in contrast to user’s expectations.

Brad Frost has an excellent Interface Inventory Checklist available on Google Docs; this is a great place to start to know what all to look for, and what interactions to examine for improvement. A checklist like the one he shared is very helpful, but the most important thing is to demonstrate things like inconsistent button sizing, or if interaction elements are confusing/not functioning.

Information Architecture

Information Architecture (IA) is the structural design of information, interaction, and UX with the goals of making things both findable and discoverable. This part of the UX Audit focuses on the findability and discoverability of navigation items, general content strategy deficiencies, reading levels, and label auditing.

For example, analyzing the IA of a project could potentially identify that label auditing for primary and secondary navigation items, quick links, buttons, and calls-to-action is necessary. Analyzing IA could also demonstrate that the project’s Flesch readability score – a score which uses the sentence length and the number of syllables per word in an equation to calculate the reading ease – isn’t written for a 8th grade level (or that your content with specific instruction requires a 6th grade reading level). For more information the Nielsen Norman Group has a great article about legibility, readability, comprehension, and anyone can use Microsoft Word to analyze content’s Flesch readability scores.

Branding Violations

This mainly depends on an organization or company’s established style guides and pattern library. If the project being redesigned is particularly old and in need of a UX Audit, there may be a lot of color, font family, interaction elements, and UX patterns that are out of sync. If a company or organization doesn’t have a set style guide or pattern library, maybe that’s the best place to start before a UX Audit. The following are some really great style guides and pattern libraries from companies, entities, and organizations you might know already:


If the project that’s being redesigned is a website, web application, or dynamically pulls information through JavaScript data binding, performance should be factored into a UX Audit. Michael Schofield has a great article on LibUX about users having different connection speeds –  millennials all the way to broadband users – and main figures in the field of UX speak about the importance of performance all of the time.

“Over the past few months, conversations about Responsive Web design have shifted from issues of layout to performance. That is, how can responsive sites load quickly -even on constrained mobile networks.” Luke Wroblewski
Product Director at Google

When conducting a UX Audit, the Chrome web browser’s DevTools has a ‘Network’ and ‘Timeline’ view that analyzes and displays the loading of assets – images, scripts, code libraries, external resources, etc. – in real time. This information can and should be included in a UX Audit to document project load times, emulate different network conditions, verify any load time issues, and ultimately point out potential pain points for users.


Google Insights or even PageFair is desirable. This is place in the UX Audit where a UX professional really gets to shine, because they already demonstrated their data collection and presentation skills, and now they get to advise the stakeholders, UI, and development on what steps and UI decisions should be taken going forward.

How can UX Audits be used?

UX Audits can and should be incorporated as a non-bias and essential part of any redesign project. Many times a UX professional also has to be a librarian, or a UI designer, or even a front-end developer – so it’s easy to skip this important step of the redesign process, with limited staff and short deadlines.

However, performing a UX Audit will enable you to slow down, focus primarily on UX for a change, and perform an audit that will provide a lot of valuable information for stakeholders, designers, and developers. This will ultimately make everyone’s job easier, and what’s wrong with working smarter rather than harder?

Crafting Websites with Design Triggers / LibUX

A design trigger is a pattern meant to appeal to behavior and cognitive biases observed in users. Big data and the user experience boom has provided a lot of information about how people actually use the web, which designs work, and–although creepy–how it is possible to cobble together an effective site designed to social engineer users

This episode is an introduction from a longer talk in which I introduce design triggers as a concept and their reason for being.

Help us out and say something nice. Your sharing and positive reviews are the best marketing we could ask for.

If you like, you can download the MP3 or subscribe to LibUX on StitcheriTunes, YouTube, Soundcloud, Google Play Music, or just plug our feed straight into your podcatcher of choice.



MarcEdit Mac Update–Inclusion of Console Mode / Terry Reese

One of the gaps in the Mac version of MarcEdit has been the lack of a console mode.  This update should correct that.  However, a couple things about how his works…

1) Mac applications are bundles, so in order to run the console program you need to run against the application bundle.  What does this look like?   From the terminal, one would run
>>/Applications/ –console

The –console flag initializes the terminal application and prompts for file names.  You can pass the filenames (this must be fully qualified paths at this point) via command-line arguments rather than running in an interactive mode.  For example:
>>/Applications/ –s /users/me/Desktop/source.mrc –d /users/me/Desktop/output.mrk –break

The above would break a MARC file into the mnemonic format.  For a full list of console commands, enter:
>>/Applications/ –help

In the future, the MarcEdit install program will be setting an environmental variable ($MARCEDIT_PATH) on installation.  At this point, I recommend opening your .bash_profile, and add the following line:
export MARCEDIT_PATH=/Applications/

You can get this download from: 


HTTPS (Almost) Everywhere / Library Tech Talk (U of Michigan)

The University of Michigan Library pledges to update its major websites to use secure (HTTPS) connections between the servers and web browsers by December 2016.

Google Policy Fellow: my OITP summer / District Dispatch

guest post by Nick Gross, OITP’s 2016 Google Policy Fellow

This summer I worked as a Google Policy Fellow at the American Library Association’s Office for Information Technology Policy (OITP) in Washington, D.C. The Google Policy fellowship gives undergraduate, graduate, and law students the opportunity to spend the summer working at public interest groups engaged in Internet and technology policy issues.

Google Policy Fellowships give undergraduate, graduate, and law students the opportunity to spend the summer working at public interest groups engaged in Internet and tech policy issues

As a fellow, my primary role at OITP was to prepare tech policy memos to submit to the incoming presidential administration. The goal is to inform policymakers about ALA’s public policy concerns, including why, and to what extent, ALA has an interest in specific tech issues and what the next policies should look like. With balanced, future-looking information and tech policies, libraries can continue to enable Education, Employment, Entrepreneurship, Empowerment, and Engagement for their patrons— The E’s of Libraries. To that end, I drafted a brief on telecommunications issues and one on copyright issues.

The telecommunications brief addresses the importance of broadband Internet to libraries. In particular, a robust broadband infrastructure ensures that libraries can continue to provide their communities with equitable access to information and telecommunications services, as well as serve residents with digital services and content via “virtual branches.” Through the Federal Communications Commission’s Universal Service Fund (USF), which includes the E-Rate program, the Lifeline program, and the Connect America Fund, libraries and underserved or unserved communities are better able to enjoy access to affordable high-capacity broadband. And greater broadband competition and local choice increase broadband deployment, affordability, and adoption for libraries and their communities, while opening up more unlicensed spectrum for Wi-Fi expands broadband capacity so libraries can better serve their communities. Moreover, libraries sometimes provide the only Internet access points for some communities and they play an important role in digital inclusion efforts. Finally, because libraries use the Internet to research, educate, and create and disseminate content, as well as provide no-fee public access to it, they highly value the FCC’s 2015 Open Internet Order which helps guarantee intellectual freedom and free expression, thereby promoting innovation and the creation and exchange of ideas and content.

As copyright lies at the core of library operations, OITP advocates for law that fulfills the constitutional purpose of copyright—namely, a utilitarian system that grants “limited” copyright protection in order to “promote the progress of science and useful arts.” The copyright brief calls for a balanced copyright system in the digital age that realizes democratic values and serves the public interest. The first sale doctrine enables libraries to lend books and other materials. The fair use doctrine is critical to libraries’ missions, as it enables the “free flow of information,” fostering freedom of inquiry and expression; for instance, it enables libraries to use so-called “orphan works” without fear of infringement liability. Moreover, libraries are at the forefront of archiving and preservation, using copyright law’s exceptions to make reproductions and replacements of works that have little to no commercial market or that represent culturally valuable content in the public domain. Libraries also enjoy protections against liability under the Section 512 Safe Harbors in the Digital Millennium Copyright Act (DMCA).

My brief on copyright issues also highlights specific challenges that threaten libraries’ mission to provide the public with access to knowledge and upset the careful balance between copyright holders and users. For instance, e-licensing and digital rights management (DRM) under section 1201 of the DMCA, as well as the section 1201 rulemaking process, limit libraries’ ability to take full advantage of copyright exceptions, from fair use to first sale to preservation and archiving. ALA also advocates for the ratification and implementation of the World Intellectual Property Organization’s “Marrakesh Treaty” to facilitate access to published works for persons who are blind, visually impaired, or otherwise print disabled.

Safe_harbor_sunset_sceneIn addition to my policy work, Google’s bi-weekly meetings at its D.C. headquarters shed light on the public policy process. At each event, Google assembled a panel of experts composed of its own policy-oriented employees and other experts from public interest groups in D.C. Topics ranged from copyright law to broadband deployment and adoption to Net Neutrality. During the meetings, I also enjoyed the opportunity to meet the other Google fellows and learn about their work.

My experience as a Google Policy Fellow at OITP taught me a great deal about how public interest groups operate and advocate effectively. For instance, I learned how public interest groups collaborate together and form partnerships to effect policy change. Indeed, ALA works, or has worked, with groups like the Center for Democracy & Technology to advocate for Net Neutrality, while advancing public access to information as a member of the Re:Create Coalition and the Library Copyright Alliance. As a founding member of the Schools, Health & Libraries Broadband Coalition and WifiForward, ALA promotes Internet policies, such as the modernization of the USF. Not only did I gain a deeper insight into telecommunications law and copyright law, I also developed an appreciation as to how such laws can profoundly impact the public interest. I’d highly recommend the Google Policy Fellowship to any student interested in learning more about D.C.’s policymaking in the tech ecosystem.

The post Google Policy Fellow: my OITP summer appeared first on District Dispatch.

Software Carpentry: SC Config; write once, compile anywhere / Jez Cope

Nine years ago, when I first release Python to the world, I distributed it with a Makefile for BSD Unix. The most frequent questions and suggestions I received in response to these early distributions were about building it on different Unix platforms. Someone pointed me to autoconf, which allowed me to create a configure script that figured out platform idiosyncracies Unfortunately, autoconf is painful to use – its grouping, quoting and commenting conventions don’t match those of the target language, which makes scripts hard to write and even harder to debug. I hope that this competition comes up with a better solution — it would make porting Python to new platforms a lot easier!
Guido van Rossum, Technical Director, Python Consortium (quote taken from SC Config page)

On to the next Software Carpentry competition category, then. One of the challenges of writing open source software is that you have to make it run on a wide range of systems over which you have no control. You don’t know what operating system any given user might be using or what libraries they have installed, or even what versions of those libraries.

This means that whatever build system you use, you can’t just send the Makefile (or whatever) to someone else and expect everything to go off without a hitch. For a very long time, it’s been common practice for source packages to include a configure script that, when executed, runs a bunch of tests to see what it has to work with and sets up the Makefile accordingly. Writing these scripts by hand is a nightmare, so tools like autoconf and automake evolved to make things a little easier.

They did, and if the tests you want to use are already implemented they work very well indeed. Unfortunately they’re built on an unholy combination of shell scripting and the archaic Gnu M4 macro language. That means if you want to write new tests you need to understand both of these as well as the architecture of the tools themselves — not an easy task for the average self-taught research programmer.

SC Conf, then, called for a re-engineering of the autoconf concept, to make it easier for researchers to make their code available in a portable, platform-independent format. The second round configuration tool winner was SapCat, “a tool to help make software portable”. Unfortunately, this one seems not to have gone anywhere, and I could only find the original proposal on the Internet Archive.

There were a lot of good ideas in this category about making catalogues and databases of system quirks to avoid having to rerun the same expensive tests again the way a standard ./configure script does. I think one reason none of these ideas survived is that they were overly ambitions, imagining a grand architecture where their tool provide some overarching source of truth. This is in stark contrast to the way most Unix-like systems work, where each tool does one very specific job well and tools are easy to combine in various ways.

In the end though, I think Moore’s Law won out here, making it easier to do the brute-force checks each time than to try anything clever to save time — a good example of avoiding unnecessary optimisation. Add to that the evolution of the generic pkg-config tool from earlier package-specific tools like gtk-config, and it’s now much easier to check for particular versions and features of common packages.

On top of that, much of the day-to-day coding of a modern researcher happens in interpreted languages like Python and R, which give you a fully-functioning pre-configured environment with a lot less compiling to do.

As a side note, Tom Tromey, another of the shortlisted entrants in this category, is still a major contributor to the open source world. He still seems to be involved in the automake project, contributes a lot of code to the emacs community too and blogs sporadically at The Cliffs of Inanity.

Meaningfully Judging Performance in Terms of User Experience / LibUX

Much about user experience design is concerned with subjective improvements to language and structure, style, tone. The bulk of our quantitative data is used toward these purposes — and, of course, being user-centric is precisely what that data is for. The role of the user experience designer connotes a ton about the sorts of improvements at the surface of our websites, at the obvious touchpoints between patron and library. Unfortunately, this approach can neglect deep systemic or technical pain points to which “design” is wrongfully oblivious but which are fundamental to good user experience.

Speed is a major example. Website performance is crucial enough that, when it is poor, the potential for even the best designs to convert is diminished. The most “usable” website can have no effect if it fails to load when and in the way users expect it to.

One thing we can be thankful for when improving the performance of a website is that while “more speed” definitely has a strong impact on the user experience, it is also easy to measure. Look, feel, and the “oomph” of meaningful, quality content, navigability, usability, each have their own quantitative metrics like conversion or bounce rate, time watched, and so on. But at best these aspects of the web design are objective-ish: the numbers hint at a possible truth, but these measurements only weather scrutiny when derived from real, very human, users.

A fast site won’t make up for other serious usability concerns, but since simple performance optimization doesn’t necessarily require any actual users, it lends itself to projects constrained by time or budget, or those otherwise lacking the human resources needed to observe usage, gather feedback, and iterate. The ideal cycle of “tweak, test, rinse, and repeat” is in some cases not possible. Few user experience projects return as much bang for the buck as site optimization, and it can be baked into the design and development process early and with known—not guessed-at, nor situational—results.

The signals

When it comes to site optimization, there are no shortage of signals to watch. There is a glut of data right in the browser about the number of bytes in, script or style file size, network status codes, drop-shadow rendering, frames per second, and so on. Tim Kadlec, author of Implementing Responsive Design, broke a lot of these down in terms of meaningful measurements in a series of articles throughout the last couple of years oriented around the “performance budget.”

A performance budget is just what it sounds like: you set a “budget” on your page and do not allow the page to exceed that. This may be a specific load time, but it is usually an easier conversation to have when you break the budget down into the number of requests or size of the page.

Such a strategy really took root in the #perfmatters movement, spurred by folks repulsed by just how fast the web was getting slower. Their observation was that because the responsive web was becoming increasingly capable and high pixel density screens were the new norm, developers making cool stuff sent larger and larger file sizes through the pipes. While by definition responsive websites can scale for any screen, they were becoming cumbersome herky-jerky mothras for which data was beginning to show negative impacts.

In his talk in 2013, “Breaking the 1000ms Time to Glass Mobile Barrier” — and, later, his book High Performance Browser Networking — Ilya Grigorik demonstrated users’ reactions to even milliseconds-long delays:

Delay User Reaction
0 – 100ms Instant
100 – 300ms Feels sluggish
300 – 1000ms Machine is working …
1s + Mental context switch
10s + I’ll come back later …

Since then, the average page weight has grown 134 percent, 186 percent since 2010. Poor performance is such a drag on what might otherwise be a positive user experience—encapsulated by a July 2015 article in The Verge, “The Mobile Web Sucks”—that the biggest players in the web game (Facebook and Google) have dramatically reacted by either enforcing design restrictions on the SEO-sensitive developer or removing the dev’s influence entirely.

Comparison of average bytes per content type in November 2010 (left) and November 2015 (right).

Self-imposed performance budgets are increasingly considered best practice, and—as mentioned—there are different ways to measure its success. In his write-up on the subject, Tim Kadlec identifies four major categories:

  • Milestone timings
  • Rule based metrics
  • Quantity based metrics
  • Speed index

Milestone Timings

A milestone in this context is a number like the time in seconds until the browser reaches the load event for the main document, or, for instance, the time until the page is visually complete. Milestones are easy to track, but there are arguments against their usefulness. Pat Meenan writes in the WebPagetest documentation that a milestone “isn’t a very good indicator of the actual end-user experience.”

As pages grow and load a lot of content that is not visible to the user or off the screen (below the fold) the time to reach the load event is extended even if the user-visible content has long-since rendered… [Milestones] are all fundamentally flawed in that they measure a single point and do not convey the actual user experience.

Rule Based and Quantity Based Metrics

Rule based metrics check a page or site against an existing checklist with a tool like YSlow or Google PageSpeed to grade your site. Quantity based metrics, on the other hand, include a lot of the data as reported by outlets like the HTTP Archive. These include total number of requests, overall page weight, and even the size of the CSS file. Not all these metrics indicate poor performance, but they are useful for conceptualizing the makeup of a page and where efforts at optimization can be targeted. If the bulk of the page weight is chalked-up to heavy image use, then perhaps there are image-specific techniques you can use for stepping-up the pace.

Example of a library web page graded by YSlow.

Example of a library web page graded by YSlow.

Speed Index

Speed Index is set apart by its attempts to measure the experience (there is an algorithm) to which Pat Meenan referred by determining how much above-the-fold content is visually complete over time then assigning a score. This is not a timing metric, but Meenan explains:

the ‘area above the curve’ calculated in ms and using 0.0–1.0 for the range of visually complete. The calculation looks at each 0.1s interval and calculates IntervalScore = Interval * ( 1.0 – (Completeness/100)) where Completeness is the percent visually complete for that frame and Interval is the elapsed time for that video frame in ms… The overall score is just a sum of the individual intervals.

View of a web page loading over time

Basically, the faster the website loads above the fold, the faster the user can start to interact with the content. A low score is better, which is read as milliseconds. A score of “1000” roughly means that a user can start to use the website after just one second. So if other metrics measure the Time To Load (TTL), then Speed Index measures Time To Interact (TTI), which may be a more meaningful signal.

TTI encapsulates an important observation even by quantitative-data nerds that web performance is just as much tied to the psychology of time and the perception of speed as it is by the speed of the network. If we look at page speed as a period of waiting, then how the user waits plays a role in how that wait is experienced. As Denys Mishunov writes in an article about “Why Performance Matters,” the wait is either active or passive:

The period in which the user has no choice or control over the waiting time, such as standing in line or waiting for a loved one who is late for the date, is called a passive phase, or passive wait. People tend to estimate passive waiting as a longer period of time than active, even if the time intervals are objectively equal.

For example, during my recent involvement with an academic library homepage redesign, our intention was that it would serve as thin a buffer as possible between the students or faculty and their research. This not only involved bringing search tools and content from deeper in the website to the forefront, but also reducing any barrier or “ugh” factor when engaging with them—such as time. Speed Index has a user-centric bias in that its measurement approximates the time the user can interact with—thus experience—the site. And it is for this reason we adopted it as a focal metric for our redesign project.

A report from Google Pagespeed.

A report from Google Pagespeed.

Quick tangent tutorial: measuring Speed Index with WebPagetest

Google develops and supports WebPagetest, the online open-source web performance diagnostic tool at, which uses virtual machines to simulate websites loading on various devices and with various browsers, throttling the network to demonstrate load times over slower or faster connections, and much more. Its convenience and ease of use makes it an attractive tool. Generating a report requires neither browser extensions nor prior experience with in-browser developer tools. WebPagetest, like alternatives, incorporates rule-based grading and quantity metrics, but it was also the first to introduce Speed Index, which can be measured by telling it to “Capture Video.”

WebPagetest Interface

WebPagetest returns a straightforward report card summarizing the performance results of its tests, including a table of milestones alongside speed indices. The tool provides results for “First View” and “Repeat View,” which demonstrates the role of the browser cache. These tests are remarkably thorough in other ways as well, including screen captures, videos, waterfall charts, content breakdowns, and optimization checklists.

WebPagetest Results

It’s worth noting that these kinds of diagnostics can be run by other tools on either end of development. Google PageSpeed Insights can be generated in the same way: type a URL and run the report. But folks can also install PageSpeed’s Apache and Nginx modules to optimize pages automatically, or otherwise integrate PageSpeed—or YSlowinto the build-process with grunt tasks. The bottom line is that these kinds of performance diagnostics can be run wherever it is most convenient, at different depths, whether you prefer to approach it as a developer or not. They can be as integrated or used ex-post-facto as needed.

The order in which the elements load matters

Of course, the user’s experience of load times is not only about how long it takes any interactive elements of the page to load but how long it takes certain elements to load. Radware’s recent report “Speed vs. Fluency in Website Loading: What Drives User Engagement” shows that “simply loading a page faster doesn’t necessarily improve users’ emotional response to the page.” They outfitted participants with neuroimaging systems and eye-trackers (mounted on monitors) in an attempt to objectively measure things like cognitive load and motivation. In the study, the same web page was loaded using three different techniques:

  1. the original, unaltered loading sequence,
  2. the fastest option, where the techniques used provided the most demonstrably fast load times regardless of rendering sequence,
  3. a version where the parts of the page most important to what the user wanted to accomplish were loaded first.
Results of Radware's study on how users process web pages during rendering

Results of Radware’s study on how users process web pages during rendering

In six out of ten pages, the sequence in which elements loaded based off their importance toward a primary user task affected overall user engagement, measured by total fixation time.

While not overwhelming, the results suggest that depending on the type of website, rendering sequence can play an important role on the “emotional and cognitive response and at which order [users] will look at different items.” Radware makes no suggestions about which rendering sequences work for which websites.

Still, the idea that cherry-picking the order in which things load on the page might decrease cognitive load (especially on an academic library homepage where the primary user task is search) is intriguing.

Earmark a Performance Budget

Anyway, this is getting a little long in the tooth. All this is to say that there are all sorts of improvements that can be made to library websites that add value to the user experience. Prioritizing between these involves any number of considerations. But while it may take a little extra care to optimize performance, it’s worth the time for one simple reason: your users expect your site to load the moment they want it.

This sets the tone for the entire experience.

Copyrights. So, this article originally appeared in Weave: Journal of Library User Experience, in an issue alongside people I really respect writing about anticipatory design and performance. It’s licensed under a creative commons attribution 3.0 license. I made some changes up there and embedded some links, but for the most part the article is in its original form.

New season, new entrepreneurship opportunities / District Dispatch

A young girl works in the "Fab Lab" at Orange County Library System's Dorothy Lumley Melrose Center.

A young girl works in the “Fab Lab” at Orange County Library System’s Dorothy Lumley Melrose Center. Photo credit: Orange County Library System.

This is a strange time of year. The days are still long and hot – at least here in D.C. – but the Labor Day promos and pre-season football games signal the start of a new season. It’s around this time that I usually reflect on the waning summer. Having just gotten back from a long vacation at the beach, I’ve had plenty of time for reflection on the past year. Professionally, I’ve focused heavily on a single topic these past few months: entrepreneurship.

In late June, months of research, outreach, and writing culminated in OITP’s release of a white paper on the library community’s impact on the entrepreneurship ecosystem. The paper brought together data and cases from across the country to outline the bevy of services academic and public libraries offer entrepreneurs. We called the paper “The People’s Incubator.” You don’t have to read the text to recognize the accuracy of this metaphor for describing the role the library community plays in helping people bring innovative ideas to life. Libraries are, and have always been, creative spaces for everyone. Since the analog era, library programs and services have encouraged all people to convert notions into innovations.

But, the more time that passes since the paper’s release, the more I feel the “People’s Incubator” moniker isn’t quite adequate to describe what the modern library community does in today’s entrepreneurship space. It does justice to the creative power of library resources, but it doesn’t convey the steadiness of the support the library community offers entrepreneurs at every turn. At each stage of launching and running a business – planning, fundraising, market analysis and more – libraries are equipped to offer assistance. Business plan competitions, courses on raising capital, research databases, census records, prototyping and digital production equipment, business counseling and intellectual property information all combine to round out the picture of the entrepreneurship services available at the modern library.

A facility offering these services is not just an incubator – it’s a constant companion; a hand to hold while navigating a competitive and often unforgiving ecosystem. And the more I read about library entrepreneurship activities, the more convinced I become that influencers across all sectors should leverage the robust resources libraries provide entrepreneurs to encourage innovation across the country. In just the few months since we published the paper, I have found one after another example of libraries’ commitment to developing a more democratic and strong entrepreneurship ecosystem. In addition to the examples described in the paper, recent library partnerships illustrate the entrepreneurship synergies the library community can help create.

The New York Public Library (NYPL) recently partnered with the 3D printing service bureau Shapeways to develop curricula for teaching the entrepreneurial applications of 3D printing.  The curricula will be piloted in a series of NYPL courses in the fall of 2016, and then publically released under an open license. Continued partnerships between libraries and tech companies like this one will advance the capacity of libraries to build key skills for the innovation economy.

For over a year, the Memphis Public Library has been a key partner in a citywide effort to boost start-up activity. Working with colleges, universities and foundations, the library’s resources and programming has helped the Memphis entrepreneurship ecosystem create hundreds of jobs. Libraries can and should continue to be a major part of these sorts of collaborations.

With support from the Kendrick B. Melrose Family Foundation, The Orange County Library System in Orlando opened the Dorothy Lumley Melrose Center in 2014. The Center offers video and audio production equipment, 3D printers, arduino and other electronics, and a host of tech classes – all of which individuals can use to launch new innovations and build key skills for the modern economy.

Through a partnership between the Montgomery County Public Library and the Food and Drug Administration (FDA), 80 teens had the opportunity to work in teams this summer to design their own mobile medical apps. The teens recently “pitched” their apps to a panel of judges at the FDA’s main campus in Silver Spring, Maryland. They’ve also gotten the chance to visit the White House.

Beyond partnerships between libraries, private firms, government agencies, academic institutions and foundations, library collaborations with Small Business Development Centers – federally-supported entrepreneurship assistance facilities – continue to be publically highlighted.

So, if I’ve learned anything from my summer of entrepreneurship, it’s this: libraries, as constant companions for entrepreneurs, are natural partners for the many public, private, non-profit and academic actors that work to advance the innovation economy. We will trumpet this important message in the coming weeks and months, as we work to alert policymakers to the important work of libraries ahead of the November elections. To do that, we need good examples of library efforts to advance start-up activities. Share yours in the comments section!

The post New season, new entrepreneurship opportunities appeared first on District Dispatch.

Evergreen 2012: ownership and interdependence / Equinox Software

"Cats that Webchick is herding" by Kathleen Murtagh on Flickr (CC-BY)

“Cats that Webchick is herding” by Kathleen Murtagh on Flickr (CC-BY)

A challenge common to any large project is, of course, herding the cats. The Evergreen project has pulled off a number of multi-year projects, including completely replacing the public catalog interface, creating acquisitions and serials modules from scratch, creating a kid’s catalog, writing Evergreen’s manual, and instituting a unit and regression testing regime. As we speak, we’re in the middle of a project to replace the staff client with a web-based staff interface.

All of this happened — and continues to happen — in a community where there’s little room for anybody to dictate to another community member to do anything in particular. We have no dictator, benevolent or otherwise; no user enhancement committee; no permanent staff employed by the Evergreen Project.

How does anything get done? By the power of Voltron interdependence.

In 2011, Evergreen become a member project of the Software Freedom Conservancy, representing a culmination of the efforts started in 2010 (as Grace mentioned).

As a member project of Conservancy, Evergreen receives several benefits: Conservancy holds the project’s money, negotiates venue contracts for the annual conference and hack-a-way, and holds the project’s trademark. However, Conservancy does not run the project — nor do they want to.

As part of joining Conservancy, the Evergreen Project established an Oversight Board, and in 2012, I had the privilege of beginning a term as chair of the EOB. The EOB is Conservancy’s interface with the Evergreen Project, and the EOB is the group that is ultimately responsible for making financial decisions.

Aha! You might think to yourself: “So, if the Evergreen Project doesn’t have a dictator in the mold of Linus Torvalds, it has elected oligarchs in the form of the Oversight Board!”

And you would be wrong. The Evergreen Oversight Board does not run the project either. The EOB does not appoint the release managers; it does not dictate who is part of the Documentation Interest Group; it does not mandate any particular sort of QA.

What does the EOB do? In part, it does help establish policies for the entire project; for example, Evergreen’s decision to adopt a code of conduct in 2014 arose from the suggestions and actions of EOB members, including Kathy Lussier and Amy Terlaga. It also, in conjunction with Conservancy, helps to protect the trademark.

The trademark matters. It represents a key piece of collective ownership, ownership that is in the hands of the community via a nonprofit, disinterested organization. Evergreen is valuable, not just as a tool that libraries can use to help patrons get access to library resources, but in part as something that various institutions have built successful services (commercial or otherwise) on.  If you take nothing else away from this post, take this: if you plan to launch an open source project for the benefit of libraries, give a thought to how the trademark should be owned and managed.  The consequences of not doing so can end up creating a huge distraction from shipping excellent software… or worse.

But back to the question of governance: how does the day to day work of writing documentation, slinging code, updating websites, training new users, seeking additional contributors, unruffling feathers, and so forth get done? By constant negotiation in a sea of interdependence. This is complicated, but not chaotic. There are plenty of contracts helping protect the interests of folks contributing to and using Evergreen: contracts with non-profit and for-profit service providers like Equinox; contracts to join consortia; contracts to pool money together for a specific project. There are also webs of trust and obligation: a developer can become a committer by showing that they are committed to improving Evergreen and have a track record of doing so successfully.

Governance is inescapable in any project that has more than one person; it is particularly important in community-based open source projects. Evergreen has benefited from a lot of careful thought about formal and informal rules and lines of communication…. and will continue to do so.

— Galen Charlton, Added Services and Infrastructure Manager

This is the seventh in our series of posts leading up to Evergreen’s Tenth birthday.

Hydra Connect 2016 update / Hydra Project

Hydra Connect 2016 takes place from Monday October 3rd to Thursday October 6th in Boston, MA.  Full details of the conference can be found in its wiki page.

Poster Show and Tell

Tuesday afternoon at HC2016 will largely be given over to an open session of posters and possibly demos where attendees will have ample chance to mingle, find out what others are doing with Hydra and ask detailed questions.  We ask that all institutions with staff attending HC2016 try to provide at least one poster about what they have done/are doing/plan to do with Hydra!  There is a list of institutions on this page in the wikiplease sign up if you haven’t already done so.  The page also has details of local printing arrangements, should you wish to take advantage of them.  For display, the local Organizing Committee will provide easels; they will also provide stiff backing for 24″ x 36″ posters.  If you decide to produce 36″ x 48″ posters you will need to have them printed on foam core or otherwise provide a rigid backing of your own.

Registration and hotels

Registration is going well and the Sheraton hotel, where we arranged a discount room rate, sold out the block.  They have now added a small number of double rooms but we have arranged an “overflow” at the Charlesmark Hotel across from the Boston Public Library.  If you haven’t yet booked for the conference and/or for accommodation now would be a very good time!  Booking details and full details of the conference program can be found on the Hydra Connect 2016 wiki page.

Lightning talks and workshop sessions

The Program Committee has left space on the Wednesday morning for up to 24 lightning talks of no more than five minutes each.  If you’d like to take one of these slots to share some ideas/concerns/rants/raves or whatever, get planning – we’ll publish a sign-up list next Friday.  At the same time, we’ll mail registered delegates with details of how to sign up for Monday’s workshop sessions.  This process will allow us to judge the take-up for each one and assign rooms accordingly; it will also allow workshop coordinators to know who will be attending and to send out any advance information that they wish to provide.

Unconference sessions

Thursday morning at Hydra Connect will be given over to unconference sessions and we’ll provide details of the process for proposing these on 9th September.


We hope to see you in just about five weeks!

Evergreen 2.9.7 and 2.10.6 released / Evergreen ILS

We are pleased to announce the release of Evergreen 2.9.7 and 2.10.6, both bugfix releases.

Evergreen 2.9.7 fixes the following issues:

  • The claims never checked out counter on the patron record is now incremented correctly when marking a lost loan as claims-never-checked-out.
  • When a transit is canceled, the copy’s status is changed only if its status was previously “In Transit”.
  • Retrieving records with embedded holdings via SRU and Z39.50 is now faster.
  • The hold status message in the public catalog now uses better grammar.
  • The error message displayed when a patron attempts to place a hold but is prevented from doing so due to policy reasons is now more likely to be useful.
  • The public catalog now draws the edition statement only from the 250 field; it no longer tries to check the 534 and 775 fields.
  • Embedded microdata now uses “offeredBy” rather than “seller”.
  • The ContentCafe added content plugin now handles the “fake” ISBNs that Baker and Taylor assigns to media items.
  • Attempting to renew a rental or deposit item in the public catalog no longer causes an internal server error.
  • Various format icons now have transparent backgrounds (as opposed to white).
  • The staff client will no longer wait indefinitely for Novelist to supply added content, improving its responsiveness.
  • A few additional strings are now marked as translatable.

Evergreen 2.10.6 fixes the same issues fixed in 2.9.7, and also fixes the following:

  • Those stock Action Trigger event definitions that send email will now include a Date header.
  • Prorating invoice charges now works again.
  • A performance issue with sorting entries on the public catalog circulation history page is fixed.
  • Various style and responsive design improvements are made to the circulation and holds history pages in the public catalog.
  • The public catalog holds history page now indicates if a hold had been fulfilled.

Evergreen 2.10.6 also includes updated translations. In particular, Spanish has received a huge update with over 9,000 new translations, Czech has received a sizable update of over 800 translations, and additional smaller updates have been added for Arabic, French (Canada), and Armenian.

Please visit the downloads page to retrieve the server software and staff clients.

Evergreen 2011 / Equinox Software

This is the sixth in our series of posts leading up to Evergreen’s Tenth birthday.

Metamorphosis is the word that comes to mind when I think back on 2011. It was Evergreen’s five-year mark and we had graduated from version 1.6 to 2.0. With the addition of acquisitions, serials, authority control and many improvements to existing modules, Evergreen was now a fully-fledged ILS.

As Grace mentioned, this time period also marked a shift in the type of libraries we attracted to our community. It was no longer a group of insiders. Evergreen had become a contender on a short list of ILSs that could offer the functionality and flexibility many libraries, large and small, were seeking.

As a Project Manager in 2011, I was fortunate to work with an emerging consortium in my home state of North Carolina. NC Cardinal began the migration of several pilot libraries from disparate ILSs to Evergreen. The inaugural library system was Cleveland County. Grant, Carol, Sharon and I could not have imagined that the migration of this three-branch library system would be the birth of one of the largest and most active library groups in Evergreen history.

The amazing folks at the State Library, Cleveland County, Buncombe County, Davidson County and all the others that followed shortly after definitely set the stage for this dynamic consortium. NC Cardinal now has over 100 library outlets across the state and they are not slowing down.

As is the theme with many of our other posts, community is at the heart of Evergreen’s success. I think not only about the larger Evergreen community but also the internal community within a consortium. Support here is what drives implementations forward and ensures success within the larger Evergreen community.

Attending the Evergreen Conference this year in Raleigh, NC brought this full circle for me. It was rewarding to see the Cardinal members, which have grown exponentially, as well as some of our newer libraries participating and getting excited about what is to come. I can’t wait to see who joins us at the next conference and in the next ten years.

Shae Tetterton – Director of Sales

Large Scale Log Analytics with Solr / SearchHub

As we countdown to the annual Lucene/Solr Revolution conference in Boston this October, we’re highlighting talks and sessions from past conferences. Today, we’re highlighting Sematext’s Radu Gheorghe and Rafał Kuć’s talk, “Large Scale Log Analytics with Solr”. Radu and Rafał will also be presenting at Lucene/Solr Revolution 2016.

This talk is about searching and analyzing time-based data at scale. Documents ranging from blog posts and social media to application logs and metrics generated by smart-watches and other smart-things share a similar pattern: timestamp among their fields, rarely changeable, deletion when they become obsolete.

This kind of data is so large that it often causes scaling and performance challenges. This talk addresses these challenges, which include: properly designing collections architecture, indexing data fast and without documents waiting in queues for processing, being able to run queries that include time based sorting and faceting on enormous amounts of indexed data without killing Solr with out of memory problems, and many more.

Radu is a search and logging geek at Sematext, working daily with the likes of Elasticsearch, Logstash and rsyslog. Co-authoring Elasticsearch in Action

Rafał is a father, husband, Sematext software engineer and consultant, Solr Cookbook and Elasticsearch Server books author, co-founder.

Large Scale Log Analytics with Solr: Presented by Rafał Kuć & Radu Gheorghe, Sematext Group Inc. from Lucidworks

lucenerevolution-avatarJoin us at Lucene/Solr Revolution 2016, the biggest open source conference dedicated to Apache Lucene/Solr on October 11-14, 2016 in Boston, Massachusetts. Come meet and network with the thought leaders building and deploying Lucene/Solr open source search technology. Full details and registration…

The post Large Scale Log Analytics with Solr appeared first on

Evanescent Web Archives / David Rosenthal

Below the fold, discussion of two articles from last week about archived Web content that vanished.

At Urban Milwaukee Michail Takach reports that Journal Sentinel Archive Disappears:
Google News Archive launched [in 2008] with ambitious plans to scan, archive and release the world’s newspapers in a single public access database. ... When the project abruptly ended three years later, the project had scanned over a million pages of news from over 2,000 newspapers. Although nobody is entirely sure why the project ended, Google News Archive delivered an incredible gift to Milwaukee: free digital access to more than a century’s worth of local newspapers.
But now:
on Tuesday, August 16, the Milwaukee Journal, Milwaukee Sentinel, and Milwaukee Journal Sentinel listings vanished from the Google News Archive home page. This change came without any advance warning and still has no official explanation.
The result for Takach is:
For years, I’ve bookmarked thousands of articles and images for further exploration at a later date. In one lightning bolt moment, all of my Google News Archive bookmarks went from treasure to trash.
To be fair, this doesn't appear to be another case of Google abruptly canceling a service:
“Google News Archive no longer has permission to display this content.”
According to the Milwaukee Journal Sentinel:
“We have contracted with a new vendor (Newsbank.) It is unclear when or if the public will have access to the full inventory that was formerly available on Google News Archive.”
The owner of the content arbitrarily decided to vanish it.

At U.S. News & World Report Steven Nelson's Wayback Machine Won’t Censor Archive for Taste, Director Says After Olympics Article Scrubbed is an excellent, detailed and even-handed look at the issues raised for the Internet Archive when the Daily Beast's:
straight reporter created a gay dating profile and reported the weights, athletic events and nationalities of Olympians who contacted him, including those from "notoriously homophobic" countries. As furor spread last week, the Daily Beast revised and then retracted the article, sending latecomers to the controversy to the Wayback Machine.
The Internet Archive has routine processes that make content they have collect inaccessible, for example in response to DMCA takedown notices. It isn't clear exactly what happened in this case. Mark Graham is quoted:
“The page we’re talking about here was removed from the Wayback Machine out of a concern for safety and that’s it.”... Graham was not immediately able to think of a similar safety-motivated removal and declined to say if the Internet Archive retains a non-public copy. In fact, he says he has no proof, just circumstantial evidence, the article ever was in the Wayback Machine.
I would endorse Chris Bourg's stance on this issue:
Chris Bourg, director of libraries at the Massachusetts Institute of Technology, says the matter is a "a tricky situation where librarian/archivists values of privacy and openness come in to conflict" and says in an email the article simply could be stored in non-public form for as long as necessary.

"My personal opinion is that we should always look for answers that cause the least harm, which in this case would be to dark archive the article; and keep it archived for as long as needed to best protect the gay men who might otherwise be outed," she says. "That’s a difficult thing to do, and is no guarantee that the info won’t be released and available from other sources; but I think archivists/librarians have special responsibilities to the subjects in our collections to 'do no harm'."
These two stories bring up four points to consider:
  • The Internet Archive is the most-used, but only one among a number of Web archives which will naturally have different policies. Portals to the archived Web that use Memento to aggregate their content, such as, could well find content the Wayback machine had suppressed in other archives.
  • Copyright enables censorship. Anything on the public Web, or in public Web archives, can be rendered inaccessible without notice by the use or abuse of copyright processes, such as the DMCA takedown process.
  • Just because archived Web resources are in the custody of a major company, such as Google, or even what we may now thankfully call a major institution, the Internet Archive, does not guarantee them permanence.
  • Thus, scholars such as Takach are faced with a hard choice, either to risk losing access without notice to the resources on which their work is based, or to ignore the law and maintain a personal archive stored in their own equipment of all those resources.
While not specifically about Web archives, emptywheel's account of the removal of the Shadow Brokers files from GitHub, Reddit and Tumblr, and Roxane Gay's The Blog That Disappeared about Google's termination of Dennis Cooper's account, show that one cannot depend on what services such as these say in their Terms of Service.

New Titles in the LITA Guide Series / LITA

A new relationship between LITA and Rowman and Littlefield publishers kicks off with the announcement of 7 recent and upcoming exciting titles on library technology. The LITA Guide Series books from Rowman and Littlefield publishers, contain practical, up to date, how-to information, and are usually under 100 pages. Proposals for new titles can be submitted to the Acquisitions editor using this link.

LITA members receive a 20% discount on all the titles. To get that discount, use promotion code RLLITA20 when ordering from the Rowman and Littlefield LITA Guide Series web site.

Integrating LibGuides into Library Websites Innovative LibGuides Applications Data Visualization Mobile Technologies in Libraries Library Service Design Librarian's Introduction to Programming Languages Digitizing Flat Media

Here are the current new LITA Guide Series titles:

Integrating LibGuides into Library Websites
Edited by Aaron W. Dobbs and Ryan L. Sittler (October 2016)

Innovative LibGuides Application: Real World Examples
Edited by Aaron W. Dobbs and Ryan L. Sittler (October 2016)

Data Visualization: A Guide to Visual Storytelling for Libraries
Edited by Lauren Magnuson (September 2016)

Mobile Technologies in Libraries
Ben Rawlins (September 2016)

Library Service Design: A LITA Guide to Holistic Assessment, Insight, and Improvement
Joe J. Marquez and Annie Downey (July 2016)

The Librarian’s Introduction to Programming Languages
Edited by Beth Thomsett-Scott (June 2016)

Digitizing Flat Media: Principles and Practices
Joy M. Perrin (December 2015)

LITA publications help to fulfill its mission to educate, serve and reach out to its members, other ALA members and divisions, and the entire library and information community through its publications, programs and other activities designed to promote, develop, and aid in the implementation of library and information technology.

OpenTrials launch date + Hack Day / Open Knowledge Foundation

Exciting news! OpenTrials, a project in which Open Knowledge is developing an open, online database of information about the world’s clinical research trials, will officially launch its beta on Monday 10th October 2016 at the World Health Summit in Berlin. After months of work behind-the-scenes meeting, planning, and developing, we’re all really excited about demoing OpenTrials to the world and announcing how to access and use the site!

The launch will take place at the ‘Fostering Open Science in Global Health’ workshop, with OpenTrials being represented by our Community Manager, Ben Meghreblian. The workshop will be a great opportunity to talk about the role of open data, open science, and generally how being open can bring improvements in medicine and beyond!worldhealthsummit_logo

As the workshop’s theme is public health emergencies, we’ll also be demoing Ebola Trials Tracker, another OpenTrials project showing how long it takes for the results of Ebola trials to be made available.

If you’ll be attending the conference or the workshop, we’d love to meet you – please do get in touch and let us know.

Hack Day

If that wasn’t enough, we also have a confirmed date and location for the OpenTrials Hack Day – it will take place on Saturday 8th October at the German office of Wikimedia in Berlin.

We’re inviting people from a range of backgrounds. So, if you’re developer, data scientist, health technologist, open data advocate, or otherwise interested in health, medicine, and clinical trials, come along and learn more about the data that powers OpenTrials, how it’s structured, and how to use our API to search the OpenTrials database or build applications using the data.

On the day our technical lead and a domain expert will be on hand to explain the data and facilitate the day – we’re really looking forward to seeing what clever hacks and mini-projects you’ll create.

For those of you who have already asked, we’ll be releasing documentation on the OpenTrials API and database soon, but meanwhile if you’re interested in the event you’ll find more details on the OpenTrials Eventbrite page, or you can register quickly below.

OpenTrials is funded by The Laura and John Arnold Foundation and directed by Dr. Ben Goldacre, an internationally known leader on clinical transparency.

Twitter: @opentrials

Powered by Eventbrite

Catalogs and Context: Part V / Karen Coyle

This entire series is available a single file on my web site.

Before we can posit any solutions to the problems that I have noted in these posts, we need to at least know what questions we are trying to answer. To me, the main question is:

What should happen between the search box and the bibliographic display?

Or as Pauline Cochrane asked: "Why should a user ever enter a search term that does not provide a link to the syndetic apparatus and a suggestion about how to proceed?"[1] I really like the "suggestion about how to proceed" that she included there. Although I can think of some exceptions, I do consider this an important question.

If you took a course in reference work at library school (and perhaps such a thing is no longer taught - I don't know), then you learned a technique called "the reference interview." The Wikipedia article on this is not bad, and defines the concept as an interaction at the reference desk "in which  the librarian responds to the user's initial explanation of his or her information need by first attempting to clarify that need and then by directing the user to appropriate information resources." The assumption of the reference interview is that the user arrives at the library with either an ill-formed query, or one that is not easily translated to the library's sources. Bill Katz's textbook "Introduction to Reference Work" makes the point bluntly:

"Be skeptical of the of information the patron presents" [2]

If we're so skeptical that the user could approach the library with the correct search in mind/hand, then why then do we think that giving the user a search box in which to put that poorly thought out or badly formulated search is a solution? This is another mind-boggler to me.

So back to our question, what SHOULD happen between the search box and the bibliographic display? This is not an easy question, and it will not have a simple answer. Part of the difficulty of the answer is that there will not be one single right answer. Another difficulty is that we won't know a right answer until we try it, give it some time, open it up for tweaking, and carefully observe. That's the kind of thing that Google does when they make changes in their interface, but we haven't got either Google's money nor its network (we depend on vendor systems, which define what we can and cannot do with our catalog).

Since I don't have answers (I don't even have all of the questions) I'll pose some questions, but I really want input from any of you who have ideas on this, since your ideas are likely to be better informed than mine. What do we want to know about this problem and its possible solutions?

(Some of) Karen's Questions

Why have we stopped evolving subject access?

Is it that keyword access is simply easier for users to understand? Did the technology deceive us into thinking that a "syndetic apparatus" is unnecessary? Why have the cataloging rules and bibliographic description been given so much more of our profession's time and development resources than subject access has? [3]

Is it too late to introduce knowledge organization to today's users?

The user of today is very different to the user of pre-computer times. Some of our users have never used a catalog with an obvious knowledge organization structure that they must/can navigate. Would they find such a structure intrusive? Or would they suddenly discover what they had been missing all along? [4]

Can we successfully use the subject access that we already have in library records?

Some of the comments in the articles organized by Cochrane in my previous post were about problems in the Library of Congress Subject Headings (LCSH), in particular that the relationships between headings were incomplete and perhaps poorly designed.[5] Since LCSH is what we have as headings, could we make them better? Another criticism was the sparsity of "see" references, once dictated by the difficulty of updating LCSH. Can this be ameliorated? Crowdsourced? Localized?

We still do not have machine-readable versions of the Library of Congress Classification (LCC), and the machine-readable Dewey Decimal Classification (DDC) has been taken off-line (and may be subject to licensing). Could we make use of LCC/DDC for knowledge navigation if they were available as machine-readable files?

Given that both LCSH and LCC/DDC have elements of post-composition and are primarily instructions for subject catalogers, could they be modified for end-user searching, or do we need to develop a different instrument altogether?

How can we measure success?

Without Google's user laboratory apparatus, the answer to this may be: we can't. At least, we cannot expect to have a definitive measure. How terrible would it be to continue to do as we do today and provide what we can, and presume that it is better than nothing? Would we really see, for example, a rise in use of library catalogs that would confirm that we have done "the right thing?"


[1]*Modern Subject Access in the Online Age: Lesson 3
Author(s): Pauline A. Cochrane, Marcia J. Bates, Margaret Beckman, Hans H. Wellisch, Sanford Berman, Toni Petersen, Stephen E. Wiberley and Jr.
Source: American Libraries, Vol. 15, No. 4 (Apr., 1984), pp. 250-252, 254-255
Stable URL:

[2] Katz, Bill. Introduction to Reference Work: Reference Services and Reference Processes. New York: McGraw-Hill, 1992. p. 82 Cited in: Brown, Stephanie Willen. The Reference Interview: Theories and Practice. Library Philosophy and Practice 2008. ISSN 1522-0222

[3] One answer, although it doesn't explain everything, is economic: the cataloging rules are published by the professional association and are a revenue stream for it. That provides an incentive to create new editions of rules. There is no economic gain in making updates to the LCSH. As for the classifications, the big problem there is that they are permanently glued onto the physical volumes making retroactive changes prohibitive. Even changes to descriptive cataloging must be moderated so as to minimize disruption to existing catalogs, which we saw happen during the development of RDA, but with some adjustments the new and the old have been made to coexist in our catalogs.

[4] Note that there are a few places online, in particular Wikipedia, where there is a mild semblance of organized knowledge and with which users are generally familiar. It's not the same as the structure that we have in subject headings and classification, but users are prompted to select pre-formed headings, with a keyword search being secondary.

[5] Simon Spero did a now famous (infamous?) analysis of LCSH's structure that started with Biology and ended with Doorbells.

Catalogs and Content: an Interlude / Karen Coyle

This entire series is available a single file on my web site.

"Editor's note. Providing subject access to information is one of the most important professional services of librarians; yet, it has been overshadowed in recent years by AACR2, MARC, and other developments in the bibliographic organization of information resources. Subject access deserves more attention, especially now that results are pouring in from studies of online catalog use in libraries."
American Libraries, Vol. 15, No. 2 (Feb., 1984), pp. 80-83
Having thought and written about the transition from card catalogs to online catalogs, I began to do some digging in the library literature, and struck gold. In 1984, Pauline Atherton Cochrane, one of the great thinkers in library land, organized a six-part "continuing education" to bring librarians up to date on the thinking regarding the transition to new technology. (Dear ALA - please put these together into a downloaded PDF for open access. It could make a difference.) What is revealed here is both stunning and disheartening, as the quote above shows; in terms of catalog models, very little progress has been made, and we are still spending more time organizing atomistic bibliographic data while ignoring subject access.

The articles are primarily made up of statements by key library thinkers of the time, many of whom you will recognize. Some responses contradict each other, others fall into familiar grooves. Library of Congress is criticized for not moving faster into the future, much as it is today, and yet respondents admit that the general dependency on LC makes any kind of fast turn-around of changes difficult. Some of the desiderata have been achieved, but not the overhaul of subject access in the library catalog.

The Background

If you think that libraries moved from card catalogs to online catalogs in order to serve users better, think again. Like other organizations that had a data management function, libraries in the late 20th century were reaching the limits of what could be done with analog technology. In fact, as Cochrane points out, by the mid-point of that century libraries had given up on the basic catalog function of providing cross references from unused to used terminology, as well as from broader and narrower terms in the subject thesaurus. It simply wasn't possible to keep up with these, not to mention that although the Library of Congress and service organizations like OCLC provided ready-printed cards for bibliographic entries, they did not provide the related reference cards. What libraries did (and I remember this from my undergraduate years) is they placed near the card catalog copies of the "Red Book". This was the printed Library of Congress Subject Heading list, which by my time was in two huge volumes, and, yes, was bound in red. Note that this was the volume that was intended for cataloging librarians who were formulating subject headings for their collections. It was never intended for the end-users of the catalog. The notation ("x", "xx", "sa") was far from intuitive. In addition, for those users who managed to follow the references, it pointed them to the appropriate place in LCSH, but not necessarily in the catalog of the library in which they were searching. Thus a user could be sent to an entry that simply did not exist.

The "RedBook" today
From my own experience, when we brought up the online catalog at the University of California, the larger libraries had for years had difficulty keeping the card catalog up to date. The main library at the University of California at Berkeley regularly ran from 100,000 to 150,000 cards behind in filing into the catalog, which filled two enormous halls. That meant that a book would be represented in the catalog about three months after it had been cataloged and shelved. For a research library, this was a disaster. And Berkeley was not unusual in this respect.

Computerization of the catalog was both a necessary practical solution, as well as a kind of holy grail. At the time that these articles were written, only a few large libraries had an online catalog, and that catalog represented only a recent portion of the library's holdings. (Retrospective conversion of the older physical card catalog to machine-readable form came later, culminating in the 1990's.) Abstracting and indexing databases had preceded libraries in automating, DIALOG, PRECIS, and others, and these gave librarians their first experience in searching computerized bibliographic data.

This was the state of things when Cochrane presented her 6-part "continuing education" series in American Libraries.

Subject Access

The series of articles was stimulated by an astonishingly prescient article by Marcia Bates in 1977. In that article she articulates both concerns and possibilities that, quite frankly, we should all take to heart today. In Lesson 3 of Cochrane's articles, Bates is quotes from 1977 saying:
"...with automation, we have the opportunity to introduce many access points to a given book. We can now use a subject approach... that allows the naive user, unconscious of and uninterested in the complexities of synonymy and vocabulary control, to blunder on to desired subjects, to be guided, without realizing it, by a redundant but carefully controlled subject access system." 
"And now is the time to change -- indeed, with MARC already so highly developed, past time. If we simply transfer the austerity-based LC subject heading approach to expensive computer systems, then we have used our computers merely to embalm the constraints that were imposed on library systems back before typewriters came into use!"

This emphasis on subject access was one of the stimuli for the AL lessons. In the early 1980's, studies done at OCLC and elsewhere showed that over 50% of the searches being done in the online catalogs of that day were subject searches, even those going against title indexes or mixed indexes. (See footnotes to Lesson 3.) Known item searching was assumed to be under control, but subject searching posed significant problems. Comments in the article include:
"...we have not yet built into our online systems much of the structure for subject access that is already present in subject cataloging. That structure is internal and known by the person analyzing the work; it needs to be external and known by the person seeking the work."
"Why should a user ever enter a search term that does not provide a link to the syndetic apparatus and a suggestion about how to proceed?"
Interestingly, I don't see that any of these problems has been solved into today's systems.

As a quick review, here are some of the problems, some proposed solutions, and some hope for future technologies that are presented by the thinkers that contributed to the lessons.

Problems noted

Many problems were surfaced, some with fairly simple solutions, others that we still struggle with.
  • LCSH is awkward, if not nearly unusable, both for its vocabulary and for the lack of a true hierarchical organization
  • Online catalogs' use of LCSH lacks syndetic structure (see, see also, BT, NT). This is true not only for display, but in retrieval, search on a broader term does not retrieve items with a narrower term (which would be logical to at least some users)
  • Libraries assign too few subject headings
  • For the first time, some users are not in the library while searching so there are no intermediaries (e.g. reference librarians) available. (One of the flow diagrams has a failed search pointing to a box called "see librarian" something we would not think to include today.)
  • Lack of a professional theory of information seeking behavior that would inform systems design. ("Without a blueprint of how most people want to search, we will continue to force them to search the we want to search." Lesson 5)
  • Information overload, aka overly large results, as well as too few results on specific searches

Proposed solutions

Some proposed solutions were mundane (add more subject headings to records) while others would require great disruption to the library environment.
  • Add more subject headings to MARC records
  • Use keyword searching, including keywords anywhere in the record.
  • Add uncontrolled keywords to the records.
  • Make the subject authority file machine-readable and integrate it into online catalogs.
  • Forget LCSH, instead use non-library bibliographic files for subject searching, such as A&I databases.
  • Add subject terms from non-library sources to the library catalog, and/or do (what today we call) federated searching
  • LCSH must provide headings that are more specific as file sizes and retrieved sets grow (in the document, a retrieved set of 904 items was noted with an exclamation point)

Future thinking 

As is so often the case when looking to the future, some potential technologies were seen as solutions. Some of these are still seen as solutions today (c.f. artificial intelligence), while others have been achieved (storage of full text).
  • Full text searching, natural language searches, and artificial intelligence will make subject headings and classification unnecessary
  • We will have access to back-of-the-book indexes and tables of contents for searching, as well as citation indexing
  • Multi-level systems will provide different interfaces for experts and novices
  • Systems will be available 24x7, and there will be a terminal in every dorm room
  • Systems will no longer need to use stopwords
  • Storage of entire documents will become possible

End of Interlude

Although systems have allowed us to store and search full text, to combine bibliographic data from different sources, and to deliver world-wide, 24x7, we have made almost no progress in the area of subject access. There is much more to be learned from these articles, and it would be instructive to do an in-depth comparison of them to where we are today. I greatly recommend reading them, each is only a few pages long.

----- The Lessons -----

*Modern Subject Access in the Online Age: Lesson 1
by Pauline Atherton Cochrane
Source: American Libraries, Vol. 15, No. 2 (Feb., 1984), pp. 80-83
Stable URL:

*Modern Subject Access in the Online Age: Lesson 2 Pauline A. Cochrane American Libraries Vol. 15, No. 3 (Mar., 1984), pp. 145-148, 150 Stable URL:

*Modern Subject Access in the Online Age: Lesson 3
Author(s): Pauline A. Cochrane, Marcia J. Bates, Margaret Beckman, Hans H. Wellisch, Sanford Berman, Toni Petersen, Stephen E. Wiberley and Jr.
Source: American Libraries, Vol. 15, No. 4 (Apr., 1984), pp. 250-252, 254-255
Stable URL:

*Modern Subject Access in the Online Age: Lesson 4
Author(s): Pauline A. Cochrane, Carol Mandel, William Mischo, Shirley Harper, Michael Buckland, Mary K. D. Pietris, Lucia J. Rather and Fred E. Croxton
Source: American Libraries, Vol. 15, No. 5 (May, 1984), pp. 336-339
Stable URL:

*Modern Subject Access in the Online Age: Lesson 5
Author(s): Pauline A. Cochrane, Charles Bourne, Tamas Doczkocs, Jeffrey C. Griffith, F. Wilfrid Lancaster, William R. Nugent and Barbara M. Preschel
Source: American Libraries, Vol. 15, No. 6 (Jun., 1984), pp. 438-441, 443
Stable URL:

*Modern Subject Access In the Online Age: Lesson 6
Author(s): Pauline A. Cochrane, Brian Aveney and Charles Hildreth Source: American Libraries, Vol. 15, No. 7 (Jul. - Aug., 1984), pp. 527-529
Stable URL:

Catalog and Context Part III / Karen Coyle

This entire series is available a single file on my web site.

In the previous two parts, I explained that much of the knowledge context that could and should be provided by the library catalog has been lost as we moved from cards to databases as the technologies for the catalog. In this part, I want to talk about the effect of keyword searching on catalog context.


If you weren't at least a teenager in the 1960's you probably missed the era of KWIC and KWOC (neither a children's TV show nor a folk music duo). These meant, respectively, KeyWords In Context, and KeyWords Out of Context. These were concordance-like indexes to texts, but the first done using computers. A KWOC index would be simply a list of words and pointers (such as page numbers, since hyperlinks didn't exist yet). A KWIC index showed the keywords with a few words on either side, or rotated a phrase such that each term appeared once at the beginning of the string, and then were ordered alphabetically.

If you have the phrase "KWIC is an acronym for Key Word in Context", then your KWIC index display could look like:

 KWIC is an acronym for Key Word In Context
Key Word In Context
acronym for Key Word In Context
KWIC is an acronym for
acronym for Key Word In Context

To us today these are unattractive and not very useful, but to the first users of computers these were an exciting introduction to the possibility that one could search by any word in a text.

It wasn't until the 1980's, however, that keyword searching could be applied to library catalogs.

Before Keywords, Headings

Before keyword searching, when users were navigating a linear, alphabetical index, they were faced with the very difficult task of deciding where to begin their entry into the catalog. Imagine someone looking for information on Lake Erie. That seems simple enough, but entering the catalog at L-A-K-E E-R-I-E would not actually yield all of the entries that might be relevant. Here are some headings with LAKE ERIE:

Boats and boating--Erie, Lake--Maps. 
Books and reading--Lake Erie region.
Lake Erie, Battle of, 1813.
Erie, Lake--Navigation

Note that the lake is entered under Erie, the battle under Lake, and some instances are fairly far down in the heading string. All of these headings follow rules that ensure a kind of consistency, but because users do not know those rules, the consistency here may not be visible. In any case, the difficulty for users was knowing with what terms to begin the search, which was done on left-anchored headings.

One might assume that finding names of people would be simple, but that is not the case either. Names can be quite complex with multiple parts that are treated differently based on a number of factors having to do with usage in different cultures:

De la Cruz, Melissa
Cervantes Saavedra, Miguel de
Because it was hard to know where to begin a search, see and see also references existed to guide the user from one form of a name or phrase to another. However, it would inflate a catalog beyond utility to include every possible entry point that a person might choose, not to mention that this would make the cataloger's job onerous. Other than the help of a good reference librarian, searching in the card catalog was a kind of hit or miss affair.

When we brought up the University of California online catalog in 1982, you can image how happy users were to learn that they could type in LAKE ERIE and retrieve every record with those terms in it regardless of the order of the terms or where in the heading they appeared. Searching was, or seemed, much simpler. Because it feels simpler, we all have tended to ignore some of the down side of keyword searching. First, words are just strings, and in a search strings have to match (with some possible adjustment like combining singular and plural terms). So a search on "FRANCE" for all information about France would fail to retrieve other versions of that word unless the catalog did some expansion:

Cooking, French
Alps, French (France)
French American literature

The next problem is that retrieval with keywords, and especially the "keyword anywhere" search which is the most popular today, entirely misses any context that the library catalog could provide. A simple keyword search on the word "darwin" brings up a wide array of subjects, authors, and titles.

Darwin, Charles, 1809-1882 – Influence
Darwin, Charles, 1809-1882 — Juvenile Literature
Darwin, Charles, 1809-1882 — Comic Books, Strips, Etc
Darwin Family
Java (Computer program language)
Rivers--Great Britain
Mystery Fiction
DNA Viruses — Fiction
Women Molecular Biologists — Fiction

Darwin, Charles, 1809-1882
Darwin, Emma Wedgwood, 1808-1896
Darwin, Ian F.
Darwin, Andrew
Teilhet, Darwin L.
Bear, Greg
Byrne, Eugene

Darwin; A Graphic Biography : the Really Exciting and Dramatic 
    Story of A Man Who Mostly Stayed at Home and Wrote Some Books
Darwin; Business Evolving in the Information Age
Emma Darwin, A Century of Family Letters, 1792-1896
Java Cookbook
Canals and Rivers of Britain
The Crimson Hair Murders
Darwin's Radio

It wouldn't be reasonable for us to expect a user to make sense of this, because quite honestly it does not make sense.

 In the first version of the UC catalog, we required users to select a search heading type, such as AU, TI, SU. That may have lessened the "false drops" from keyword searches, but it did not eliminate them. In this example, using a title or subject search the user still would have retrieved items with the subjects DNA Viruses — Fiction, and Women Molecular Biologists — Fiction, and an author search would have brought up both Java Cookbook and Canals and Rivers of Britain. One could see an opportunity for serendipity here, but it's not clear that it would balance out the confusion and frustration. 

You may be right now thinking "But Google uses keyword searching and the results are good." Note that Google now relies heavily on Wikipedia and other online reference books to provide relevant results. Wikipedia is a knowledge organization system, organized by people, and it often has a default answer for search that is more likely to match the user's assumptions. A search on the single word "darwin" brings up:

In fact, Google has always relied on humans to organize the web by following the hyperlinks that they create. Although the initial mechanism of the search is a keyword search, Google's forte is in massaging the raw keyword result to bring potentially relevant pages to the top. 

Keywords, Concluded

The move from headings to databases to un-typed keyword searching has all but eliminated the visibility and utility of headings in the catalog. The single search box has become the norm for library catalogs and many users have never experienced the catalog as an organized system of headings. Default displays are short and show only a few essential fields, mainly author, title and date. This means that there may even be users who are unaware that there is a system of headings in the catalog.

Recent work in cataloging, from ISBD to FRBR to RDA and BIBFRAME focus on modifications to the bibliographic record, but do nothing to model the catalog as a whole. With these efforts, the organized knowledge system that was the catalog is slipping further into the background. And yet, we have no concerted effort taking place to remedy this. 

What is most astonishing to me, though, is that catalogers continue to create headings, painstakingly, sincerely, in spite of the fact that they are not used as intended in library systems, and have not been used in that way since the first library systems were developed over 30 years ago. The headings are fodder for the keyword search, but no more so than a simple set of tags would be. The headings never perform the organizing function for which they were intended. 


Part IV will look at some attempts to create knowledge context from current catalog data, and will present some questions that need to be answered if we are to address the quality of the catalog as a knowledge system.

Catalog and Context, Part II / Karen Coyle

This entire series is available a single file on my web site.

In the previous post, I talked about book and card catalogs, and how they existed as a heading layer over the bibliographic description representing library holdings. In this post, I will talk about what changed when that same data was stored in database management systems and delivered to users on a computer screen.

Taking a very simple example, in the card catalog a single library holding with author, title and one subject becomes three separate entries, one for each heading. These are filed alphabetically in their respective places in the catalog.

In this sense, the catalog is composed of cards for headings that have attached to them the related bibliographic description. Most items in the library are represented more than once in the library catalog. The catalog is a catalog of headings.

In most computer-based catalogs, the relationship between headings and bibliographic data is reversed: the record with bibliographic and heading data, is stored once; access points, analogous to the headings of the card catalog, are extracted to indexes that all point to the single record.

This in itself could be just a minor change in the mechanism of the catalog, but in fact it turns out to be more than that.

First, the indexes of the database system are not visible to the user. This is the opposite of the card catalog where the entry points were what the user saw and navigated through. Those entry points, at their best, served as a knowledge organization system that gave the user a context for the headings. Those headings suggest topics to users once the user finds a starting point in the catalog.

When this system works well for the user, she has some understanding of where she was in the virtual library that the catalog created. This context could be a subject area or it could be a bibliographic context such as the editions of a work.

Most, if not all, online catalogs do not present the catalog as a linear, alphabetically ordered list of headings. Database management technology encourages the use of searching rather than linear browsing. Even if one searches in headings as a left-anchored string of characters a search results in a retrieved set of matching entries, not a point in an alphabetical list. There is no way to navigate to nearby entries. The bibliographic data is therefore not provided either in the context or the order of the catalog. After a search on "cat breeds" the user sees a screen-full of bibliographic records but lacking in context because most default displays do not show the user the headings or text that caused the item to be retrieved.

Although each of these items has a subject heading containing the words "Cat breeds" the order of the entries is not the subject order. The subject headings in the first few records read, in order:

  1. Cat breed
  2. Cat breeds
  3. Cat breeds - History
  4. Cat breeds - Handbooks, manuals, etc.
  5. Cat breeds
  6. Cat breeds - Thailand
  7. Cat breeds

If if the catalog uses a visible and logical order, like alphabetical by author and title, or most recent by date, there is no way from the displayed list for the user to get the sense of "where am I?" that was provided by the catalog of headings.

In the early 1980's, when I was working on the University of California's first online catalog, the catalogers immediately noted this as a problem. They would have wanted the retrieved set to be displayed as:

(Note how much this resembles the book catalog shown in Part I.) At the time, and perhaps still today, there were technical barriers to such a display, mainly because of limitations on the sorting of large retrieved sets. (Large, at that time, was anything over a few hundred items.) Another issue was that any bibliographic record could be retrieved more than once in a single retrieved set, and presenting the records more than once in the display, given the database design, would be tricky. I don't know if starting afresh today some of these features would be easier to produce, but the pattern of search and display seems not to have progressed greatly from those first catalogs.

In addition, it is in any case questionable whether a set of bibliographic items retrieved from a database on some query would reproduce the presumably coherent context of the catalog. This is especially true because of the third major difference between the card catalog and the computer catalog: the ability to search on individual words in the bibliographic record rather than being limited to seeking on full left-anchored headings. The move to keyword searching was both a boon and a bane because it was a major factor in the loss of context in the library catalog.

Keyword searching will be the main topic of Part III of this series.

Catalog and Context, Part I / Karen Coyle

This multi-part post is based on a talk I gave in June, 2016 at ELAG in Copenhagen.
This entire series is available a single file on my web site.

Imagine that you do a search in your GPS system and are given the exact point of the address, but nothing more.

Without some context showing where on the planet the point exists, having the exact location, while accurate, is not useful.

In essence, this is what we provide to users of our catalogs. They do a search and we reply with bibliographic items that meet the letter of that search, but with no context about where those items fit into any knowledge map.

Because we present the catalog as a retrieval tool for unrelated items, users have come to see the library catalog as nothing more than a tool for known item searching. They do not see it as a place to explore topics or to find related works. The catalog wasn't always just a known item finding tool, however. To understand how it came to be one, we need a short visit to Catalogs Past.

Catalogs Past

We can't really compare the library catalog of today to the early book catalogs, since the problem that they had to solve was quite different to what we have today. However, those catalogs can show us what a library catalog was originally meant to be.
book catalog entry

A book catalog was a compendium of entry points, mainly authors but in some cases also titles and subjects. The bibliographic data was kept quite brief as every character in the catalog was a cost in terms of type-setting and page real estate. The headings dominated the catalog, and it was only through headings that a user could approach the bibliographic holdings of the library. An alphabetical author list is not much "knowledge organization", but the headings provided an ordered layer over the library's holdings, and were also the only access mechanism to them.

Some of the early card catalogs had separate cards for headings and for bibliographic data. If entries in the catalog had to be hand-written (or later typed) onto cards, the easiest thing was to slot the cards into the catalog behind the appropriate heading without adding heading data to the card itself.

Often there was only one card with a full bibliographic description, and that was the "main entry" card. All other cards were references to a point in the catalog, for example the author's name, where more information could be found.

Again, all bibliographic data was subordinate to a layer of headings that made up the catalog. We can debate how intellectually accurate or useful that heading layer was, but there is no doubt that it was the only entry to the content of the library.

The Printed Card

In 1902 the Library of Congress began printing cards that could be purchased by libraries. The idea was genius. For each item cataloged by LC a card was printed in as many copies as needed. Libraries could buy the number of catalog card "blanks" they required to create all of the entries in their catalogs. The libraries would use as many as needed of the printed cards and type (or write) the desired headings onto the top of the card. Each of these would have the full bibliographic information - an advantage for users who then would not longer need to follow "see" references from headings to the one full entry card in the catalog.

These cards introduced something else that was new: the card would have at the bottom a tracing of the headings that LC was using in its own catalog. This was a savings for the libraries as they could copy LC's practice without incurring their own catalogers' time. This card, for the first time, combined both bibliographic information and heading tracings in a single "record", with the bibliographic information on the card being an entry point to the headings.

Machine-Readable Card Printing

The MAchine Readable Cataloging (MARC) project of the Library of Congress was a major upgrade to card printing technology. By including all of the information needed for card printing in a computer-processable record, LC could take advantage of new technology to stream-line its card production process, and even move into a kind of "print on demand" model. The MARC record was designed to have all of the information needed to print the set of cards for a book; author, title, subjects, and added entries were all included in the record, as well as some additional information that could be used to generate reports such as "new acquisitions" lists.

Here again the bibliographic information and the heading information were together in a single unit, and it even followed the card printing convention of the order of the entries, with the bibliographic description at top, followed by headings. With the MARC record, it was possible to not only print sets of cards, but to actually print the headers on the cards, so that when libraries received a set they were ready to do into the catalog at their respective places.

Next, we'll look at the conversion from printed cards to catalogs using database technology.

-> Part II

Atmire Acquires Open Repository / DuraSpace News

By Bram Luyten, @mire  Atmire NV has entered into an agreement to acquire Open Repository, BioMed Central's repository service for academic institutions, charities, NGOs and research organisations.

Under the agreement, Atmire will take over management and support of all Open Repository customers effective from July 28th. The acquisition adds to Atmire's client base of institutions using DSpace (an open source repository software package typically used for creating open access repositories) and allows BioMed Central to focus on its core business concerns.

Catnip / LibUX

A gif demonstrating how Catnip is used.

I really like that one of the core set of library services from version-one-point-oh is to match-make me with something to read or do. Curation and personalization is where it’s always been at – pre pre <pre> internet – and I am super entertained by what things like readers advisory look like on the web.


Limit to full text in VuFind / Eric Lease Morgan

This posting outlines how a “limit to full text” functionality was implemented in the “Catholic Portal’s” version of VuFind.

While there are many dimensions of the Catholic Portal, one of its primary components is a sort of union catalog of rare and infrequently held materials of a Catholic nature. This union catalog is comprised of metadata from MARC records, EAD files, and OAI-PMH data repositories. Some of the MARC records include URLs in 856$u fields. These URLs point to PDF files that have been processed with OCR. The Portal’s indexer has been configured to harvest the PDF documents, when it comes across them. Once harvested the OCR is extracted from the PDF file, and the resulting text is added to the underlying Solr index. The values of the URLs are saved to the Solr index as well. Almost by definition, all of the OAI-PMH content indexed by Portal is full text; almost all of the OAI-PMH content includes pointers to images or PDF documents.

Consequently, if a reader wanted to find only full text content, then it would be nice to: 1) do a search, and 2) limit to full text. And this is exactly what was implemented. The first step was to edit Solr’s definiton of the url field. Specifically, its “indexed” attribute was changed from false to true. Trivial. Solr was then restarted.

The second step was to re-index the MARC content. When this is complete, the reader is able to search the index for URL content — “url:*”. In other words, find all records whose URL equals anything.

The third step was to understand that all of the local VuFind OAI-PMH identifiers have the same shape. Specifically, they all include the string “oai”. Consequently, the very astute reader could find all OAI-PMH content with the following query: “id:*oai*”.

The third step was to turn on a VuFind checkbox option found in facets.ini. Specifically, the “[CheckboxFacets]” section was augmented to include the following line:

id:*oai* OR url:* = “Limit to full text”

When this was done a new facet appeared in the VuFind interface.

Finally, the whole thing comes to fruition when a person does an initial search. The results are displayed, and the facets include a limit option. Upon selection, VuFind searches again, but limits the query by “id:*oai* OR url:*” — only items that have URLs or come from OAI-PMH repositories. Pretty cool. Catholic Portal's version of VuFind

Kudos go to Demian Katz for outlining this process. Very nice. Thank you!

An open letter to Heather Bresch / Andromeda Yelton

Dear Heather Bresch,

You lived in Morgantown. I did, too: born and raised. My parents are retired from the university you attended. My elementary school took field trips to Mylan labs. They were shining, optimistic.

You’re from West Virginia. I am, too. This means we both know something of the coal industry that has both sustained and destroyed our home. You know, as I do, how many miners have been killed in explosions: trapped underground when a pocket of methane ignites. We both know that miners long carried safety lamps: carefully shielded but raw flames that would go out when the oxygen went too low, a warning to get away — if they had not first exploded, as open flames around methane do. Perhaps you know, as I only recently learned, that miners were once required to buy their own safety lamps: so when safer ones came out, ones that would only warn without killing you first, miners did not carry them. They couldn’t afford to. They set probability against their lives, went without the right equipment, and sometimes lost, and died.

I’m a mother. You are, too. I don’t know if your children carry medication for life-threatening illnesses; I hope you have not had to face that. I have. In our case it’s asthma, not allergies, and an inhaler, not an Epi-Pen. It’s a $20 copay with our insurance and lasts for dozens of doses. It doesn’t stop asthma attacks once they start — my daughter’s asthma is too severe for that — but sometimes it prevents them. And when it does not, it still helps: we spend two days in the hospital instead of five; we don’t go to the ICU. (Have you ever been with your child in a pediatric ICU? It is the most miraculous, and the worst, place on earth.)

Most families can find their way to twenty dollars. Many cannot find six hundred. They’ll go without, and set probability against their children’s lives. Rich children will live; poor children will sometimes lose, and die.

I ask you to reconsider.


Andromeda Yelton

Evergreen 2010 : Sine Qua Non / Equinox Software

This is the fifth in our series of posts leading up to Evergreen’s Tenth birthday.  

I often tell people I hire that when you start a new job the first month is the honeymoon period. At month three you are panicking and possibly wondering why you thought you could do this. At six months you realize you’ve actually got the answers and at twelve months it’s like you never worked anywhere else. For me, 2010 represented months six through eighteen of my employment with Equinox and it was one of the most difficult, rewarding, and transformative years of my career. Coincidentally, it was also an incredibly transforming year for Evergreen.

In early 2010, Evergreen 1.6 was planned and released on schedule thanks to contributing efforts from the usual suspects back at that time. Bug fixes and new development were being funded or contributed by PINES, Conifer, Mohawk College, Evergreen Indiana, Calvin College, SAGE, and many others in the community. Somewhere in the midst of the ferocious adoption rate and and evolution of 2010, Evergreen quietly and without fanfare faced (and passed) its crucible. Instead of being thrown off stride, this amazingly determined community not only met the challenge, but deftly handled the inevitable friction that was bound to arise as the community grew.

In late August of 2010 KCLS went live on a beta version of Evergreen 2.0 after just over a year of intense and exhilarating development. It marked the beginning of another major growth spurt for Evergreen, including full support for Acquisitions, Serials, as well as the introduction of the template toolkit OPAC (or TPAC). I have nothing but positive things to say about the teams that worked to make that go-live a reality. KCLS and Equinox did amazing things together and, while not everything we did was as successful as we had envisioned, we were able to move Evergreen forward in a huge leap. More importantly, everyone involved learned a lot about ourselves and our organizations – including the community itself.

The community learned that we were moving from a small group of “insiders” and enthusiasts into a more robust and diverse community of users. This is, of course, natural and desirable for an open source project but the thing that sticks out in my mind is how quickly and easily the community adapted to rapid change. At the Evergreen Conference in 2010 a dedicated group met and began the process of creating an official governance structure for the Evergreen project. This meeting led to the eventual formation of the Evergreen Oversight Board and our current status as a member project of the Software Freedom Conservancy.

In the day-to-day of the Evergreen project I witnessed how the core principles of open source projects could shape a community of librarians. And I was proud to see how this community of librarians could contribute their core principles to strengthen the project and its broader community. We complement one another even as we share the most basic truths:
*The celebration of community
*The merit of the individual
*The empowerment of collaboration
*The belief that information should be free

Evergreen is special. More importantly, our community is special. And it’s special because behind each line of code there are dozens of people who contributed their time to create it. Each of those people brought with them their passion, their counter-argument, their insight, their thoughtfulness, and their sheer determination. And together, this community created something amazing. They made great things. They made mistakes. They learned. They adapted. They persevered. And those people behind those lines of code? They’re not abstractions. They are people I know and respect; people who have made indelible marks on our community. It’s Mike, Jason, Elizabeth, Galen, Kathy, Bill, Amy, Dan, Angela, Matt, Elaine, Ben, Tim, Sharon, Lise, Jane, Lebbeous, Rose, Karen, Lew, Joan, and too many others to name. They’re my community and when I think back on how much amazing transformation we’ve achieved in just one year, or ten years, I can’t wait to see what we do in the next ten.

– Grace Dunbar, Vice President

Open Knowledge Switzerland Summer 2016 Update / Open Knowledge Foundation

The first half of 2016 was a very busy one for the Open Knowledge Swiss chapter, Just between April to June the chapter had 3 Hackathons, 15 talks, 3 meetups and 10 workshops. In this blog post we highlight some of these activities to update the Open Knowledge Community about our chapter’s work.


Main projects

Our directors worked on relaunching the federal Open Government Data portal and its new online handbook. We gathered and published datasets and ran workshops in support of various hackdays – and we migrated and improved our web infrastructure with better support of the open Transport API (handling up to 1.7 Mio requests per day!).


FOJ_1238Main events

We held our annual conference in June, ran energy-themed hackdays in April and ran an OpenGLAM hackathon in July. Additionally, we supported two smaller regional hackathons in the spring, and a meetup on occasion of Open Data Day.



Like other organisations in this space, our main challenge is redefining our manifesto and restructuring our operations to become a smoother running chapter that is more responsive to the needs of our members and community. This restructuring continues to be a challenge that we are learning from – and need to learn more about.



Our media presence and public identity continues to be stronger than ever. We are involved in a wide range of political and inter-organizational activities in support of diverse areas of openness, and in general we are finding that our collective voice is stronger and our messages better received everywhere we go.



We have had several retreats with the board to discuss changes in the governance and to welcome new directors: Catherine Pugin (,, Martin Grandjean ( and Alexandre Cotting (

We are primarily working on a better overall organizational structure to support our community and working groups: starting and igniting new initiatives will be the next step. Among them will be the launch of business-oriented advocacy group called “Swiss Data Alliance”.



Looking ahead

We will soon announce a national program on food data, which includes hackdays and a funded follow-up/incubation phase for prototypes produced. And we are busy setting up a hackathon at the end of September with international scope and support called Hack for Ageing Well. Follow #H4AW for more info.

We are excited about upcoming cross-border events like #H4AW and Jugend Hackt, opening doors to development and research collaborations. Reach out through the Open Knowledge forums and we’ll do our best to connect you into the Swiss community!