Planet Code4Lib

The Oldest Internet Publication You’ve Never Heard Of / Roy Tennant

aug1990Twenty-five years ago I started a library current awareness service called Current Cites. The idea was to have a team of volunteers monitor library and information technology literature and cite only the best publications in a monthly publication (see the first page of the inaugural issue pictured). Here is the latest issue. TidBITS is, I think, the only Internet publication that is older, and they beat us only by a few months.

Originally, the one-paragraph description accompanying the bibliographic details was intended to summarize the contents. However, we soon allowed each reviewer latitude in using humor and personal insights to provide context and an individual voice.

Although we began publication in print only and for an intended audience of UC Berkeley Library staff, we quickly realized that the audience could be global and the technologies were coming to make it available for free to such a worldwide audience. If you’re curious, you can read more about how Current Cites came to be as well as its early history.

Ever since we have published every month without fail. It has weathered my paternity leave (twins, with one now graduated from college and the other soon to be), the turnover of many reviewers, and going through several sponsoring organizations. We have had only three editors in all that time: David F.W. Robison, Teri Rinne, and myself.

On our 20th anniversary I wrote some of my thoughts about longevity and what contributes to it, which still applies. But then I’ve always been hard to dump, as Library Journal can attest. I’ve been writing for them since 1997.

So please bear with me as I mark this milestone. With only about 3,300 subscribers to the mailing list distribution (we also have an RSS feed and I tweet a link to each issue), we are probably the longest-lived Internet publication you’ve never heard of. Until now.

Here for your edification is the current number of subscribers by country:

United States 2,476
Canada 210
Australia 134
United Kingdom 69
Netherlands 40
New Zealand 33
Spain 32
Germany 28
Italy 26
Taiwan 20
Sweden 18
Israel 17
Brazil 16
Norway 15
Japan 14
France 13
Belgium 11
??? 11
India 10
Ireland 10
South Africa 8
Finland 7
Denmark 6
Portugal 6
Hungary 5
Singapore 5
Switzerland 5
Mexico 4
Peru 4
Austria 3
Croatia 3
Greece 3
Lebanon 3
Republic of Korea 3
Saudi Arabia 3
United Arab Emirates 3
Argentina 2
Chile 2
China 2
Colombia 2
Federated States of Micronesia 2
Kazakhstan 2
Lithuania 2
Philippines 2
Poland 2
Slovakia 2
Trinidad and Tobago 2
Turkey 2
Botswana 1
Czech Republic 1
Estonia 1
Hong Kong 1
Iceland 1
Islamic Republic of Iran 1
Jamaica 1
Malaysia 1
Morocco 1
Namibia 1
Pakistan 1
Qatar 1
Uruguay 1

Link roundup July 31, 2015 / Harvard Library Innovation Lab

This is the good stuff.

The Factory of Ideas: Working at Bell Labs

The UK National Videogame Arcade is the inspirational mecca that gaming needs | Ars Technica

I Can Haz Memento

A Graphical Taxonomy of Roller Derby Skate Names

Current Cites – the amazing 25th anniversary / HangingTogether

current citesI suspect that a large part of the audience for this blog also subscribes to Current Cites the “annotated bibliography of selected articles, books, and digital documents on information technology” as the masthead describes it. Those of us who subscribe would describe it as “essential”. Those of us who publish newsletters describe the fact that as of August 2015 it will have been published continuously for twenty five years as “amazing”. Those of us who know the editor, our pal and colleague, Roy Tennant, describe the feat he has performed as “stunning” and him as “indefatigable“.

And if you are not a subscriber to this essential, amazing, and stunning newsletter you should be clicking right here. And then you should congratulate Roy in a comment below. Do that right now.


1024px-Fireworks_on_the_75th._Golden_Gate_anniversaryBy Mireia Garcia Bermejo (Own work)  via Wikimedia Commons

About Jim Michalko

Jim coordinates the OCLC Research office in San Mateo, CA, focuses on relationships with research libraries and work that renovates the library value proposition in the current information environment.

Launch of timber tracking dashboard for Global Witness / Open Knowledge Foundation

Open Knowledge has produced an interactive trade dashboard for anti-corruption NGO Global Witness to supplement their exposé on EU and US companies importing illegal timber from the Democratic Republic of Congo (DRC).


The DRC Timber Timber Trade Tracker consumes open data from to visualise where in the world Congolese timber is going. The dashboard makes it easy to identify countries that are importing large volumes of potentially illegal timber, and to see where timber shipped by companies accused of systematic illegal logging and social and environmental abuses is going on.

Global Witness has long campaigned for greater oversight of the logging industry in DRC which is home to two thirds of the world’s second largest rainforest. The logging industry is mired with corruption with two of the DRC’s biggest loggers allegedly complicit in the beating and raping of local populations. Alexandra Pardal, campaign leader at Global Witness said:

We knew that DRC logging companies were breaking the law, but the extent of illegality is truly shocking. The EU and US are failing in their legal obligations to keep timber linked to illegal logging, violence and intimidation off our shop floors. Traders are cashing in on a multi-million dollar business that is pushing the world’s vanishing rainforests to extinction.

The dashboard is part of a long term collaboration between Open Knowledge and Global Witness through which they have jointly created a series of interactives and data-driven investigations around corruption and conflict in the extractives industries.

To read the full report and see the dashboard go here.

If you work for an organisation that wants to make its data come alive on the web, get in touch with our team through

“The User Experience” in Public Libraries Magazine / LibUX

Toby Greenwalt asked Amanda and I —um, Michael — to guest-write about the user experience for his The Wired Library column in Public Libraries Magazine. Our writeup was just published online after appearing in print a couple of months ago.

A screenshot of "The Wired Library" spread in Public Libraries Magazine

“The Wired Library” in Public Libraries Magazine, vol. 54, no. 3

We were pretty stoked to have an opportunity to jabber outside our usual #libux echo chamber to evangelize a little and rejigger woo-woo ideas about the user experience for real-world use — it’s catching on.

Such user experience is holistic, negatively or positively impacted at every interaction point your patron has with your library. The brand spanking new building loses its glamour when the bathrooms are filthy; the breadth of the collection loses its meaning when the item you drove to the library for isn’t on the shelf; an awesome digital collection just doesn’t matter if it’s hard to access; the library that literally pumps joy through its vents nets a negative user experience when the hump of the doorframe makes it hard to enter with a wheelchair.

The rest of the post has to do with simple suggestions for improving the website, but the big idea stuff is right up top. Knowing what we know about how folks read on the web, we still get to flashbake some neurons even if this is a topic readers don’t care about.

Read the “The User Experience” over at Public Libraries Online.

I write the Web for Libraries each week — a newsletter chock-full of data-informed commentary about user experience design, including the bleeding-edge trends and web news I think user-oriented thinkers should know.

The post “The User Experience” in Public Libraries Magazine appeared first on LibUX.

Meet Your Developer: Will Panting / Islandora

A Meet Your Developer double feature this week, as we introduce another instructor for the upcoming Islandora Conference: Will Panting. A Programmer/Analyst at discoverygarden, Inc., Will is a key member of the Committers Group and of one of the most stalwart defenders of best practices and backwards compatibility in Islandora. If you adopt a brand new module and it doesn't break anything, you may well have Will to thank.

Please tell us a little about yourself. What do you do when you’re not at work?

I went to UPEI and have a major in Comp Sci and a minor in Business. Before DGI I had a short stint at the University. As well as all the normal things like friends and family I spend my spare time developing some personal projects and brewing beer. I've been trying to get my brown recipe right for years now.

How long have you been working with Islandora? How did you get started?

More than four years that I've been with DGI. I had heard about the company through UPEI. I find working on Islandora very rewarding; I think this space is of some very real value.

Sum up your area of expertise in three words:

Complete Islandora Stack

What are you working on right now?

A complex migration from a custom application. It's a good one, using most of the techniques we've had to in the past.

What contribution to Islandora are you most proud of?

I've been in about just every corner of the code base and written tons of peripheral modules and customizations. I think the thing that I'm most proud of isn't a thing, but a consistent push for sustainable practice.

What new feature or improvement would you most like to see?

I'm divided between a viewer framework, an XSLT management component or the generic graph traversal hooks. All basic technology that would create greater consistency and speed development.

What’s the one tool/software/resource you cannot live without?

Box provisioning; absolutely crucial to our rate of development.

If you could leave the community with one message from reading this interview, what would it be?

Commit. Dive deep in the code, let it cut you up then stitch the wounds and do it again. It's great to see new committers.

Seeking Balance in Copyright and Access / DPLA

The most important word in discussions around copyright in the United States is balance. Although there are many, often strong disagreements between copyright holders and those who wish to provide greater access to our cultural heritage, few dispute that the goal is to balance the interests of the public with those of writers, artists, and other creators.

Since the public is diffuse and understandably pays little attention to debates about seemingly abstract topics like copyright, it has been hard to balance their interests with those of rightsholders, especially corporations, who have much more concentrated attention and financial incentives to tilt the scale. (Also, lawyers.) Unsurprisingly, therefore, the history of copyright is one of a repeated lengthening of copyright terms and greater restrictions on public use.

The U.S. Copyright Office has spent the last few years looking at possible changes to the Copyright Act given that we are now a quarter-century into the age of the web, and its new forms of access to culture enabled by mass digitization. Most recently, the Office issued a report with recommendations about what to do about orphan works and the mass digitization of copyrighted works. The Office has requested feedback on its proposal, as well as on other specific questions regarding copyright and visual works and a proposed “making available” right (something that DPLA has already responded to). Each of these studies and proposals impact the Digital Public Library of America and our 1,600 contributing institutions, as well as many other libraries, archives, and museums that seek to bring their extensive collections online.

We greatly appreciate that the Office is trying to tackle these complex issues, given how difficult it is to ascertain the copyright status of many works created in the last century. As the production of books, photographs, audio, and other types of culture exploded, often by orders of magnitude, and as rights no longer had to be registered, often changed hands in corporate deals, and passed to estates (since copyright terms now long outlast the creators), we inherited an enormous problem of unclear rights and “orphan works” where rightsholders cannot easily—or ever—be found. This problem will only worsen now that digital production has given the means to billions of people to become creators, and not just consumers, of culture.

Although we understand the complexity and many competing interests that the Office has tried to address in the report, we do not believe their recommendations achieve that critical principle of balance. In our view, the recommendations unfortunately put too many burdens on the library community, and thus too many restrictions on public access. The report seeks to establish a lengthy vetting process for scanned items that is simply unworkable and extraordinarily expensive for institutions that are funded by, and serve, the public.

Last week, with the help of DPLA’s Legal Advisory Committee co-chair Dave Hansen, we filed a response to one of the Office’s recent inquiries, focusing on how the copyright system can be improved for visual works like photographs. As our filing details, DPLA’s vast archive of photographs from our many partners reveals how difficult it would be for cultural heritage institutions to vet the rights status of millions of personal, home, and amateur photographs, as well as millions of similar items in the many local collections contained in DPLA.

These works can provide candid insights into our shared cultural history…[but] identifying owners and obtaining permissions is nearly impossible for many personal photographs and candid snapshots…Even if creators are identifiable by name, they are often not locatable. Many are dead, raising complicated questions about whether rights were transferred to heirs, or perhaps escheated to the state. Because creators of many of these works never thought about the rights that they acquired in their visual works, they never made formal plans for succession of ownership.

Thus, as the Office undertakes this review, we urge it to consider whether creators, cultural heritage institutions, and the public at large would be better served by a system of protection that explicitly seeks to address the needs, expectations, and motivations of the incredibly large number of creators of these personal, home and amateur visual works, while appropriately accommodating those creators for whom copyright incentives do matter and for whom licensing and monetization are important.

Rather than placing burdens on libraries and archives for clearing use of visual works, we recommend that the Copyright Office focus on the creation of better copyright status and ownership information by encouraging rightsholders, who are in the best position to provide that information, to step forward. You can read more about our position in the full filing.

When we launched in 2013, one of the most gratifying responses we received was an emotional email from Australian who found a photograph of his grandmother, digitized by an archive in Utah and made discoverable through DPLA. It’s hard to put a price on such a discovery, but surely we must factor such moments into any discussion of copyright and access. We should value more greatly the public’s access to our digitized record, and find balanced ways for institutions to provide such access.

Mapping Libraries: Creating Real-time Maps of Global Information / Library of Congress: The Signal

The following is a guest post by Kalev Hannes Leetaru, a data scientist and Senior Fellow at George Washington University Center for Cyber & Homeland Security. In a previous post, he introduced us to the GDELT Project, a platform that monitors the news media, and presented how mass translation of the world’s information offers libraries enormous possibilities for broadening access. In this post, he writes about re-imagining information geographically.

Why might geography matter to the future of libraries?

Information occurs against a rich backdrop of geography: every document is created in a location, intended for an audience in the same or other locations, and may discuss yet other locations. The importance of geography in how humans understand and organize the world (PDF) is underscored by its prevalence in the news media: a location is mentioned every 200-300 words in the typical newspaper article of the last 60 years. Social media embraced location a decade ago through transparent geotagging, with Twitter proclaiming in 2009 that the rise of spatial search would fundamentally alter how we discovered information online. Yet the news media has steadfastly resisted this cartographic revolution, continuing to organize itself primarily through coarse editorially-assigned topical sections and eschewing the live maps that have redefined our ability to understand global reaction to major events. Using journalism as a case study, what does the future of mass-scale mapping of information look like and what might we learn of the future potential for libraries?

What would it look like to literally map the world’s information as it happens? What if we could reach across the world’s news media each day in real time and put a dot on a map for every mention in every article, in every language of any location on earth, along with the people, organizations, topics, and emotions associated with each place? For the past two years this has been the focus of the GDELT Project and through a new collaboration with online mapping platform CartoDB, we are making it possible to create rich interactive real-time maps of the world’s journalistic output across 65 languages.

Leveraging more than a decade of work on mapping the geography of text, GDELT monitors local news media from throughout the globe, live translates it, and performs “full-text geocoding” in which it identifies, disambiguates, and converts textual descriptions of location into mappable geographic coordinates. The result is a real-time multilingual geographic index over the world’s news that reflects the actual locations being talked about in the news, not just the bylines of where articles were filed. Using this platform, this geographic index is transformed into interactive animated maps that support spatial interaction with the news.

What becomes possible when the world’s news is arranged geographically? At the most basic level, it allows organizing search results on a map. The GDELT Geographic News Search allows a user to search by person, organization, theme, news outlet, or language (or any combination therein) and instantly view a map of every location discussed in context with that query, updated every hour. An animation layer shows how coverage has changed over the last 24 hours and a clickable layer displays a list of all matching coverage mentioning each location over the past hour.

Figure 1 - GDELT's Geographic News Search showing geography of Portuguese-language news coverage during a given 24 hour period

Figure 1 – GDELT’s Geographic News Search showing geography of Portuguese-language news coverage during a given 24 hour period

Selecting a specific news outlet like or as the query yields an instant geographic search interface to that outlet’s coverage, which can be embedded on any website. Imagine if every news website included a map like this on its homepage that allowed readers to browse spatially and find its latest coverage of rural Brazil, for example. The ability to filter news at the sub-national level is especially important when triaging rapidly-developing international stories. A first responder assisting in Nepal is likely more interested in the first glimmers of information emerging from its remote rural areas than the latest on the Western tourists trapped on Mount Everest.

Coupling CartoDB with Google’s BigQuery database platform, it becomes possible to visualize large-scale geographic patterns in coverage. The map below visualizes all of the locations mentioned in news monitored by GDELT from February to May 2015 relating to wildlife crime. Using the metaphor of a map, this list of 30,000 articles in 65 languages becomes an intuitive clickable map.

Figure 2 - Global discussion of wildlife crime

Figure 2 – Global discussion of wildlife crime

Exploring how the news changes over time, it becomes possible to chart the cumulative geographic focus of a news outlet, or to compare two outlets. Alternatively, looking across global coverage holistically, it becomes possible to instantly identify the world’s happiest and saddest news, or to determine the primary language of news coverage focusing on a given location. By arraying emotion on a map it becomes possible to instantly spot sudden bursts of negativity that reflect breaking news of violence or unrest. Organizing by language, it becomes possible to identify the outlets and languages most relevant to a given location, helping a reader find relevant sources about events in that area. Even the connections among locations in terms of how they are mentioned together in the news yields insights into geographic contextualization. Finally, by breaking the world into a geographic grid and computing the topics trending in each location, it becomes possible to create new ways of visualizing the world’s narratives.

Figure 3 - All locations mentioned in the New York Times (green) and BBC (yellow/orange) during the month of March 2015

Figure 3 – All locations mentioned in the New York Times (green) and BBC (yellow/orange) during the month of March 2015

Figure 4 – Click to see a live animated map of the average “happy/sad” tone of worldwide news coverage over the last 24 hours mentioning each location

Figure 5 - Click to see a live animated map of the primary language of worldwide news coverage over the last 24 hours mentioning each location

Figure 5 – Click to see a live animated map of the primary language of worldwide news coverage over the last 24 hours mentioning each location

Figure 6 - Interactive visualization of how countries are grouped together in the news media

Figure 6 – Interactive visualization of how countries are grouped together in the news media

Turning from global news to domestic television news, these same approaches can be applied to television closed captioning, making it possible to click on a location and view the portion of each news broadcast mentioning events at that location.

Figure 7 - Mapping the locations mentioned in American television news

Figure 7 – Mapping the locations mentioned in American television news

Turning back to the question that opened this post – why might geography matter to the future of libraries? As news outlets increasingly cede control over the distribution of their content, they do so not only to reach a broader audience, but to leverage more advanced delivery platforms and interfaces. Libraries are increasingly facing identical pressures as patrons turn towards services (PDF) like Google Scholar, Google Books, and Google News instead of library search portals. If libraries embraced new forms of access to their content, such as the kinds of geographic search capabilities outlined in this post, users might find those interfaces more compelling than those of non-news platforms. The ability of ordinary citizens to create their own live-updating “geographic mashups” of library holdings opens the door to engaging with patrons in ways that demonstrate the value of libraries beyond as a museum of physical artifacts and connecting individuals across national or international lines. As more and more library holdings, from academic literature to the open web itself, are geographically indexed, libraries stand poised to lead the cartographic revolution, opening the geography of their vast collections to search and visualization, and making it possible for the first time to quite literally map our world’s libraries.

Sampling methods for heuristic faceting / State Library of Denmark

Initial experiments with heuristic faceting in Solr were encouraging: Using just a sample of the result set, it was possible to get correct facet results for large result sets, reducing processing time by an order of magnitude. Alas, further experimentation unearthed that the sampling method was vulnerable to clustering. While heuristic faceting worked extremely well for most of the queries, it failed equally hard for a few of the queries.

The problem

Abstractly, faceting on Strings is a function that turns a collection of documents into a list of top-X terms plus the number of occurrences of these terms. In Solr the collection of documents is represented with a bitmap: One bit per document; if the bit is set, the document is part of the result set. The result set of 13 hits for an index with 64 documents could look like this:

00001100 01010111 00000000 01111110

Normally the faceting code would iterate all the bits, get the terms for the ones that are set and update the counts for those terms. The iteration of the bits is quite fast (1 second for 100M bits), but getting the terms (technically the term ordinals) and updating the counters takes more time (100 seconds for 100M documents).

Initial attempt: Sample the full document bitmap

The initial sampling was done by dividing the result set into chunks and only visiting those chunks. If we wanted to sample 50% of our result set and wanted to use 4 chunks, the parts of the result set to visit could be the one marked with red:

4 chunks: 00001100 01111110 00000000 01010111

As can be counted, the sampling hit 5 documents out of 13. Had we used 2 chunks, the result could be

2 chunks: 00001100 01111110 00000000 01010111

Only 2 hits out of 13 and not very representative. A high chunk count is needed: For 100M documents, 100K chunks worked fairly well. The law of large numbers helps a lot, but in case of document clusters (a group of very similar documents indexed at the same time) we still need both a lot of chunks and a high sampling percentage to have a high chance of hitting them. This sampling is prone to completely missing or over representing clusters.

Current solution: Sample the hits

Remember that iterating of the result bitmap itself is relatively fast. Instead of processing chunks of the bitmap and skipping between them, we iterate over all the hits and only update counts for some of them.

If the sampling rate is 50%, the bits marked with red would be used as sample:

50% sampling: 00001100 01111110 00000000 01010111

If the sampling rate is 33%, the bits for the sample documents would be

33% sampling: 00001100 01111110 00000000 01010111

This way of sampling is a bit slower than sampling on the full document bitmap as all bits must be visited, but it means that the distribution of the sampling points is as fine-grained as possible. It turns out that the better distribution gives better results, which means that the size of the sample can be lowered. Lower sample rate = higher speed.

Testing validity

A single shard from the Net Archive Search was used for testing. The shard was 900GB with 250M documents. Faceting was performed on the field links, which contains all outgoing links from indexed webpages. There are 600M unique values in that field and each document in the index contains an average of 25 links. For a full search on *:* that means 6 billion updates of the counter structure.

For this test, we look for the top-25 links. To get the baseline, a full facet count was issued for the top-50 links for a set of queries. A heuristic facet call was issued for the same queries, also for the top-50. The number of lines until the first discrepancy were counted for all the pairs. The ones with a count beneath 25 were considered faulty. The reason for the over provisioning was to raise the probability of correct results, which of course comes with a performance penalty.

The sampling size was set to 1/1000 the number of documents or roughly 200K hits. Only result sets sizes above 1M are relevant for validity as those below takes roughly the same time to calculate with and without sampling.

Heuristic validity for top 25/50

Heuristic validity for top 25/50

While the result looks messy, the number of faulty results was only 6 out of 116, for results set sizes above 1M. For the other 110 searches, the top-25 fields were correct. Raising the over provisioning to top-100 imposes a larger performance hit, but reduces the number of faulty results to 0 for this test.

Heuristic validity for top 25/100

Heuristic validity for top 25/100

Testing performance

The response times for full count faceting and heuristic faceting on the links field with over provision of 50 is as follows:

Heuristic speed for top 25/50

Heuristic speed for top 25/50

Switching from linear to logarithmic plotting for the y-axis immediately:

Heuristic speed for top 25/50, logarithmic Y-axis

Heuristic speed for top 25/50, logarithmic y-axis

It can be seen full counting rises linear with result size, while sampling time is near-constant. This makes sense as the sampling was done by updating counts for a fixed amount of documents. Other strategies, such as making the sampling rate a fraction of the result size, should be explored further, but as the validity plot shows, the fixed strategy works quite well.

The performance chart for over provisioning of 100 looks very much like the one for 50, only with slightly higher response times for sampling. As the amount of non-valid results is markedly lower for an over provisioning of 100, this seems like the best speed/validity trade off for our concrete setup.

Heuristic speed for top 25/100, logarithmic Y-axis

Heuristic speed for top 25/100, logarithmic y-axis


Heuristic faceting with sampling on hits gives a high probability of correct results. The speed up relative to full facet counting rises with result set size as sampling has near-constant response times. Using over provisioning allows for fine-grained tweaking between performance and chance of correct results. Heuristic faceting is expected to be the default for interactive use with the links field. Viability of heuristic faceting for smaller fields is currently being investigated.

As always, there is full source code and a drop-in sparse faceting Solr 4.10 WAR at GitHub.

MarcEdit 6 Updates / Terry Reese

I hadn’t planned on putting together an update for the Windows version of MarcEdit this week, but I’ve been working with someone putting the Linked Data tools through their paces and came across instances where some of the linked data services were not sending back valid XML data – and I wasn’t validating it.  So, I took some time and added some validation.  However, because the users are processing over a million items through the linked data tool, I also wanted to provide a more user friendly option that doesn’t require opening the MarcEditor – so I’ve added the linked data tools to the command line version of MarcEdit as well. 

Linked Data Command Line Options:

The command line tool is probably one of those under-used and unknown parts of MarcEdit.  The tool is a shim over the code libraries – exposing functionality from the command line, and making it easy to integrate with scripts written for automation purposes.  The tool has a wide range of options available to it – and for users unfamiliar with the command line tool – they can get information about the functionality offered by querying help.  For those using the command line tool – you’ll likely want to create an environmental variable pointing to the MarcEdit application directory so that you can call the program without needing to navigate to the directory.  For example, on my computer, I have an environmental variable called: %MARCEDIT_PATH% which points to the MarcEdit app directory.  This means that if I wanted to run the help from my command line for the MarcEdit Command Line tool, I’d run the following and get the following results:

C:\Users\reese.2179>%MARCEDIT_PATH%\cmarcedit -help
* MarcEdit 6.1 Console Application
* By Terry Reese
* email:
* Modified: 2015/7/29
        -s:     Path to file to be processed.
                        If calling the join utility, source must be files
                        delimited by the ";" character
        -d:     Path to destination file.
                          If call the split utility, dest should specify a fold
                        where split files will be saved.
                        If this folder doesn't exist, one will be created.
        -rules: Rules file for the MARC Validator.
        -mxslt: Path to the MARCXML XSLT file.
        -xslt:  Path to the XML XSLT file.
        -batch: Specifies Batch Processing Mode
        -character:     Specifies character conversion mode.
        -break: Specifies MarcBreaker algorithm
        -make:  Specifies MarcMaker algorithm
        -marcxml:       Specifies MARCXML algorithm
        -xmlmarc:       Specifics the MARCXML to MARC algorithm
        -marctoxml:     Specifies MARC to XML algorithm
        -xmltomarc:     Specifies XML to MARC algorithm
        -xml:   Specifies the XML to XML algorithm
        -validate:      Specifies the MARCValidator algorithm
        -join:  Specifies join MARC File algorithm
        -split: Specifies split MARC File algorithm
        -records:       Specifies number of records per file [used with split c
        -raw:   [Optional] Turns of mnemonic processing (returns raw data)
        -utf8:  [Optional] Turns on UTF-8 processing
        -marc8: [Optional] Turns on MARC-8 processing
        -pd:    [Optional] When a Malformed record is encountered, it will modi
y the process from a stop process to one where an error is simply noted and a s
ub note is added to the result file.
        -buildlinks:    Specifies the Semantic Linking algorithm
This function needs to be paired with the -options parameter
        -options        Specifies linking options to use: example: lcid,viaf:lc
oclcworkid,autodetect           lcid: utilizes to link 1xx/7xx data
                autodetect: autodetects subjects and links to know values
                oclcworkid: inserts link to oclc work id if present
                viaf: linking 1xx/7xx using viaf.  Specify index after colon. I
 no index is provided, lc is assumed.
                        VIAF Index Values:
                        all -- all of viaf
                        nla -- Australia's national index
                        vlacc -- Belgium's Flemish file
                        lac -- Canadian national file
                        bnc -- Catalunya
                        nsk -- Croatia
                        nkc -- Czech.
                        dbc -- Denmark (dbc)
                        egaxa -- Egypt
                        bnf -- France (BNF)
                        sudoc -- France (SUDOC)
                        dnb -- Germany
                        jpg -- Getty (ULAN)
                        bnc+bne -- Hispanica
                        nszl -- Hungary
                        isni -- ISNI
                        ndl -- Japan (NDL)
                        nli -- Israel
                        iccu -- Italy
                        LNB -- Latvia
                        LNL -- Lebannon
                        lc -- LC (NACO)
                        nta -- Netherlands
                        bibsys -- Norway
                        perseus -- Perseus
                        nlp -- Polish National Library
                        nukat -- Poland (Nukat)
                        ptbnp -- Portugal
                        nlb -- Singapore
                        bne -- Spain
                        selibr -- Sweden
                        swnl -- Swiss National Library
                        srp -- Syriac
                        rero -- Swiss RERO
                        rsl -- Russian
                        bav -- Vatican
                        wkp -- Wikipedia

        -help:  Returns usage information

The linked data option uses the following pattern: cmarcedit.exe –s [sourcefile] –d [destfile] –buildlinks –options [linkoptions]

As noted above in the list, –options is a comma delimited list that includes the values that the linking tool should query.  A user, for example, looking to generate workids and uris on the 1xx and 7xx fields using – the command would look like:

<< cmarcedit.exe –s [sourcefile] –d [destfile] –buildlinks –options oclcworkid,lcid

Users interesting in building all available linkages (using viaf, autodetecting subjects, etc. would use:

<< cmarcedit.exe –s [sourcefile] –d [destfile] –buildlinks –options oclcworkid,lcid,autodetect,viaf:lc

Notice the last option – viaf. This tells the tool to utilize viaf as a linking option in the 1xx and the 7xx – the data after the colon identifies the index to utilize when building links.  The indexes are found in the help (see above).

Download information:

The update can be found on the downloads page: or using the automated update tool within MarcEdit.  Direct links:

Mac Port Update:

Part of the reason I hadn’t planned on doing a Windows update of MarcEdit this week is that I’ve been heads down making changes to the Mac Port.  I’ve gotten good feedback from folks letting me know that so far, so good.  Over the past few weeks, I’ve been integrating missing features from the MarcEditor into the Port, as well as working on the Delimited Text Translation.  I’ll now have to go back and make a couple of changes to support some of the update work in the Linked Data tool – but I’m hoping that by Aug. 2nd, I’ll have a new Mac Port Preview that will be pretty close to completing (and expanding) the initial port sprint. 

Questions, let me know.


FASTR zooms out of Senate Committee / District Dispatch

Today, after many years of effort by our members and the open access community, the Fair Access to Science and Technology Research Act of 2015 (FASTR) was approved by unanimous voice vote of the Senate Committee on Homeland Security and Governmental Affairs. It now goes to the full Senate for consideration as early as this September. ALA thanks Committee Chair Ron Johnson (R-WI) and his staff for their hard work and wishes again to express its deep gratitude to Senator John Cornyn (R-TX) for his leadership and his staff’s tireless efforts toward ensuring that tax-payer funded research be and remain accessible to the public.

Blurred highway lights

Photo by Andreas Levers

As ALA’s press release states, “FASTR would require federal departments and agencies with an annual extramural research budget of $100 million to develop a policy to ensure that researchers submit an electronic copy of the final manuscript accepted for publication in a peer-reviewed journal. Additionally, the bill would also require that each taxpayer-funded manuscript be made available to the public online and without cost, no later than twelve months after the article has been published in a peer-reviewed journal.”

While this may seem a small step, it is a critical, momentum-generating advance and the most meaningful legislative movement on FASTR that has ever occurred. Please stay tuned as we continue to monitor this issue and to go into overdrive if ongoing efforts to accelerate a vote on S.779 by the Senate in this calendar year gain traction.

Congratulations to all of you who helped “move FASTR” today and thanks for being ready to join ALA again when it’s time to tell Congress to floor it!

The post FASTR zooms out of Senate Committee appeared first on District Dispatch.

Even though it is summer, CopyTalk webinars continue! / District Dispatch

Fallen coffee cup

From Lotus Head

Higher education universities and their libraries provide copyright information to the members of their community in different ways. Join us on CopyTalk this month to hear three universities describe the services they offer regarding copyright to their faculty, staff, and students. Our presenters will include Sandra Enimil, Program Director, University Libraries Copyright Resources Center from the Ohio State University, Pia Hunter, Visiting Assistant Professor and Copyright and Reserve Librarian from the University of Illinois at Chicago, and Cindy Kristof, Head of Copyright and Document Services from Kent State University.

CopyTalk will take place on August 6th at 11am Pacific/2pm Eastern time. After a brief introduction of our presenters, our speakers will present for 45 minutes, and we will end with a Q&A session (questions will be collected during the presentations).

Please join us at the webinar URL. Enter as a guest, no password required.

We are limited on the number of concurrent viewers we can have, so we ask you to watch with others at your institution if at all possible. The presentations are recorded and will be available online soon after the presentation. Oh yeah – it’s free!

The post Even though it is summer, CopyTalk webinars continue! appeared first on District Dispatch.

Jobs in Information Technology: July 29, 2015 / LITA

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week:

Digital Services Coordinator, Metropolitan New York Library Council, New York, NY

Senior Library Applications Developer, Brown University, Providence, RI

Associate University Librarian for Digital Technologies, Brown University, Providence, RI

Information Designer for Digital Scholarly Publications, Brown University, Providence, RI

Science and Engineering Librarian, Pennsylvania State University Libraries, University Park, PA

Information Sciences and Business Liaison Librarian, Pennsylvania State University Libraries, University Park, PA

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

III report: “WE LOVE THE LIBRARY, BUT WE LIVE ON THE WEB.” / Jonathan Rochkind

ILS Vendor III has released a report based on a survey of patrons at 7 UK academic libraries:

“WE LOVE THE LIBRARY, BUT WE LIVE ON THE WEB.” Findings around how academic library users view online resources and services (You have to register to download)

Some of the summary of findings from the report:

  • “User behaviours are increasingly pervasive, cutting across age, experience, and subject areas”
  • “Online anywhere, on any device, is the default access setting”
  • “Almost without exception, users are selecting different discovery tools to meet different requirements, ranging from known item searches to broad investigation of a new topic. Perhaps with some credit due to recent ‘discovery layer’ developments, the specialist library search is very much of interest in this bag of tools, alongside global search engines and more particular entry points such as Google Scholar and Wikipedia.”
  • Library Search is under informed scrutiny. Given a user base that is increasingly aware of the possibilities for discovery and subsequent access, there are frustrations regarding a lack of unified coverage of the library content, the failure to deliver core purposes well (notably, known item searches and uninterrupted flow-through to access), and unfavourable comparisons with global search engines in general and Google Scholar in particular. We note:
    • Global Search Engines – Whilst specialised tools are valued, the global search engines (and especially Google) are the benchmark.
    • Unified Search – Local collection search needs to be unified, not only across print and electronic, but also across curatorial silos (archives, museums, special collections, repositories, and research data stores).
    • . Search Confidence – As well as finding known items reliably and ordering results accordingly, library search needs to be flexible and intelligent, not obstructively fussy and inexplicably random.

I think this supports some of the directions we’ve been trying to take here. We’ve tried to make our system play well with Google Scholar (both directing users to Google Scholar as an option where appropriate, and using Umlaut to provide as good a landing page as possible when users come from Google Scholar and want access to licensed copies, phyisically held copies, or ILL services for items discovered).  We’ve tried to move toward a unified search in our homegrown-from-open-source-components catalog.

And most especially we’ve tried to focus on “uninterrupted flow-through to access”, again with the Umlaut tool.

We definitely have a ways to go in all these areas, it’s an uphill struggle in many ways , as discussed in my previous comments on the Ithaka report on Streamlining Access to Scholarly Resources.

But I think we’ve at least been chasing the right goals.

Another thing noted in the report:

  • “Electronic course readings are crucial (Sections 8, 12) Clearly, the greatest single issue raised in qualitative feedback is the plea for mandated / recommended course readings— and, ideally, textbooks—to be universally available as digital downloads,”

We’ve done less work locally in this direction, on course reserves in general, and I think we probably ought to. This is one area where I’d especially wonder if UK users may not be representative of U.S. users — but I still have no doubt that our undergraduate patrons spend enough time with course readings to justify more of our time then we’ve been spending on analyzing what they need in electronic systems and improving them.

The report makes a few recommendations:

  • “The local collection needs to be surfaced in the wider ecosystem.”
  • “Libraries should consider how to encompass non-text resources.”
  • “Electronic resources demand electronic workflows.”
  • “Libraries should empower users like any modern digital service. Increasing expectations exist across all user categories—likely derived from experiences with other services—that the library should provide ‘Apps’ geared to just-in-time support on the fly (ranging from paying a fine to finding a shelf) and should also support interactions for registered returning users with transaction histories, saved items, and profile-enabled automated recommendations.”
  • “Social is becoming the norm”

Other findings suggest that ‘known item searches’ are still the most popular use of the “general Library search”, although “carry out an initial subject search” is still present as well.  And that when it comes to ebooks, “There is notably strong support to be able to download content to use on any device at any time.”  (Something we are largely failing at, although we can blame our vendors).

Filed under: General

Meet Your Developer: QA Dan / Islandora

With the Islandora Conference coming up , we thought it would be a good time to Meet some Developers, especially those who will be leading workshops. Kicking it off is Daniel Aitken, better known in the Islandora community as QA Dan, master of testing. Despite the name, Dan now works for discoverygarden, Inc as a developer, although he maintains ceremonial duties as Lord Regent of the QA Department. He's known for thorough troubleshooting on the listserv, some very handy custom modules, and will be leading workshops on How to Tuque and Solution Packs (Experts) at the upcoming Islandora Conference. Here's QA Dan in his own words:

Please tell us a little about yourself. What do you do when you’re not at work?
Hmm … when I’m not working, and I decide to do something more interesting than sitting on the couch, I’m probably baking. Pies, biscuits, cookies … currently I’m working on making fishcakes from scratch. Batch one was less than stellar. I think I accidentally cooked the starch out of the potatoes.
How long have you been working with Islandora? How did you get started?
I’ve been with discoverygarden for about three years now. I kind of randomly fell into it! I didn’t really know what to expect, but the team here is fantastic, and I’ve gotten the opportunity to work on so many fascinating projects that it’s been a blast.
Sum up your area of expertise in three words:
Uh … code base security?
What are you working on right now?
Right this second? Looking at a fix to the basic solr config that should prevent GSearch/Fedora from spinning its wheels in an unusual case where certain fields end in whitespace. Once I actually get some free time here in the QA department? Updating our Travis-CI .yaml scripts to use caching between builds so that hopefully we don’t have 45 minute-plus build processing times.
What contribution to Islandora are you most proud of?
The testing back-ends! Y’know, all this stuff. To be fair, a bare-bones version was there before I got my hands on it, but it’s been updated to the point where it’s almost indistinguishable from its first form. It separates testing utilities from the actual test base class so that it can be shoehorned into other frameworks, like the woefully-underused, or even in included frameworks like the basically-magical datastream validation stuff! That way, no matter what you’re doing, you can manipulate Fedora objects during tests! It’s also been made easy to extend, hint hint.
What new feature or improvement would you most like to see?
Does ‘consolidated documentation’ count? I feel like that’s what slips a lot of people up. I know we’ve been working on improving it - we have a whole interest group devoted to it - but it’s a multi-tendriled beast that needs to be tamed. We have appendices living in multiple places, API documentation that only lives in individual modules’ api.php files, and just … all kinds of other stuff. As a kind-of-but-not-really-an-end-user sort, I only really make improvements to technical documents like Working With Fedora Objects Programmatically via Tuque, and half the time these are ones I made myself because there was a desperate gap in the knowledgebase.
What’s the one tool/software/resource you cannot live without?
Ten Million with a Hat! I don’t know what I did before I had it. Actually, I do; I flushed all my time down the toilet manually creating all the objects I use in my regular testing. So I made a thing with the concept of ‘just batch ingest a bunch of random objects, and modify each one via hooks’. Then, I started working on the hooks - things to add OBJs and generate derivatives and randomly construct MODS datastreams and do DC crosswalking and add things to bookmarks and whatnot - whatever fits the case for whatever I’m testing. Now I’ve gone from ingesting objects I need manually like some kind of chump to having Islandora take care of it for me. The moral of the story is that if you think you couldn’t live without a tool, probably just make it? Code is magic. Tuque is also magic.
If you could leave the community with one message from reading this interview, what would it be?
Write tests and Travis integration for your contributed modules! I know it’s a time investment, but I’ve put a lot of work into making it easier for you! There’s even a guideline here. It’ll tell you all about how to poke at things inside Islandora and make assertions about what comes back, like whether or not objects are objects and whether or not the datastreams exist and are well-formed (they tend to rely on the actual contents of the binary, and never on extensions or mime types). Well-written high-level tests can tell you if you’ve broken something that you didn’t expect to, and Travis can tell you all sorts of things about the quality of your code per-commit. A tiny weight is lifted off my shoulder every time I see a project I’ve never encountered before that has a ‘tests’ folder and a ‘.travis.yml’ and a big green ‘PASSING’ in the README on GitHub.
Happy Islandora-ing!

Why Diversity Matters: A Roundtable Discussion on Racial and Ethnic Diversity in Librarianship / In the Library, With the Lead Pipe

Download PDF
Image by Flickr User Webtreats (CC-BY-2.0)

Image by Flickr User webtreats(CC-BY 2.0)

In Brief: 

After presenting together at ACRL 2015 to share research we conducted on race, identity, and diversity in academic librarianship, we reconvene panelists Ione T. Damasco, Cataloger Librarian at the University of Dayton, Isabel Gonzalez-Smith, Undergraduate Experience Librarian at the University of Illinois, Chicago, Dracine Hodges, Head of Acquisitions at Ohio State University, Todd Honma, Assistant Professor of Asian American Studies at Pitzer College, Juleah Swanson, Head of Acquisition Services at the University of Colorado Boulder, and Azusa Tanaka, Japanese Studies Librarian at the University of Washington in a virtual roundtable discussion. Resuming the conversation that started at ACRL, we discuss why diversity really matters to academic libraries, librarians, and the profession, and where to go from here. We conclude this article with a series of questions for readers to consider, share, and discuss among colleagues to continue and advance the conversation on diversity in libraries.


Earlier this year, at the Association of College and Research Libraries (ACRL) 2015 conference, the authors of this article participated in a panel discussion entitled “From the Individual to the Institution: Exploring the Experiences of Academic Librarians of Color”1 which covered research the panelists had conducted on institutional racism, structures of privilege and power, and racial and ethnic identity theory in academic libraries and among academic librarians. The hour-long, standing-room only session scraped the surface of conversations that are needed among academic librarians on issues of diversity, institutional racism, microaggressions, identity, and intersectionality. It was our intent with the ACRL panel to plant the seeds for these conversations and for critical thought in these areas to further germinate. We saw these conversations begin to take shape during and after the panel discussion on Twitter, and overheard in the halls of the Oregon Convention Center. As Pho and Masland write in the final chapter of The Librarian Stereotype, “we are now at a point where discussions about the intersectionality of gender, sexuality, race, and ethnicity in librarianship are happening among a wider audience . . . These difficult conversations about diversity are the first steps toward a plan of action” (2015, p. 277). These conversations must continue to grow.

The discussion of racial and ethnic diversity in libraries is a subset of the larger discussion of race in the United States. For anyone participating in these discussions, the experience can be difficult and uncomfortable. Such discussions can be academic in nature, but very often they are personal and subjective. In the United States, our long history of avoiding difficult and meaningful conversations about race has made it challenging for some people to perceive or comprehend disparities in representation and privilege. Fear often plays a significant role as a barrier to engaging in these conversations. Fear of the unknown, fear of rejection, fear of change, and the perceived possibility of losing control can complicate these discussions. Participants in these conversations have to be willing to concede a certain amount of vulnerability in order to move the discussion forward, but vulnerability makes many people uncomfortable, which in turn makes it easy to just avoid the discussion altogether.

What follows is a virtual roundtable discussion where we speak openly about why diversity really matters, what actions can be taken, and suggest questions for readers to consider, share, and discuss in honest and open conversations with colleagues. At times, authors reveal the very real struggle to articulate or grapple with the questions, just as one might encounter in a face-to-face conversation. But, ultimately, by continuing this conversation we work to advance our profession’s understanding of the complexity of race and ethnic diversity in librarianship, and to strive toward creating sustainable collaborations and lasting change in a profession that continues to face significant challenges in maintaining race and ethnic diversity.

Before launching into the roundtable discussion, we acknowledge that an additional challenge when talking about race is the use of terminology and language that intellectualizes some of the real-world experiences and feelings we face. Terminology is useful due to its ability to create precision in meaning, but it also can alienate and turn away readers who use different language or terms to express similar experiences, feelings, or concepts. Yet in order to have a critical discussion of race and diversity, it is important that we engage in the use of particular terms that help us to identify, explain, and analyze issues and experiences that will help us to advance the conversation in deeper and more meaningful ways. In this article we do use terms that draw from a common critical lexicon, and we have made an effort to define and/or footnote many of these terms for readers who might be unfamiliar with these terms.

Why does diversity matter?

Juleah: Why does diversity matter? This question was posed to the audience at the end of our ACRL panel (Swanson, et al., 2015) , as something to reflect upon. For our virtual roundtable, I’m re-asking this question, because this question warrants meaningful discussion. Let’s go around the “table” and start with Ione.

Ione: When the question was first posed to us, I struggled with articulating a response that was more than just an intuitive reaction. My first thought was that diversity matters because we don’t live and work in a vacuum of homogeneity. But I realize that’s both a naïve and inaccurate answer, as there are many places where people still live in segregated areas in terms of race, and that there are work environments that for many reasons, tend to have a homogeneous pool of employees. It’s not enough to say that diversity matters because the world is diverse.

Isabel: Ione’s initial comment about wanting to respond beyond her intuition reminds me of Isabel Espinal’s “A New Vocabulary for Inclusive Librarianship: Applying Whiteness Theory to our profession” piece where she discusses Sensate Theory, an anthropology framework in discussing whiteness. I agree with Ione’s reaction of wanting to articulate why racial and ethnic diversity is important, how painful prejudice and discrimination can feel, and the need for acknowledgement of the disparities that exist in different communities’ experiences and history due to race/ethnicity. Discovering Espinal’s exploration of sensate theory was thrilling for me because she says that the theory emphasizes gut reactions – emotion and the senses (Espinal, 145). Librarians of color may react with “a very angry or very tearful reaction or both…the experience of encountering whiteness in the library setting is one that is felt in the body; it is more than an intellectual abstraction.” (145) This really resonated with me because I consider myself an intelligent, composed person but when my colleagues or I experience discrimination due to our race/ethnicity, I can’t help but feel an initial overwhelmingness. This is then immediately followed by a process of checking my emotions to find ways to articulate myself in an intellectual way as a means to be acknowledged and understood. As a person of color, this is what discussing the relevance and meaning behind diversity means to me – a struggle between gut reaction and articulation.

Dracine: This question is a challenge. Nevertheless, most people who come into this profession want to be of service directly or indirectly to others. Libraries of every variety exist to serve their respective constituents through access to information and spaces for collaboration.
With that in mind, I think diversity matters in relation to the relevance of services being provided to meet practical and extraordinary needs. Needs that are diverse not only because of ethnicity and race, but also because of religion, gender, socioeconomic status, physical ability, etc.

With recent headlines related to racism and violence, it is easy to see the connectivity of libraries in the pursuit of social justice ideals. So much of the conversation we’ve been having pertains to administrative and cultural constructs that frustrate diversity. These are large and lofty issues in scope. I often think their enormity makes us dismissive of the tangible impacts of diversity in the commonplace work performed in libraries every day.

I’ve heard many anecdotal stories from colleagues, both of color and white, who were able to customize or enhance instruction for an individual or group because of personal insights and experiences related to issues like English as a Foreign Language and format accessibility. Perhaps mountains were not moved, but to the individuals who benefitted hills were climbed.

Isabel: Dracine’s example of instructors tailoring their sessions for a particular class of students based on factors like language is a great example of how librarians are tuning into the identity aspect of the communities they serve. Juleah, Azusa, and I have been using identity theory to think about diversity initiatives from an angle that takes into account the individual experience at a more fundamental level. Because identity is so dynamic and in constant flux, it is often constructed from the internal sense of self as well as the external, social level. Consider it like the messages we internalize from what we see on tv, read in history books or who possesses roles of authority in our institutions, who sits at the reference desk. It makes sense that your colleagues customize their instruction because we intuitively sense that people respond positively to another person who is like themselves. That’s why the library ethnic caucuses are important – they establish a sense of community which provide some individuals with a sense of community and belonging. Ethnic identity theory helps us understand this phenomenon.

Azusa: As Dracine says above, diversity matters because the libraries must accommodate diverse user groups as well as librarian population. Ione mentioned during our panel how the field of Library and Information Science (LIS) and higher education in general views diversity as a problem to be solved (Swanson, et al., 2015). Diversity, in race, ethnicity, sexuality, age, social background, and more, will bring power to the libraries where balanced views and all kinds of possibilities are inevitable for successful research and teaching. Diversity is not a problem, but an asset for the institution.

Juleah: When we talk about diversity and why it matters in academic libraries, I think what we’re really trying to get at are two different concepts: 1) diversity in relation to the library profession’s role in social justice (Morales, Knowles, Bourg, 2014) and 2) diversity in relation to organizational culture within libraries.

To be honest, I think our profession, librarians as a whole, but more specifically academic librarians, are in the midst of a professional culture crisis. I think this stems from the homogeneity within our professional ranks. What we get to do as academic librarians today is incredible, from pushing our campuses into open access models for research output to being active participants in conversations about managing massive amounts of data. But are we proud of the homogeneity and the stagnant racial and ethnic diversity within the profession? I don’t think we are.

I think diversity matters because, right now, it allows us the opportunity to reinvent our organizational and professional culture into something that is not reliant on homogeneity of people and ideas, but rather looks toward what we bring to the future of higher education.

Ione: Juleah’s comment about diversity in academic libraries being two separate concepts are actually intertwined, and are worth exploring at the same time. I think her first point about libraries and social justice poses difficult questions for us as a profession—how far do we take social responsibility as academic libraries? As academic librarians? How do we reconcile social responsibility with the missions of our institutions, and what do we do when they are out of alignment? Connecting these to her second point, internally, how far do we take a social justice concept of diversity in terms of our daily work as librarians? Can we even agree upon a definition of social justice in terms of diversity? I think Todd raised an important question during the panel (Swanson, et al., 2015) when he said, “The question is, is diversity a social justice? Is racial equity part of an institutional mission? If it isn’t, then we have to interrogate that.”

If we think of our libraries as microcosms of the world around us, I don’t think we can ignore the fact that oppressive structures of power which exist in our culture are reproduced within the structures that exist in higher education, in our universities and colleges, and in our academic libraries, often unknowingly and sometimes with the best of intentions. Numbers aren’t everything, but the lack of positive movement in terms of racial demographics in our field is a cause for concern. And just adding more people of “diverse” backgrounds does nothing to address structural problems with an institution. I think as we move as a society to undo oppression of marginalized identities, libraries, as places that serve larger communities, do bear a responsibility to undo their own oppressive structures and question why things have stayed the same over the years in our profession.

Isabel: You’re right, Ione. Like I said at our panel at ACRL, you can’t just hire a person of color and call it diversity (Swanson, et al., 2015). If we’re going to pursue diversity initiatives at the student and professional level, we need to identify what long-term success looks like for our field and what resonates with individuals. What Juleah, Azusa, and I found in our research was that racial and ethnic identity theory helps us understand why librarians of color may respond well to ethnic causes or liaisoning for students of color groups and how they may feel a sense of loneliness in a predominately white institution or perceive their race/ethnicity is used to pigeon-hole their professional responsibilities.

Diversity matters because we all play a part in the messages we disseminate, regardless of how we identify. Librarians contribute towards the preservation and accessibility of information, representations of authority in the intellectual sphere, and advocating against censorship. What is the message that our collections, library staff representation, research, or programming gives to the communities we serve? And what are we doing to serve our patrons in ways that take into account their race and/or ethnicity?

Todd: To add to what Isabel said about the librarian’s role in the preservation and accessibility of information, I think at a profound foundational level, libraries are involved in an epistemological project. In other words, as an institution that collects, preserves, and distributes information, libraries serve the function of helping to create and circulate knowledge in our society. How institutions construct and curate information, and how users access and synthesize that information, are not outside the realm of the political. Especially in the case of academic libraries, which encompass a scholarly mission of furthering intellectual growth and scholarly communication, thinking carefully and deeply about the types of knowledge that is both included and excluded is crucial to the mission of the library and its relation to broader society.

Isabel: NPR recently recently featured Michelle Obama’s commencement speech to the predominantly African-American class of Martin Luther King Jr. Preparatory High School in the south side of Chicago where she mentions how the famous American author Richard Wright was not being allowed to check out books at the public library because he was black (Obama, 2015). I instantly thought of Todd’s point when I heard it on the radio – that the American library’s past was once a place of exclusion, and how it still remains political. The First Lady’s point was to inspire the graduating class to persevere beyond their struggles towards achieving greatness – a message intended to resonate with the students because it was coming from an accomplished, powerful, fellow South Sider of Chicago.

Todd: That example also reminds me of how E.J. Josey, writing in 1972, identified academic libraries as having a unique role to play in the black liberation movement. Even today, as higher education continues to be a site of privilege for some and exclusion for others, diversity and educational equity is something that we still need to work on. Thus, in relationship to libraries and higher education, diversity is important to consider in how we think about all aspects of the ‘life cycle of information,’ particularly when it comes to the ways in which historically underrepresented groups and historically underrepresented forms of knowledge and practices have not been included in – and at times, systematically excluded from – collection building and user services.

Ione: Many of us who work in academic libraries have encountered “diversity training” at one point or another, and in the course of that training, we may have been presented with statistics from both business and higher education that demonstrate the value of diversity in specific ways. For example, many businesses highlight the importance of being able to work effectively in a global market, and higher education has followed that line of thinking in terms of promoting diversity as a way of building student competence in intercultural interactions as a key component of their college education. Another reason diversity is often touted as a component of an effective workplace is that studies have shown that more often than not, more diverse work teams have proven to be highly productive. But I find these market-driven motivations for promoting diversity to be very superficial and highly problematic.

Todd: The approach to diversity that Ione describes is part of a growing concern regarding the “neoliberalization of the library” (Hill, 2010; Pateman, 2003), including increased privatization, a shrinking public sphere, and a market-driven approach to issues like diversity. Failure to think about how diverse communities have been and continue to be impacted by such trends, and along with it the perpetuation of the implicit race and class privileges, will only lead to the further homogenization and privatization of places, practices, and services.
When considering issues of race and racial representation in the library, I think it’s important that we move beyond an additive model and think about the epistemological. People of color (as well as other disenfranchised groups) are more than just laboring bodies, more than just token representatives of a diverse workforce under the conditions of capitalism, but also possess, practice, and embody different ways of understanding and inhabiting the world, which as Juleah points out, can help to reinvent the culture of the library, and higher education, more generally. It is this possibility of transformation that I think is why diversity matters.

Juleah: This has been a captivating discussion so far, addressing themes from homogeneity in the profession, organizational culture, race and identity, issues of social justice, and ultimately critically examining our role as librarians to the communities we serve. We could spend more time on this question, but similar to a time limit in a real world discussion, we have a word count. So, let’s move on to the next question.

Where do we go from here?

Juleah: Often times, after engaging in critical discourse, when the conversation ends, we are left wondering what to do next. Rather than leaving this for the reader to consider after finishing this article, let’s address this issue here. Now that we have touched about why diversity matters, where do we go from here?

Ione: Participating in the ACRL panel really challenged me to think about my own approaches to researching diversity, which had previously been focused on understanding the experiences of individuals of color. However, as Todd had pointed out during the panel (Swanson, et al., 2015), I think we all need to be more versed in critical perspectives around identity (and intersectionality)2 in order to have more effective conversations about how racism and other forms of oppression continue to be produced and reproduced in our organizations. Listening to the experiences of those who have been marginalized3 may motivate us to move towards a more socially just world, but developing critical competencies and deepening our knowledge base in critical theory can give us the tools to actually dismantle those structures that have marginalized them in the first place.

Dracine: During the panel, I made a comment regarding my own relief upon hearing my director say diversity was not my issue (Swanson, et al., 2015). For me this was important because even as a librarian of color my professional expertise is not diversity. However, if you want to talk about getting Arabic language books through U.S. customs, then sure, I might have some thoughts. I care about diversity for the very reasons that have been discussed and definitely want to leave the profession better than I found it. I think it’s important to acknowledge that how that happens may look different for each individual. The biggest takeaway for me was the obvious need for a reset or a refresh on the question of diversity in libraries. We’ve begun to have what feels like genuine conversations that will hopefully combat the diversity fatigue felt by both librarians of color and perhaps our white counterparts.

Ione: Arm yourself with knowledge, and then have the courage to use that knowledge to start dialogues with your colleagues, administrators, faculty, and staff, not just in your library but across your campuses to examine existing policies and practices that have left far too much room for discrimination (both implicit and explicit) to occur. And I mention courage because these are not easy conversations to have, or even to initiate. It’s easy for defensiveness to arise in these conversations, and for emotions to get rather heated, but I think it is possible to move through those communication barriers and get to a place of actual growth.

Juleah: When talking about diversity in academic libraries with colleagues of varying racial and ethnic backgrounds, acknowledging that institutional racism4 does exist, regardless of intent and well-meaning, can, in fact, be very freeing in a conversation, because institutional racism is not about us-versus-them, or you-versus-me, but instead it’s a collective outcome to be analyzed and critiqued collectively by an organization. The question becomes not, “What are we doing wrong?” but instead, “How can we change our outcomes?”

Ione: Another thing I would recommend is seeking out other campus partners with expertise in mediating these types of conversations. For example, a few years ago, our campus hosted a series of “Dialogues on Diversity” that brought together small cohorts of faculty and staff from different units to attend a series of dialogue sessions mediated by trained facilitators to try to build a better sense of community across differences. It was a very small step, and it did not transform our campus culture overall, but I do think it helped create a network of people across the university who obviously cared about bridging differences in order to improve our overall campus climate. Through that program, I met people with whom I have since worked on initiatives and programs related to diversity.

Isabel: Great suggestions Ione. My institution did diversity dialogues in collaboration with campus partners and the sessions include the perspectives from people of different experiences and backgrounds. It’s a productive way to navigate through the uncomfortable tension between the personal and the systemic contributions towards diversity. I would also suggest that librarians, regardless of race/ethnicity or hierarchy in their institutions, pay attention to recent discussions in our profession regarding microaggressions, which are often unintentional comments “that convey rudeness, insensitivity and demeans a person’s’ racial heritage or identity” (Sue et al, 2007). The LIS Microaggressions tumbr project reminds us that we are all capable of demeaning someone despite our best intentions, but we also have the opportunity to truly listen when we are being called out, being humbled by the experience, and learning from it. At a personal level, this one thing we can and must all do – listen.

Todd: One of the important points that was discussed at the panel and that we continue to discuss here is trying to come up with ways to transform both the profession and the various institutions that we work at. Crucial to such a consideration is identifying where power lies. Of course, we all exercise power in different ways. The key is to figure out how to exercise our power to make lasting, sustainable change at the structural level. And we can’t just be acting alone. We need to create movements and build alliances, and this often entails creative forms of coalition building. (Although I suppose all forms of coalition are creative.)

Ruth Wilson Gilmore (2007) makes a point of stressing that we need to identify both likely and unlikely allies. We need to be better about doing that in the LIS field. At the ACRL panel, one of the audience members noted that ALA is 98% white (Swanson, et al., 2015). Obviously, change in terms of the percentages of people of color in ALA, or the LIS field in general, is not going to happen overnight, so how do we work with that 98% so that we are creating coalitions with people who can be good allies.

A helpful way of thinking about institutional alliances is what Scott Frickel (2011) calls “shadow mobilizations,” which entails creating informal networks of activism among diverse stakeholders within the constraints of the institution. I think such a strategy can be effective in building alliances within and between different constituent groups in the LIS fields. One of the points that I raised in the ACRL panel was that we need to recognize the complexity of people’s identity, how our positionalities encompass intersectional identities and affiliations that are not always immediately visible and legible (Swanson, et al., 2015). So even though ALA or the profession is predominantly white, that whiteness is not monolithic. It is inflected through categories such as class, gender, sexuality, religion, ability, etc. By understanding diversity, including racial diversity, through a framework that is sensitive to how it is always already constituted through these other intersections, we can forge multiple coalitions in ways that are complex, nuanced, and durable. Ultimately, this would mean that we are constructing a movement based on a diversity politics that is founded on a quest for social justice and social transformation rather than token representation or inclusion.

Ione: In terms of higher education and academic libraries, I think we really need to question hiring practices, and tenure and promotion practices. As I mentioned during the ACRL panel (Swanson, et al., 2015) back in March, the idea of “organizational fit” is a problematic concept in terms of search committee discussions. While it is never an official criterion for an applicant, I think search committees reinforce the status quo when they use language to deny an applicant a position because of their perceived inability to fit the existing organizational culture. I think we also need to take a closer look at how we write our position descriptions, how we write our mission statements, essentially, what do we convey about ourselves as organizations to potential applicants?

Todd: This requires all of us to take a critical, self-reflexive look at our complicity in maintaining the status quo and our roles in facilitating the goals of social change. For example, we can take some lessons from those working in other fields—like the STEM (science, technology, engineering, math) fields—that are also struggling to recruit and retain historically underrepresented groups. Attention is being given to how to make STEM more culturally relevant to people of color and other marginalized groups so that there are alternative pathways to pursue it in terms of scholarship and profession (Basu & Barton, 2007; Lee & Buxton, 2010; Lyon, Jafri, & St Louis, 2012). As we continue to build on efforts to diversify the LIS field, I think looking at other strategies, interrogating the current field and its practices, and asking questions such as how do we make LIS more culturally relevant and what alternative pathways can be developed to increase recruitment and retention of people of color and other marginalized groups are important facets for us to consider.

Azusa: The ACRL Diversity Committee’s Diversity Standards: Cultural Competency for Academic Libraries may be a good guide for some libraries to develop local approaches in diversifying populations and recruiting and maintaining a diverse library workforce. University of Washington Bothell and Cascadia Community College Campus Library Diversity Team was formed with the guidelines in the Diversity Standards and adapted some of the eleven standards in it to develop training sessions in cultural awareness and cross cultural communication (Lazzaro, Mills, Garrard, Ferguson, Watson, & Ellenwood, 2014). The outcome was quite positive, and their assessments indicates that structured opportunity to think and learn about diversity and cultural differences by sharing and hearing personal experiences from their colleagues, which can be odd otherwise, was particularly helpful. If your institution has staff members from different cultures, developing cultural awareness from each other is one good way to start.

Questions for our readers

Juleah: Emphasized throughout this article, a continued conversation on diversity, particularly racial and ethnic diversity in the profession, is needed. As we conclude this roundtable discussion, what questions do you offer to reader that will carry this conversation forward?

Todd: As many people have noted, there is a very noticeable racial disparity in the LIS profession, and this has been something that has been talked about for a while now (Espinal, 2001; Galvan, 2015; Honma, 2005; Peterson, 1996). I think a useful way of framing it so that we move beyond the “deficit model” that targets individuals or communities, is to flip the question and ask:

  • Is there a particular deficit in the LIS profession itself that is not attractive to people of color to pursue?
  • Are there ways that the LIS field (and all of us who work in that field, whether as librarians, faculty, administrators, etc.) promotes, intentionally or unintentionally, structures and cultures that may be deemed exclusionary to those who have been historically marginalized and underrepresented?
  • How can we (as individuals, coalitions, institutions) create change?

Ione: We need to start asking some big questions in LIS education and higher education in general.
In terms of LIS education:

  • Do current curricular offerings at ALA-accredited library schools address critical theories of identity and how they intersect with theories of information and the practice of librarianship?
  • How do we encourage faculty teaching in LIS to develop coursework that addresses these issues?
  • For LIS students who plan to pursue academic librarianship as a career path, are tenure and promotion issues raised in their courses so that these new librarians come into their academic workplaces prepared to take on the challenges of earning tenure?

In terms of higher education:

  • If we truly value diversity in all its forms, are we doing everything we can to really show that?
  • Do we talk about valuing different leadership styles, different communication styles, or innovative ways of looking at existing practices?

Azusa: Other questions I would like to ask the readers are:

  • Why diversity among LIS matters particularly for academic library?
  • How is it related to many academic libraries’ vision and mission—supporting the faculty and students’ teaching and learning?
  • Is it because diversity among librarians encourage the users to approach us?
  • Is it because diversity encourages the users to think out of box which is fundamental in researching, teaching, and learning?

Dracine: Ever practical, I would ask readers to contemplate the context of their environment and remember the difficulty we all have with engaging this topic. Discussions about diversity should be diverse. Diversity urgencies may be different from one institution to the next. With that in mind, I think it is important to consider:

  • What is the signal to noise ratio? A discussion about diversity could fill an ocean and after awhile it becomes white noise. However, a meaningful discussion should start by focusing on aspects that are critical and tangible to your specific community/organization.
  • Also, what are the rules of engagement? This seems like a mundane question, but it is a rather important one in terms of creating the space for real and penetrating dialogue.

Juleah: A great deal of what we’ve discussed are learned concepts, either through reading and research, or through lived experiences. Yet, these concepts are complex and cannot simply be conveyed through a sound byte of information.

  • What innovative ways can we educate and teach colleagues and students about complex issues like microaggressions, institutional racism, and privilege, reflecting both traditional means of teaching such as lectures and readings, and through learned experiences?


  • Evaluate the culture at your organization/institution. To what degree is the issue of diversity upheld at your institution and how does it differ to that of your library?
  • If your institution’s mission actively values diversity, what is the campus or community doing about it? Who are the key players and how can you partner with them?
  • From your personal experience, what are the biggest stumbling blocks in the discussions pertaining to diversity? How does it impact how you are able (or not) to dialogue with someone of a different experience than yours?
  • Change can occur at every level – personal, institutional, and professional. As a librarian, where do you feel most empowered to enact change? Where do you find the greatest obstacles?

Thank you to our external reviewer Frans Albarillo, internal reviewers Ellie Collier and Cecily Walker and publishing editor Annie Pho. Your insights and guidance helped us shape and reshape, and reshape some more, our article.

Works Cited:

Basu, S. J., & Barton, A. C. (2007). Developing a sustained interest in science among urban minority youth. Journal of Research in Science Teaching, 44(3), 466–489.

Cohen, C. J. (1999). The boundaries of blackness: AIDS and the breakdown of Black politics. Chicago: University of Chicago Press.

Crenshaw, K. (1991). Mapping the margins: Intersectionality, identity politics, and violence against women of color. Stanford Law Review, 43(6), 1241-1299.

Espinal, I. (2001). A new vocabulary for inclusive librarianship: applying whiteness theory to our profession. In L. Castillo-Speed, (Ed.), The power of language/El poder de la palabra: selected Papers from the Second REFORMA National Conference (pp. 131–49). Englewood, CO: Libraries Unlimited.

Frickel, S. (2011). Who are the experts of environmental health justice? In G. Ottinger & B. R. Cohen (Eds.), Technoscience and environmental justice: expert cultures in a grassroots movement (pp. 21-40). Cambridge, Mass.: MIT Press.

Galvan, A. (2015). Soliciting performance, hiding bias: whiteness and librarianship. In the Library with the Lead Pipe. Retrieved from

Garibay, J. C., (2014). Diversity in the Classroom. Los Angeles, CA: UCLA Diversity & Faculty Development. Retrieved from

Gilmore, R. W. (2007). In the shadow of the shadow state. In Incite! Women of Color Against Violence (Ed.), The revolution will not be funded: beyond the non-profit industrial complex (pp.41-52). Cambridge, Mass.: South End Press.

Hill, D. (2010). Class, capital and education in this neoliberal and neoconservative period. In S. Macrine, P. Maclaren, and D. Hill (Eds.), Revolutionizing pedagogy: education for social justice within and beyond global neo-liberalism (pp. 119–144). New York: Palgrave Macmillan.

Honma, T. (2005). Trippin’ over the color line: The invisibility of race in library and information studies. InterActions: UCLA Journal of Education and Information Studies, 1(2), 1-26. Retrieved from

Institutional racism. (2014). In Scott, J.(Ed.), A Dictionary of Sociology. Retrieved from

Josey, E. J. (1972). Libraries, reading, and the liberation of black people. The Library scene, 1(1), 4-7.

Lazzaro, A. E., Mills, S., Garrard, T., Ferguson, E., Watson, M., & Ellenwood, D. (2014). Cultural competency on campus Applying ACRL’s Diversity Standards. College and Research Libraries News, 75, 6, 332-335. Retrieved from

Lee, O., & Buxton, C. A. (2010). Diversity and equity in science education: Research, policy, and practice. New York: Teachers College Press.

Lyon, G. H., Jafri, J., & St. Louis, K. (2012). Beyond the pipeline: STEM pathways for youth development. Afterschool Matters, 16, 48–57.

Morales, M., Knowles, E. C., & Bourg, C. (2014). Diversity, Social Justice, and the Future of Libraries. portal: Libraries and the Academy, 14(3), 439-451. DOI: 10.1353/pla.2014.0017

Obama, M., (2015, June 9). Remarks by the First Lady at Martin Luther King Jr. Preparatory High School Commencement Address. Speech presented at Martin Luther King Jr. Preparatory High School Commencement, Chicago, IL. Retrieved from

Pateman, J. (2003). Libraries contribution to solidarity and social justice in a world of neo-liberal globalisation. Information for Social Change, 18. Retrieved from

Peterson, L. (1996). Alternative perspectives in library and information science: Issues of race. Journal of Education for Library and Information Science, 37(2), 163–174.

Pho, A., & Masland, T. (2014). The revolution will not be stereotyped: Changing perceptions through diversity. In N. Pagowsky & M. Rigby (Eds.), The librarian stereotype: Deconstructing perceptions & presentations of information work (pp. 257-282). Chicago: Association of College and Research Libraries. Retrieved from

Ridley, C., & Kelly, S. (2006). Institutional racism. In Y. Jackson (Ed.), Encyclopedia of multicultural psychology. (pp. 256-258). Thousand Oaks, CA: SAGE Publications, Inc. doi:

Solorzano, D., & Huber, L. (2012). Microaggressions, racial. In J. Banks (Ed.), Encyclopedia of diversity in education. (pp. 1489-1492). Thousand Oaks, CA: SAGE Publications, Inc. Retrieved from

Sue, D. W., Capodilupo, C.M., Torino, G.C., Bucceri, J.M., Holder, A.M.B., Nadal, K.L., & Esquilin, M. (2007). Racial Microaggressions in everyday life: Implications for clinical practice. American Psychologist, 62 (4), 271-286.

Swanson, J., Tanaka, A., Gonzalez-Smith, I., Damasco, I.T., Hodges, D., Honma, T., Espinal, I., (2014, March 26). From the Individual to the Institution: Exploring the Experiences of Academic Librarians of Color [Audio recording]. Retrieved from

  1. A recorded slidecast presentation, including full audio of the ACRL 2015 panel discussion “From the Individual to the Institution: Exploring the Experiences of Academic Librarians of Color” is freely available.
    Users who do not already have access will need to establish an account in order to view this and other ACRL 2015 recorded slide cast presentations.
  2. Intersectionality is a concept developed by critical race scholar Kimberle Crenshaw (1991) that seeks to examine the “multiple grounds of identity” that shape our social world. This theory recognizes that categories such as race, class, gender, sexuality, etc. are not mutually exclusive but, rather, are interconnected and co-constituted and therefore cannot be examined independently of each other. It also recognizes the interconnectedness of systems of oppression that shape the structural, political, and representational aspects of identity.
  3. Marginalization refers to the way in which the dominant group uses institutions, laws, ideologies, and cultural norms to disempower, control, and oppress minority groups (Cohen, 1999). Marginalization can occur in various realms, including but not limited to the political, economic, and social, and can include being excluded from decision-making processes and institutions, denied access to resources, segregation and stigmatization based on perceived identity.
  4. Institutional racism is the sometimes intentional, but more often unintentional policies, practices, or customs, that prevent or exclude racial groups from equal participation in an institution (Ridley & Kelly, 2006; Dictionary of Sociology, 2014).

Walkthrough Of Geonames Recon Service / Christina Harlow

This came out of documentation I was writing up for staff here at UTK. I apologize if it is too UTK-workflow specific.

I’m working currently on migrating a lot of our non-MARC metadata collections from older platforms using a kind of simple Dublin Core to MODS/XML (version 3.5, we’re currently looking at 3.6) that will be ingested into Islandora. That ‘kind of simple Dublin Core’ should be taken as: there was varying levels of metadata oversight over the years, and folks creating the metadata had different interpretations of the Dublin Core schema - a well-documented and well-known issue/consideration for working with such a general/flexible schema. Yes, there are guidelines from DCMI, but for on-the-ground work, if there is no overarching metadata application profile to guide and nobody with some metadata expertise (or investment) to verify that institution-wide, descriptive (or any type, for that matter) metadata fields are being used consistently, it is no surprise that folks will interpret metadata fields in different ways with an eye to their own collection/context. This issue increases when metadata collections grow over time, occur with little to no documentation, and a lot of the metadata creation is handed off to content specialists, who might then hand it off to their student workers. If you are actually reading my thoughts right now, well thanks, but also you probably know the situation I’m describing well.

Regardless, I’m not here to talk about why I think my job is important, but rather about a very particular but useful procedure and tool that make up my general migration/remediation work, which also happens to be something I’m using and documenting right now for UTK cataloger reskilling purposes. I have been working with some of the traditional MARC catalogers to help with this migration process, and so far the workflow is something like this:

  1. I pull the original DC (or other) data, either from a csv file stored somewhere, or, preferably, from an existing OAI-PMH DC/XML feed for collections in (soon to be legacy) platforms. This data is stored in a GitHub repository [See note below] as the original data for both version control and “But we didn’t write this” verification purposes.
  2. A cleaned data directory is made in that GitHub repo, where I put a remediation files subdirectory. I will review the original data, see if an existing, documented mapping makes sense (unfortunately, each collection usually requires separate mapping/handling), and pull the project into OpenRefine. In OpenRefine, I’ll do a preliminary ‘mapping’ (rename columns, review the data to verify my mapping as best I can without looking at the digitized objects due to time constraints). At this point, I will also note what work needs to be done in particular for that dataset. I’ll export that OpenRefine project and put it into the GitHub repo remediation files subdirectory, and also create or update the existing wiki documentation page for that collection.
  3. At this point, I will hand off the OpenRefine project to one of the catalogers currently working on this metadata migration project. They are learning OpenRefine from scratch but doing a great job of getting the hang of both the tool and the mindset for batch metadata work. I will tell them some of the particular points they need to work on for that dataset, but also they are trained to check that the mapping holds according to the UTK master MODS data dictionary and MAP, as well as that controlled access points have appropriate terms taken from the selected vocabularies/ontologies/etc. that we use. With each collection they complete, I’m able to give them a bit more to handle with the remediation work, which has been great.
  4. Once the catalogers are done with their remediation work/data verification, I’ll take that OpenRefine project they worked on, bring it back into OpenRefine on my computer, and run some of the reconciliation services for pulling in URIs/other related information we are currently capturing in our MODS/XML. One of the catalogers is starting to run some of these recon services herself, but it is something I’m handing over slowly because there is a lot of nuance/massaging to some of these services, and the catalogers working on this project only currently do so about 1 day a week (so it takes longer to get a feeling for this).
  5. I review, do some reconciliation stuff, get the complex fields together that need to be for the transform, then export as simple XML, take that simple XML and use my UTK-standard OpenRefine XML to MODS/XML XSLT to generate MODS/XML, then run encoding/well-formed/MODS validation checks on that set of MODS/XML files.
  6. Then comes the re-ingest to Islandora part, but this is already beyond the scope of what I meant this post to be.

GitHub Note: I can hear someone now: ‘Git repositories/GitHub is not made for data storage!’ Yes, yes, I know, I know. It’s a cheat. But I’m putting these things under version control for my own verification purposes, as well as using GitHub because it has a nice public interface I can point to whenever a question comes up about ‘What happened to this datapoint’ (and those questions do come up). I don’t currently, but I have had really good luck with using the Issues component of GitHub too for guiding/centralizing discussion about a dataset. Using GitHub also has had the unintended but helpful consequence of highlighting to content specialists who are creating the metadata just why we need metadata version control, and why the metadata updates get frozen during the review, enhancement and ingest process (and after that, metadata edits can only happen in the platform). But, yes, GitHub was not made for this, I know. Maybe we need dataHub. Maybe there is something else I *should* be using. Holla if you know what that is.

Okay, so I’m in step 4 right now, with a dataset that was a particular pain to remediate/migrate because the folks who did the grouping/digitization pulled together a lot of different physical objects into one digital object. This is basically the digital equivalent of ‘bound-withs’. However, the cataloger who did some of the remediation did a great job of finding, among other datapoints, the subject_geographic terms, getting them to subject_geographic, and normalizing the datapoint to a LCNAF/LCSH heading where possible. I’m about to take this and run my OpenRefine Geonames recon service against it to pull in coordinates for these geographic headings where possible. As folks seem to be interested in that recon service, I’m going to walk through that process here and now with this real life dataset.


So here is that ready-for-step-4 dataset in LODRefine (Linked Open Data Refine, or OpenRefine with some Linked Data extensions baked in; I need to write more about that later):

Metadata in LODRefine

You can see from that portion a bit of what work is going on here. What I’m going to target in on right now is the subject_geographic column, which has multiple values per record (records in this instance are made up of a number of rows. This helps centralize the reconciliation work, but will need to be changed to 1 record = 1 row before pulling out for XML transformations). Here is the column, along with a text facet view to see the values we will be reconciling against Geonames:

LODRefine Geographic Text Facet

Look at those wonderfully consistent geographic terms, thanks to the cataloger’s work! But, some have LoC records and URIs, some don’t, some maybe have Geonames records (and so coordinates), some might not… so let’s go ahead and reconcile with Geonames first. To use the Geonames service, I already have a copy of the Geonames Recon Service on my computer, and I have updated my local machine’s code to have my own private Geonames API name. See more here:

I’m then going to a CLI (on my work computer, just plain old Mac Terminal),

Mac Terminal & Bash

change to the directory where I have my local Geonames recon service code stored,

Mac Terminal CD to Geonames directory

then type in the command ‘python –debug’. The Geonames endpoint should fire up on your computer now. You may get some warning notes like I have below, which means I need to do some updating to this recon service or to my computer’s dependencies installation (but am going to ignore for time being while recon service still works because, well, time is at a premium).

Terminal with Geonames flask app running

Note, during all of this, I already have LODRefine running in a separate terminal and the LODRefine GUI in my browser.

LODRefine running in terminal

Alright, with all that running, lets hop back to our web browser window where LODRefine GUI is running with my dataset up. I’ve already added this Geonames as a reconciliation service, but in case you haven’t, you would still go first to the dropdown arrow for any column (I’m using the column I want to reconcile here, subject_geographic), then to Reconcile > Start Reconciling.

LODRefine drop-down menu to reconcile

A dialog box like this should pop up:

LODRefine reconcile dialog box

I’ve already got GeoNames Reconciliation Service added, but if you don’t, click on ‘Add Standard Service’ (in the bottom left corner), then add the localhost URL that the Geonames python flask app you started up in the Terminal before is running on (for me and most standard set ups, this will be

LODRefine add recon service dialog box with URL input

I will cancel out of that because I already have it running, and then click on the existing GeoNames Reconciliation Service in the ‘Reconcile column’ dialog box. If you just added the service, you should have the same thing as me showing now upon adding the service:

LODRefine Geonames reconciliation options

There are a few type options to choose from:

  • geonames/name = search for the cells’ text just in the names field in a Geonames record
  • geonames/name_startWith = search for Geonames records where the label starts with the cells’ text
  • geonames/name_equals = search for an exact match between the Geonames records and the cells’ text
  • geonames/all = just do keyword search of Geonames records with our cells’ text.

Depending on the original data you are working with, the middle two options can return much more accurate results for your reconciliation work. However, because these are LoC-styled headings (with the mismatching of headings style with Geonames I’ve described recently in other posts as well as in the for this Geonames Recon code), I’m going to go with geonames/all. If you haven’t read those other thoughts, basically, the Geonames name for Richmond, Virginia is just ‘Richmond’, with Virginia, United States, etc. noted instead in the hierarchy portion of the record. This makes sense but makes for bad matching with LoC-styled headings. Additionally, the fact that a lot of these geographic headings refer to archaeological dig sites and not cities/towns/other geopolitical entities also means a keyword search will return better results (in that it will return results at all).

Sidenote: See that ‘Also use relevant details from other columns’ part? This is something I’d love to use for future enhancements to this recon service (maybe refer to hierarchical elements there?) as well as part of a better names (either LCNAF or VIAF) recon service I’m wanting to work more on. Names, in particular, personal names and reconciliation is a real nightmare right now.

Alright, so I select ‘geonames/all’ then I click on the ‘Start reconciling’ button in the bottom right corner. Up should pop a yellow notice that reconciliation is happening. Unfortunately, you can’t do more LODRefine work while that is occuring, and depending on your dataset size, it might take a while. However, one of the benefits of using this reconciliation service (versus a few others ways that exist for reconciliation against an API in LODRefine) is speed.

LODRefine note that reconciliation service is running

Once the reconciliation work is done, up should pop a few more facet boxes in LODRefine - the judgement and the best candidate’s score boxes, as well as the matches found as hyperlinked options below each cell in the column. Any cell with a value considered a high score match to a Geonames value will be associated with and hyperlinked to that Geonames value automatically.

Geonames reconciliation done in LODRefine

Before going through matches and choosing the correct ones where needed, I recommend you change the LODRefine view from rows to records - as long as the column you are reconciling in is not the first column. Changing from records to rows then editing the first column means that, once you go back to records view, the records groupings may have changed and no longer be what you intent. But for any other column, the groupings remain intact.

LODRefine records to rows buttons

Also, take a second to look at the Terminal again where Geonames is running. You should see a bunch of lines showing the API query URLs used for each call, as well as a 200 response when a match is found (I’m not going to show you this on my computer as each API call has my personal Geonames API name/key in it). Just cool to see this, I think.

Back to the work in LODRefine, I’m going to first select to facet the results with judgement:none and then unselect ‘error’ in the best candidate’s score facet box.

LODRefine geonames reconciliation facet boxes

If you’re looking at this and thinking ‘1 match? that is not really good’, well… 1. yes, there are definite further improvements needed to have Geonames and LoC-styled headings work better together, but… 2. library data has a much, much higher bar for this sort of batch work and the resultant quality/accuracy expected, so 1 auto-determined match in a set of geographic names focused on perhaps not well known archaeological sites is okay with me. Plus, the Geonames recon service is not done helping us yet.

Now you should have a list of cells with linked options below each value:

LODRefine geonames reconciliation choices within cells

What I do now is review the options, and choose the double check box for what is the correct Geonames record to reconcile against. The double check box means that what I choose for this cell value will also be applied to all other cells in LODRefine that have that same value.

If I’m uncertain, I can also click on any of the options, and the Geonames record for that option will show up for my review. Also, for each option you select as the correct one for that cell, those relevant cells should then disappear from the visible set due to our facet choices.

Using these functionalities, I can go through the possible matches fairly quickly, and much more quickly all other work included than doing this matching entirely manually. Due to the constraints of library data’s expected quality, this sort of semi-automated, enhanced-manual reconciliation is really where a lot of this work will occur for many (but not all) institutions.

If in reviewing the matches, if there is no good match presented, you can choose ‘create new topic’ to pass through the heading as found, unreconciled with Geonames.

Now I’m done my review (which took about 5 minutes for this set), I can see that I have moved from 1 matched heading to 106 matched headings (I deselected the ‘None’ facet in the judgment box and closed the ‘best match facet’ box).

LODRefine Geonames reconciliation facet boxes after review

However, there are still 134 headings that were matched to nothing in Geonames. Click on that ‘none’ facet in the judgment box, and leaving the geographic_subject column text facet box up, I can do a quick perusal of what didn’t find a match, as well as check on headings that seem like should have had a match in Geonames. However, for this dataset, I see a lot of these are archaeological dig sites, which probably aren’t in Geonames, so the service worked fairly well so far. This is also how you’ll find some typos or other errors as well, and any historical changes in names that may have occured. For the facet values that I do find in Geonames, I click to edit the facet value and go ahead and add the coordinates, which is what I pull from Geonames currently (we opt to choose the LoC URI for these headings at present, but this is under debate).

Note: datasets with more standard geographic names (cities, states, etc) will have much better results doing the above described work. However, I want to show here a real life example of something I want to pull in coordinates from Geonames for, like archaeological or historical sites.

LODRefine post-Geonames Reconciliation Text Facet for unmatched values

I end up adding Coordinates for 5 values which weren’t matched to Geonames either because of typos or because the site is on the border of 2 states (a situation LoC and Geonames handle differently). I fixed 7 typos as well in this review.

Now I’m done reconciling, I want to capture the Geonames Coordinates in my final value. First I close all the open facet boxes in LODRefine. Now on that subject_geographic column, I am going to click on the column header triangle/arrow and choose Edit Cells > Transform.

LODRefine Column Text Transform Selection

In the Custom text transform on column subject_geographic box that appears, in the Expression text area, I will put in the following:

if(isNonBlank(, value + substring(," | ")), value)

Lets break this out a bit:

  • value = the cell’s original value that was then matched against Geonames.
  • = the name (and coordinates because we’re using the Geonames recon service I cobbled together) of the value we choose as a match in the reconciliation process.
  • = the URI for that matched value from the reconciliation process.
  • Why isn’t there cell.recon.match.coords? Yes, I tried that, but it involves hacking the core OpenRefine recon service backend more than I’m willing to do right now
  • if(test, do this, otherwise do that) = not all of the cells had a match in Geonames, so I don’t want to change those unmatched cells. The if statements then says “if there is a reconciliation match, then pull in that custom bit, otherwise leave the cell value as is.”
  • substring(,“ | “)) = means I just want to pull everything in that value after the pipe - namely, the coordinates. I am leaving the name values as is because they are currently matched against LoC for our metadata.

Why do we need to run this transform? Because although we have done reconciliation work in LODRefine, if I was to pull this data out now (say export as CSV), the reconciliation data would not come with it. LODRefine is still storing the original cell values in the cells, with the reconciliation data laid over top of it. This transform will change the underlying cell values to the reconciled values I want, where applicable.

After running the transform, you can remove the reconciliation data to see exactly what the underlying values now look like. And remember there is always the Undo tab in LODRefine if you need to go back.

LODRefine clear reconciliation data menu option

What our cell values look like now:

LODRefine data post-reconciliation

And, ta-da! Hooray! At this point, I can shut down the Geonames reconciliation python flask app running in that terminal by going to the Terminal window it is running in and typing in cntl + C. Back in LODRefine, remember to change back from rows view to records view (links to do this are in the top left corner).

Thoughts on this process

Some may think this process seems a bit extreme for pulling in just coordinates. However…

  1. Remember that this seems extreme for the first few times or when you are writing up documentation explaining it (especially if you are as verbose as I am). In practice, this takes me maybe at most 20 minutes for a dataset of this size (73 complex MODS records with 98 unique subject_geographic headings). It gets faster and easier, and is definitely more efficient still, than completely manually updates, and remains far more accurate than completely automated reconciliation options.
  2. If so moved, I could pull in Geonames URIs as part of this work, which would be even better. However, because of how we handle our MODS at present, we don’t. But the retrieval of URIs and other identifiers for such datapoints is a key benefit.
  3. For datasets larger than 100 quasi-complex records, this is really the only way to go at present and considering our workflows. I don’t want to give these datasets to the catalogers and ask them to add coordinates, URIs, or other reconciled information because they need to focus on the batch work on this process - checking the mappings, getting values in appropriate formats or encodings - and not manually searching each controlled access point in a record or row then copy and pasting that information from some authority source. But this is all a balancing act.
  4. This process also has the added benefit of making very apparent typos and other such mistakes. Unfortunately, I’m not as aware in my quick blog ramblings.

Hope this is helpful for others.

Link roundup July 28, 2015 / Harvard Library Innovation Lab

I see a theme here — computers are entertainers, directors, performers.

A Sort of Joy

Editor by NYTLabs

GIFs of Japanese Life

The Next Wave

Islandora Community Stories / Islandora

During the Islandora Conference, we hope to collect stories from community members about how and why they got started with Islandora. Alex Kent from the Conference Planning Team will arrange casual in person video interviews from those who are willing, taking about 10-15 minutes of your time. We'll be using iPhone/iPADs to record the interviews. To participate, seek out Alex at the conference or drop an email to to set up a time. 

After the conference we'll compile the interviews and make them available on the Islandora site as a way to highlight community members and Islandora's value.

If you do not wish to be interviewed in person, you can take the survey online here.

Those who participate at the conference will be entered in a drawing to win one of five Islandora Tuque Tuques.

A useful function for making querySelectorAll() more like jQuery / LibUX

Through LibUX I try to evangelize the importance of speed — or the perception of speed — to the net value of the user experience. People care.

Of the many tweaks we can make to improve web performance, we might try to ween our code from javascript libraries where it’s unnecessary. Doing so removes bloat in a couple of ways: first, by literally reducing the number of bytes required to render or add functionality to a site or app; second — and, more importantly — scripts just process faster in the browser if they have fewer methods to refer to.

As I write this I am weening myself from jQuery, and even though newer utilities like querySelector (MDN) do the trick by using jQuery-like syntax, they’re not quite the Coca-Cola mouth-watering sweetness of $( selector ).doSomething().

The difference between document.querySelectorAll( '.pie' ) (MDN) and $( '.pie' ) is that the object returned by the former is an array-like-but-not-an-array NodeList that doesn’t give you the immediate access to manipulate each instance of that element in the document. With jQuery, to add cream to every slice of pie you might write

$( '.pie' ).addClass( 'cream' );

The no-jQuery way requires that you deal with the NodeList yourself. This example is only three additional lines — but it’s enough to make me whine a little.

var pie = document.querySelectorAll( '.pie' );

for ( var i = 0; i < pie.length; i++ ) {
  pie[i].classList.add( 'cream' );

A useful helper function

The following wrapper allows for use of a jQuery-like dollar-sign selector that lets you iterate through these elements as a simple array: $$( selector ).forEach( function( el { doSomething() });. I have adopted this from seeing its use in some of Lea Verou‘s projects.

function $$(selector, context) {
    context = context || document;
    var elements = context.querySelectorAll(selector);

The array-like NodeList is turned into a regular array with elements ) (MDN)), which can add convenience and otherwise mitigate some of the withdrawal we in Generation jQuery feel when iterating through the DOM.

$$( '.pie' ).forEach( function( pie ) {
  pie.classList.add( 'cream' );

The post A useful function for making querySelectorAll() more like jQuery appeared first on LibUX.

Updated Westlake footnote / William Denton

I updated the list of fictional footnotes with more information on Don’t Ask (1993) by Donald E. Westlake, which I just read (it’s a Dortmunder):

Two chapter headings have footnotes that identify them as “Optional—historical aside—not for credit.” Chapter six mentions a street with “a whole block of taxpayers.” This is footnoted: “A temporary structure, commonly one story in height and containing shops of the most ephemeral sort. Constructed by owners of the land when a delay is anticipated, sometimes of several decades’ duration, between the razing of the previous unwanted edifice and the erection of the new blight on the landscape. Called a ‘taxpayer’ because that’s what it does.+” The second footnote, indented under the first, says, “Didn’t expect a footnote in a novel, did you? And a real informative one, too. Pays to keep on your toes.”

The t.p. verso of my 1994 Mysterious Press paperback edition has this:

Enjoy lively book discussion online with CompuServe. To become a member of CompuServe call 1-800-848-8199 and ask for the Time Warner Trade Publishing forum. (Current members: GO:TWEP.)

I called the number but got a fast busy.

How to Participate in the September 2015 NDSA New England Regional Meeting / Library of Congress: The Signal

The following is a guest post by Kevin Powell, digital preservation librarian at Brown University.

Credit: Zachary Painter

Credit: Zachary Painter

On September 25th, UMass Dartmouth will host the National Digital Stewardship Alliance New England Regional Meeting with Brown University. We enthusiastically encourage librarians, archivists, preservation specialists, knowledge managers, and anyone else with an interest in digital stewardship and preservation to join us. There are a number of ways to participate this year:

Give a presentation!

We are currently accepting proposals for 10-15 minute presentations. These can range from project reports and collaboration proposals to tool demonstrations and formal research. Share what you’re working on at your organization with your local colleagues! If you have something to share regarding digital stewardship, we’d like to hear about it. Proposals are due August 7th.

Submit a discussion topic!

When registration opens August 14th, registered attendees will have the opportunity to suggest a topic for the “unconference.”  The key idea is that participants define the topics, and everyone who attends participates in the discussions. Attendees will vote on the suggested topics during our free , catered lunch on the day of the meeting. After the second session of presentations, we will announce a line-up of discussions based on everyone’s votes.

Give a lightning talk on an unconference discussion topic!

We set 45 minutes aside at the end of our meeting for 5 minute, informal lightning talks related to the unconference discussions. If your group made some insightful observations or generated helpful documentation, we invite one person from that group to briefly share with everybody.


New England was the very first region to host a regional NDSA meeting in 2013, and this year marks the third annual meeting. There is a strong, vibrant community of professionals interested in digital stewardship and working on a diverse set of projects. We want to collaborate and learn with you! Plus, there are a few beaches nearby, and we will finish with plenty of daylight to spare! Registration opens August 14th.

We hope to see you there!

New ORCID Integrations / Brown University Library Digital Technologies Projects

  •  MIT Libraries have created an ORCID integration that allows their faculty to link an existing ORCID iD to their MIT profile or create a new ORCID record, which then populates the ORCID record with information about their employment at MIT
  • University of Pittsburgh is generating ORCID records for their researchers and adding their University of Pittsburgh affiliation


Code4Lib Northern California: Stanford, CA / Code4Lib

Stanford University Libraries will host a Code4Lib Northern California regional meeting on Tuesday, August 4, 2015 in the Lathrop Library on Stanford's campus. We'll have a morning session with lightning talks about various topics and an afternoon of smaller breakout and working sessions.

You can get more details about the event and register on the Code4Lib NorCal wiki page.

What is actually happening out there in terms of institutional data repositories? / HangingTogether

There is an awful lot of talk about academic libraries providing data curation services for their researchers.  It turns out that in most cases that service amounts to training and advice, but not actual data management services.  However, institutions without data repositories are likely thinking about implementing one.  We thought it would be helpful to hear from those few who have implemented data repositories.  [If you are one of those pioneers and did not get a chance to fill out the survey, feel free to describe your repository program as a comment to this post.]

OCLC Research conducted an unscientific survey about data repositories from 5/19/2015 to 7/16/2015. Initially the survey was sent to twelve institutions that were believed to have a data repository. They were asked to identify other institutions with data repositories.   In total, 31 institutions were invited to take the survey. 22 filled out the survey and two of those indicated that they do not have a data repository. The following summarizes the twenty responses from institutions with data repositories.

TECHNICAL DETAILS. Eight of the institutions run a stand-alone data repository and twelve have a combination institutional repository and data repository. Six of the sites run DSpace, six run Hydra/Fedora systems, 4 have locally developed systems, and there are one each running Rosetta, Dataverse, SobekCM, and HUBzero.

PRESERVATION. All but one provide integrity checks. Seventeen keep offsite backup copies. Twelve provide format migration. Ten put master files in a dark archive. Two volunteered that they provide DOI generation.

SERVICES. Three institutions reported that they accept deposits from researchers not associated with their institution. One is part of a consortial arrangement and one is part of a network. Seven have their data or metadata harvested by other data repositories. When researchers deposit their data in an external repository, ten will include the datasets in their own repository and one includes just the metadata in their repository. All of them provide public access to data. Fourteen restrict or limit access when appropriate.

FUNDING. When asked about funding sources, eighteen reported that the library’s base budget covered at least some of the expenses. Seven said that was their only source of funding. Seven reported getting fees from researchers and four reported getting fees from departments. Five get institutional funding specifically for data management. Four get money from the IT budget. Only one institution reported getting direct funds from grant-funded projects and only one reported getting indirect funds from grant-funded projects. None reported getting fees from users, having an endowment, or having had  grant funding to develop the repository.

While technical, preservation, and service issues can be challenging, I suspect that for some time the funding issues will be the most inhibiting to provision of this important service in support the university research mission.

[Many thanks to Amanda Rinehart, Data Management Services Librarian at The Ohio State University, for help with the creation of the survey]

About Ricky Erway

Ricky Erway, Senior Program Officer at OCLC Research, works with staff from the OCLC Research Library Partnership on projects ranging from managing born digital archives to research data curation.

Creating campus-wide technology partnerships: Mission impossible? / LITA

Libraries have undergone significant changes in the last five years, shifting from repositories to learning spaces, from places to experiences. Much of this is due to our growing relationships with our IT, instructional technology, and research colleagues as the lines between technology and library-related work become continually more blurred.

But it’s not always easy to establish these types of partnerships, especially if there haven’t been any connections to build on. So how can you approach outreach to your IT campus departments and individuals?

There are typically two types of partnerships that you can initiate:

1. There is a program already established, and you would like the library to be involved where it wasn’t involved before

2. You are proposing something completely new

All you have to do is convince the coordinator or director of the project or department that having the library become a part of that initiative is a good thing especially if they don’t think you have anything to offer. Easier said than done, right? But what happens if that person is not responding to your painstakingly crafted email? If the person is a director or chair, chances are they have an assistant who is much more willing to communicate with you and can often make headway where you can’t.

Ask if you can attend a departmental meeting or if they can help you set up a meeting with the person who can help things move forward. Picking up the phone doesn’t hurt either-if someone is in their office, they might, just might, be inclined to talk with you as opposed to ignoring the email you sent them days ago which is by now buried under an avalanche of other emails and will be duly ignored.

Always try to send an agenda ahead of time so they know what you’re thinking-that additional time might just be the thing they need to be able to consider your ideas instead of having to come up with something on the spot. Plus, if you’re nervous, that will serve as your discussion blueprint and can prevent you from rambling or going off into tangents-remember, the person in front of you has many other things to think about, and like it or not, you have to make good use of their time!

After the meeting, along with your thank you, be sure to remind them of the action items that were discussed-that way when you contact others within the department to move forward with your initiative they are not wondering what’s going on and why you’re bugging them. Also asking who might be the best person to help with whatever action items you identify will help you avoid pestering the director later-there’s nothing worse than getting the green light then having to backtrack or delay because you forgot to ask them who to work with! From there on out, creating a system for communicating regularly with all those involved in moving forward is your priority. Make sure everyone who needs to be at the table receives an invitation and understands why they are there. Clarify who is in charge and what the expectations of the work are. Assume that they know nothing and the only thing their supervisor or colleague has said is that they will be working with the library on a project.

You might also have to think outside the proverbial IT box when it comes to building partnerships. For example, creating a new Makerspace might not start with IT, but rather with a department who is interested in incorporating it into their curriculum. Of course IT will become part of the equation at some point, but that unit might not be the best way to approach creating this type of space and an academic department would be willing to help split the cost because their students are getting the benefits.

Finally, IT nowadays comes in many forms and where you once thought the campus supercomputing center has nothing to do with your work, finding out exactly what their mission is and what they do, could come in handy. For example, you might discover that they can provide storage for large data sets and they could use some help to spread the word to faculty about this. Bingo! You’ve just identified an opportunity for those in the library who are involved in this type of work to collaborate on a shared communication plan where you can introduce what the library is doing to help faculty with their data management plans and the center can help store that same data.

Bottom line, technology partnerships are vital if libraries are going to expand their reach and become even more integrated into the academic fabric of their institutions. But making those connections isn’t always easy, especially because some units might not see the immediate benefits of such collaborations. Getting to the table is often the hardest step in the process, but keeping these simple things in mind will (hopefully) smooth the way:

1. Look at all possible partners, not just the obvious IT connections

2. Be willing to try different modes of outreach if your preferred method isn’t having success

3. Be prepared to demonstrate what the library can bring to the table and follow through

Coming to Hydra Connect 2015? We need a poster from you! / Hydra Project

If you’re coming to Hydra Connect 2015, please read on carefully.  Planning to come but not booked yet?  Tickets are selling steadily and we have a limit of 200 places – don’t leave your booking too late!  Full details at Hydra Connect 2015 including lists of the workshops, plenary and parallel track sessions.  Taking account of people who have agreed to speak but have not yet registered, almost half the tickets have gone.

We are devoting more than two hours of Hydra Connect 2015 to a “poster show and tell” slot.  At previous Connects this proved to be a highly successful session and we were urged to repeat it.  The event is an “open room format”:  each contributor is given a poster display space and table around the outside of a large room.  Attendees are encouraged to circulate and to discuss the work represented in each poster with the author(s).  We encourage authors to time-share with a colleague where possible so that everyone gets the chance to circulate.

We are asking all institutions represented at Hydra Connect to create a poster (or more) about any significant Hydra-related work they are doing (or planning) and to arrive in Minneapolis prepared to talk about it in the poster show and tell.  We know how awkward it can be to bring posters on a plane so we are making arrangements for you to email it for printing locally.

Posters should be sent either in PowerPoint or PDF format. They should be at either half scale or full scale to prevent issues with images when scaling. The maximum size is 24″ x 36″ (tall or wide) or 36″ x 48″ (wide only).  Anything smaller that that can be printed.  Costs for printing and foam mounting these sizes are approximately $90 and $135 respectively.  We will send details explaining how to use this local service in due course; in the meantime, please get designing!

Peter Binkley, Matt Critchlow, Karen Estlund, Erin Fahy, Anna Headley (Program Committee)

Mark Bussey (Conference host)

Urgent: Help Rebury “Zombie” Cybersecurity Bill / District Dispatch

Zombie hand breaking through dirt

Help Rebury CISA Now! (Credit: 22860, Flickr)

It’s back to the “barricades” for librarians and our many civil liberties coalition allies. Just over a year ago, District Dispatch sounded the alarm about the return of privacy-hostile “cybersecurity” or “information sharing” legislation. Again dubbed a “zombie” for its ability to rise from the legislative dead, the current version of the bill (S. 754) goes by the innocuous name of the “Cybersecurity Information Sharing Act” . . . but “CISA” is anything but. As detailed below, not only won’t it be effective as advertised in thwarting cyber-attacks, but it de facto grants broad new mass data collection powers to many federal, as well as state and even local, government agencies!

CISA was approved in a secret session last March by the Senate Intelligence Committee. In April, ALA and more than 50 other organizations, leading cybersecurity experts and academics called on Congress to fix its many flaws in a detailed letter. Since then, S. 754 hasn’t had a single public hearing in this Congress. Nonethe­less, Senate Majority Leader Mitch McConnell (R-KY) is pushing for a vote on S. 754 by the full Senate right now, before the Senate breaks for its summer recess in a matter of days. Sadly, unless we can stop it, this dangerously and heavily flawed bill looks to be headed for passage even if not amended at all.

Touted by its supporters as a means of preventing future large-scale data breaches like the massive one just suffered by the federal government’s Office of Personnel Management, leading security experts argue that CISA actually won’t do much, if anything, to prevent such incursions . . . and many worry that it could make things worse. As detailed by our compatriots at New America’s Open Technology Institute and the Center for Democracy and Technology, what it will do is create incentives for private companies and the government to widely share huge amounts of Americans’ personally identifiable information that will itself then be vulnerable to sophisticated hacking attacks. In the process, the bill also creates massive exemptions from liability for private companies under every major consumer privacy protection law now on the books.

Your collected personal information would be shared instantly under the bill among many federal agencies including the Office of the Director of National Intelligence, the Department of Defense, NSA and the Department of Justice. Worse yet, it also would be shared with garden variety law enforcement entities at every level of government. None of them would be required to adequately restrict how long they can retain that personal information, or limit what kinds of non-cyber offenses the information acquired could be used to prosecute. If enacted, that would be a sweeping “end run” on the Fourth Amendment and, in effect, make CISA a broad new surveillance bill.

CISA also allows both the government and private companies to take rapid unilateral “counter­measures” to retaliate against perceived threats, which may disable or disrupt many computer networks, including for example a library system’s or municipal government’s, believed to be the source of a cyber-attack.

With all of its defects and dangers, it’s no wonder that CISA’s been labelled a “zombie!” Now, it’s time for librarians to rise again, too . . . to the challenge of once more stopping CISA in its tracks. This time around, in addition to just calling on the President to threaten to veto CISA as he has in the past, ALA has partnered with more than a dozen other national groups to do it in a way so old it’s novel again: sending Senate offices thousands . . . of faxes.

Courtesy of our friends at, you can join this retro campaign to protect the future of your privacy by delivering a brief, pre-written message online with just a single mouse click at now! (If you prefer, you’ll also have the option of writing your own message.)

Together we can stop CISA one more time, but votes could happen anytime now. Please act today!

 Additional Information and Resources


American Civil Liberties Union

Center for Democracy and Technology

New America’s Open Technology Institute

The post Urgent: Help Rebury “Zombie” Cybersecurity Bill appeared first on District Dispatch.

NOW AVAILABLE: Lower per Terabyte Cost for Additional ArchivesDirect Storage / DuraSpace News

Winchester, MA There will always be more, not less, data. That fact makes it likely that you will need more archival storage space than you originally planned for. Rapid, on-the-fly collection development, unexpected, gifted digital materials and rich media often require additional storage. ArchivesDirect has lowered the per terabyte cost of additional storage to make using the service more cost effective for organizations and institutions seeking to meet institutional demands for ensuring that their digital footprint is safe and accessible for future generations.

Telling VIVO Stories at Colorado University Boulder with Liz Tomich / DuraSpace News

“Telling VIVO Stories” is a community-led initiative aimed at introducing project leaders and their ideas to one another while providing VIVO implementation details for the VIVO community and beyond. The following interview includes personal observations that may not represent the opinions and views of Colorado University Boulder or the VIVO Project.

Julia Trimmer, Duke University, talked with the Liz Tomich at Colorado University Boulder to learn about their VIVO story.

skos-history: New method for change tracking applied to STW Thesaurus for Economics / ZBW German National Library of Economics

“What’s new?” and “What has changed?” are questions users of Knowledge Organization Systems (KOS), such as thesauri or classifications, ask when a new version is published. Much more so, when a thesaurus existing since the 1990s has been completely revised, subject area for subject area. After four intermediately published versions in as many consecutive years, ZBW's STW Thesaurus for Economics has been re-launched recently in version 9.0. In total, 777 descriptors have been added; 1,052 (of about 6,000) have been deprecated and in their vast majority merged into others. More subtle changes include modified preferred labels, or merges and splits of existing concepts.

Since STW has been published on the web in 2009, we went to great lengths to make change traceable: No concept and no web page has been deleted, everything from prior versions is still available. Following a presentation at DC-2013 in Lisbon, I've started the skos-history project, which aims to exploit published SKOS files of different versions for change tracking. A first beta implementation of Linked-Data-based change reports went live with STW 8.14, making use of SPARQL "live queries" (as described in a prior post). With the publication of STW 9.0, full reports of the changes are available. How do they work?


The basic idea is to exploit the power of SPARQL on named graphs of different versions of the thesaurus. After having loaded these versions into a "version store", we can compute deltas (version differences) and save them as named graphs, too. A combination of the dataset versioning ontology (dsv:) by Johan De Smedt, the skos-history ontology (sh:), SPARQL service description (sd:) and VoiD (void:) provides the necessary plumbing in a separate version history graph:

 skos-history example graphs

That in place, we can query the version store, for e.g. the concepts added between two versions, like this:

# Identify concepts inserted with a certain version
SELECT distinct ?concept ?prefLabel
 # query the version history graph to get a delta and via that the relevant graphs
  ?delta a sh:SchemeDelta ;
   sh:deltaFrom/dc:identifier "8.14" ;
   sh:deltaTo/dc:identifier "9.0" ;
   sh:deltaFrom/sh:usingNamedGraph/sd:name ?oldVersionGraph ;
   dct:hasPart ?insertions .
  ?insertions a sh:SchemeDeltaInsertions ;
   sh:usingNamedGraph/sd:name ?insertionsGraph .
 # for each inserted concept, a newly inserted prefLabel must exist ...
 GRAPH ?insertionsGraph {
  ?concept skos:prefLabel ?prefLabel
 # ... and the concept must not exist in the old version
  GRAPH ?oldVersionGraph {
   ?concept ?p []

The resulting report, cached for better performance and availability, can be found in the change reports section of the STW site, together with reports on deprecation/replacement of concepts, changed preferrred labels, hiearchy changes, merges and splits of concepts (descriptors as well as the higher level subject categories of STW). The queries used to create the reports are available on GitHub and linked from the report pages.

The methodology allows for aggregating changes over multiple versions and levels of the hierarchy of a concept scheme. That enabled us to gather information for the complete overhaul of STW, and to visualize it in change graphics:

STW relaunch: Business economics

The method applied here to STW is in no way specific to it. It does not rely on transaction logging of the internal thesaurus management system, nor on any other out-of-band knowledge, but solely on the published SKOS files. Thus, it can be applied to other knowledge management systems, by its publishers as well as by interested users of the KOS. Experiments with TheSoz, Agrovoc and the Finnish YSO have been conducted already; example endpoints with multiple versions of these vocabularies (and of STW, of course) are provided by ZBW Labs.

At the Finnish National Library, as well as the FAO, approaches are under way to explore the applicability of skos-history to the thesauri and maintenance workflows there. In the context of STW, the change reports are mostly optimized for human consumption. We hope to learn more how people use it in automatic or semi-automatic processes - for example, to update changed preferred label of systems working with prior versions of STW, to review indexed titles attached to split-up concepts, or to transfer changes to derived or mapped vocabularies. If you want to experiment, please fork on GitHub. Contributions in the issue queue as well as well as pull requests are highly welcome.

More detailed information can be found in a paper (Leveraging SKOS to trace the overhaul of the STW Thesaurus for Economics), which will be presented at DC-2015 in Sao Paulo.



Big list of ASINs (ASIN) / Open Library Data Additions

A list of ASINs ( product identifiers) generated by extracting ASIN-shaped strings from the list of pages crawled by the wayback machine..

This item belongs to: data/ol_data.

This item has files of the following types: Data, Archive BitTorrent, Data, Metadata

Library Privacy and the Freedom Not To Read / Eric Hellman

One of the most difficult privacy conundrums facing libraries today is how to deal with the data that their patrons generate in the course of using digital services. Commercial information services typically track usage in detail, keep the data indefinitely, and regard the data as a valuable asset. Data is used to make many improvements, often to personalize the service to best meet the needs of the user. User data can also be monetized; as I've written here before, many companies make money by providing web services in exchange for the opportunity to track users and help advertisers target them.

A Maginot Line fortification. Photo from the US Army.
The downside to data collection is its impact on user privacy, something that libraries have a history of defending, even at the risk of imprisonment. Since the Patriot Act, many librarians have believed that the best way to defend user privacy against legally sanctioned intrusion is to avoid collecting any sensitive data. But as libraries move onto the web, that defense seems more and more like a Maginot Line, impregnable, but easy to get around. (I've written about an effort to shore up some weak points in library privacy defenses.)

At the same time, "big data" has clouded the picture of what constitutes sensitive data. The correlation of digital library use with web activity outside the library can impact privacy in ways that never would occur in a physical library. For example, I've found that many libraries unknowingly use Amazon cover images to enrich their online catalogs, so that even a user who is completely anonymous to the library ends up letting Amazon know what books they're searching for.

Recently, I've been serving on the Steering Committee of an initiative of NISO to try to establish a set of principles that libraries, providers of services to libraries, and publishers can use to support privacy patron privacy. We held an in-person meeting in San Francisco at the end of July. There was solid support from libraries, publishers and service companies for improving reader privacy, but some issues were harder than others. The issues around data collection and use attracted the widest divergence in opinion.

One approach that was discussed centered on classifying different types of data depending on the extent to which they impact user privacy. This also the approach taken by most laws governing privacy of library records. They mostly apply only to "Personally Identifiable Information" (PII), which usually would mean a person's name, address, phone number, etc., but sometimes is defined to include the user's IP address. While it's important to protect this type of information, in practice this usually means that less personal information lacks any protection at all.

I find that the data classification approach is another Maginot privacy line. It encourages the assumption that collection of demographics data – age, gender, race, religion, education, profession, even sexual orientation – is fair game for libraries and participants in the library ecosystem. I raised some eyebrows when I suggested that demographic groups might deserve a level of privacy protection in libraries, just as individuals do.

OCLC's Andrew Pace gave an example that brought this home for us all. When he worked as a librarian at NC State, he tracked usage of the books and other materials in the collection. Every library needs to do this for many purposes. He noticed that materials placed on reserve for certain classes received little or no usage, and he thought that faculty shouldn't be putting so many things on reserve, effectively preventing students not taking the class from using these materials. And so he started providing usage reports to the faculty.

In retrospect, Andrew pointed out that, without thinking much about it, he might have violated the privacy of students by informing their teachers that that they weren't reading the assigned materials. After all, if a library wants to protect a user's right to read, they also have to protect the right not to read. Nobody's personally identifiable information had been exposed, but the combination of library data – a list of books that hadn't circulated – with some non-library data – the list of students enrolled in a class and the list of assigned reading – had intersected in a way that exposed individual reading behavior.

What this example illustrates is that libraries MUST collect at least SOME data that impinges on reader privacy. If reader privacy is to be protected, a "privacy impact assessment" must be made on almost all uses of that data.  In today's environment, users expect that their data signals will be listened to and their expressed needs will be accommodated. Given these expectations, building privacy in libraries is going to require a lot of work and a lot of thought.

Code4LibMW 2015 Write-up / Terry Reese

Whew – it’s be a wonderfully exhausting past few days here in Columbus, OH as the Libraries played host to Code4LibMW.  This has been something that I’ve been looking forward to ever since making the move to The Ohio State University; the C4L community has always been one of my favorites, and while the annual conference continues to be one of the most important meetings on my calendar – it’s within these regional events where I’m always reminded why I enjoy being a part of this community. 

I shared a story with the folks in Columbus this week.  As one of the folks that attended the original C4L meeting in Corvallis back in 2006 (BTW, there were 3 other original attendees in Columbus this week), there are a lot of things that I remember about that event quite fondly.  Pizza at American Dream, my first experience doing a lightening talk, the joy of a conference where people were writing code as they were standing on stage waiting their turn to present, Roy Tennant pulling up the IRC channel while he was on stage, so he could keep an eye on what we were all saying about him.  It was just a lot of fun, and part of what made it fun was that everyone got involved.  During that first event, there were around 80 attendees, and nearly every person made it onto the stage to talk about something that they were doing, something that they were passionate about, or something that they had been inspired to build during the course of the week.  You still get this at times at the annual conference, but with it’s shear size and weight, it’s become much harder to give everyone that opportunity to share the things that interest them, or easily connect with other people that might have those same interests.  And I think that’s the purpose that these regional events can serve. 

By and large, the C4L regional events feel much more like those early days of the C4L annual conference.  They are small, usually free to attend, with a schedule that shifts and changes throughout the day.  They are also the place where we come together, meet local colleagues and learn about all the fantastic work that is being done at institutions of all sizes and all types.  And that’s what the C4LMW meeting was for me this year.  As the host, I wanted to make sure that the event had enough structure to keep things moving, but had a place for everyone to participate.  For me – that was going to be the measure of success…did we not just put on a good program – but did this event help to make connections within our local community.  And I think that in this, the event was successful.  I was doing a little bit of math, and over the course of the two days, I think that we had a participation rate close to 90%, and an opportunity for everyone that wanted to get up and just talk about something that they found interesting.  And to be sure – there is a lot of great work being done out here by my Midwest colleagues (yes, even those up in Michigan Smile).

Over the next few days, I’ll be collecting links and making the slides available via the C4LMW 2015 home page as well as wrapping up a few of the last responsibilities of hosting an event, but I wanted to take a moment and again thank everyone that attended.  These types of events have never been driven by the presentations, the hosts, or the presenters – but have always been about the people that attend and the connections that we make with the people in the room.  And it was a privilege this year to have the opportunity to host you all here in Columbus. 



The well of studiousness / Karen G. Schneider


Pride 2015

My relative quiet is because my life has been divided for a while between work and studying for exams. But I share this photo by former PUBLIB colleague and retired librarian Bill Paullin from the 2015 Pride March in San Francisco, where I marched with my colleagues in what suddenly became an off-the-hook celebration of what one parade marshal drily called, “Thank you, our newly-discovered civil rights.”

I remember the march, but I also remember the  hours before our contingent started marching, chatting with dear colleagues about all the important things in life while around us nothing was happening. It was like ALA Council, except with sunscreen, disco music, and free coconut water.

Work is going very well. Team Library is made of professionals who enjoy what they do and commit to walking the walk. The People of the Library did great things this summer, including eight (yes eight) very successful “chat with a librarian” sessions for parent orientations, and a wonderful “Love Your Library” carnival for one student group. How did we get parents to these sessions? Schmoozing, coffee, and robots (as in, tours of our automated retrieval system). We had a competing event, but really — coffee and robots? It’s a no-brainer. Then I drive home to our pretty street in a cute part of a liveable city, and that is a no-brainer, too.

I work with such great people that clearly I did something right in a past life. Had some good budget news. Yes please! Every once in a while I think, I was somewhere else before I came here, and it was good; I reflect on our apartment in San Francisco, and my job at Holy Names. I can see myself on that drive to work, early in the morning, twisting down Upper Market as the sun lit up the Bay Bridge and the day beckoned, full of challenge and possibility. It was a good part of my life, and I record these moments in the intergalactic Book of Love.

And yet: “a ship in port is safe, but that’s not what ships are built for.” I think of so many good things I learned in my last job, not the least of which the gift of radical hospitality.  I take these things with me, and yet the lesson for me is that I was not done yet. It is interesting to me that in the last few months I learned that for my entire adult life I had misunderstood the word penultimate. It does not mean the final capper; it means the place you go, before you go to that place.  I do not recall what made me finally look up this term, except when I did I felt I was receiving a message.

Studying is going very well, except my brain is unhappy about ingesting huge amounts of data into short-term memory to be regurgitated on a closed-book test. Cue lame library joke: what am I, an institutional repository? Every once in a while I want to share a bon mot from my readings with several thousand of my closest friends, then remember that people who may be designing the questions I’ll be grappling with are on the self-same networks. So you see pictures of our Sunday house meetings and perhaps a random post or share, but the things that make me go “HA HA HA! Oh, that expert in […….redacted……..] gets off a good one!” stay with me and Samson, our ginger cat, who is in charge of supervising my studies, something he frequently does with his eyes closed.

We have landed well, even after navigating without instruments through a storm. Life is good, and after this winter, I have a renewed appreciation for what it means for life to be good. That second hand moves a wee faster every year, but there are nonetheless moments captured in amber, which we roll from palm to palm, marveling in their still beauty.

Amazon owns the cloud / David Rosenthal

Back in May I posted about Amazon's Q1 results, the first in which they broke out AWS, their cloud services, as a separate item. The bottom line was impressive:
AWS is very profitable: $265 million in profit on $1.57 billion in sales last quarter alone, for an impressive (for Amazon!) 17% net margin.
Again via Barry Ritholtz, Re/Code reports on Q2:
Amazon Web Services, ... grew its revenue by 81 percent year on year in the second quarter. It grew faster and with higher profit margins than any other aspect of Amazon’s business.

AWS, which offers leased computing services to businesses, posted revenue of $1.82 billion, up from $1 billion a year ago, as part of its second-quarter results.

By comparison, retail sales in North America grew only 26 percent to $13.8 billion from $11 billion a year ago.

The cloud computing business also posted operating income of $391 million — up an astonishing 407 percent from $77 million at this time last year — for an operating margin of 21 percent, making it Amazon’s most profitable business unit by far. The North American retail unit turned in an operating margin of only 5.1 percent.
Revenue growing at 81% year-on-year at a 21% and growing margin despite:
price competition from the likes of Google, Microsoft and IBM.
Amazon clearly dominates the market, the competition is having no effect on their business. As I wrote nearly a year ago, based on Benedict Evans' analysis:
Amazon's strategy is not to generate and distribute profits, but to re-invest their cash flow into starting and developing businesses. Starting each business absorbs cash, but as they develop they turn around and start generating cash that can be used to start the next one.
Unfortunately, S3 is part of AWS for reporting purposes, so we can't see the margins for the storage business alone. But I've been predicting for years that if we could, we would find them to be very generous.

Link roundup July 24, 2015 / Harvard Library Innovation Lab

A block of links sourced from the team. We’ve got Annie, Adam, dano, and Matt!

A Light Sculpture Is Harvesting San Francisco’s Secrets

Toki Pona: A Language With a Hundred Words – The Atlantic

Swedish Puzzle Rooms Test Teams’ Wits and Strength | Mental Floss

Outernet: A Digital Library in the Sky / LITA



To me, libraries have always represented a concentration of knowledge. Growing up I dreamt about how smart I’d be if I read all of the books in my hometown’s tiny local branch library.  I didn’t yet understand the subtle differences between libraries, archives and repositories, but I knew that the promise of the internet and digital content meant that, someday, I’d be able to access all of that knowledge as if I had a library inside my computer. The idea of aggregating all of humanity’s knowledge in a way that makes it freely accessible to everyone is what led me to library school, programming, and working with digital libraries/repositories, so whenever I find a project working towards that goal I get tingly. Outernet makes me feel very tingly.

In a nutshell, Outernet is a startup that got sponsored by a big nonprofit, and aims to use satellites to broadcast data down to Earth. By using satellites, they can avoid issues of internet connectivity, infrastructure, political censorship and local poverty. The data they plan to provide would be openly licensed educational materials specifically geared towards underprivileged populations such as local news, crop prices, emergency communications, open source applications, literature, textbooks and courseware, open access academic articles, and even the entirety of Wikipedia. Currently the only way to receive Outernet’s broadcasts is with a homemade receiver, but a low cost (~$100) solar-powered, weather-proof receiver with built in storage is in the works which could be mass produced and distributed to impoverished or disaster-stricken areas.

Outernet chooses the content to be added to its core archive with a piece of software called Whiteboard which acts as a kind of Reddit for broadcast content; volunteers submit new URLs pointing to content they believe Outernet should broadcast, and the community can upvote or downvote it with the top-ranking content making it into the core archive, democratizing the process. A separate piece of software called Librarian acts as the interface to locally received content; current receivers act as a Wi-Fi hotspot which users can connect to and use Librarian to explore, copy or delete content as well as configuring the data Librarian harvests. Public access points are being planned for places like schools, hospitals and public libraries where internet connectivity isn’t feasible, with a single person administering the receiver and its content but allowing read-only access to anyone.

While the core work is being done by Outernet Inc., much of the project relies on community members volunteering time to discuss ideas and test the system. You can find more about the community at, but the primary way to participate is to build a receiver yourself and report feedback or to submit/vote on content using Whiteboard. While Outernet is still a long way off from achieving its goals, its still one of the most exciting and fun ideas I’ve heard about in a while and definitely something to keep an eye on.


On the road again / Coral Sheldon-Hess

This post is late. I wanted to write it back at the beginning of this whole moving debacle experience, but there wasn’t time.

Short version: I’m moving to Pittsburgh again! Yay! Dale (my Mr.) and I will be there by August 10th. He’s got a cool job, and I’m not looking hard for a job, because I like consulting. Also, our friends are THE BEST!

And now for the longer version.


As you may know, I lived in Alaska for five years. I made some awesome friends and had a great job, and all was well, except that six months of winter—with ice on all the sidewalks and parking lots, because people don’t shovel or plow down to pavement—is too much for me, especially post-arthritis. Nothing I injured ever seemed to heal, so I couldn’t risk being out on the ice. And I don’t make a great shut-in. (And, honestly, even before that, the cycle of too much darkness/too much light was hard on both Dale and me. We thought we’d get better at it over time, but we did not.)

Last August, I moved (back) to Charlottesville, VA, for a job that, uh, didn’t work out (which deserves its own post, but … not yet, I’m still too angry). Because I needed a car and because our pet birds aren’t suited for jet planes, I drove. Side note: you’d be amazed how much stuff I can fit into a Subaru and still have full visibility out the side windows and the rearview mirror!

Fast forward nine miserable months, and Dale joined me in VA. But at that point we were both unemployed and both fed up with Charlottesville (where his job offers kept evaporating at the last minute, mysteriously) and with a pretty bad rental situation. We aimed for Pittsburgh, although there were a few digressions where we talked to people in other places. And I’m pleased to say that he’s accepted a position with a really cool company in Pittsburgh, so that’s where we’re headed!

Like Charlottesville was for me, Pittsburgh is an “again” for both of us. We met there. He went to undergrad there, and I went to grad school there (both times). We loved the place and had a great set of friends in the area. Honestly, we might not ever have left, except that new library school graduates have a lot of competition for jobs, there, where they don’t in Anchorage. 😉 So we’re both really happy to be going back.

Job stuff

Although I’m not looking hard for an employer, I am willing to talk to anybody who’s already got a diverse team and who is looking for a blend of library, leadership, and technical skills (and responsibilities), preferably in the 20-30 hour/week range. For a really perfect job, I could go back to 40 hours/week, but that would require giving up a recurring contract that I like a lot, as well as missing out on a few other things that are important to me.

Honestly, I’m too excited about getting my consulting business off the ground to actively look for an employer; I’m a great boss, so working for myself appeals to me. 😉 I plan to make websites for people (artists, small businesses) and do social media planning and training with them. I think teaching artists how to make Etsy sites would be fun—I already have two potential customers, for that, so it seems like there’s a market. 😃 (Yes, I know it’s easy. It’s also intimidating to people who don’t do tech all the time.)

And I owe a certain library association a proposal for a class on web fundamentals, so I’ll put that together and (assuming they like it) run that under this business’s umbrella, as well. I like teaching, so I may find more ways that can factor into my business.

I may eventually grow to the point where I want to do contracting with libraries (not just library associations), because I know I can do some good, there; but I want to start simple and get my contracting workflow in order, first.

And I guess I should say, my recurring contract isn’t web work; it’s library work. But it’s interesting and often challenging, and it pays better than my initial web rate. Eventually, my web work will pay better, but that’s a while off, yet.

Logistics of the move

This whole thing has so many moving parts.

Anchorage and the Honda

We own a house in Anchorage. Getting that under contract has been stressful and weird, but it is under contract, with people who want to make it into a day care (💜) and who are willing to rent it until they can close on it. There was a lot of negotiating around getting things fixed, but we came to an amicable agreement. So that’s good, because our mortgage is covered, and we will eventually no longer own the house, and it will be well-loved.

That house was full of stuff. Dale spent months (and I flew back to help him for two weeks, earlier this year) dealing with all of that—a lot of yard sale, Craigslist, and giving-away-to-friends, plus no small amount of putting-into-boxes. I’ve promised never to move somewhere and leave him behind, ever again. All of that stuff is was in a storage unit (until this morning), waiting to go on a truck, probably into a boat, maybe onto a train or another truck, on its way to be dropped off in Pittsburgh. Now the shipping company has it. We’ll see it again in 4-6 weeks. (That all sounds simpler than the way it went down. We called for a quote early last week, and they said they had a spot for us early this week. They were slow to get back to us, so our stuff didn’t get loaded up until today, when we’d hoped to be driving out of Anchorage by yesterday.)

Since the people renting our house have control of our house, we’re staying with friends in Anchorage. Our friends are THE BEST! They’ve been super nice and super patient about the fact that our date for leaving town keeps being moved back. More on that next paragraph.

We have a car in Anchorage, too. Its total worth is less than the cost to ship a car, but it does probably have another 100k miles left in its life, so we decided to drive it across the continent (see image above and map link below). It turns out that the Honda dealer in Anchorage completely hosed us on some service stuff over the last year, so we’re waiting with bated breath to find out whether or not Midas has the right parts to fix our brakes so that it’s safe to drive. (Honda didn’t.) It was an 11 day drive (doing 600+ miles the last day, so really 12) the first time and a 14 day drive the second time for me, alone; I had the birds, so I couldn’t do fun tourist stuff, but I was also the only driver, so I had limited hours on the road per day. I figure that all balances out, so, to get there by August 5, we need to leave by Saturday.

If they don’t have the part, we will sell the car to a used car place and fly back. It would be a bummer for Dale not to get to do the drive, and I really like that car (especially the fancy radio he put into it for me as a birthday present); but it would be a much bigger bummer to die from brake failure on some mountain or highway somewhere along the way.

Charlottesville and the Subaru

Our lease in Charlottesville ends at the end of August, and since the landlord has both of our keys and still has our security deposit and has already thanked us for the good condition we left the place in, I have concluded that we are clear, there. I closed our utility accounts and washed my hands of it. Maybe we’ll get part of the deposit back if he can rent it to someone else before the end of August.

We had lots of stuff in Charlottesville, too, including a bed and a huge couch. (It’s eight feet long. It didn’t look that big in the showroom. Super comfy, though!) All of that got packed into a UHaul and driven to Pittsburgh. We were supposed to caravan up to Pittsburgh, with Dale driving the truck (with a chinchilla in the passenger seat) and me driving our Subaru (with birds in the passenger seat), but I couldn’t sleep the night before we left and wasn’t safe to drive. So the Subaru is in our friend’s driveway in Charlottesville (our friends are THE BEST!), full of stuff; and our drive from Alaska looks a little different, ending with the Subaru and Honda caravanning up to Pittsburgh from Charlottesville, assuming the Honda is OK.

If we can’t drive the Honda across the continent, we’ll fly to National (using airline miles, which is something we can’t do into or out of Pittsburgh, grr, Alaska Airlines), train to Charlottesville, repack the car to make space, and just drive the Subaru up to Pittsburgh. It’ll be fiiiine.

Pittsburgh and the pets

We had looked at apartments in Pittsburgh while Dale was interviewing and found one that’s pretty great, with a really nice landlady (who likes birds 😃). It’s in Pittsburgh, so of course it has architectural quirks. 😁 But it has a washer and dryer of its own and a dishwasher (the latter being necessary to keep Dale’s and my marriage intact) and enough space for us, our pets, and a home office. It’s in a super walkable/fairly cool neighborhood, on a couple of good bus lines, which is nice. And it has parking! So that’s fantastic.

It isn’t available until at least August 2, maybe as late as the 5th. So the stuff from Charlottesville is in a storage unit in Pittsburgh. Our friends are THE BEST, so we had lots of help unloading the truck.

Also because our friends are THE BEST, our chinchilla is staying with one set of friends, and our birds are staying with another. (That’s why we’re still trying to get back by the 5th, even though Dale doesn’t start until the 10th. We feel bad leaving our pets with our friends any longer than necessary. … And also we have some settling to do before starting a whole new life, y’know?)

The drive, mapped

We’ve had to cut some corners (like our trip to Omaha 😦), but here’s our current planned trip. If you’re along the way and want us to stop and say hi, let us know! 😃

Yet Another Metadata Zoo / Roy Tennant

2015-07-23_16-48-54I was talking with my old friend John Kunze a little while back and he described a project that he is involved with called “Yet Another Metadata Zoo” or In a world of more ontologies than you can shake a stick at, it aims to provide a simple, easy-to-use mechanism for defining and maintaining individual metadata terms and their definitions.

The project explains itself like this:

The YAMZ Metadictionary (metadata dictionary) prototype…is a proof-of-concept web-based software service acting as an open registry of metadata terms from all domains and from all parts of “metadata speech”. With no login required, anyone can search for and link to registry term definitions. Anyone can register to be able to login and create terms.

We aim for the metadictionary to become a high-quality cross-domain metadata vocabulary that is directly connected to evolving user needs. Change will be rapid and affordable, with no need for panels of experts to convene and arbitrate to improve it. We expect dramatic simplification compared to the situation today, in which there is an overwhelming number of vocabularies (ontologies) to choose from.

Our hope is that users will be able to find most of the terms they need in one place (one vocabulary namespace), namely, the Metadictionary. This should minimize the need for maintaining expensive crosswalks with other vocabularies and cluttering up expressed metadata with lots of namespace qualifiers. Although it is not our central goal, the vocabulary is shovel-ready for those wishing to create linked data applications.

If you have a Google ID, signing in is dead simple and you can begin creating and editing terms. You can also vote terms up or down, which can eventually take a term from “vernacular” status (the default for new terms) to “canonical” — terms that are considered stable and unchanging. A third status is “deprecated”.


You can browse terms to see what is there already.

I really like this project for several reasons:

  • It’s dead simple.
  • It’s fast and easy to gain value from it. 
  • Every term has an identifier, forever and always (deprecated terms keep their identifier).
  • Voting and commenting are a key part of the infrastructure, and provide easy mechanisms for it to get ever better over time.

What it needs now is more people involved, so it can gain the kind of input and participation that is necessary to make it a truly authoritative source of metadata element names and descriptions. I’ve already contributed to it, how about you?


Libraries: Apply now for 2016 IMLS National Medals / District Dispatch

Institute of Museum and Library Services logo

Institute of Museum and Library Services logo

The application period is now open for the 2016 National Medal for Museum and Library Service, the nation’s highest honor. Each year, the Institute of Museum and Library Services (IMLS) recognizes libraries and museums that make significant and exceptional contributions in service to their communities. Nomination forms are due October 1, 2015.

Read more from IMLS:

All types of nonprofit libraries and library organizations, including academic, school, and special libraries, archives, library associations, and library consortia, are eligible to receive this honor. Public or private nonprofit museums of any discipline (including general, art, history, science and technology, children’s, and natural history and anthropology), as well as historic houses and sites, arboretums, nature centers, aquariums, zoos, botanical gardens, and planetariums are eligible.

Winners are honored at a ceremony in Washington, DC, host a two-day visit from StoryCorps to record community member stories, and receive positive media attention. Approximately thirty finalists are selected as part of the process and are featured by IMLS during a six-week social media and press campaign.

Anyone may nominate a museum or library for this honor, and institutions may self-nominate. For more information, reach out to one of the following contacts.

Program Contact for Museums:
Mark Feitl, Museum Program Specialist

Program Contact for Libraries:
Katie Murray, Staff Assistant

The Institute of Museum and Library Services is the primary source of federal support for the nation’s 123,000 libraries and 35,000 museums.

The post Libraries: Apply now for 2016 IMLS National Medals appeared first on District Dispatch.

Virtual Shelf Browse / Jonathan Rochkind

We know that some patrons like walking the physical stacks, to find books on a topic of interest to them through that kind of browsing of adjacently shelved items.

I like wandering stacks full of books too, and hope we can all continue to do so.

But in an effort to see if we can provide an online experience that fulfills some of the utility of this kind of browsing, we’ve introduced a Virtual Shelf Browse that lets you page through books online, in the order of their call numbers.

An online shelf browse can do a number of things you can’t do physically walking around the stacks:

  • You can do it from home, or anywhere you have a computer (or mobile device!)
  • It brings together books from various separate physical locations in one virtual stack. Including multiple libraries, locations within libraries, and our off-site storage.
  • It includes even checked out books, and in some cases even ebooks (if we have a call number on record for them)
  • Place one item at multiple locations in a Virtual Shelf, if we have more than one call number on record for it. There’s always more than one way you could classify or characterize a work; a physical item can only be in one place at a time, but not so in a virtual display.

The UI is based on the open source stackview code released by the Harvard Library Innovation Lab. Thanks to Harvard for sharing their code, and to @anniejocaine for helping me understand the code, and accepting my pull requests with some bug fixes and tweaks.

This is to some extent an experiment, but we hope it opens up new avenues for browsing and serendipitous discovery for our patrons.

You can drop into one example place in the virtual shelf browse here, or drop into our catalog to do your own searches — the Virtual Shelf Browse is accessed by navigating to an individual item detail page, and then clicking the Virtual Shelf Browse button in the right sidebar.  It seemed like the best way to enter the Virtual Shelf was from an item of interest to you, to see what other items are shelved nearby.

Screenshot 2015-07-23 15.09.12

Our Shelf Browse is based on ordering by Library of Congress Call Numbers. Not all of our items have LC call numbers, so not every item appears in the virtual shelf, or has a “Virtual Shelf Browse” button to provide an entry point to it. Some of our local collections are shelved locally with LC call numbers, and these are entirely present. For other collections —  which might be shelved under other systems or in closed stacks and not assigned local shelving call numbers — we can still place them in the virtual shelf if we can find a cataloger-suggested call number in the MARC bib 050 or similar fields. So for those collections, some items might appear in the Virtual Shelf, others not.

On Call Numbers, and Sorting

Library call number systems — from LC, to Dewey, to Sudocs, or even UDC — are a rather ingenious 19th century technology for organizing books in a constantly growing collection such that similar items are shelved nearby. Rather ingenious for the 19th century anyway.

It was fun to try to bringing this technology — and the many hours of cataloger work that’s gone into constructing call numbers — into the 21st century to continue providing value in an online display.

It was also challenging in some ways. It turns out the nature of ordering of Library of Congress call numbers particularly is difficult to implement in computer software, there are a bunch of odd cases where to a human it might be clear what the proper ordering is  (at least to a properly trained human? and different libraries might even order differently!), but difficult to encode all the cases into software.

The newly released Lcsort ruby gem does a pretty marvelous job of allowing sorting of LC call numbers that properly sorts a lot of them — I won’t say it gets every valid call number, let alone local practice variation, right, but it gets a lot of stuff right including such crowd-pleasing oddities as:

  • `KF 4558 15th .G6` sorts after `KF 4558 2nd .I6`
  • `Q11 .P6 vol. 12 no. 1` sorts after `Q11 .P6 vol. 4 no. 4`
  • Can handle suffixes after cutters as in popular local practice (and NLM call numbers), eg `R 179 .C79ab`
  • Variations in spacing or punctuation that should not matter for sorting, `R 169.B59.C39` vs `R169 B59C39 1990` `R169 .B59 .C39 1990` etc.

Lcsort is based on the cummulative knowledge of years of library programmer attempts to sort LC calls, including an original implementation based on much trial and error by Bill Dueber of the University of Michigan, a port to ruby by Nikitas Tampakis of Princeton University Library, advice and test cases based on much trial and error from Naomi Dushay of Stanford, and a bunch more code wrangling by me.

I do encourage you to check out Lcsort for any LC call number ordering needs, if you can do it in ruby — or even port it to another language if you can’t. I think it works as well or better as anything our community of library technologies has done yet in the open.

Check out my code — rails_stackview

This project was possible only because of the work of so many that had gone before, and been willing to share their work, from Harvard’s stackview to all the work that went into figuring out how to sort LC call numbers.

So it only makes sense to try to share what I’ve done too, to integrate a stackview call number shelf browse in a Blacklight Rails app.  I have shared some components in a Rails engine at rails_stackview

In this case, I did not do what I’d have done in the past, and try to make a rock-solid, general-purpose, highly flexible and configurable tool that integrated as brainlessly as possible out of the box with a Blacklight app. I’ve had mixed success trying to do that before, and came to think it might have been over-engineering and YAGNI to try. Additionally, there are just too many ways to try to do this integration — and too many versions of Blacklight changes to keep track of — I just wasn’t really sure what was best and didn’t have the capacity for it.

So this is just the components I had to write for the way I chose to do it in the end, and for my use cases. I did try to make those components well-designed for reasonable flexibility, or at least future extension to more flexibility.

But it’s still just pieces that you’d have to assemble yourself into a solution, and integrate into your Rails app (no real Blacklight expectations, they’re just tools for a Rails app) with quite a bit of your own code.  The hardest part might be indexing your call numbers for retrieval suitable to this UI.

I’m curious to see if this approach to sharing my pieces instead of a fully designed flexible solution might still ends up being useful to anyone, and perhaps encourage some more virtual shelf browse implementations.

On Indexing

Being a Blacklight app, all of our data was already in Solr. It would have been nice to use the existing Solr index as the back-end for the virtual shelf browse, especially if it allowed us to do things like a virtual shelf browse limited by existing Solr facets. But I did not end up doing so.

To support this kind of call-number-ordered virtual shelf browse, you need your data in a store of some kind that supports some basic retrieval operations: Give me N items in order by some field, starting at value X, either ascending or descending.

This seems simple enough; but the fact that we want a given single item in our existing index to be able to have multiple call numbers makes it a bit tricky. In fact, a Solr index isn’t really easily capable of doing what’s needed. There are various ways to work around it and get what you need from Solr: Naomi Dushay at Stanford has engaged in some truly heroic hacks to do it, involving creating a duplicate mirror indexing field where all the call numbers are reversed to sort backwards. And Naomi’s solution still doesn’t really allow you to limit by existing Solr facets or anything.

That’s not the solution I ended up using. Instead, I just de-normalize to another ‘index’ in a table in our existing application rdbms, with one row per call number instead of one row per item.  After talking to the Princeton folks at a library meet-up in New Haven, and hearing this was there back-end store plan for supporting ‘browse’ functions, I realized — sure, why not, that’ll work.

So how do I get them indexed in rdbms table? We use traject for indexing to Solr here, for Blacklight.  Traject is pretty flexible, and it wasn’t too hard to modify our indexing configuration so that as the indexer goes through each input record, creating a Solr Document for each one — it also, in the same stream, creates 0 to many rows in an RDBMS for each call number encountered.

We don’t do any “incremental” indexing to Solr in the first place, we just do a bulk/mass index every night recreating everything from the current state of the canonical catalog. So the same strategy applies to building the call numbers table, it’s just recreated from scratch nightly.  After racking my brain to figure out how to do this without disturbing performance or data integrity in the rdbms table — I realized, hey, no problem, just index to a temporary table first, then when done swap it into place and delete the former one.

I included a snapshotted, completely unsupported, example of how we do our indexing with traject, in the rails_stackview documentation.  It ends up a bit hacky, and makes me wish traject let me re-use some of it’s code a little bit more concisely to do this kind of a bifurcated indexing operation — but it still worked out pretty well, and leaves me pretty satisfied with traject as our indexing solution over past tools we had used.

I had hoped that adding the call number indexing to our existing traject mass index process would not slow down the indexing at all. I think this hope was based on some poorly-conceived thought process like “Traject is parallel multi-core already, so, you know, magic!”  It didn’t quite work out that way, the additional call number indexing adds about 10% penalty to our indexing time, taking our slow mass indexing from a ~10 hour to an ~11 hour process.  We run our indexing on a fairly slow VM with 3 cores assigned to it. It’s difficult to profile a parallel multi-threaded pipeline process like traject, I can’t completely wrap my head around it, but I think it’s possible on a faster machine, you’d have bottlenecks in different parts of the pipeline, and get less of a penalty.

On call numbers designed for local adjustment, used universally instead

Another notable feature of the 19th century technology of call numbers that I didn’t truly appreciate until this project — call number systems often, and LC certainly,  are designed to require a certain amount of manual hand-fitting to a particular local collection.  The end of the call number has ‘cutter numbers’ that are typically based on the author’s name, but which are meant to be hand-fitted by local catalogers to put the book just the right spot in the context of what’s already been shelved in a particular local collection.

That ends up requiring a lot more hours of cataloger labor then if a book simply had one true call number, but it’s kind of how the system was designed. I wonder if it’s tenable in the modern era to put that much work into call number assignment though, especially as print (unfortunately) gets less attention.

However, this project sort of serves as an experiment of what happens if you don’t do that local easing. To begin with, we’re combining call numbers that were originally assigned in entirely different local collections (different physical library locations), some of which were assigned before these different libraries even shared the same catalog, and were not assigned with regard to each other as context.  On top of that, we take ‘generic’ call numbers without local adjustment from MARC 050 for books that don’t have locally assigned call numbers (including ebooks where available), so these also haven’t been hand-fit into any local collection.

It does result in occasional oddities, such as different authors with similar last names writing on a subject being interfiled together. Which offends my sensibilities since I know the system when used as designed doesn’t do that. But… I think it will probably not be noticed by most people, it works out pretty well after all.

Filed under: General