Planet Code4Lib

MarcEdit 6.3 Updates (all versions) / Terry Reese

I spent sometime this week working on a few updates for MarcEdit 6.3.  Full change log below (for all versions).


* Bug Fix: MarcEditor: When processing data with right to left characters, the embedded markers were getting flagged by the validator.
* Bug Fix: MarcEditor: When processing data with right to left characters, I’ve heard that there have been some occasions when the markers are making it into the binary files (they shouldn’t).  I can’t recreate it, but I’ve strengthen the filters to make sure that these markers are removed when the mnemonic file format is saved.
* Bug Fix: Linked data tool:  When creating VIAF entries in the $0, the subfield code can be dropped.  This was missed because viaf should no longer be added to the $0, so I assumed this was no longer a valid use case.  However local practice in some places is overriding best practice.  This has been fixed.

A note on the MarcEditor changes.  The processing of right to left characters is something I was aware of in regards to the validator – but in all my testing and unit tests, the data was always filtered prior to compiling the data.  These markers that are inserted are for display, as noted here:  However, on the pymarc list, there was apparently an instance where these markers slipped through.  The conversation can be found here:!topic/pymarc/5zxuOh0fVuc.  I posted a long response on the list, but I think i t’s being held in moderation (I’m a new member to the list), but generally, here’s what I found.  I can’t recreate it, but I have updated the code to ensure that this shouldn’t happen.  Once a mnemonic file is saved (and that happens prior to compiling), these markers are removed from the file.  I guess if you find this isn’t the case, let me know.  I can add the filter down into the MARCEngine level, but I’d rather not, as there are cases where these values may be present (legally)…this is why the filtering happens in the Editor, where it can assess their use and if the markers are present already, determine if they are used correctly.

Downloads can be picked up through the automated update tool, or via


Neutrality is anything but / Karen G. Schneider

“We watch people dragged away and sucker-punched at rallies as they clumsily try to be an early-warning system for what they fear lies ahead.” — Unwittingly prophetic me, March, 2016.

Sheet cake photo by Flickr user Glane23. CC by 2.0

Sometime after last November, I realized something very strange was happening with my clothes. My slacks had suddenly shrunk, even if I hadn’t washed them. After months of struggling to keep myself buttoned into my clothes, I gave up and purchased slacks and jeans one size larger. I call them my T***p Pants.

This post is about two things. It is about the lessons librarians are learning in this frightening era about the nuances and qualifications shadowing our deepest core values–an era so scary that quite a few of us, as Tina Fey observed, have acquired T***p Pants. And it’s also some advice, take it or leave it, on how to “be” in this era.

I suspect many librarians have had the same thoughts I have been sharing with a close circle of colleagues. Most librarians take pride in our commitment to free speech. We see ourselves as open to all viewpoints. But in today’s new normal, we have seen that even we have limits.

This week, the ACRL Board of Directors put out a statement condemning the violence in Charlottesville. That was the easy part. The Board then stated, “ACRL is unwavering in its long-standing commitment to free exchange of different viewpoints, but what happened in Charlottesville was not that; instead, it was terrorism masquerading as free expression.”

You can look at what happened in Charlottesville and say there was violence “on all sides,” some of it committed by “very fine people” who just happen to be Nazis surrounded by their own private militia of heavily-armed white nationalists. Or you can look at Charlottesville and see terrorism masquerading as free expression, where triumphant hordes descended upon a small university town under the guise of protecting some lame-ass statue of an American traitor, erected sixty years after the end of the Civil War, not coincidentally during a very busy era for the Klan. Decent people know the real reason the Nazis were in Charlottesville: to tell us they are empowered and emboldened by our highest elected leader.

There is no middle ground. You can’t look at Charlottesville and see everyday people innocently exercising First Amendment rights.

As I and many others have argued for some time now, libraries are not neutral.  Barbara Fister argues, “we stand for both intellectual freedom and against bigotry and hate, which means some freedoms are not countenanced.” She goes on to observe, “we don’t have all the answers, but some answers are wrong.”

It goes to say that if some answers are wrong, so are some actions. In these extraordinary times, I found myself for the first time ever thinking the ACLU had gone too far; that there is a difference between an unpopular stand, and a stand that is morally unjustifiable. So I was relieved when the national ACLU concurred with its three Northern California chapters that “if white supremacists march into our towns armed to the teeth and with the intent to harm people, they are not engaging in activity protected by the United States Constitution. The First Amendment should never be used as a shield or sword to justify violence.”

But I was also sad, because once again, our innocence has been punctured and our values qualified. Every asterisk we put after “free speech” is painful. It may be necessary and important pain, but it is painful all the same. Many librarians are big-hearted people who like to think that our doors are open to everyone and that all viewpoints are welcome, and that enough good ideas, applied frequently, will change people. And that is actually very true, in many cases, and if I didn’t think it was true I would conclude I was in the wrong profession.

But we can’t change people who don’t want to be changed. Listen to this edition of The Daily, a podcast from the New York Times, where American fascists plan their activities. These are not people who are open to reason. As David Lankes wrote, “there are times when a community must face the fact that parts of that community are simply antithetical to the ultimate mission of a library.”

We urgently need to be as one voice as a profession around these issues. I was around for–was part of–the “filtering wars” of the 1990s, when libraries grappled with the implications of the Internet bringing all kinds of content into libraries, which also challenged our core values. When you’re hand-selecting the materials you share with your users, you can pretend you’re open to all points of view. The Internet challenged that pretense, and we struggled and fought, and were sometimes divided by opportunistic outsiders.

We are fortunate to have strong ALA leadership this year. The ALA Board and President came up swinging on Tuesday with an excellent presser that stated unequivocally that “the vile and racist actions and messages of the white supremacist and neo-Nazi groups in Charlottesville are in stark opposition to the ALA’s core values,” a statement that (in the tradition of ensuring chapters speak first) followed a strong statement from our Virginia state association.  ARL also chimed in with a stemwinder of a statement.  I’m sure we’ll see more.

But ALA’s statement also describes the mammoth horns of the library dilemma. As I wrote colleagues, “My problem is I want to say I believe in free speech and yet every cell in my body resists the idea that we publicly support white supremacy by giving it space in our meeting rooms.” If you are in a library institution that has very little likelihood of exposure to this or similar crises, the answers can seem easy, and our work appears done. But for more vulnerable libraries, it is crucial that we are ready to speak with one voice, and that we be there for those libraries when they need us. How we get there is the big question.

I opened this post with an anecdote about my T***p pants, and I’ll wrap it up with a concern. It is so easy on social media to leap in to condemn, criticize, and pick apart ideas. Take this white guy, in an Internet rag, the week after the election, chastising people for not doing enough.  You know what’s not enough? Sitting on Twitter bitching about other people not doing enough. This week, Siva Vaidhyanathan posted a spirited defense of a Tina Fey skit where she addressed the stress and anxiety of these political times.  Siva is in the center of the storm, which gives him the authority to state an opinion about a sketch about Charlottesville. I thought Fey’s skit was insightful on many fronts. It addressed the humming anxiety women have felt since last November (if not earlier). It was–repeatedly–slyly critical of inaction: “love is love, Colin.” It even had a Ru Paul joke. A lot of people thought it was funny, but then the usual critics came out to call it naive, racist, un-funny, un-woke, advocating passivity, whatever.

We are in volatile times, and there are provocateurs from outside, but also from inside. Think. Breathe. Walk away from the keyboard. Take a walk. Get to know the mute button in Twitter and the unfollow feature in Facebook. Pull yourself together and think about what you’re reading, and what you’re planning to say. Interrogate your thinking, your motives, your reactions.

I’ve read posts by librarians deriding their peers for creating subject guides on Charlottesville, saying instead we should be punching Nazis. Get a grip. First off, in real life, that scenario is unlikely to transpire. You, buried in that back cubicle in that library department, behind three layers of doors, are not encountering a Nazi any time soon, and if you did, I recommend fleeing, because that wackdoodle is likely accompanied by a trigger-happy militiaman carrying a loaded gun. (There is an entire discussion to be had about whether violence to violence is the politically astute response, but that’s for another day.) Second, most librarians understand that their everyday responses to what is going on in the world are not in and of themselves going to defeat the rise of fascism in America. But we are information specialists and it’s totally wonderful and cool to respond to our modern crisis with information, and we need to be supportive and not go immediately into how we are all failing the world. Give people a positive framework for more action, not scoldings for not doing enough.

In any volatile situation, we need to slow the eff down and ask how we’re being manipulated and to what end; that is a lesson the ACLU just learned the hard way. My colleague Michael Stephens is known for saying, “speak with a human voice.” I love his advice, and I would add, make it the best human voice you have. We need one another, more than we know.


Freebo@ND and library catalogues / Eric Lease Morgan

Freebo@ND is a collection of early English book-like materials as well as a set of services provided against them. In order to use & understand items in the collection, some sort of finding tool — such as a catalogue — is all but required. Freebo@ND supports the more modern full text index which has become the current best practice finding tool, but Freebo@ND also offers a set of much more traditional library tools. This blog posting describes how & why the use of these more traditional tools can be beneficial to the reader/researcher. In the end, we will learn that “What is old is new again.”

An abbreviated history

lemons by ericA long time ago, in a galaxy far far away, library catalogues were simply accession lists. As new items were brought into the collection, new entries were appended to the list. Each item would then be given an identifier, and the item would be put into storage. It could then be very easily located. Search/browse the list, identify item(s) of interest, note identifier(s), retrieve item(s), and done.

As collections grew, the simple accession list proved to be not scalable because it was increasingly difficult to browse the growing list. Thus indexes were periodically created. These indexes were essentially lists of authors, titles, or topics/subjects, and each item on the list was associated with a title and/or a location code. The use of the index was then very similar to the use of the accession list. Search/browse the index, identify item(s) of interest, note location code(s), retrieve item(s), and done. While these indexes were extremely functional, they were difficult to maintain. As new items became a part of the collection it was impossible to insert them into the middle of the printed index(es). Consequently, the printed indexes were rarely up-to-date.

To overcome the limitations of the printed index(es), someone decided not to manifest them as books, but rather as cabinets (drawers) of cards — the venerable card catalogue. Using this technology, it was trivial to add new items to the index. Type up cards describing items, and insert cards into the appropriate drawer(s). Readers could then search/browse the drawers, identify item(s) of interest, note location code(s), retrieve item(s), and done.

It should be noted that these cards were formally… formatted. Specifically, they included “cross-references” enabling the reader to literally “hyperlink” around the card catalogue to find & identify additional items of interest. On the downside, these cross-references (and therefore the hyperlinks) where limited by design to three to five in number. If more than three to five cross-references were included, then the massive numbers of cards generated would quickly out pace the space allotted to the cabinets. After all, these cabinets came dominate (and stereotype) libraries and librarianship. They occupied hundreds, if not thousands, of square feet, and whole departments of people (cataloguers) were employed to keep them maintained.

With the advent of computers, the catalogue cards became digitally manifested. Initially the digital manifestations were used to transmit bibliographic data from the Library of Congress to libraries who would print cards from the data. Eventually, the digital manifestations where used to create digital indexes, which eventually became the online catalogues of today. Thus, the discovery process continues. Search/browse the online catalogue. Identify items of interest. Note location code(s). Retrieve item(s). Done. But, for the most part, these catalogues do not meet reader expectations because the content of the indexes is merely bibliographic metadata (authors, titles, subjects, etc.) when advances in full text indexing have proven to be more effective. Alas, libraries simply do not have the full text of the books in their collections, and consequently libraries are not able to provide full text indexing services. †

What is old is new again

flowersThe catalogues representing the content of Freebo@ND are perfect examples of the history of catalogues as outlined above.

For all intents & purposes, Freebo@ND is YAVotSTC (“Yet Another Version of the Short-Title Catalogue”). In 1926 Pollard & Redgrave compiled an index of early English books entitled A Short-title Catalogue of books printed in England, Scotland, & Ireland and of English books printed abroad, 1475-1640. This catalogue became know as the “English short-title catalogue” or ESTC. [1] The catalog’s purpose is succinctly stated on page xi:

The aim of this catalogue is to give abridged entries of all ‘English’ books, printed before the close of the year 1640, copies of which exist at the British Museum, the Bodleian, the Cambridge University Library, and the Henry E. Huntington Library, California, supplemented by additions from nearly one hundres and fifty other collections.

The 600-page book is essentially an author index beginning with likes of George Abbot and ending with Ulrich Zwingli. Shakespeare begins on page 517, goes on for four pages, and includes STC (accession) numbers 22273 through 22366. And the catalogue functions very much like the catalogues of old. Articulate an author of interest. Look up the author in the index. Browse the listings found there. Note the abbreviation of libraries holding an item of interest. Visit library, and ultimately, look at the book.

The STC has a history and relatives, some of which is documented in a book entitled The English Short-Title Catalogue: Past, present, future and dating from 1998. [2] I was interested in two of the newer relatives of the Catalogue:

  1. English short title catalogue on CD-ROM 1473-1800 – This is an IBM DOS-based package supposably enabling the researcher/scholar to search & browse the Catalogue’s bibliographic data, but I was unable to give the package a test drive since I did not have ready access to DOS-based computer. [3] From the bibliographic description’s notes: “This catalogue on CD-ROM contains more than 25,000 of the total 36,000 records of titles in English in the British Library for the period 1473-1640. It also includes 105,000 records for the period 1641-1700, together with the most recent version of the ESTC file, approximately 312,000 records.”
  2. English Short Title Catalogue [as a website] – After collecting & indexing the “digital manifestations” describing items in the Catalogue, a Web-accessible version of the catalogue is available from the British Library. [4] From the about page: “The ‘English Short Title Catalogue’ (ESTC) began as the ‘Eighteenth Century Short Title Catalogue’, with a conference jointly sponsored by the American Society for Eighteenth-Century Studies and the British Library, held in London in June 1976. The aim of the original project was to create a machine-readable union catalogue of books, pamphlets and other ephemeral material printed in English-speaking countries from 1701 to 1800.” [5]

As outlined above, Freebo@ND is a collection of early English book-like materials as well as a set of services provided against them. The source data originates from the Text Creation Partnership, and it is manifested as a set of TEI/XML files with full/rich metadata as well as the mark up of every single word in every single document. To date, there are only 15,000 items in Freebo@ND, but when the project is complete, Freebo@ND ought to contain close to 60,000 items dating from 1460 to 1699. Given this data, Freebo@ND sports an online, full text index of the works collected to date. This online interface is both field searchable, free text searchable, and provides a facet browse interace. [6]

market by ericBut wait! There’s more!! (And this is the point.) Because the complete bibliographic data is available from the original data, it has been possible to create printed catalogs/indexes akin to the catalogs/indexes of old. These catalogs/indexes are available for downloading, and they include:

  • main catalog – a list of everything ordered by “accession” number; use this file in conjunction with your software’s find function to search & browse the collection [7]
  • author index – a list of all the authors in the collection & pointers to their locations in the repository; use this to learn who wrote what & how often [8]
  • title index – a list of all the works in the collection ordered by title; this file is good for “known item searches” [9]
  • date index – like the author index, this file lists all the years of item publication and pointers to where those items can be found; use this to see what was published when [10]
  • subject index – a list of all the Library Of Congress subject headings used in the collection, and their associated items; use this file to learn the “aboutness” of the collection as a whole [11]

These catalogs/indexes are very useful. It is really easy to load these them into your favorite text editor and to peruse them for items of interest. They are even more useful if they are printed! Using these catalogues/indexes it is very quick & easy to see how prolific any author was, how many items were published a given year, and what the published items were about. The library profession’s current tools do not really support such functions. Moreover, and unlike the cool (“kewl”) online interfaces alluded to above, these printed catalogs are easily updated, duplicated, shareable, and if bound can stand the test of time. Let’s see any online catalog last more than a decade and be so inexpensively produced.

“What is old is new again.”


† Actually, even if libraries where to have the full text of their collection readily available, the venerable library catalogues would probably not be able to use the extra content. This is because the digital manifestations of the bibliographic data can not be more than 100,000 characters long, and the existing online systems are not designed for full text indexing. To say the least, the inclusion of full text indexing in library catalogues would be revolutionary in scope, and it would also be the beginning of the end of traditional library cataloguing as we know it.

[1] Short-title catalogue or ESTC –
[2] Past, present, future –
[3] STC on CD-ROM –
[4] ESTC as website –
[5] ESTC about page –
[6] Freebo@ND search interface –
[7] main catalog –
[8] author index –
[9] title index –
[10] date index –
[11] subject index –

Victory near in 20-year fight to provide public with CRS reports / District Dispatch

Congressional Research Service (CRS) logoAfter nearly 20 years of advocacy by ALA, Congress has recently taken significant steps toward permanently assuring free public access to reports by the Congressional Research Service (CRS). Taxpayers fund these reports but generally have not been able to read them. ALA welcomes these moves to ensure the public can use these valuable aids to understanding public policy issues.

What are CRS Reports?
CRS is an agency, housed within the Library of Congress, that prepares public policy research for members of Congress. All members of Congress and their staffs have immediate access to these reports on topics ranging from avocado growing to zinc mining.

Political insiders know that these reports, produced by the nonpartisan expert staff at CRS, are excellent sources of information about nearly every conceivable public policy topic. But CRS reports have not been routinely published, and so they have only been accessible to those with a connection on Capitol Hill or through an unofficial third-party source.

ALA’s Calls for Public Access
ALA has long called for public access to CRS reports. ALA’s Council adopted a resolution on the topic in 1998, shortly before Sen. John McCain (R-Ariz.) and then-Rep. Chris Shays (R-Conn.) introduced the first legislation to post CRS reports online for public access. We have continued to advocate on the issue over the years, most recently by supporting the latest iteration of that legislation, the Equal Access to Congressional Research Service Reports Act.

What’s New
Both House and Senate appropriators have recently approved language to provide public access to CRS reports. Because appropriations are needed to fund the government, these are considered must-pass bills.

In the Senate, S. 1648 includes the language of the Equal Access to CRS Reports Act. In the House, similar provisions were included in H. Rept. 115-199: the report accompanying H.R. 3162 (which in turn was compiled into H.R. 3219).

What’s Next
Four key steps remain before we and our allies can declare victory in our nearly 20-year effort to provide public access to CRS reports:

  1. The House and Senate have to reconcile the (relatively minor) differences between their language on this issue.
  2. The provision has to survive any attempts to weaken or remove the language on the floor of the House or Senate when a reconciled bill or Report is considered;
  3. Both houses of Congress have to pass an identical bill; and
  4. The President has to sign it.

These are significant “ifs.” But, because these appropriations bills are necessary to keep the government open, there’s a real chance it will get done. Until then, ALA will continue to speak up for the public’s right to access this useful information.

The post Victory near in 20-year fight to provide public with CRS reports appeared first on District Dispatch.

Spark OAI Harvester / FOSS4Lib Updated Packages

Last updated August 18, 2017. Created by Peter Murray on August 18, 2017.
Log in to edit this page.

The DPLA is launching an open-source tool for fast, large-scale data harvests from OAI repositories. The tool uses a Spark distributed processing engine to speed up and scale up the harvesting operation, and to perform complex analysis of the harvested data. It is helping us improve our internal workflows and provide better service to our hubs.  The Spark OAI Harvester is freely available and we hope that others working with interoperable cultural heritage or science data will find uses for it in their own projects.

Package Type: 
Development Status: 
Operating System: 

Open Data Conference in Switzerland / Open Knowledge Foundation

This year’s, the Open Data Conference in Switzerland, was all about Open Smart Cities, Open Tourism & Transport Data, Open Science & Open Food Data. We learnt how Open Data can be a catalyst of digital transformation and a crucial factor for advancing data quality. We got insights into the role of open data in the daily work of journalists and learnt how open data portals make an important contribution to enable Switzerland to remain a leader and innovator in the digital world.

Over 200 people attended the conference: its’ program was composed of 8 keynotes and parallel afternoon tracks with a total of 18 workshops. A highlight of the conference was the visit of Pavel Richter, CEO of Open Knowledge International. Pavel emphasized the purpose of Open Knowledge International lying in empowering civil society organisation to use open data to improve people’s life, for instance by collaborating with human rights institutions. Recent key arguments for open data being “I can take it and put it somewhere else, in a safer place […] it works as a concept, because the data is not lost, it can be secured and re-used”. The entire Q&A with Pavel Richter and Barnaby Skinner in English is available here:

Another highlight was the closing keynote which was held by the president of the École Polytechnique Fédérale de Lausanne who spoke about “The role of “open” in digital Switzerland” and emphasized that public access to scientific data should be the norm so that the rest of the world can also profit from it. His entire talk is available in English here:

Furthermore we curated the following material for you:

Silencing @AusGLAMBlogs / Hugh Rundle

Silencing @AusGLAMBlogs

A little while ago, GLAM blogger Danielle asked if it would be possible to allow bloggers to stop AusGLAMblogs from indexing and tweeting a particular post. Many bloggers mix professional and very personal posts, so Danielle's request seemed pretty reasonable - bloggers don't want some personal posts to be pushed into their professional sphere, and, frankly, most GLAM professionals probably don't want to read them anyway.

I've finally gotten around to creating a simple workaround. My first solution was pretty straightforward:

var notGlam = _.contains(item.categories, 'notGLAM') || _.contains(item.categories, 'notglam');
  if (!notGlam) {
    // all the rest of the RSS ingest logic goes here
  } else {
    // don't do anything

The AusGLAMBlogs app and Twitter bot are entirely driven by RSS, that wonderful web technology of which pundits keep declaring the death, at precisely the time it is single handedly powering the spread of the new hotness in web content (aka podcasts). The app looks for new posts every ten minutes, and if it finds any, adds them to the listing and queues a tweet. The obvious way to stop a post from being added and tweeted is to include a bit of metadata in the RSS feed for that post, and filter it out. The code above simply says "if the post has a tag of 'notGLAM' or 'notglam', don't do anything".[1]

The thing that makes this a lot simpler is that Meteor, the JavaScript/nodejs framework I used to write the app, includes underscorejs by default (because it's needed within the Meteor code itself). Underscorejs is a really useful library of 'functional programming helpers' - it allows you to do things that are possible with plain vanilla JavaScript, but much simpler if you can just use the underscore function. In the example here, I use the npm feedparser package to grab the RSS feed and spit it out in nice, normalised JSON. Each post in an RSS feed will have an array called "categories", which includes anything a CMS or blogging platform calls a category or a tag. So _.contains does what you probably expect it would: returns true or false depending on whether the array does indeed 'contain' the value you're looking for.

I was feeling quite pleased that I'd found a simple solution to this problem, until I remembered that I've been pushing all the AusGLAMBlogs code to GitHub in the hope that it might be useful to (and used by) other people. That means that it really needs to be relatively easy to customise. The other problem with the initial solution is that if I want to add new filter tags I have to keep adding more 'or' statements. What we really want here is a list of filter tags, and then to check whether anything in that list is included in the tags from each post. Luckily, underscorejs saved the day again, with _.find. This function allows us to check each value in an array against a function, and return true or false (i.e. 'found') if the function returns true for any of the values. So we can combine both of these underscore functions to take each tag in a post, and run _.find against a function that asks if the filter list _.contains the tag:

var filterList = ["notGLAM", "notglam", "Notglam", "#notglam", "#notGLAM"];

var hiddenPost = _.find(item.categories, function(tag){
  return _.contains(filterList, tag)

if (!hiddenPost) {
  // all the rest of the RSS ingest logic goes here
} else {
  // don’t do anything

Now if someone wants to make their own app for, say, people who blog about cricket, they can change the filterList to use the tag notCricket and it will work the same way.

The upshot is - if you're writing a post for your usually-GLAM-themed blog and you don't want it to be ingested into AusGLAMBlogs and tweeted, simply include notGLAM as a tag. Simple!

Oh, and thanks to Danielle for the suggestion!

  1. Technically what is actually says is "If the post isn't not about GLAM, do stuff", because "not about GLAM" is an exception to the normal behaviour of the app, and I prefer to put exceptions second in an if / else statement. ↩︎

Delete Forensics / Ed Summers

TL;DR Deleted tweets in a #unitetheright dataset seem to largely be the result of Twitter proactively suspending accounts. Surprisingly, a number of previously identified deletes now appear to be available, which suggests users are temporarily taking their accounts private. Image and video URLs from protected, suspended and deleted accounts/tweets appear to still be available. The same appears to be true of Facebook.

Note: Data Artist Erin Gallagher provided lots of feedback and ideas for what follows in this post. Follow her on Medium to see more of her work, and details about this dataset shortly.

In my last post I jotted down some notes about how to identify deleted Twitter data using the technique of hydration. But, as I said near the end, calling these tweets deletes obscures what actually happened to the tweet. A delete implies that a user has decided to delete their tweet. Certainly this can happen, but the reality is a bit more complicated. Here are the scenarios I can think of (please get in touch if you can think of more):

  1. The user could have decided to protect their account, or take it private. This will result in all their tweets becoming unavailable except to those users who are an approved followers of the account.
  2. The user could have decided to delete their account, which has the effect of deleting all of their tweets.
  3. The user account could have been suspended by Twitter because it was identified as a source of spam or abuse of some kind.
  4. If the tweet is not itself a retweet the user could have simply decided to delete the individual tweet.
  5. If the tweet is a retweet then 1,2,3 or 4 may have have happened to the original tweet.
  6. If the tweet is a retweet and none of 1-4 hold then the user deleted their retweet. The original tweet still exists, but it is no longer marked as retweeted by the given user.

I know, this is like an IRS form from hell right? So how could we check these things programmatically? Let’s take a look at them one by one.

  1. If an account has been protected you can go to the user’s Twitter profile on the web and look for the text “This account’s Tweets are protected.” in the HTML.
  2. If the account has been completely deleted you can go to the user’s Twitter profile on the web and you will get a HTTP 404 Not Found error.
  3. If the account has been suspended, attempting to fetch the user’s Twitter profile on the web will result in a HTTP 302 Found response that redirects to
  4. If the tweet is not a retweet and fetching the tweet on the web results in a HTTP 404 Not Found then the individual tweet has been deleted.
  5. If the tweet is a retweet and one of 1, 2, 3 or 4 happened to the original tweet then that’s why it is no longer available.
  6. If the tweet is a retweet and the original tweet is still available on the web then the user has decided to delete their retweet, or unretweet (I really hope that doesn’t become a word).

With this byzantine logic in hand it’s possible to write a program to do automated lookups on the live web, with some caching to prevent looking up the same information more than once. It is a bit slow because I added a sleep to not go at too hard. The script also identifies itself with a link to the program on GitHub in the User-Agent string. I added this program to the utility scripts in the twarc repository.

So I ran on the #unitetheright deletes I identified previously and here’s what it found:

Result Count Percent
TWEET_OK 980 5.9%

I think it’s interesting to see that, at least with this dataset, the majority of the deletes were a result of Twitter proactively suspending users because of a tweet that had been retweeted a lot. Perhaps this is the result Twitter monitoring other users flagging the user’s tweets as abusive or harmful, or blocking the user entirely. I think it speaks well of Twitter’s attempts to try to make their platform a more healthy environment. But of course we don’t know how many tweets ought to have been suspended, so we’re only seeing part of the story–the content that Twitter actually made efforts to address. But they appear to be trying, which is good to see.

Another stat that struck me as odd was the number of tweets that were actually available on the web (TWEET_OK). These are tweets that appeared to be unavailable three days ago when I hydrated my dataset. So in the past three days 980 tweets that appeared to be unavailable have reappeared. Since there’s no trash can on Twitter (you can’t undelete a tweet) that means that the creators of these tweets must have protected their account, and then flipped it back to public. I guess it’s also possible that Twitter suspended them, and then reversed their action. I’ve heard from other people who will protect their account when a tweet goes viral to protect themselves from abuse and unwanted attention, and then turn it back to public again when the storm passes. I think this could be evidence of that happening.

One unexpected thing that I noticed in the process of digging through the results is that even after an account has been suspended it appears that media URLs associated with their tweets still resolve. For example the polNewsForever account was suspended but their profile image still resolves. In fact videos and images that polNewsForever have posted also still seem to resolve. The same is true of actual deletes. I’m not going to reference the images and videos here because they were suspended for a reason. So you will need to take my word for it…or run an experiment yourself…

FWIW, a quick test on Facebook shows that it works the same way. I created a public post with an image, copied the URL for the image, deleted the post, and the image URL still worked. Maybe the content expires in their CDNs at some point? It would be weird if it just lived their forever like a dead neuron. I guess this highlights why it’s important to limit the distribution of the JSON data that contain these URLs.

Since the avatar URLs are still available it’s possible to go through the suspended accounts and look at their avatar images. Here’s what I found:

suspended avatar images

Notice the pattern? They aren’t eggheads, but pretty close. Another interesting thing to note is that 52% of the suspended accounts were created August 11, 2017 or after (the date of the march). So a significant amount of the suspensions look like Twitter trying to curb traffic created by bots.

Open Data Handbook now available in the Nepali Language / Open Knowledge Foundation

On 7 August 2017 Open Knowledge Nepal launched the first version of Nepali Open Data Handbook – An introductory guidebook used by governments and civil society organizations around the world as an introduction and blueprint for open data projects. The book was launched by Mr. Krishna Hari Baskota, Chief Information Commissioner of National Information Commission, Dr. Nama Raj Budhathoki, Executive Director of Kathmandu Living Labs and Mr. Nikesh Balami, CEO of Open Knowledge Nepal at the event organized at Moods Lounge, Bluebird Mall, Kathmandu. Around 50 people working in the different sectors of open data attended the launch program.

The Open Data Handbook has been translated into more than 18 languages including Chinese, French, German, Indonesian, Italian, Japanese, Portuguese, Russian, Spanish. Now the Nepali language is also available at At the event a hard copy version of the Open Data Handbook was launched, which included the content from Global Open Data Handbook, Licensing terms from the Open Definition, some notable featured Open Data projects of Nepal and the process of asking information of the Nepal government using the Right To Information Act.

Open Knowledge Nepal believes the Nepali version of the Open Data Handbook will work as a perfect resource for government and civil society organizations (CSOs) to expand their understandings of open data and, ultimately, reap its benefits. Speaking at the event,

Mr. Nikesh Balami, CEO of Open Knowledge Nepal said “I believe that this Nepali version of the Open Data Handbook will help government policymakers, leaders, and citizens understand open data in their native language. It will also be a useful resource for CSOs to use for their own open data awareness programs, as well as data journalists who rely on data openness to report on local stories.” He thanked the volunteer who contributed on the translation, feedback, and review of the Handbook.

Mr. Krishna Hari Baskota, Chief Information Commissioner stressed the need for people in government to understand the value of open data. He also remarked that while the Nepal government is already a treasure trove of data, there is a need for more data to be created and made open. He highlighted the journey traveled by the Nepal Government in the path of open data and motivated youths to join the momentum.

Dr. Nama Raj Budhathoki, Executive Director of Kathmandu Living Labs said, “There should be an equal balance between supply and demand side of data and it’s a perfect time for Nepal to shift from Creation to Use”. Dr. Budhathoki shared his experiences of creating open data with OpenStreetMap and household surveys, and acknowledged the need for use of open data.

Open Knowledge Nepal envisions the impact of the Open Data Handbook to be mainly around the four different themes of open data: improving government, empowering citizens, creating opportunity, and solving public problems. To achieve impact within these different themes, solely having a good supply of data is not enough. We also need to ensure that the demand side is strong by increasing innovation, engagement, and reusability of published data. This Handbook will make it easier for government officials and the citizens of Nepal to learn more about open data in their native language. In doing so, it will help create a balanced environment between the supply and demand side of data, which in the long run will help promote and institutionalize transparency, accountability and citizen engagement in Nepal.

Fusion and JavaScript: Shared Scripts, Utility Functions and Unit Tests / Lucidworks


Lucidworks Fusion uses a data pipeline paradigm for both data ingestion (Index Pipelines) and for search (Query Pipelines).  A Pipeline consists of one or more ordered Pipeline Stages.  Each Stage takes input from the previous Stage and provides input to the following Stage. In the Index Pipeline case, the input is a document to be transformed prior to indexing in Apache Solr.

In the Query Pipelines case, the first stages manipulate a Query Request. A middle stage submits the request to Solr and the following stages can be used to manipulate the Query Response.

The out-of-the-box stages included in Lucidworks Fusion let the user perform many common tasks such as field mapping for an Index Pipeline or specialized Facet queries for the Query Pipeline.  However, as described in a previous article, many projects have specialized needs in which the flexibility of the JavaScript stage is needed.

The code snippets in this article have been simplified and shortened for convenience.  The full examples can be downloaded from my GitHub repo

Taking JavaScript to the Next Level with Shared Scripts, Utility Functions and Unit Tests

Throwing a few scripts into a pipeline to perform some customized lookups or parsing logic is all well and good, but sophisticated ingestion strategies could benefit from some more advanced techniques.

  • Reduce maintenance problems by reusing oft-needed utilities and functions.  Some of the advanced features of the Nashorn JavaScript engine largely eliminate the need to copy/paste code into multiple Pipelines.  Keeping a single copy reduces code maintenance problems.
  • Use a modern IDE for editing.  The code editor in Fusion is functional but it provides little help with code completion, syntax highlighting, identifying typos illuminating global variables or generally speeding development.
  • Use Unit Tests to help reduce bugs and ensure the health of a deployment.

Reusing Scripts

Lucidworks Fusion uses the standard Nashorn JavaScript engine which ships with Java 8.  The load() command, combined with an Immediately Invoked Function Expression (IIFE) allows a small pipeline script to load another script.  This allows common functionality to be shared across pipelines.  Here’s an example:

var loadLibrary = function(url){
    var lib = null;
    try{'\n\n*********\n*Try to library load from: ' + url);
      lib = load(url);// jshint ignore:line'\n\n**********\n* The library loaded from: ' + url);
      logger.error('\n\n******\n* The script at ' + url + ' is missing or invalid\n’ + e.message);
    return lib;

Get Help From an IDE

Any sort of JavaScript function or objects could be contained in the utilLib.js as shown above.  Below is a simple example of a library containing two handy functions.
Explanatory notes:

  • The wrapping structure i.e. (function(){…}).call(this); makes up the IIFE structure used to encapsulate the  util object.  While this is not strictly necessary, it provides a syntax easily understood by the IntelliJ IDE.
  • The globals comment at the top, as well as the jshint comment at the bottom, are hints to the JSHint code validation engine used in the IDE.  These suppress error conditions resulting from the Nashorn load() functionality and global variables set by the Java environment which invokes the JavaScript Pipeline Stage.
  • The IDE will have underlined potentially illegal code in red. The result is an opportunity to fix typos without having to repeatedly test-load the script and hunt thru a log file only to find a cryptic error message from the Nashorn engine.  Also, note the use of the “use strict” directive.  This tells JSHint to also look for things like the inadvertent declaration of global variables.
/* globals  Java,arguments*/
    "use strict";
    var util = {};
    util.isJavaType = function(obj){
        return (obj && 
		typeof obj.getClass === 'function' && 
		typeof obj.notify === 'function' && 
		typeof obj.hashCode === 'function');

     * For Java objects, return the short name, 
     * e.g. 'String' for a java.lang.String
     * JavaScript objects, usually use lower case.
     * e.g. 'string' for a JavaScript String
    util.getTypeOf = function getTypeOf(obj){
        'use strict';
        var typ = 'unknown';
        //test for java objects
        if( util.isJavaType(obj)){
            typ = obj.getClass().getSimpleName();
        }else if (obj === null){
            typ = 'null';
        }else if (typeof(obj) === typeof(undefined)){
            typ = 'undefined';
        }else if (typeof(obj) === typeof(String())){
            typ = 'string';
        }else if (typeof(obj) === typeof([])) {
            typ = 'array';
        else if ( === '[object Date]'){
                typ = 'date';
        }else {
            typ = obj ? typeof(obj) :typ;
        return typ;

    //return util to make it publicly accessible
    return util;
}).call(this); // jshint ignore: line

Overview of Utility Functions

Here is a summary description of some of the utility functions included in utilLib.js

index.concatMV(doc, fieldName, delim) Return a delimited String containing all values for a given field. If the names field contains values for ‘James’, ‘Jim’, ‘Jamie’, and ‘Jim’, calling index.concatMV(doc, ‘names’, ‘, ‘) would return “James, Jim, Jamie”

index.getFieldNames(doc, pattern) Return an array of field names in doc which match the pattern regular expression.

index.trimField(doc, fieldName) Remove all whitespace from all values of the field specified.  Leading and trailing whitespace is truncated and redundant whitespace within values is replaced with a single space.

util.concat(varargs) Here varargs can be one or more arguments of String or String[].  They will all be concatenated into a single String and returned.

util.dateToISOString(date) Convert a Java Date or JavaScript Date into an ISO 8601 formatted String.

util.dedup(arr) Remove redundant elements in an array.

util. decrypt(toDecrypt) Decrypt an AES encrypted String.

util. encrypt(toEncrypt) Encrypt a string with AES encryption.

util. getFusionConfigProperties() Read in the default Fusion config/ file and return it as a Java Properties object.

util.isoStringToDate(dateString) Convert an ISO 8601 formatted String into a Java Date.

util. queryHttp2Json(url) Perform an HTTP GET on a URL and parse the response into JSON.

util.stripTags(markupString) Remove markup tags from an HTML or XML string.

util.truncateString(text, len, useWordBoundary) Truncate text to a length of len.  If useWordBoundary is true break on the word boundary just before len.

Testing the Code

Automated unit testing of Fusion stages can be complicated.  Unit testing shared utility functions intended for use in Fusion stages is even more difficult.  A full test harness is beyond the scope of this Blog, but the essentials can be accomplished with the command-line curl utility or an REST client like Postman.

  • Start with a well-known state in the form of a pre-made PipelineDocument. To see an example of the needed JSON, look at what is produced by the Logging Stage which comes with Fusion.
  •  POST the PipelineDocument Fusion using the Index Pipelines API.  You will need to pass an ID, and Collection name as parameters as well as the trailing “/index” path in order to invoke the pipeline.
  • The POST operation should return the document as modified by the pipeline.  Inspect it and signal Pass or Fail events as needed.

Unit tests can also be performed manually by running the Pipeline within Fusion.  This could be part of a Workbench simulation or an actual Ingestion/Query operation.  The utilLib.js contains a rudimentary test harness for executing tests and comparing the results to an expected String value.  The results of tests are written both to the connections.log or api.log as well as being pushed into the Stage’s context map in the _runtime_test_results element as shown below.  The first test shows that util.dedup(‘a’, ‘b’, ‘c’, ‘a’, ‘b’) but the results do not contain the duplicates. Other common tests are also performed.  For complete details see the index.runTests() function in utilLib.js.


This article demonstrates how to load shareable JavaScript into Fusion’s Pipeline Stages so that common functions can be shared across pipelines.  It also contains several handy utility functions which can be used as-is or as a building blocks in more complex data manipulations.  Additionally, ways to avoid common pitfalls such as JavaScript syntax typos and unintended global variables were shown.  Finally, a Pipeline Simulation was run and the sample unit-test results were shown.


Special thanks to Carlos Valcarcel and Robert Lucarini of Lucidwoks as well as Patrick Hoeffel and Matt Kuiper at Polaris Alpha for their help and sample scripts.

The post Fusion and JavaScript: Shared Scripts, Utility Functions and Unit Tests appeared first on Lucidworks.

PubMed Lets Google Track User Searches / Eric Hellman

CT scan of a Mesothelioma patient.
CC BY-SA by Frank Gaillard
If you search on Google for "Best Mesothelioma Lawyer" and then click on one of the ads, Google can earn as much as a thousand dollars for your click. In general, Google can make a lot of money if it knows you're the type of user who's interested in rare types of cancer. So you might be surprised that Google gets to know everything you search for when you use PubMed, the search engine offered by the National Center for Biotechnology Information (NCBI), a service of the National Library of Medicine (NLM) at the National Institutes of Health (NIH). Our tax dollars work really hard and return a lot of value at NCBI, but I was surprised to discover Google's advertising business is getting first crack at that value!

You may find this hard to believe, but you shouldn't take may word for it. Go and read the NLM Privacy Policy,  in particular the section on "Demographic and Interest Data"
On some portions of our website we have enabled Google Analytics and other third-party software (listed below), to provide aggregate demographic and interest data of our visitors. This information cannot be used to identify you as an individual. While these tools are used by some websites to serve advertisements, NLM only uses them to measure demographic data. NLM has no control over advertisements served on other websites.
DoubleClick: NLM uses DoubleClick to understand the characteristics and demographics of the people who visit NLM sites. Only NLM staff conducts analyses on the aggregated data from DoubleClick. No personally identifiable information is collected by DoubleClick from NLM websites. The DoubleClick Privacy Policy is available at
You can opt-out of receiving DoubleClick advertising at
I will try to explain what this means and correct some of the misinformation it contains.

DoubleClick is Google's display advertising business. DoubleClick tracks users across websites using "cookies" to collect "demographic and interest information" about users. DoubleClick uses this information to improve its ad targeting. So for example, if a user's web browsing behavior suggests an interest in rare types of cancer, DoubleClick might show the user an ad about mesothelioma. All of this activity is fully disclosed in the DoubleClick Privacy Policy, which approximately 0% of PubMed's users have actually read. Despite what the NLM Privacy Policy says, you can't opt-out of receiving DoubleClick Advertising, you can only opt out of DoubleClick Ad Targeting. So instead of Mesothelioma ads, you'd probably be offered deals at

It's interesting to note that before February 21 of this year, there was no mention of DoubleClick in the privacy policy (see the previous policy ). Despite the date, there's no reason to think that the new privacy policy is related to the change in administrations, as NIH Director Francis Collins was retained in his position by President Trump. More likely it's related to new leadership at NLM. In August of 2016, Dr. Patricia Flatley Brennan became NLM director. Dr. Brennan, a registered nurse and an engineer, has emphasized the role of data to the Library's mission. In an interview with the Washington Post, Brennan noted:
In the 21st century we’re moving into data as the basis. Instead of an experiment simply answering a question, it also generates a data set. We don’t have to repeat experiments to get more out of the data. This idea of moving from experiments to data has a lot of implications for the library of the future. Which is why I am not a librarian.
The "demographic and interest data" used by NLM is based on individual click data collected by Google Analytics. As I've previously written, Google Analytics  only tracks users across websites if the site-per-site tracker IDs can be connected to a global tracker ID like the ones used by DoubleClick. What NLM is allowing Google to do is to connect the Google Analytics user data to the DoubleClick user data. So Google's advertising business gets to use all the Google Analytics data, and the Analytics data provided to NLM can include all the DoubleClick "demographic and interest" data.

What information does Google receive when you do a search on Pubmed?
For every click or search, Google's servers receive:
  • your search term and result page URL
  • your DoubleClick user tracking ID
  • your referring page URL
  • your IP address
  • your browser software and operating system
While "only NLM staff conducts analyses on the aggregated data from DoubleClick", the DoubleClick tracking platform analyzes the unaggregated data from PubMed. And while it's true that "the demographic and interest data" of PubMed visitors cannot be used to identify them as  individuals, the data collected by the Google trackers can trivially be used to identify as individuals any PubMed users who have Google accounts. Last year, Google changed its privacy policy to allow it to associate users' personal information with activity on sites like PubMed.
"Depending on your account settings, your activity on other sites and apps may be associated with your personal information in order to improve Google’s services and the ads delivered by Google.
So the bottom line is that Google's stated policies allow Google to associate a user's activity on PubMed with their personal information. We don't know if Google makes use of PubMed activity or if the data is saved at all, but NLM's privacy policy is misleading at best on this fact.

Does this matter? I have written that commercial medical journals deploy intense advertising trackers on their websites, far in excess of what NLM is doing. "Everybody" does it. And  we know that agencies of the US government spend billions of dollars sifting through web browsing data looking for terrorists, so why should NLM be any different? So what if Google gets a peek at PubMed user activity - they see such a huge amount of user data that PubMed is probably not even noticeable.

Google has done some interesting things with search data. For example, the "Google Flu Trends" and "Google Dengue Trends" projects studied patterns of searches for illness - related terms. Google could use the PubMed Searches for similar investigations into health provider searches.

The puzzling thing about NLM's data surrender is the paltry benefit it returns. While Google gets un-aggregated, personally identifiable data, all NLM gets is some demographic and interest data about their users. Does NLM really want to better know the age, gender, and education level of PubMed users??? Turning on the privacy features of Google Analytics (i.e. NOT turning on DoubleClick) has a minimal impact on the usefulness of the usage data it provides.

Lines need to be drawn somewhere. If Google gets to use PubMed click data for its advertising, what comes next? Will researchers be examined as terror suspects if they read about nerve toxins or anthrax? Or perhaps inquiries into abortifactants or gender-related hormone therapies will be become politically suspect. Perhaps someone will want a list of people looking for literature on genetically modified crops, or gun deaths, or vaccines? Libraries should not be going there.

So let's draw the line at advertising trackers in PubMed. PubMed is not something owned by a publishing company,  PubMed belongs to all of us. PubMed has been a technology leader worthy of emulation by libraries around the world. They should be setting an example. If you agree with me that NLM should stop letting Google track PubMed Users, let Dr. Brennan know (politely, of course.)

  1. You may wonder if the US government has a policy about using third party services like Google Analytics and DoubleClick. Yes, there is a policy, and NLM appears to be pretty much in compliance with that policy.
  2. You might also wonder if Google has a special agreement for use of its services on US government websites. It does, but that agreement doesn't amend privacy policies. And yes, the person signing that policy for Google subsequently became the third CTO of the United States.
  3.  I recently presented a webinar which covered the basics of advertising in digital libraries in the National Network of Libraries of Medicine [NNLM] "Kernal of Knowledge" series.
  4. (8/16) Yes, this blog is served by Google. So if you start getting ads for privacy plug-ins...
  5. (8/16) is a tool you can use to see what goes on under the cover when you search on PubMed. Tip from Gary Price.

Jobs in Information Technology: August 16, 2017 / LITA

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week

University of North Florida-Thomas G. Carpenter Library, Online Learning Librarian, Jacksonville, FL

Miami University Libraries, Web Services Librarian, Oxford, OH

John M. Pfau Library, CSU, San Bernardino, Information Technology Librarian, San Bernardino, CA

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

Age of Asymmetries / Dan Cohen

Cory Doctorow’s 2008 novel Little Brother traces the fight between hacker teens and an overactive surveillance state emboldened by a terrorist attack in San Francisco. The novel details in great depth the digital tools of the hackers, especially the asymmetry of contemporary cryptography. Simply put, today’s encryption is based on mathematical functions that are really easy in one direction—multiplying two prime numbers to get a large number—and really hard in the opposite direction—figuring out the two prime numbers that were multiplied together to get that large number.

Doctorow’s speculative future also contains asymmetries that are more familiar to us. Terrorist attacks are, alas, all too easy to perpetrate and hard to prevent. On the internet, it is easy to be loud and to troll and to disseminate hate, and hard to counteract those forces and to more quietly forge bonds.

The mathematics of cryptography are immutable. There will always be an asymmetry between that which is easy and that which is hard. It is how we address the addressable asymmetries of our age, how we rebalance the unbalanced, that will determine what our future actually looks like.

PlayStation and Lucene: Indexing 1M Docs per Second on 18 Servers / Lucidworks

As we countdown to the annual Lucene/Solr Revolution conference in Las Vegas next month, we’re highlighting talks and sessions from past conferences. Today, we’re highlighting Sony Interactive Entertainment’s Alexander Filipchik’s talk, “PlayStation and Lucene: Indexing 1M Docs per Second on 18 Servers”.

PlayStation4 is a not just a gaming console. The PlayStation Network is a system that handles more than 70 millions active users, and in order to create an awesome gaming experience has to support  personalized search at scale. The systems that provide this personalized experience indexes up to 1M documents per second using Lucene and only uses 18 mid sized Amazon instances.  This talk covers how the PlayStation team personalizes search for their users at scale with Lucene.


lucenerevolution-avatarJoin us at Lucene/Solr Revolution 2017, the biggest open source conference dedicated to Apache Lucene/Solr on September 12-15, 2017 in Las Vegas, Nevada. Come meet and network with the thought leaders building and deploying Lucene/Solr open source search technology. Full details and registration…

The post PlayStation and Lucene: Indexing 1M Docs per Second on 18 Servers appeared first on Lucidworks.

Message from Reshma Saujani, founder of Girls Who Code / District Dispatch

Reshma Saujani, the founder and CEO of the national non-profit organization Girls Who Code, has taught computing skills to and inspired more than 10,000 girls across America. At the opening general session of the 2017 ALA Annual Conference this past June, Reshma spoke about Girls Who Code, how they are working to teach 100,000 girls to code by the end of 2018, and the organization’s many intersections with libraries.

Reshma is motivated to make sure that libraries – especially those who are interested in developing coding resources and programs – know about her free resources. As you will read in her message below, she invites ALA members and advocates to join the Girls Who Code movement.

To request a free Girls Who Code Starter Kit, including tips for leaders, giveaways and more, email:

 Girls Who Code Book CoverI’m Reshma Saujani, the CEO & Founder of Girls Who Code, the national nonprofit working to close the gender gap in tech.

Computing skills are the most sought-after in the US job market, but girls across the US are being left behind. Today, less than a quarter of computing jobs are held by women, and that number is declining.

First off, I am not a coder. My background is as a lawyer and politician. In 2010, I was the first South Asian-American woman to run for Congress. When I was running for office, I spent a lot of time visiting schools. That’s when I noticed something. In every computer lab, I saw dozens of boys learning to code and training to be tech innovators. But there were barely any girls! This didn’t seem right to me. I did some research and learned that by 2020, there will be 1.4 million open jobs in computing, but fewer than 1 in 5 computer science graduates are women. With women making up almost half of our work force, it’s imperative for our economy that we’re preparing our girls for the future of work.

I decided I was going to teach girls to code and close the gender gap in tech. What started as an experiment with 20 girls in a New York City classroom has grown to a movement of 40,000 middle and high school girls across the states.

In 2017, we’re expanding our movement with the launch of a 13-book series as an invitation for girls everywhere to learn to code and change the world. These books include explanations of computer science concepts using real life examples; relatable characters and profiles of women in tech. It’s one of the first times that the story of computer science has been told through so many girls’ voices. We’re doing this because literary representation matters; one of the best ways to spark girls’ interest is to share stories of girls who look like them. When you teach girls to code, they become change agents and can build apps, programs, and movements to help tackle our country’s toughest problems.

With these books and our Clubs Program, Girls Who Code seeking to teach 100,000 girls to code by the end of 2018. Clubs are free after-school programs for girls to use computer science to impact their community and join our sisterhood of supportive peers and role models. Clubs are led by Facilitators, who can be librarians, teachers, computer scientists, parents or volunteers from any background or field. Many Facilitators have no technical experience and learn to code alongside their Club members.

We hope you’ll join our movement by bringing these books and a Club to your library.

The post Message from Reshma Saujani, founder of Girls Who Code appeared first on District Dispatch.

ES6 export/import / Alf Eaton, Alf

Without default export

utils.js (exporting)

export const foo = () => 'ooo'
export const bar = () => 'xxx'


const foo = () => 'ooo'
const bar = () => 'xxx'

export { foo, bar }

app.js (importing)

import { foo, bar } from './utils'



import * as utils from './utils'

index.js (re-exporting)

export * from './utils'

With default export

utils.js (exporting)

export const bar = () => 'xxx'
export default () => 'ooo'

app.js (importing)

import foo, { bar } from './utils'

index.js (re-exporting)

export { default as foo, bar } from './utils'


export default from './utils' // re-export only the default export

OpenSpending platform update / Open Knowledge Foundation


OpenSpending is a free, open and global platform to search, visualise, and analyse fiscal data in the public sphere. This week, we soft launched an updated technical platform, with a newly designed landing page. Until now dubbed “OpenSpending Next”, this is a completely new iteration on the previous version of OpenSpending, which has been in use since 2011.

Landing page at


At the core of the updated platform is Fiscal Data Package. This is an open specification for describing and modelling fiscal data, and has been developed in collaboration with GIFT. Fiscal Data Package affords a flexible approach to standardising fiscal data, minimising constraints on publishers and source data via a modelling concept, and enabling progressive enhancement of data description over time. We’ll discuss in more detail below.

From today:

  • Publishers can get started publishing fiscal data with the interactive Packager, and explore the possibilities of the platform’s rich API, advanced visualisations, and options for integration.
  • Hackers can work on a modern stack designed to liberate fiscal data for good! Start with the docs, chat with us, or just start hacking.
  • Civil society can access a powerful suite of visualisation and analysis tools, running on top of a huge database of open fiscal data. Discover facts, generate insights, and develop stories. Talk with us to get started.

All the work that went into this new version of OpenSpending was only made possible by our funders along the way. We want to thank Hewlett, Adessium, GIFT, and the consortium for helping fund this work.

As this is now completely public, replacing the old OpenSpending platform, we do expect some bugs and issues. If you see anything, please help us by opening a ticket on our issue tracker.


The updated platform has been designed primarily around the concept of centralised data, decentralised views: we aim to create a large, and comprehensive, database of fiscal data, and provide various ways to access that data for others to build localised, context-specific applications on top. The major features of relevance to this approach are described below.

Fiscal Data Package

As mentioned above, Fiscal Data Package affords a flexible approach to standardising fiscal data. Fiscal Data Package is not a prescriptive standard, and imposes no strict requirements on source data files.

Instead, users “map” source data columns to “fiscal concepts”, such as amount, date, functional classification, and so on, so that systems that implement Fiscal Data Package can process a wide variety of sources without requiring change to the source data formats directly.

A minimal Fiscal Data Package only requires mapping an amount and a date concept. There are a range of additional concepts that make fiscal data usable and useful, and we encourage the mapping of these, but do not require them for a valid package.

Based on this general approach to specifying fiscal data with Fiscal Data Package, the updated OpenSpending likewise imposes no strict requirements on naming of columns, or the presence of columns, in the source data. Instead, users (of the graphical user interface, and also of the application programming interfaces) can provide any source data, and iteratively create a model on top of that data that declares the fiscal measures and dimensions.



The Packager is the user-facing app that is used to model source data into Fiscal Data Packages. Using the Packager, users first get structural and schematic validation of the source files, ensuring that data to enter the platform is validly formed, and then they can model the fiscal concepts in the file, in order to publish the data. After initial modelling of data, users can also remodel their data sources for a progressive enhancement approach to improving data added to the platform.


The Explorer is the user-facing app for exploration and discovery of data available on the platform.


The Viewer is the user-facing app for building visualisations around a dataset, with a range of options, for presentation, and embedding views into 3rd party websites.


The DataMine is a custom query interface powered by Re:dash for deep investigative work over the database. We’ve included the DataMine as part of the suite of applications as it has proved incredibly useful when working in conjunction with data journalists and domain experts, and also for doing quick prototype views on the data, without the limits of API access, as one can use SQL directly.



The Datastore is a flat file datastore with source data stored in Fiscal Data Packages, providing direct access to the raw data. All other databases are built from this raw data storage, providing us with a clear mechanism for progressively enhancing the database as a whole, as well as building on this to provide such features directly to users.

Analytics and Search

The Analytics API provides a rich query interface for datasets, and the search API provides exploration and discovery capabilities across the entire database. At present, search only goes over metadata, but we have plans to iterate towards full search over all fiscal data lines.

Data Importers

Data Importers are based on a generic data pipelining framework developed at Open Knowledge International called Data Package Pipelines. Data Importers enable us to do automated ETL to get new data into OpenSpending, including the ability to update data from the source at specified intervals.

We see Data Importers as key functionality of the updated platform, allowing OpenSpending to grow well beyond the one thousand plus datasets that have been uploaded manually over the last five or so years, towards tens of thousands of datasets. A great example of how we’ve put Data Importers to use is in the EU Structural Funds data that is part of the Subsidy Stories project.


It is slightly misleading to announce the launch today, when we’ve in fact been using and iterating on OpenSpending Next for almost 2 years. Some highlights from that process that have led to the platform we have today are as follows. with Adessium

Adessium provided Open Knowledge International with funding towards fiscal transparency in Europe, which enabled us to build out significant parts of the technical platform, commision work with J++ on Agricultural Subsidies , and, engage in a productive collaboration with Open Knowledge Germany on what became, which even led to another initiative from Open Knowledge Germany called The Story Hunt.

This work directly contributed to the technical platform by providing an excellent use case for the processing of a large, messy amount of source data into a normalised database for analysis, and doing so while maintaining data provenance and the reproducibility of the process. There is much to do in streamlining this workflow, but the benefits, in terms of new use cases for the data, are extensive.

We are particularly excited by this work, and the potential to continue in this direction, by building out a deep, open database as a potential tool for investigation and telling stories with data. via Horizon 2020

As part of the consortium, we were able to both build out parts of the technical platform, and have a live use case for the modularity of the general architecture we followed. A number of components from the core OpenSpending platform have been deployed into the platform with little to no modification, and the analytical API from OpenSpending was directly ported to run on top of a triple store implementation of the data model.

An excellent outcome of this project has been the close and fruitful work with both Open Knowledge Germany and Open Knowledge Greece on technical, community, and journalistic opportunities around OpenSpending, and we plan for continuing such collaborations in the future.

Work on Fiscal Data Package with GIFT

Over three phases of work since 2015 (the third phase is currently running), we’ve been developing Fiscal Data Package as a specification to publish fiscal data against. Over this time, we’ve done extensive testing of the specification against a wide variety of data in the wild, and we are iterating towards a v1 release of the specification later this year.

We’ve also been piloting the specification, and OpenSpending, with national governments. This has enabled extensive testing of both the manual modeling of data to the specification using the OpenSpending Packager, and automated ETL of data into the platform using the Data Package Pipelines framework.

This work has provided the opportunity for direct use by governments of a platform we initially designed with civil society and civic tech actors in mind. We’ve identified difficulties and opportunities in this arena at both the implementation and the specification level, and we look forward to continuing this work and solving use cases for users inside government.


Many people have been involved in building the updated technical platform. Work started back in 2014 with an initial architectural vision articulated by our peers Tryggvi Björgvinsson and Rufus Pollock. The initial vision was adapted and iterated on by Adam Kariv (Technical Lead) and Sam Smith (UI/X), with Levko Kravets, Vitor Baptista, and Paul Walsh. We reused and enhanced code from Friedrich Lindenberg. Lazaros Ioannidis and Steve Bennett made important contributions to the code and the specification respectively. Diana Krebs, Cecile Le Guen, Vitoria Vlad and Anna Alberts have all contributed with project management, and feature and design input.

What’s next?

There is always more work to do. In terms of technical work, we have a long list of enhancements.
However, while the work we’ve done in the last years has been very collaborative with our specific partners, and always towards identified use cases and user stories in the partnerships we’ve been engaged in, it has not, in general, been community facing. In fact, a noted lack of community engagement goes back to before we started on the new platform we are launching today. This has to change, and it will be an important focus moving forward. Please drop by at our forum for any feedback, questions, and comments.

Using the Global Open Data Index to strengthen open data policies: Best practices from Mexico / Open Knowledge Foundation

This is a blog post coauthored with Enrique Zapata, of the Mexican National Digital Strategy.

As part of the last Global Open Data Index (GODI), Open Knowledge International (OKI) decided to have a dialogue phase, where we invited individuals, CSOs, and national governments to exchange different points of view, knowledge about the data and understand data publication in a more useful way.

In this process, we had a number of valuable exchanges that we tried to capture in our report about the state of open government data in 2017, as well as the records in the forum. Additionally, we decided to highlight the dialogue process between the government and civil society in Mexico and their results towards improving data publication in the executive authority, as well as funding to expand this work to other authorities and improve the GODI process. Here is what we learned from the Mexican dialogue:

The submission process

During this stage, GODI tries to directly evaluate how easy it is to find and their data quality in general. To achieve this, civil society and government actors discussed how to best submit and agreed to submit together, based on the actual data availability.

Besides creating an open space to discuss open data in Mexico and agreeing on a joint submission process, this exercise showed some room for improvement in the characteristics that GODI measured in 2016:

  • Open licenses: In Mexico and many other countries, the licenses are linked to datasets through open data platforms. This showed some discrepancies with the sources referenced by the reviewers since the data could be found in different sites where the license application was not clear.
  • Data findability: Most of the requested datasets assess in GODI are the responsibility of the federal government and are available in Nevertheless, the titles to identify the datasets are based on technical regulation needs, which makes it difficult for data users to easily reach the data.
  • Differences of government levels and authorities: GODI assesses national governments but some of these datasets – such as land rights or national laws – are in the hands of other authorities or local governments. This meant that some datasets can’t be published by the federal government since it’s not in their jurisdiction and they can’t make publication of these data mandatory.

Open dialogue and the review process

During the review stage, taking the feedback into account, the Open Data Office of the National Digital Strategy worked on some of them. They summoned a new session with civil society, including representatives from the Open Data Charter and OKI in order to:

  • Agree on the state of the data in Mexico according to GODI characteristics;
  • Show the updates and publication of data requested by GODI;
  • Discuss paths to publish data that is not responsibility of the federal government;
  • Converse about how they could continue to strengthen the Mexican Open Data Policy.


The results

As a result of this dialogue, we agreed six actions that could be implemented internationally beyond just the Mexican context both by governments with centralised open data repositories and those which don’t centralise their data, as well as a way to improve the GODI methodology:

  1. Open dialogue during the GODI process: Mexico was the first country to develop a structured dialogue to agree with open data experts from civil society about submissions to GODI. The Mexican government will seek to replicate this process in future evaluations and include new groups to promote open data use in the country. OKI will take this experience into account to improve the GODI processes in the future.
  2. Open licenses by default: The Mexican government is reviewing and modifying their regulations to implement the terms of Libre Uso MX for every website, platform and online tool of the national government. This is an example of good practice which OKI have highlighted in our ongoing Open Licensing research.
  3. “GODI” data group in CKAN: Most data repositories allow users to create thematic groups. In the case of GODI, the Mexican government created the “Global Open Data Index” group in This will allow users to access these datasets based on their specific needs.
  4. Create a link between government built visualization tools and The visualisations and reference tools tend to be the first point of contact for citizens. For this reason, the Mexican government will have new regulations in their upcoming Open Data Policy so that any new development includes visible links to the open data they use.
  5. Multiple access points for data: In August 2018, the Mexican government will launch a new section on to provide non-technical users easy access to valuable data. These data called “‘Infraestructura de Datos Abiertos MX’ will be divided into five easy-to-explore and understand categories.
  6. Common language for data sets: Government naming conventions aren’t the easiest to understand and can make it difficult to access data. The Mexican government has agreed to change the names to use more colloquial language can help on data findability and promote their use. In case this is not possible with some datasets, the government will go for an option similar to the one established in point 5.

We hope these changes will be useful for data users as well as other governments who are looking to improve their publication policies. Got any other ideas? Share them with us on Twitter by messaging @OKFN or send us an email to


#NoFilter: Creating Compelling Visual Content for Social Media / LITA

The #NoFilter series has as its focus the numerous challenges that surround social media and its use in the library. In previous posts, I discussed sources for content inspiration as well as tips for content planning. This entry will concentrate on creating compelling visual content for your library’s social media.

A strong visual component for a social media post is imperative for capturing the attention of users and bringing them into dialogue with the library and forming the relationships that are key to institutional social media success.  Social media is not a one-way self-promotional tool for a library, but rather an interactive space allowing a library to engage meaningfully with users, cultivate their support and kindle their enthusiasm for the library’s work. Quality visual content in a social media post has the potential to spur conversations with/among users who in turn share the library’s content with ever-wider audiences.

Below are three tips for generating compelling visual content for your library’s social media posts:

1915 honey bread recipe

Recipe card created using Canva for a Honey Bread Recipe from 1915. The card was shared on the Othmer Library’s Pinterest and Tumblr where it elicited numerous user responses.

  1. Craft an aesthetically pleasing design using Canva, the user-friendly web-based graphic design service. Utilizing one of Canva’s social media templates makes the creation process all that more efficient. If graphic design is not your forte, you can complete Canva’s Design Essentials tutorials which cover everything from fonts and colors to images and backgrounds.
  2. Assemble an infographic to display information in a captivating way. Canva makes this easy with its infographic template. PowerPoint can also be used for this purpose. The Penn Libraries provide excellent instructions for this process on its Infographics Guide.
  3. Bring a static photo or illustration to life with an animated GIF (short for Graphic Interchange Format). At the Othmer Library of Chemical History, we employ Photoshop Elements in order to create GIFs of images in our rare books and archives. Does using Photoshop seem intimidating to you? The Smithsonian Libraries offer some useful tips and tricks in their 2014 blog post: Library Hacks: Creating Animated GIFs. My colleague also created a handy step-by-step guide for GIF-making: Animated GIF Basics.


What types of visual content do you share on your library’s social media? Do you have any tips for creating compelling visuals? Share them in the comments below!

How to do text mining in 69 words / Eric Lease Morgan

Doing just about any type of text mining is a matter of: 0) articulating a research question, 1) acquiring a corpus, 2) cleaning the corpus, 3) coercing the corpus into a data structure one’s software can understand, 4) counting & tabulating characteristics of the corpus, and 5) evaluating the results of Step #4. Everybody wants to do Step #4 & #5, but the initial steps usually take more time than desired.


FCC extends Net Neutrality public comment period to August 30 / District Dispatch

FCC building in Washington, D.C. The FCC has extended the public comment period to August 30 to allow for On Friday, the FCC announced it would extend the public comment period on its proposal to roll back a 2015 order protecting net neutrality for an additional two weeks. This phase of the process is supposed to allow for “replies” to arguments raised by other commenters.

With close to 20 million comments in the public record so far, any additional time is useful. It’s worth noting, however, that many advocates have called for the FCC to release the consumer complaints received since the 2015 Open Internet Order went into effect and all documents related to the ombudsperson’s interactions with internet users. The comment extension, while welcome, does not address the fact the FCC has yet to make public more than 40,000 net neutrality complaints that could provide direct and relevant evidence in response to numerous questions that the FCC poses in this proceeding.

The extra time means more opportunities for the library community to engage. Even if you have already submitted comments, you can do so again “on reply” Here are a few easy strategies:

  • Submit a comment amplifying the library and higher education principles for an open internet.
  • You can cite to specific examples or arguments in the initial comments submitted by ALA and allies earlier in the proceeding.
  • Thousands of librarians and library staff from across the country have filed comments on their own or via the ALA’s action alert. Members of the library community called on the FCC to keep the current net neutrality rules and shared their worries that the internet with “slow lanes” would hurt libraries and the communities they serve. The comments below offer a few examples and may help with your comments:
    • The New Jersey Library Association submits: “Abandoning net neutrality in favor of an unregulated environment where some content is prioritized over other content removes opportunities for entrepreneurs, students and citizens to learn, grow and participate in their government. It will further enhance the digital divide and severely inhibit the ability of our nation’s libraries to serve those on both sides of that divide.”
    • “If net neutrality is to be abolished, then our critical online services could be restricted to ‘slow lanes’ unless we pay a premium,” wrote John, a public library employee in Georgia. “These include our job and career gateway, language learning software, grant finding, medical information, ebooks, and test preparation guides, such as for the GED and ASVAB. Ending net neutrality would hurt the people who need equal access the most. These people use our career gateway to find jobs, our grant finder to support their businesses and nonprofits, and use our test aids to earn their GED or get into the military. If we were forced to pay a premium to access these resources, it will limit our ability to fund our other programs and services.”
    • Catherine, a reference librarian at a major university in Oregon writes, “I [have] learned that imaginative online searching is an invaluable research tool for personal, professional, and scholarly interests. Yes, going online can be fun, but the internet must not be considered a plaything. Access must not be restricted or limited by corporate packaging.”
    • Hampton, a chief executive officer of a public library system in Maryland, wrote about all the functions and services of the modern library dependent on reliable, unfettered internet access: “In our library, we offer downloadable eBooks, eMagazines, and eAudiobooks as well as numerous databases providing courses through, language learning through Rosetta Stone, 365-days-a-year tutoring for kindergarten through adult with BrainFuse, and many more resources online. We have public computers with internet access as well as free WiFi in our fifteen libraries extending Internet access to thousands of customers who bring their tablets and smartphones to the library. We work with customers to help them in the health care marketplace, with applications for Social Security and jobs, and every conceivable use of the internet. Obviously, being relegated to lower priority internet access would leave our customers in a very difficult position.”
    • Others wrote with concerns about the need for access to information for democracy to thrive, like Carrie, an information professional from Michigan: “The internet is not merely a tool for media consumption, but is also a means of free expression, a resource for education, and most importantly, an implement of democracy. I will not mince words: Allowing corporations to manipulate the flow of information on the internet is not the way forward. An end to net neutrality would hurt businesses large and small, inhibit the free flow of speech online, and allow telecommunications corporations to unjustly interfere with market forces.”

Stay tuned via the District Dispatch and American Libraries blog posts.

The post FCC extends Net Neutrality public comment period to August 30 appeared first on District Dispatch.

Data-cards – a design pattern / Open Knowledge Foundation

Cross-posted on

It can be useful to recognise patterns in the challenges we face, and in our responses to those challenges. In doing this, we can build a library of solutions, a useful resource when similar challenges arise in the future. When working on innovative projects, as is often the case at Open Knowledge International, creating brand new challenges is inevitable. With little or no historical reference material on how best to tackle these challenges, paying attention to your own repeatable solutions becomes even more valuable.

From a user interface design point of view, these solutions come in the form of design patterns – reusable solutions to commonly occurring problems. Identifying, and using design patterns can help create familiar processes for users; and by not reinventing the wheel, you can save time in production too.

In our work on Data Packages, we are introducing a new task into the world – creating those data packages. This task can be quite simple, and it will ultimately be time saving for people working with data. That said, there is no escaping the fact that this is a task that has never before been asked of people, one that will need to be done repeatedly, and potentially, from within any number of interfaces.

It has been my task of late to design some of these interfaces; I’d like to highlight one pattern that is starting to emerge – the process of describing, or adding metadata to, the columns of a data table. I was first faced with this challenge when working on OS Packager. The objective was to present a recognisable representation of the columns, and facilitate the addition of metadata for each of those columns. The adding of data would be relatively straight forward, a few form fields. The challenge lay in helping the user to recognise those columns from the tables they originated. As anyone who works with spreadsheets on a regular basis will know, they aren’t often predictably or uniformly structured, meaning it is not always obvious what you’re looking at. Take them out of the familiar context of the application they were created in, and this problem could get worse. For this reason, just pulling a table header is probably not sufficient to identify a column. We wanted to provide a preview of the data, to give the best chance of it being recognisable. In addition to this, I felt it important to keep the layout as close as possible to that of say Excel. The simplest solution would be to take the first few rows of the table, and put a form under each column, for the user to add their metadata.



This is a good start, about as recognisable and familiar as you’re going to get. There is one obvious problem though, this could extend well beyond the edge of the users screen, leading to an awkward navigating experience. For an app aimed at desktop users, horizontal scrolling, in any of its forms, would be problematic.

So, in the spirit of the good ol’ webpage, let’s make this thing wrap. That is to say that when an element can not fit on the screen, it moves to a new “line”. When doing this we’ll need some vertical spacing where this new line occurs, to make it clear that one column is separate from the one above it. We then need horizontal spacing to prevent the false impression of grouping created by the rows.



The data-card was born. At the time of writing it is utilised in OS Packager, pretty closely resembling the above sketch.



Data Packagist is another application that creates data packages, and it faces the same challenges as described above. When I got involved in this project there was already a working prototype, I saw in this prototype data cards beginning to emerge.

It struck me that if these elements followed the same data card pattern created for OS Packager, they could benefit in two significant ways. The layout and data preview would again allow the user to more easily recognise the columns from their spreadsheet; plus the grid layout would lend itself well to drag and drop, which would mean avoiding multiple clicks (of the arrows in the screenshot above) when reordering. I incorporated this pattern into the design.



Before building this new front-end, I extracted what I believe to be the essence of the data-card from the OS Packager code, to reuse in Data Packagist, and potentially future projects. While doing so I thought about the current and potential future uses, and the other functions useful to perform at the same time as adding metadata. Many of these will be unique to each app, but there are a couple that I believe likely to be recurring:

  • Reorder the columns
  • Remove / ignore a column

These features combine with those of the previous iteration to create this stand-alone data-card project:

Time will tell how useful this code will be for future work, but as I was able to use it wholesale (changing little more than a colour variable) in the implementation of the Data Packagist front-end, it came at virtually no additional cost. More important than the code however, is having this design pattern as a template, to solve this problem when it arises again in the future.

Styling and theming React Components / Alf Eaton, Alf

When styling applications with global CSS stylesheets, generic class names become specialised over time (e.g. section_heading__active), and eventually cause problems when a style change in one part of the app accidentally affects usage of the same class name in another part of the app.

Importantly, when authoring app components, you don’t know (or want to know) what CSS classes are being used in the rest of the app.

When authoring components in JSX it’s tempting to just write inline styles, but these can be slow and make the element code more difficult to read.

There are some standardised solutions to this - scoped styles in Shadow DOM - but lack of browser support means that JS polyfills are required.

Another alternative is CSS Modules: when a CSS file is imported into a JS file, the class names in the CSS file are replaced with globally unique names and are provided as an object*.

A solution: CSS in JS

Define the styles inside the component and inject them with a Higher Order Component, generating unique class names at the same time.

In order for components to be themeable, there needs to be a way for theme variables to be available when writing styles*. A theme provider HOC makes the theme object available via React’s context, so the variables can be used in styles.

There are a few leading contenders:

  • styled-components: write styles in CSS, interpolate theme variables, injects a className string containing a single class name.
  • react-jss: write styles in JSS (objects), use theme variables as values, injects a classes object containing multiple class names.
  • styled-jsx: write CSS in <style> tags, scoped to the parent component.
  • material-ui: uses react-jss and adds an name to each style object, allowing any class to be targetted for overriding by the theme.

Further reading

  • CSS Modules aren’t currently supported by react-scripts. SASS is one way to define and use global variables in CSS, but is also not currently supported by react-scripts

UTR / Ed Summers

I’ve always intended to use this blog as more of a place for rough working notes as well as somewhat more fully formed writing. So in that spirit here are some rough notes for some digging into a collection of tweets that used the #unitetheright hashtag. Specifically I’ll describe a way of determining what tweets have been deleted.

Note: be careful with deleted Twitter data. Specifically be careful about how you publish it on the web. Users delete content for lots of reasons. Republishing deleted data on the web could be seen as a form of Doxing. I’m documenting this procedure for identifying deleted tweets because it can provide insight into how particularly toxic information is traveling on the web. Please use discretion in how you put this data to use.

So I started by building a dataset of #unitetheright data using twarc:

twarc search '#unitetheright' > tweets.json

I waited two days and then was able to gather some information about the tweets that were deleted. I was also interested in what content and websites people were linking to in their tweets because of the implications this has for web archives. Here are some basic stats about the dataset:

number of tweets: 200,113

collected at: 2017-08-13 11:46:05 EDT

date range: 2017-08-04 11:44:12 - 2017-08-13 15:45:39 UTC

tweets deleted: 16,492 (8.2%)

Top 10 Domains in Tweeted URls

Domain Count 518 91 83 47 32 22 17 16 15 15

Top 25 Tweeted URLs (after unshortening)

URL Count 1460 929 613 384 351 338 244 242 223 208 202 189 187 184 167 143 127 123 107 100 99 90 87 81 80


So how do you get a sense of what has been deleted from your data? While it might make sense to write a program to do this eventually, I find it can be useful to work in a more a more exploratory way on the command line first and then when I’ve got a good workflow I can put that into a program. I guess if I were a real data scientist I would be doing this in R or a Jupyter notebook at least. But I still enjoy working at the command line, so here are the steps I took to identify tweets that had been deleted from the original dataset:

First I extracted and sorted the tweet identifiers into a separate file using jq:

jq -r '.id_str' tweets.json | sort -n > ids.csv

Then I hydrated those ids with twarc. If the tweet has been deleted since it was first collected it cannot be hydrated:

twarc hydrate ids.csv > hydrated.json

I extracted these hydrated ids:

jq -r .id_str hydrated.json | sort -n > ids-hydrated.csv

Then I used diff to compare the pre and post hydration ids, and used a little bit of Perl to strip of the diff formatting, which results in a file of tweet ids that have been deleted.

diff ids.csv ids-hydrated.csv | perl -ne 'if (/< (\d+)/) {print "$1\n";}' > ids-deleted.csv

Since we have the data that was deleted we can now build a file of just deleted tweets. Maybe there’s a fancy way to do this on the command line but I found it easiest to write a little bit of Python to do it:

After you run it you should have a file delete.json. You might want to convert it to CSV with something like twarc’s utility to inspect in a spreadsheet program.

Calling these tweets deleted is a bit of a misnomer. A user could have deleted their tweet, deleted their account, protected their account or Twitter could have decided to suspend the users account. Also, the user could have done none of these things and simply retweeted a user who had done one of these things. Untangling what happened is left for another blog post. To be continued…

ALA celebrates 10 years of Google Policy Fellows / District Dispatch

ALA celebrates the 10th anniversary of the Google Policy Fellows Program.

ALA celebrates the 10th anniversary of the Google Policy Fellowship Program.

Last Friday, we said goodbye to our 2017 Google Policy Fellow Alisa Holahan. The week before her departure, she and OITP hosted a lunch and discussion for this year’s cohort of Google Policy Fellows.

Similar to the six Policy Fellows lunches we have hosted in the past (in 2016, 2015, 2014, 2013, 2012 and 2011), the gathering was an opportunity for the Fellows to explore the intersection of information technology policy and libraries. Fellows from various policy organizations including the Center for Democracy and Technology, the National Hispanic Media Coalition, and Public Knowledge attended to learn more about ALA’s role in shaping technology policy and addressing library needs.

Alan Inouye, Marijke Visser and Carrie Russell shared a brief overview of their roles and the focus at OITP and I represented the intersection between OITP and the OGR. After introductions, the conversation turned to a series of questions: How does the Ready to Code initiative support workforce innovation? How does the Washington Office set priorities? How do we decide our portfolios of work? The informal question-and-answer format generated an interesting exchange around libraries’ roles and interests in technology and innovation.

Most notably, this summer’s lunch marked the 10th anniversary of the Google Policy Fellow Program, of which ALA is a founding host organization. Since 2008, we have encouraged master’s and doctoral students in library and information studies or related areas with an interest in national public policy to apply and have now amassed a decade of alumni, including:

As the expanding role of libraries of all types evolves, the need for information professionals with Washington experience and savvy will continue to grow. The Washington Office is privileged to have hosted ten early-career professionals and to provide the means for them to obtain direct experience in national policy making.

The post ALA celebrates 10 years of Google Policy Fellows appeared first on District Dispatch.

Islandora 7.x Committers Calls are on a new platform / Islandora

Ever wanted to come to the bi-weekly Islandora Committers Call but don't like using Skype? Good news! We have moved the call to the FreeConferenceCallHD line used by the weekly Islandora CLAW Calls and many of our Interest Groups. You can join by phone or using the web interface in your browser, and join us in the #islandora IRC channel on Freenode for sharing links and text comments (there's a web version for that as well, if you don't want to run an IRC client).

How to join:

Using A Query Classifier To Dynamically Boost Solr Ranking / Lucidworks

As we countdown to the annual Lucene/Solr Revolution conference in Las Vegas next month, we’re highlighting talks and sessions from past conferences. Today, we’re highlighting Target’s Howard Wan’s talk, “Using A Query Classifier To Dynamically Boost Solr Ranking”.

About 40% of our queries at are ambiguous, which can result in products from many categories. For example, the query “red apple” can match the following products: a red apple ipod (electronic category), red apple fruit ( fresh produce ), red apple iphone case ( accessories). It is desirable to have a classifier to instruct Solr to boost items from the desire category. In addition, for a search engine with a small index, a good percentage of the queries may have little or no results. Is it possible to use the classifier to solve both problems? This talk discusses a classifier built from behavior data which can dynamically re-classify the query to solve both problems.


lucenerevolution-avatarJoin us at Lucene/Solr Revolution 2017, the biggest open source conference dedicated to Apache Lucene/Solr on September 12-15, 2017 in Las Vegas, Nevada. Come meet and network with the thought leaders building and deploying Lucene/Solr open source search technology. Full details and registration…

The post Using A Query Classifier To Dynamically Boost Solr Ranking appeared first on Lucidworks.

Evergreen 3.0 development update #14: results of the second Feedback Fest / Evergreen ILS

Display at the circulation desk of the Lucius Beebe Memorial Library in Wakefield, MA. Photo courtesy Jeff Klapes.

Since the previous update, another 61 patches have made their way into Evergreen’s master branch. Last week was the second Feedback Fest for the 3.0 release cycle, so let’s talk about some numbers.

A total of 52 bugs either had a pull request when the Feedback Fest started or got one during the course of the week. Of those, 23 resulted in patches getting merged. Some merges of particular note include:

  • A new database view for more efficiently retrieving lists of current and aged loans (bug 1695007)
  • Teaching Evergreen about the time zone that a library belongs to to help format due date and due displays (bug 1705524)
  • Teaching the web staff client how to print spine and item labels (bug 1704873)
  • Adding two more popularity metrics (bug 1688096)
  • Several improvements to Evergreen’s SIP server
  • Numerous fixes to the the internationalization infrastructure; the net effect is that more strings are available to translators

The fix for bug 1480432 warrants a particular shout-out, as it clarifies a subtle detail of the staff permissions system. It’s possible for a staff member to receive a given permission at more than one depth. For example, a staff member could have VIEW_USER permission at the branch level because of their primary profile, but could also have it at system level because of a secondary permission group. Prior to the bugfix, the actual depth that was applied would depend on hidden details of exactly when and how the permission was granted. As a result of the bugfix, there’s now a clear rule: if a staff account is granted a given permission multiple times, the depth applied is the one that grants the permission as broadly as possible. Many thanks to Michele Morgan for the bugfix and to Cesar Velez for writing unit tests.

Back to the numbers: three of the feedback fest bugs were tested and signed off, but not yet merged. Four of them got a pull request during the fest, while nine received comments of various sorts. In total, 75% of the feedback fest bugs received substantive feedback, which matches the results of the first Feedback Fest.

I would like to thank the following people who participated in Feedback Fest #2:

  • Galen Charlton
  • Jeff Davis
  • Bill Erickson
  • Jason Etheridge
  • Rogan Hamby
  • Blake Henderson
  • Kyle Huckins
  • Kathy Lussier
  • Terran McCanna
  • Michele Morgan
  • Andrea Neiman
  • Dan Pearl
  • Mike Rylander
  • Dan Scott
  • Chris Sharp
  • Ben Shum
  • Cesar Velez

It should be noted that several people also reported or commented on bugs that didn’t have a pull request.

As a reminder, the deadline for feature slush is this Friday, 18 August. Feature slush means that new features meant for inclusion in 3.0 should have a pull request on their bugs by end of business on the 18th (although as release manager and as somebody who hopes to be viewing the solar eclipse on the 21st, there is some wiggle room with that deadline). There will not be similar wiggle room for the feature freeze deadline of 1 September.

There are some large pull requests awaiting review, including the web staff serials module and the copy alert and suppression matrix, so please keep testing, y’all!

Duck trivia

Ducks can help streamline work at the circulation desk. The image in today’s post is courtesy of Jeff Klapes of the Lucius Beebe Memorial Library in Wakefield, MA, who says, “We have a monthly duck display at our circulation desk to divert the attention of the little ones while their parents check things out. It’s been wildly successful.”


Updates on the progress to Evergreen 3.0 will be published every Friday until general release of 3.0.0. If you have material to contribute to the updates, please get them to Galen Charlton by Thursday morning.

Make a nomination for the 2018 National Medal for Museum and Library Service / District Dispatch

IMLS LogoThe Institute of Museum and Library Services is now accepting nominations for the 2018 National Medal for Museum and Library Service awards. Anyone — an employee, a board member, a member of the public, or an elected official — can nominate an institution. To be considered, the institution must complete and return a nomination form by October 2, 2017.

In 2017, libraries from Iowa, California, South Carolina, Minnesota and Maine were selected to receive this high honor. In 2018, IMLS is particularly interested in libraries with programs that build community cohesion and serve as catalysts for positive community change, including programs that provide services for veterans and military families, at-risk children and families, the un- and under-employed and youth confronting barriers to STEM-related employment.

The ten winning institutions will be honored at a ceremony D.C. and are invited to host a two-day visit from StoryCorps to record community member stories. You can hear some of htese these moving impact stories, dating back to 2009, here.

Institutions interested in being considered should read the nomination form carefully and contact the designated program contacts with questions. The library-specific program contact for the National Medal for Museum and Library Service is Laura McKenzie, who can be reached at (202) 653-4644 or by email at

As we noted with the annoucement of the National Leadership Grants for Libraries and the Laura Bush 21st Century Librarian programs (deadline September 1!), an increase in nominations for the National Medal would send a signal to our Members of Congress that libraries are vital institutions in communities across the country. So, don’t delay — write your nomination today and celebrate the library workers who make our country a better place to live.

The post Make a nomination for the 2018 National Medal for Museum and Library Service appeared first on District Dispatch.

Enriching catalogue pages in Evergreen with Wikidata / Dan Scott

I'm part of the Music in Canada @ 150 Wikimedia project, organizing wiki edit-a-thons across Canada to help improve the presence of Canadian music and musicians in projects like Wikpedia, Wikidata, and Wikimedia Commons. It's going to be awesome, and it's why I invested time in developing and delivering the Wikidata for Librarians presentation at the CAML preconference.

Right now I'm at the Wikimania 2017 conference, because it is being held in Montréal--just down the road from me when you consider it is an international affair. The first two days were almost entirely devoted to a massive hackathon consisting of hundreds of participants with a very welcoming, friendly ambiance. It was inspiring, and I participated in several activities:

  • installing Wikibase--the technical foundation for Wikidata--from scratch
  • an ad-hoc data modelling session with Jan and Stacy Allison-Cassin that resulted in enhancing the periodicals structure on Wikidata

But I also had the itch to revisit and enhance the JavaScript widget that runs in our Evergreen catalogue which delivers on-demand cards of additional metadata about contributors to recorded works. I had originally developed the widget as a proof-of-concept for the potential value to cultural institutions of contributing data to Wikidata--bearing in mind a challenge put to the room at an Evergreen 2017 conference session that asked what tangible value linked open data offers--but it was quite limited:

  • it would only show a card for the first listed contributor to the work
  • it was hastily coded, and thus duplicated code, used shortcuts, and had no comments
  • the user interface was poorly designed
  • it was not explicitly licensed for reuse

So I spent some of my hackathon time (and some extra time stolen from various sessions) fixing those problems--so now, when you look at the catalogue record for a musical recording by the most excellent Canadian band Rush, you will find that each of the contributors to the album has a musical note (♩) which, when clicked, displays a card based on the data returned from Wikidata using a SPARQL query matching the contributor's name (limited in scope to bands and musicians to avoid too many ambiguous results).

I'm not done yet: the design is still very basic, but I'm happier about the code quality and it now supports queries for all of the contributors to a given album. It is also licensed for reuse under the GPL version 2 or later license, so as long as you can load the script in your catalogue and tweak a few CSS query selector statements to identify where the script should find contributor names and where it should place the cards, it should theoretically be usable in any catalogue of musical recordings. And with the clear "Edit on Wikidata" link, I hope that it encourages users to jump in and contribute if they find one of their favourite performers lacks (or shows incorrect!) information.

You can find the code on the Evergreen contributor git repository.

“Small functions considered harmful” / Jonathan Rochkind

From a blog post by Cindy Sridharan.

Remind you of any library/cultural-heritage sector codebases you’ve worked with lately?

Some people seem so enamored with small functions that the idea of abstracting any and every piece of logic that might seem even nominally complex into a separate function is something that is enthusiastically cheered on.

I’ve worked on codebases inherited from folks who’d internalized this idea to such an unholy extent that the end result was pretty hellish and entirely antithetical to all the good intentions the road to it was paved with. In this post, I hope to explain why some of the oft-touted benefits don’t always pan out the way one hopes and the times when some of the ideas can actually prove to be counterproductive.

I think blindly following Rubocop’s dictatorial micro-advice without a human thinking about the macro-level and “does this make the code more readable/maintainable” (and “what are the use-cases for flexibility, what dimensions do we expect to change or be changed? And how do we provide for that?”) can contribute to this.

My main problem with DRY is that it forces one into abstractions — nested and premature ones at that. Inasmuch as it’s impossible to abstract perfectly, the best we can do abstract well enough insofar as we can. Defining “well enough” is hard and is contingent on a large number of factors, some of them being:

— the nature of the assumptions underpinning the abstraction and how likely (and for how long) they are likely to hold water
— the extent to which the layers of abstractions underlying the abstraction in question are prone to remain consistent (and correct) in their implementation and design
— the flexibility and extensibility of both the underlying abstractions as well as any abstraction built on top of the abstraction in question currently
— the requirements and expectations of any future abstractions that might be built on top of the abstraction in question

…DRYing up code to the fullest extent possible right now would mean depriving our future selves of the flexibility to accommodate any changes that might be required. It’s akin to trying to find the perfect fit, when what we really should be optimizing for is to allow ourselves enough leeway to make the inevitable changes that will be required sooner or later.

Or as Sandi Metz has said, “duplication is far cheaper than the wrong abstraction”.

As a result, the cognitive overhead of processing the verbose function (and variable) names, mapping them into the mental model I’ve been building so far, deciding which functions to dig deeper into and which to skim, and piecing together the puzzle to uncover the “big picture” becomes rather difficult.

…This has already been stated before but it bears reiterating — an explosion of small functions, especially one line functions, makes the codebase inordinately harder to read. This especially hurts those for whom the code should’ve been optimized for in the first place — newcomers….

…Simple code isn’t necessarily the easiest code to write, and rarely is it ever the DRYest code. It takes an enormous amount of careful thought, attention to detail and care to arrive at the simplest solution that is both correct and easy to reason about. What is most striking about such hard-won simplicity is that it lends itself to being easily understood by both old and new programmers, for all possible definitions of “old” and “new”.

Actually one of the best essays I’ve seen on code architecture matching my own experiences I’ve seen. I recommend reading the whole thing. 

Filed under: General

Voting now open! Bring ALA to SxSW / District Dispatch

SXSW logoFor a third year, ALA is planning for Austin’s annual South by Southwest (SXSW) festival. As in years past, we need your help to bring our programs to the SXSW stage. Public voting counts for 30 percent of SXSW’s decision to pick a panel, so please join us in voting for these two ALA programs.

YALSA Past President Linda Braun and OITP Fellow Mega Subramaniam have partnered with IMLS and Google to put on a panel called “Ready to Code: Libraries Supporting CS Education.” Here’s the description:

In the last decade, libraries have transformed, from the traditional book provider to become the community anchor where the next generation technology innovations take place. Drawing from initiatives such as the Libraries Ready to Code project and IMLS grants, this session provides perspectives from thought leaders in industry, government, universities, and libraries on the role libraries play in our national CS education ecosystem and work together with communities to support youth success. You can view the video here.

The Office for Diversity, Literacy and Outreach Services and the Office for Intellectual Freedom are partnering to offer a worshop entitled “Free Speech or Hate Speech?” Here is the quick summary:

The Supreme Court agrees with the rock group, The Slants, that their name is protected under the first amendment. An increase in uses of hate speech in the United States has sparked a new fire in the debate: Is hate speech free speech? Is it a hate crime? The lines can be blurry. We will explore the history of intellectual freedom challenges and how to respond to traumatic interactions involving hate speech that are not seen as “crimes.” See the video here.

As you might remember, in 2016, ALA and Benetech collaborated on a session about leveraging 3D printers to create new learning opportunities for students with disabilities. And, in 2015, OITP partnered with D.C. Public Library and MapStory to present an interactive panel about the ways that libraries foster entrepreneurship and creativity.

Become a registered voter in the Panel Picker process by signing up for an account and get your votes in before Friday, August 25. (Also, be sure to keyword search “library” in the Panelpicker – there are over 30 related programs!)

You will have the opportunity to “Vote Up” or “Vote Down” on all session ideas (votes will be kept private) and add comments to each page. We encourage you to use this commenting feature to show support and even engage with the voting community.

The post Voting now open! Bring ALA to SxSW appeared first on District Dispatch.

Git physical / Harvard Library Innovation Lab

This is a guest blog post by our summer fellow Miglena Minkova.

Last week at LIL, I had the pleasure of running a pilot of git physical, the first part of a series of workshops aimed at introducing git to artists and designers through creative challenges. In this workshop I focused on covering the basics: three-tree architecture, simple git workflow, and commands (add, commit, push). These lessons were fairly standard but contained a twist: The whole thing was completely analogue!

The participants, a diverse group of fellows and interns, engaged in a simplified version control exercise. Each participant was tasked with designing a postcard about their summer at LIL. Following basic git workflow, they took their designs from the working directory, through the staging index, to the version database, and to the remote repository where they displayed them. In the process they “pushed” five versions of their postcard design, each accompanied by a commit note. Working in this way allowed them to experience the workflow in a familiar setting and learn the basics in an interactive and social environment. By the end of the workshop everyone had ideas on how to implement git in their work and was eager to learn more.

Timelapse gif by Doyung Lee (

Not to mention some top-notch artwork was created.

The workshop was followed by a short debriefing session and Q&A.

Check GitHub for more info.

Alongside this overview, I want to share some of the thinking that went behind the scenes.

Starting with some background. Artists and designers perform version control in their work but in a much different way than developers do with git. They often use error-prone strategies to track document changes such as saving files in multiple places using obscure file naming conventions, working in large master files, or relying on in-built software features. At best these strategies result in inconsistencies, duplication and a large disc storage, and at worst, irreversible mistakes, loss of work, and multiple conflicting documents. Despite experiencing some of the same problems as developers, artists and designers are largely unfamiliar with git (exceptions exist).

The impetus for teaching artists and designers git was my personal experience with it. I had not been formally introduced to the concept of version control or git through my studies, nor my work. I discovered git during the final year of my MLIS degree when I worked with an artist to create a modular open source digital edition of an artist’s book. This project helped me see git as an ubiquitous tool with versatile application across multiple contexts and practices, the common denominator of which is making, editing, and sharing digital documents.

I realized that I was faced with a challenge: How do I get artists and designers excited about learning git?

I used my experience as a design educated digital librarian to create relatable content and tailor delivery to the specific characteristics of the audience: highly visual, creative, and non-technical.

Why create another git workshop? There are, after all, plenty of good quality learning resources out there and I have no intention of reinventing the wheel or competing with existing learning resources. However, I have noticed some gaps that I wanted to address through my workshop.

First of all, I wanted to focus on accessibility and have everyone start on equal ground with no prior knowledge or technical skills required. Even the simplest beginner level tutorials and training materials rely heavily on technology and the CLI (Command Line Interface) as a way of introducing new concepts. Notoriously intimidating for non-technical folk, the CLI seems inevitable given the fact that git is a command line tool. The inherent expectation of using technology to teach git means that people need to learn the architecture, terminology, workflow, commands, and the CLI all at the same time. This seems ambitious and a tad unrealistic for an audience of artists and designers.

I decided to put the technology on hold and combine several pedagogies to leverage learning: active learning, learning through doing, and project-based learning. To contextualize the topic, I embedded elements of the practice of artists and designers by including an open ended creative challenge to serve as a trigger and an end goal. I toyed with different creative challenges using deconstruction, generative design, and surrealist techniques. However this seemed to steer away from the main goal of the workshop. It also made it challenging to narrow down the scope, especially as I realized that no single workflow can embrace the diversity of creative practices. At the end, I chose to focus on versioning a combination of image and text in a single document. This helped to define the learning objectives, and cover only one functionality: the basic git workflow.

I considered it important to introduce concepts gradually in a familiar setting using analogue means to visualize black-box concepts and processes. I wanted to employ abstraction to present the git workflow in a tangible, easily digestible, and memorable way. To achieve this the physical environment and set up was crucial for the delivery of the learning objectives.

In terms of designing the workspace, I assigned and labelled different areas of the space to represent the components of git’s architecture. I made use of directional arrows to illustrate the workflow sequence alongside the commands that needed to be executed and used a “remote” as a way of displaying each version on a timeline. Low-tech or no-tech solution such as carbon paper were used to make multiple copies. It took several experiments to get the sketchpad layering right, especially as I did not want to introduce manual redundancies that do little justice to git.

Thinking over the audience interaction, I had considered role play and collaboration. However these modes did not enable each participant to go through the whole workflow and fell short of addressing the learning objectives. Instead I provided each participant with initial instructions to guide them through the basic git workflow and repeat it over and over again using their own design work. The workshop was followed with debriefing which articulated the specific benefits for artists and designers, outlined use cases depending on the type of work they produce, and featured some existing examples of artwork done using git. This was to emphasize that the workshop did not offer a one-size fits all solution, but rather a tool that artists and designers can experiment with and adopt in many different ways in their work.

I want to thank Becky and Casey for their editing work.

Going forward, I am planning to develop a series of workshops introducing other git functionality such as basic merging and branching, diff-ing, and more, and tag a lab exercise to each of them. By providing multiple ways of processing the same information I am hoping that participants will successfully connect the workshop experience and git practice.

MarcEdit 7 Z39.50/SRU Client Wireframes / Terry Reese

One of the appalling discoveries when taking a closer look at the MarcEdit 6 codebase, was the presence of 3(!) Z39.50 clients (all using slightly different codebases.  This happened because of the ILS integration, the direct Z39.50 Database editing, and the actual Z39.50 client.  In the Mac version, these clients are all the same thing – so I wanted to emulate that approach in the Windows/Linux version.  And as a plus, maybe I would stop (or reduce) my utter distain at having support Z39.50 generally, within any library program that I work with. 

* Sidebar – I really, really, really can’t stand working with Z39.50.  SRU is a fine replacement for the protocol, and yet, over the 10-15 years that its been available, SRU remains a fringe protocol.  That tells me two things:

  1. Library vendors generally have rejected this as a protocol and there are some good reason for this…most vendors that support (and I’m thinking specifically about ExLibris), use a custom profile.  This is a pain in the ass because the custom profile requires code to handle foreign namespaces.  This wouldn’t be a problem if this only happened occasionally, but it happens all the time.  Every SRU implementation works best if you use their custom profiles.  I think what made Z39.50 work, is the well-defined set of Bib-1 attributes.  The flexibility in SRU is a good thing, but I also think it’s why very few people support it, and fewer understand how it actually works.
  2. That SRU is a poor solution to begin with.  Hey, just like OAI-PMH, we created library standards to work on the web.  If we had it to do over again, we’d do it differently.  We should probably do it differently at this point…because supporting SRU in software is basically just checking a box.  People have heard about it, they ask for it, but pretty much no one uses it.

By consolidating the Z39.50 client code, I’m able to clean out a lot of old code, and better yet, actually focus on a few improvements (which has been hard because I make improvements in the main client, but forget to port them everywhere else).  The main improvements that I’ll be applying has to do with searching multiple databases.  Single search has always allowed users to select up to 5 databases to query.  I may remove that limit.  It’s kind of an arbitrary one.  However, I’ll also be adding this functionality to the batch search.  When doing multiple database searches in batch, users will have an option to take all records, the first record found, or potentially (I haven’t worked this one out), records based on order of database preference. 


Main Window:


Z39.50 Database Settings:


SRU Settings:


There will be a preferences panel as well (haven’t created it yet), but this is where you will set proxy information and notes related to batch preferences.  You will no longer need to set title field or limits, as the limits are moving to the search screen (this has always needed to be variable) and the title field data is being pulled from preferences already set in the program preferences.

One of the benefits of making the changes is that this folds the z39.50/sru client into the Main MarcEdit application (rather than as a program that was shelled to), which allows me to leverage the same accessibility platform that has been developed for the rest of the application.  It also highlights one of the other changes happening in MarcEdit 7.  MarcEdit 6- is a collection of about 7 or 8 individual executables.  This makes sense in some cases, less sense in others.  I’m evaluating all the stand-alone programs and if I replicate the functionality in the main program, then it means that while initially, having these as separate program might have been a good thing, the current structure of the application has changed, and so the code (both external and internal) code needs to be re-evaluated and put in one spot.  In the application, this has meant that in some cases, like the Z39.50 client, the code will move into MarcEdit proper (rather being a separate program called mebatch.exe) and for SQL interactions, it will mean that I’ll create a single shared library (rather than replicating code between three different component parts….the sql explorer, the ILS integration, and the local database query tooling).

Questions, let me know.


An approach to building open databases / Open Knowledge Foundation

This post has been co-authored by Adam Kariv, Vitor Baptista, and Paul Walsh.

Open Knowledge International (OKI) recently coordinated a two-day work sprint as a way to touch base with partners in the Open Data for Tax Justice project. Our initial writeup of the sprint can be found here.

Phase I of the project ended in February 2017 with the publication of What Do They Pay?, a white paper that outlines the need for a public database on the tax contributions and economic activities of multinational companies.

The overarching goal of the sprint was to start some work towards such a database, by replicating data collection processes we’ve used in other projects, and to provide a space for domain expert partners to potentially use this data for some exploratory investigative work. We had limited time, a limited budget, and we are pleased with the discussions and ideas that came out of the sprint.

One attendee, Tim Davies, criticised the approach we took in the technical stream of the sprint. The problem with the criticism is the extrapolation of one stream of activity during a two-day event to posit an entire approach to a project. We think exploration and prototyping should be part of any healthy project, and that is exactly what we did with our technical work in the two-day sprint.

Reflecting on the discussion presents a good opportunity here to look more generally at how we, as an organisation, bring technical capacity to projects such as Open Data for Tax Justice. Of course, we often bring much more than technical capacity to a project, and Open Data for Tax Justice is no different in that regard, being mostly a research project to date.

In particular, we’ll take a look at the technical approach we used for the two-day sprint. While this is not the only approach towards technical projects we employ at OKI, it has proven useful on projects driven by the creation of new databases.

An approach

Almost all projects that OKI either leads on, or participates in, have multiple partners. OKI generally participates in one of three capacities (sometimes, all three):

  • Technical design and implementation of open data platforms and apps.
  • Research and thought leadership on openness and data.
  • Dissemination and facilitating participation, often by bringing the “open data community” to interact with domain specific actors.

Only the first capacity is strictly technical, but each capacity does, more often than not, touch on technical issues around open data.

Some projects have an important component around the creation of new databases targeting a particular domain. Open Data for Tax Justice is one such project, as are OpenTrials, and the Subsidy Stories project, which itself is a part of OpenSpending.

While most projects have partners, usually domain experts, it does not mean that collaboration is consistent or equally distributed over the project life cycle. There are many reasons for this to be the case, such as the strengths and weaknesses of our team and those of our partners, priorities identified in the field, and, of course, project scope and funding.

With this as the backdrop for projects we engage in generally, we’ll focus for the rest of this post on aspects when we bring technical capacity to a project. As a team (the Product Team at OKI), we are currently iterating on an approach in such projects, based on the following concepts:

  • Replication and reuse
  • Data provenance and reproducibility
  • Centralise data, decentralise views
  • Data wrangling before data standards

While not applicable to all projects, we’ve found this approach useful when contributing to projects that involve building a database to, ultimately, unlock the potential to use data towards social change.

Replication and reuse

We highly value the replication of processes and the reuse of tooling across projects. Replication and reuse enables us to reduce technical costs, focus more on the domain at hand, and share knowledge on common patterns across open data projects. In terms of technical capacity, the Product Team is becoming quite effective at this, with a strong body of processes and tooling ready for use.

This also means that each project enables us to iterate on such processes and tooling, integrating new learnings. Many of these learnings come from interactions with partners and users, and others come from working with data.

In the recent Open Data for Tax Justice sprint, we invited various partners to share experiences working in this field and try a prototype we built to extract data from country-by-country reports to a central database. It was developed in about a week, thanks to the reuse of processes and tools from other projects and contexts.

When our partners started looking into this database, they had questions that could only be answered by looking back to the original reports. They needed to check the footnotes and other context around the data, which weren’t available in the database yet. We’ve encountered similar use cases in both and OpenTrials, so we can build upon these experiences to iterate towards a reusable solution for the Open Data for Tax Justice project.

By doing this enough times in different contexts, we’re able to solve common issues quickly, freeing more time to focus on the unique challenges each project brings.

Data provenance and reproducibility of views on data are absolutely essential to building databases with a long and useful futureData provenance and reproducibility

We think that data provenance, and reproducibility of views on data, is absolutely essential to building databases with a long and useful future.

What exactly is data provenance? A useful definition from wikipedia is “… (d)ata provenance documents the inputs, entities, systems, and processes that influence data of interest, in effect providing a historical record of the data and its origins”. Depending on the way provenance is implemented in a project, it can also be a powerful tool for reproducibility of the data.

Most work around open data at present does not consider data provenance and reproducibility as an essential aspect of working with open data. We think this is to the detriment of the ecosystem’s broader goals of seeing open data drive social change: the credible use of data from projects with no provenance or reproducibility built in to the creation of databases is significantly diminished in our “post truth” era.

Our current approach builds data provenance and reproducibility right into the heart of building a database. There is a clear, documented record of every action performed on data, from the extraction of source data, through to normalisation processes, and right to the creation of records in a database. The connection between source data and processed data is not lost, and, importantly, the entire data pipeline can be reproduced by others.

We acknowledge that a clear constraint of this approach, in its current form, is that it is necessarily more technical than, say, ad hoc extraction and manipulation with spreadsheets and other consumer tools used in manual data extraction processes. However, as such approaches make data provenance and reproducibility harder, because there is no history of the changes made, or where the data comes from, we are willing to accept this more technical approach and iterate on ways to reduce technical barriers.

We hope to see more actors in the open data ecosystem integrating provenance and reproducibility right into their data work. Without doing so, we greatly reduce the ability for open data to be used in an investigative capacity, and likewise, we diminish the possibility of using the outputs of open data projects in the wider establishment of facts about the world. Recent work on beneficial ownership data takes a step in this direction, leveraging the PROV-DM standard to declare data provenance facts.

Centralise data, decentralise views

In OpenSpending, OpenTrials, and our initial exploratory work on Open Data for Tax Justice, there is an overarching theme to how we have approached data work, user stories and use cases, and co-design with domain experts: “centralise data, decentralise views”.

Building a central database for open data in a given domain affords ways of interacting with such data that are extremely difficult, or impossible, by actively choosing to decentralise such data. Centralised databases make investigative work that uses the data easier, and allows for the discovery, for example, of patterns across entities and time that can be very hard to discover if data is decentralised.

Additionally, by having in place a strong approach to data provenance and reproducibility, the complete replication of a centralised database is relatively easily done, and very much encouraged. This somewhat mitigates a major concern with centralised databases, being that they imply some type of “vendor lock-in”.

Views on data are better when decentralised. By “views on data” we refer to visualisations, apps, websites – any user-facing presentation of data. While having data centralised potentially enables richer views, data almost always needs to be presented with additional context, localised, framed in a particular narrative, or otherwise presented in unique ways that will never be best served from a central point.

Further, decentralised usage of data provides a feedback mechanism for iteration on the central database. For example, providing commonly used contextual data, establishing clear use cases for enrichment and reconciliation of measures and dimensions in the data, and so on.

Data wrangling before data standards

As a team, we are interested in, engage with, and also author, open data standards. However, we are very wary of efforts to establish a data standard before working with large amounts of data that such a standard is supposed to represent.

Data standards that are developed too early are bound to make untested assumptions about the world they seek to formalise (the data itself). There is a dilemma here of describing the world “as it is”, or, “as we would like it to be”. No doubt, a “standards first” approach is valid in some situations. Often, it seems, in the realm of policy. We do not consider such an approach flawed, but rather, one with its own pros and cons.

We prefer to work with data, right from extraction and processing, through to user interaction, before working towards public standards, specifications, or any other type of formalisation of the data for a given domain.

Our process generally follows this pattern:

  • Get to know available data and establish (with domain experts) initial use cases.
  • Attempt to map what we do not know (e.g.: data that is not yet publicly accessible), as this clearly impacts both usage of the data, and formalisation of a standard.
  • Start data work by prescribing the absolute minimum data specification to use the data (i.e.: meet some or all of the identified use cases).
  • Implement data infrastructure that makes it simple to ingest large amounts of data, and also to keep the data specification reactive to change.
  • Integrate data from a wide variety of sources, and, with partners and users, work on ways to improve participation / contribution of data.
  • Repeat the above steps towards a fairly stable specification for the data.
  • Consider extracting this specification into a data standard.

Throughout this entire process, there is a constant feedback loop with domain expert partners, as well as a range of users interested in the data.


We want to be very clear that we do not think that the above approach is the only way to work towards a database in a data-driven project.

Design (project design, technical design, interactive design, and so on) emerges from context. Design is also a sequence of choices, and each choice has an opportunity cost based on various constraints that are present in any activity.

In projects we engage in around open databases, technology is a means to other, social ends. Collaboration around data is generally facilitated by technology, but we do not think the technological basis for this collaboration should be limited to existing consumer-facing tools, especially if such tools have hidden costs on the path to other important goals, like data provenance and reproducibility. Better tools and processes for collaboration will only emerge over time if we allow exploration and experimentation.

We think it is important to understand general approaches to working with open data, and how they may manifest within a single project, or across a range of projects. Project work is not static, and definitely not reducible to snapshots of activity within a wider project life cycle.

Certain approaches emphasise different ends. We’ve tried above to highlight some pros and cons of our approach, especially around data provenance and reproducibility, and data standards.

In closing, we’d like to invite others interested in approaches to building open databases to engage in a broader discussion around these themes, as well as a discussion around short term and long term goals of such projects. From our perspective, we think there could be a great deal of value for the ecosystem around open data generally – CSOs, NGOs, governments, domain experts, funders – via a proactive discussion or series of posts with a multitude of voices. Join the discussion here if this is of interest to you.

Problems with Authority / Library Tech Talk (U of Michigan)

A graph of organization nodes and edges depicting the United States Federal bureaucracy.

MARC Authority records can be used to create a map of the Federal Government that will help with collection development and analysis. Unfortunately, MARC is not designed for this purpose, so we have to find ways to work around the MARC format's limitations.

The Copyright Office belongs in the Library of Congress / District Dispatch

In “Lessons From History: The Copyright Office Belongs in the Library of Congress,” a new report from the American Library Association (ALA), Google Policy Fellow Alisa Holahan compellingly documents that Congress repeatedly has considered the best locus for the U.S. Copyright Office (CO) and consistently reaffirmed that the Library of Congress (Library) is its most effective and efficient home.

the James Madison Building

The U.S. Copyright Office is located in the James Madison Memorial Building of the Library of Congress in Washington, D.C. Photo credit: The Architect of the Capitol

Prompted by persistent legislative and other proposals to remove the CO from the Library in both the current and most recent Congresses, Holahan’s analysis comprehensively reviews the history of the locus of copyright activities from 1870 to the present day. In addition to providing a longer historical perspective, the Report finds that Congress has examined this issue at roughly 20-year intervals, declining to separate the CO and Library each time.

Notable developments occurred, for example, in the deliberations leading to the Copyright Act of 1976. In particular, there was argument made that the CO performs executive branch functions, and thus its placement in the legislative branch is unconstitutional. The 1976 Act left the U.S. Copyright Office in the Library. Moreover, in 1978, the U.S. Court of Appeals for the Fourth Circuit in Eltra Corp. v. Ringer directly addressed this constitutionality question. It found no constitutional problem with the CO’s and Library’s co-location because the Copyright Office operates under the direction of the Librarian of Congress, an appointee of the president.

Holahan also notes another challenge via the Omnibus Patent Act of 1996, which proposed that copyright, patent and trademark activities be consolidated under a single government corporation. This Act was opposed by then-Register of Copyrights Marybeth Peters and then-Librarian of Congress James Billington, as well as an array of stakeholders that included the American Society of Composers, Authors and Publishers (ASCAP); American Society of Journalists and Authors; as well as the library, book publishing and scholarly communities. This legislation was not enacted, thereby leaving the placement of the Copyright Office unchanged.

The neutral question that launched this research was to identify anything of relevance in the historical record regarding the placement of the Copyright Office. ALA recommends Holahan’s research (refer to her full report for additional historical milestones and further details) to anyone contemplating whether the Register of Copyrights should be appointed by the President or whether the Copyright Office should be relocated from the Library.

In a nutshell, these questions have been asked and answered the same way many times already: “it ain’t broke, so don’t fix it.” Holahan’s research and report will inform ALA’s continuing lobbying and policy advocacy on these questions as we work to protect and enhance copyright’s role in promoting the creation and dissemination of knowledge for all.

The post The Copyright Office belongs in the Library of Congress appeared first on District Dispatch.

Jobs in Information Technology: August 9, 2017 / LITA

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week

Oregon State University Libraries and Press, Library Technician 3, Corvallis, OR

New York University Division of Libraries, Supervisor, Metadata Production & Management, New York, NY

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

Customizing Ranking Models in Solr to Improve Relevance for Enterprise Search / Lucidworks

As we countdown to the annual Lucene/Solr Revolution conference in Las Vegas next month, we’re highlighting talks and sessions from past conferences. Today, we’re highlighting Salesforce’s Ammar Haris & Joe Zeimen’s talk, “Customizing Ranking Models in Solr to Improve Relevance for Enterprise Search”.

Solr provides a suite of built-in capabilities that offer a wide variety of relevance related parameter tuning. Index and/or query time boosts along with function queries can provide a great way to tweak various relevance related parameters to help improve the search results ranking. In the enterprise space however, given the diversity of customers and documents, there is a much greater need to be able to have more control over the ranking models and be able to run multiple custom ranking models.

This talk discusses the motivation behind creating an L2 ranker and the use of Solr Search Component for running different types of ranking models at Salesforce.

lucenerevolution-avatarJoin us at Lucene/Solr Revolution 2017, the biggest open source conference dedicated to Apache Lucene/Solr on September 12-15, 2017 in Las Vegas, Nevada. Come meet and network with the thought leaders building and deploying Lucene/Solr open source search technology. Full details and registration…

The post Customizing Ranking Models in Solr to Improve Relevance for Enterprise Search appeared first on Lucidworks.

The Transformation of Academic Library Collecting / HangingTogether

The Transformation of Academic Library Collecting

In October 2016, I was privileged to attend a seminal event, The Transformation of Academic Library Collecting: A Symposium Inspired by Dan C. Hazen, along with colleagues Lorcan Dempsey and Constance Malpas who were speaking. This occasion brought together a group of eminent library leaders, research collections specialists and scholars at Norton’s Woods Conference Center in Cambridge, MA, to commemorate the career of Dan Hazen (1947–2015) and reflect upon the transformation of academic library collections. Hazen was a towering figure in the world of research collections management and was personally known to many attendees; his impact on the profession of academic librarianship and the shape of research collections is widely recognized and continues to shape practice and policy in major research libraries.

Sarah Thomas (Vice President for the Harvard Library and University Librarian & Roy E. Larsen Librarian for the Faculty of Arts and Sciences) and other colleagues had done a remarkable job not only selecting speakers but designing an event that allowed for discussion and reflection. We felt that the event needed to be documented in some way, and were pleased that Sarah endorsed this idea. The resulting publication, The Transformation of Academic Library Collecting: A Synthesis of the Harvard Library’s Hazen Memorial Symposium, is now freely available from our website.

Drawing from presentations and audience discussions at the symposium, this publication examines of some central themes important to a broader conversation about the future of academic library collections, in particular, collective collections and the reimagination of what have traditionally been called “special” and archival collections (now referred to as unique and distinctive collections). The publication also includes a foreword about Dan Hazen and his work by Sarah Thomas.

The Transformation of Academic Library Collecting: A Synthesis of the Harvard Library’s Hazen Memorial Symposium is not only a tribute to Hazen’s impact on the academic library community, but also a primer on where academic library collections could be headed in the future. We hope you will read, share, and use this as a basis for continuing an important conversation.

IMLS Leadership grants & Laura Bush grants available / District Dispatch

The Institute of Museum and Library Services (IMLS) recently announced the availability of two grant opportunities for libraries through the National Leadership Grants for Libraries and the Laura Bush 21st Century Librarian programs.The Institute of Museum and Library Services (IMLS) recently announced the availability of two grant opportunities for libraries through the National Leadership Grants for Libraries (NLG) and the Laura Bush 21st Century Librarian (LB21) programs. The deadline to submit grant proposals is September 1, 2017, and awards will be announced in January 2018. NLG and LB21 programs are funded through the Library Services and Technology Act (LSTA) administered by IMLS.

Libraries are encouraged to apply for these funding opportunities. An increase in applications for these programs would send a signal to Congressional appropriators, and the Administration, that these grants are needed in communities across the country. Earlier this year, the President proposed eliminating both grant programs for FY2018, cutting $13.4 million for NLG and $10.0 million for LB21. The House Appropriations Committee rejected the President’s request and in July provided funding for both programs at their FY2017 levels. The full House is expected to vote on the funding bill that includes these programs in September, as is the key Senate Subcommittee and Committee with jurisdiction over both.

The NLG program invests in projects that address challenges and opportunities faced by libraries. Work funded often produces creative and valuable new tools, research findings and models that can be widely used and have national impact. The LB21 program supports “human capital projects” for libraries and librarians. It is intended to help produce a diverse workforce of librarians to better meet the changing learning and information needs of the American public.

IMLS has announced that the next round of NLG and LB21 grants will support three kinds of projects:

  • Community Anchors – projects that advance the role of libraries (and library professionals) as community anchors that foster community partnerships to encourage civic and cultural engagement, community dialogue, lifelong learning, promote digital inclusion and support local economies;
  • National Digital Platform – projects or professionals that create, develop, and expand digital content and services in communities; and
  • Curating Collections – projects or professionals that further preservation and the management of digital library collections.

For more information about the grant guidelines, as well as examples of previously awarded grants, visit IMLS’ NLG or the LB21 pages. IMLS also has posted informational webinars to answer potential applicants’ questions.

Grant requests will be peer-reviewed and must be submitted online by September 1, 2017, with all required documents through In FY2017, approximately 25% of grant requests were funded. The next grant cycle for NLG and LB21 will be announced in December.

The post IMLS Leadership grants & Laura Bush grants available appeared first on District Dispatch.

Stories: Interesting projects I worked on this past year / Eric Lease Morgan

This is short list of “stories” outlining some of the more interesting projects I worked on this past year:

  • Ask Putin – A faculty member from the College of Arts & Letters acquired the 950-page Cyrillic transcript of a television show called “Ask Putin”. The faculty member had marked up the transcription by hand in order to analyze the themes conveyed therein. They then visited the Center for Digital Scholarship, and we implemented a database version of the corpus. By counting & tabulating the roots of each of the words for each of the sixteen years of the show, we were able to quickly & easily confirm many of the observations she had generated by hand. Moreover, the faculty member was able to explore additional themes which they had not previously coded.
  • Who’s related to whom – A visiting scholar from the Kroc Center asked the Center for Digital Scholarship to extract all of the “named entities” (names, places, & things) from a set of Spanish language newspaper articles. Based on strength of the relationships between the entities, the scholar wanted a visualization to be created illustrating who was related to whom in the corpus. When we asked more about the articles and their content, we learned we had been asked to map the Columbian drug cartel. While incomplete, the framework of this effort will possibly be used by a South American government.
  • Counting 250,000,000 words – Working with Northwestern University, and Washington University in St. Louis, the Center for Digital Scholarship is improving access & services against the set of literature called “Early English Books”. This corpus spans 1460 and 1699 and is very representative of English literature of that time. We have been creating more accurate transcriptions of the texts, digitizing original items, and implementing ways to do “scalable reading” against the whole. After all, it is difficult to read 60,000 books. Through this process each & every word from the transcriptions has been saved in a database for future analysis. To date the database includes a quarter of a billion (250,000,000) rows. See:
  • Convocate – In conjunction with the Center for Civil and Human Rights, the Hesburgh Libraries created an online tool for comparing & contrasting human rights policy written by the Vatican and various non-governmental agencies. As a part of this project, the Center for Digital Scholarship wrote an application that read each & every paragraph from the thousands of pages of text. The application then classified each & every paragraph with one or more keyword terms for the purposes of more accurate & thorough discovery across the corpus. The results of this application enable the researcher to items of similar interest even if they employ sets of dispersed terminology. For more detail, see:

We Used Problem-Based Learning in Library Instruction and Came to Question Its Treatment of Students / In the Library, With the Lead Pipe

In Brief:

Two instruction librarians at a medium-sized liberal-arts college on the East Coast of the United States replaced their lecture-style teaching with Problem-Based Learning (PBL). They collaborated with two English instructors to bring PBL to a two-session sequence of library instruction. However, the more they used PBL, and the more they read about how other instruction librarians had employed it, the more they came to see how problematic it can be—especially in its failure to see students as teachers. In this article, you will consider if Problem-Based Learning needs a refresh with critical pedagogy.

In the studies that followed Pedagogy of the Oppressed, I sought for more clarity as I attempted to analyze student-teacher relations. I have insisted on making it clear that teachers and students are different, but if the teacher has opted for democracy, he or she cannot allow this difference to become antagonistic. This means that he or she must not allow his or her authority to become authoritarian. 1

How we found problem-based learning

We were two librarians at a medium-sized liberal-arts school on the East Coast of the United States when we began to replace our lecture-style information literacy instruction with Problem-Based Learning (PBL). At our college, there were about 4,000 undergraduate students, and at our library, we worked with thirty-one other librarians and staff, with ten of the librarians doing around 180-200 sessions of instruction each academic year.

Before we experimented with PBL, nearly all our instruction was one-shot and lecture-style. The ten of us who taught had a checklist of things we should cover in a fifty or seventy-five-minute session. This checklist started with library basics like how to contact librarians, find hours of operation, and check a library account. It then moved on to simple catalog and database searching before getting to specialized, subject-specific LibGuides we had made.2

We taught in a computer lab in the library, where we would stand at the front of the room, project the computer screen, and lead students through our list. We requested that students log onto computers and follow along as we went through our steps, but whenever any other librarians sat in the back of the room, perhaps to observe our teaching, they couldn’t help but notice that not all of the students took our lead. Instead, some students would go to Google or perhaps use library resources (like specialized databases) that weren’t being demonstrated at the time. Other students wouldn’t log on at all and would be using their phones. If we got through our program in thirty-five or forty minutes, we’d let the students know they could use the rest of the time—ten or fifteen minutes—to search on their own. Sometimes we would rove around the room at that point, and other times we wouldn’t.

This method of teaching worked for us because it was reproducible and predictable. By having one checklist that performed well enough for all classes, we didn’t have to come up with new or drastically different lesson plans; and by providing only one-shot instruction, we didn’t have to worry about covering twice as many (or maybe even three times as many) sessions. We were, after all, part of a small group of instruction librarians, a fact that constrained just how much we could experiment with additional teaching while shouldering other responsibilities. Besides, in the post-instruction surveys we sent to faculty and instructors, they either praised us or had no gripes. They were satisfied with what we were doing, and the instruction requests, on the order of 190 a year, kept rolling in.

Although we knew lecture-style, one-shot instruction worked well enough from our perspective (the librarian’s view), and though we knew faculty and instructors had no qualms with it, we had little idea about how the students felt. We were not sending post-instruction surveys to them, and in the last ten or fifteen minutes of a session, when we interacted with them directly, they weren’t commenting on our teaching methods, and we weren’t asking them for advice or critique.

It would have been easy to continue what we were doing, but the lack of student participation and feedback bothered us. We were familiar with ideas from critical pedagogy, and in many ways we realized our teaching epitomized Paulo Freire’s “banking concept” of education. To define that term, Freire writes:

In the banking concept of education, knowledge is a gift bestowed by those who consider themselves knowledgeable upon those whom they consider to know nothing. Projecting an absolute ignorance onto others, a characteristic ideology of oppression, negates education and knowledge as a process of inquiry.3

The perniciousness of the banking concept isn’t just that teachers view students as empty or in deficit; it’s also that teachers fail to make space for students to learn in ways that call upon their unique lived experiences. Ira Shor, another practitioner of critical pedagogy insists that “Students are creative, intelligent beings, not plants or blank slates or pegboards for teacherly hammering.”4 And bell hooks, perhaps the most influential critical pedagogy theorist of them all, goes further by writing about pushback from teachers whose students seek to upend the banked class:

During my twenty years of teaching, I have witnessed a grave sense of dis-ease among professors (irrespective of their politics) when students want us to see them as whole human beings with complex lives and experiences rather than simply as seekers after compartmentalized bits of knowledge.5

Passages like the ones above, with their clear, direct language, helped us define our vague but persistent worries about not involving students in the learning process. We saw that our methods of instruction were shutting down opportunities for mutual teaching and learning and that, in effect, as students walked into our computer lab, we were asking them to check themselves at the door and leave their histories, their identities, and their knowledge behind. Although critical pedagogy began to transform our thinking and served as our catalyst, strangely enough the instruction we designed next wasn’t based on the ideas of Freire, Shor, or hooks. Instead, because Problem-Based Learning was popular in library journals at the time, and perhaps because it superficially reminded us of critical pedagogy, we turned to it. Only later, when reflecting on the effectiveness of Problem-Based Learning, would we return to critical pedagogy.

Problem-Based Learning—From Medical School to Libraries

Problem-Based Learning started in Ontario, Canada, at McMaster University in 1969, a year before Paulo Freire published Pedagogy of the Oppressed. It was developed by faculty in McMaster’s Health Sciences and was used in the medical school. Teachers observed that “students were disenchanted and bored with their medical education because they were saturated by the vast amounts of information they had to absorb, much of which was perceived to have little relevance to medical practice.”6 To address their concerns, they created PBL, a drastically different alternative based on these principles:

  • Learning Is Student-Centered
  • Learning Occurs in Small Student Groups
  • Teachers Are Facilitators or Guides
  • Problems Form the Organizing Focus and Stimulus for Learning
  • Problems Are a Vehicle for the Development of Clinical Problem-Solving Skills
  • New Information Is Acquired Through Self-Directed Learning.7

Because Problem-Based Learning and critical pedagogy started almost simultaneously and because they share certain features—like an emphasis on problem-solving and an aim of upending hierarchies—we’ve wondered if their theorists and practitioners were at all in conversation with one another. “Was there any exchange?” we’ve asked ourselves. Were there professional networks that bridged North America and South America? We haven’t been able to find any evidence of collaboration, so perhaps these ways of teaching and learning simply happened coincidentally and in parallel. Nevertheless, as we first read about PBL’s origin and history, we couldn’t help but to recall passages of Freire’s like this:

Those truly committed to liberation must reject the banking concept in its entirety, adopting instead a concept of women and men as conscious beings, and consciousness as consciousness intent upon the world. They must abandon the educational goal of deposit-making and replace it with the posing of the problems of human beings in relation with the world.8

But even as we noticed similarities between critical pedagogy and PBL, we also began to pick up on stark differences, most of which concern the very definition of what it means to be a student or a teacher. In PBL, for instance, roles are clearly defined in that the learning environment should be focused on people identified as “students”; that is, education should be “student-centered.” Furthermore, anyone who had defined themselves as a “teacher” should ideally switch their role and behavior to that of a “facilitator” or “guide.”

A practitioner of critical pedagogy like Freire, in contrast, is wary of saddling anyone with roles that not only simplify but also dichotomize. He writes:

Through dialogue, the teacher-of-the-students and the students-of-the-teacher cease to exist and a new term emerges: teacher-student with students-teachers. The teacher is no longer merely the-one-who-teaches, but one who is himself taught in dialogue with the students, who in turn while being taught also teach.9

These divergent ways of understanding what it means to be a “teacher” or a “student” became more pronounced and concerning to us as, much later, we got deeper into our own use of Problem-Based Learning and delved further into the writings of instruction librarians who had tried it. But early on, when we were gathering information about PBL in libraries, we weren’t yet critical of it. Instead, we were simply learning about something new—like that it was in the late 1990s and early 2000s that librarians began to employ PBL and tout its efficacy. In articles from journals, we read about librarians who raise the importance of problems and case studies used in instruction10, 11, 12 and who, like the faculty at McMaster University, choose to redefine themselves as “tutors,” “guides,” “facilitators,” and “coaches.”13, 14, 15 In addition, “student-centered” learning becomes the fashionable term16, 17, 18 and small-group learning is what’s preferable.19, 20, 21 Oftentimes, these groups have to “report to” or “debrief” the rest of the class.

Although these methods are similar to the original McMaster program and to one another, too, there are also important differences amongst them. For example, some librarians claim PBL can be accomplished with a one-shot session22, 23, 24 while others argue that far more time is required.25, 26, 27 Some show librarians working closely with faculty and instructors28, 29, 30 while others show very little interaction or none at all.31, 32 Beyond all these similarities and differences about PBL in library journals, there is also a pronounced pattern in which some librarians and faculty members—the newly self-labeled “facilitators” and “guides”—characterize students in broadly negative ways, especially before intervening with PBL. Regarding students, they write lines like:

When [students] do not get what they think they are looking for immediately, they are apt to say, “This database sucks,” discard it, and try another source—just as uncritically as they might decide just to try another Web site or another store at the mall when they do not immediately find something for which they are shopping.33

[PBL] is also a great option for teaching information literacy to uninterested students who believe they already know how to find the information they need.34

We also began to better understand that students who are not given any support or structure to begin the research process only use what they already know (the Internet), because the unknown is too unknown or too invisible to their untrained eye.35

But we must take care to not move too quickly and must be sure to connect important concepts previously learned with new concepts to be learned. It is soon apparent that students who have not used and identified appropriate information sources will always be unable to identify believable solutions to real-life problems and to move beyond what they already know.36

We don’t recall if we took issue with these lines when we first read them. At the time, we were combing the articles for ideas, hoping they would help us put together instruction that would break us out of our banking-concept-based checklist. Later, though, as we began to work with PBL, and as we went back to these articles, we couldn’t help but to feel that statements like the ones above betrayed a contradiction in how librarians use PBL in instruction and write about it after the fact. Simply put, what’s the point of “student-centered” learning that dismisses where students are coming from? And what are we to do if, after our own forays into PBL, we can’t agree with such characterizations of students?

A Quick Sketch of What Our Problem-Based Learning Looked Like:

In putting together our version of Problem-Based Learning instruction, we elected to borrow elements from what several other librarians had tried as opposed to inventing something wholly new or recreating but one method exactly. This is what we put together:

  1. Similar to what Barbara Kenney37 recommends, before the PBL instruction began, we met with the instructor of the class to go over their assignment, settle on library resources to include in instruction, and come up with three problems. For our problems, instead of using case studies, which are what Kenney employed, we chose to go with questions.
  2. For the first of two fifty-minute instruction sessions, we started by asking the students about their assignment. After a brief discussion, we then presented them with the first question we brainstormed with their instructor. Next, with the students’ input, we discussed how the question could be broken up into keywords before we demonstrated how it could be plugged into two library resources. (Some librarians who have used PBL in instruction provide orientation about research materials38, 39, 40 while others, to great effect, choose to have students figure them out on their own.41, 42 ) After that, the students did their own searches with that same question while we roved around the room with their instructor, encouraging students to speak with us and with one another. We then demonstrated two more library resources before, again, the students did their own searches. Although this session did not include group work and reporting out, which are often components of PBL, we believe what we did here still fits the PBL description because problems were, in fact, the driver of the session. Furthermore, in some library articles about PBL, group work is not featured at all.43, 44
  3. In the second of two fifty-minute sessions, we did a quick review of what we did in the first session before we broke up the students into four groups. Two of the groups got the second question of the three that we had developed with their instructor, and two of the groups got the third question. For about twenty minutes, the groups explored their questions while we and their instructor circuited the room, speaking with them and listening them talk to one another. Then, for the final twenty minutes, the groups took the role of the teacher and taught us what they had learned. (For reasons we’ll get to later, we prefer to characterize what the students did at the end as teaching, not “reporting” or “debriefing.”)

A Closer Look At Our Use of Problem-Based Learning and What Came Out of it

Meeting with Instructors before Instruction:

In order to come up with problems to pose in two sessions of library instruction, we opted to team up with two instructors who were teaching English composition classes. Both of them were introducing assignments in which their students would have to identify a problem—it could be a local, national, or global issue—and propose a feasible solution by way of argumentation and research. Speaking with these instructors, we agreed to come up with three questions for the sequence of instruction. We decided to come up with questions, as opposed to problems or case studies, because both instructors stressed inquiry in their classes, and they believed that in order to come up with a good argument or solution, you must first be able to raise questions you genuinely want to know the answer to. Having decided on questions as our focus in Problem-Based Learning, we collaborated to come up with some that we not only thought worked with the assignment but also provoked us into feeling we’d want to write problem-solving papers of our own. These are the questions we came up with:

  • How does poverty affect the learning of college-age students in the US?
  • What are colleges doing to be more energy efficient? What, exactly, could our school do to be more energy efficient?
  • What do colleges do to provide resources for students who live off campus or who commute to campus? What does our school do? Should our school do something more?
  • At colleges, what are the attitudes toward terms like “safe space,” “microaggression,” and “trigger warning”?
  • Are there colleges that provide rape kits—and the professionals (SANEs) [Sexual Assault Nurse Examiners] trained to administer them—for their students? Should our college provide these?45
  • What are colleges doing to address the cost of textbooks?
  • What actions are being taken to stem the rise of human trafficking in Europe and the US?
  • How is funding and security protection at UNESCO world heritage sites coordinated/handled?

Looking at these questions now, and having read more carefully about PBL, we see that some of them aren’t truly the open-ended or “ill-structured problems” that are preferred.46 To make these questions more open-ended, we should have taken better care to ensure their answers aren’t just bits of information found on websites or in databases. For example, our question “What are colleges doing to address the cost of textbooks?” simply requires students to discover something—like the initiative the University of Massachusetts, Amherst, has put forward with Open Educational Resources. However, if we had described how costly textbooks can be on our own campus and asked students what can be done about that problem, then that would have been a much more open-ended question or an “ill-structured” problem.

Despite the fact that we could have improved our questions, we did find it was worthwhile to spend time with instructors, review assignments together, think about lesson planning, and come up with questions that were important to us. Oftentimes, theorists in critical pedagogy write about the power differential between teachers and students, but they don’t mention the oppressive dynamics that can exist between educators working together, like librarians and instructors or faculty. In these planning meetings we had, the instructor was not dictating a lesson plan to us, and we weren’t imposing banking-concept methods on them by assuming they didn’t know anything about worthwhile library instruction. Thinking about Paolo Freire, bell hooks writes, “There was this one sentence of Freire’s that became a revolutionary mantra for me: ‘We cannot enter the struggle as objects in order later to become subjects.’”47 By working closely with each other from the beginning, we were ensuring we’d both be actors—not the acted upon—in the classroom, and we hoped that students would also feel like agents once the instruction began.

The First Session of Instruction:

To describe our teaching method in this first session, which took place in our library’s computer lab and was only fifty minutes long, we came to refer to it as a “verse-chorus-verse” approach in that we would alternate between starting a discussion or demonstrating a resource for no more than five minutes before speaking about something or working together. So, after having introduced ourselves and given the students a few minutes to fill out a quick pre-instruction survey, we encouraged people to speak by asking them about what they knew about their problem-solving assignment.

In one class, the students said they weren’t sure what the assignment was because they had only just received it. We next looked at the assignment together, and one student identified that a peer-reviewed article was a requirement.

“What does ‘peer-reviewed’ mean?” we asked everyone.

One student ventured it’s when people in your class read a draft of your work. The instructor then stepped in and said that, yes, that’s one way of thinking about peer review but that another is a particular genre feature of scholarly writing. She went further by describing the process of the academic peer review.

In another class, when we were again speaking about components of the assignment, one student said they could cite newspaper articles in their writing. “When citing news sources, is there anything you have to be careful of?” we asked everyone. One person said newspapers can be biased, and as we continued to ask questions about bias, students mentioned that sources like The New York Times could be considered more liberal, while The Wall Street Journal was more conservative.

Next, we presented one of the questions we had concocted with the instructor. We explained that we thought the question we had come up with went well with the problem-solving assignment and that it was something we were truly curious about and wanted to investigate with their help. Breaking up our question into keywords, we next used that language to demonstrate how a couple databases (Academic Search Complete and JSTOR) worked before encouraging the students to use that same question—as well as those keywords and other language they generated—to conduct searches of their own. After that, we showed how ProQuest Newsstand (or Literati by Credo) operates and also shared some site-specific Googling tips before, again, opening up the time for students to do their own searches, keeping our original question in mind.

When the students were doing their own searches and experimenting with databases and the open internet, we and the instructors roved around the room, checking in with people individually and listening if students were speaking to one another. In a session in which students were using the question “How does poverty affect the learning of college-age students in the US?” we noticed one student had Googled a thesaurus in order to come up with synonyms for “poverty.” Another student, comparing Academic Search Complete and JSTOR, told us and other students around her that she believed JSTOR’s Boolean operators did not work as well as Academic Search Complete’s. She wanted to know why JSTOR broke up her terms if she had stipulated she wanted articles that must include both “college” AND “poverty.” In that moment, we said we didn’t know the answer but that, perhaps, we could look it up and encouraged her to do the same.

In a first session with another class, when we were exploring the question “What are colleges doing to be more energy efficient? What, exactly, could our school do to be more energy efficient?” a student found that “going green” was a useful term to use in ProQuest Newsstand.

Another student conducting a site-specific Google search with “” found a university webpage from 2007. “Is this source too old?” he said. We asked him what he thought, and he came to the conclusion that it depends on what you’re writing about.

Another student was finding material in ProjectMUSE, a resource we didn’t demonstrate but that they found serendipitously on one of our library’s webpages.

Yet another student discovered she could start with a search in Literati–she was looking for reference material about “green energy”–only to get offered links that would take her to academic articles in JSTOR. She pointed all this out to us when we visited.

Although these are admittedly anecdotes (and clearly not generalizable to a particular population) for us they confirmed well enough that demonstrating resources for fifteen or twenty minutes and having students do their own searches for thirty or thirty-five minutes is a far better, far more active use of time than the other way around. In the past, when we had lectured to students and worked through our checklist, we could go through most of the session without hearing a single student voice. With Problem-Based Learning—even our modified version of it that, in this first session, didn’t include group work—the focus can be on a question instead of a lecturer. What this means is that there was much more time for everyone to participate, and people were speaking with us, their instructor, and one another from minute one. Deborah Cheney48 and Barbara Kenney49 have both written about how PBL brings about more interaction between the students, librarians, and instructors, and this is undoubtedly what we experienced as well.

Furthermore, moments like the ones above illustrated that first-year college students do have much to share and contribute when it comes to library research. As Ira Shor insisted, they aren’t plants, they aren’t pegboards; like anyone they are intelligent, curious humans who bring rich, varied experiences with them to the classroom.

As Eric Hines and Samantha Hines plainly state, “It is a commonly held opinion among teaching faculty that the average college student lacks sufficient skill and training in critical thinking and information literacy”50 but that wasn’t our belief or that of the instructors we worked with. Even if we had held that grim view, how could we have failed to notice what the students brought to this first session of instruction? Certainly, they showed they had something to learn when it came to the specific resources we were working with and the conventions of academic writing and research, but they also repeatedly proved they arrived to the session not as blank slates but as diaries full of lived experiences, many of which already equipped them with a framework for analyzing information. Thinking about these examples, we also can’t help but to return to Cheney’s words:

It is soon apparent that students who have not used and identified appropriate information sources will always be unable to identify believable solutions to real-life problems and to move beyond what they already know.51 [Emphasis ours]

Again, this just wasn’t what we observed, and even for the students who weren’t immediately able to find “appropriate information sources” related to the question we provided, it was obvious that what they already knew (that is, their lives, their memories, their experiences) wasn’t something to dismiss or discount. Having read Cheney’s article, we have only respect for how dynamically she teams up with faculty members, and her instructional design and teaching are clearly powerhouse, but in passages like the one above, she betrays something about PBL that we came across more than a few times in library journals; namely that while PBL might appear progressive and “student-centered” on the surface, it can still harbor banking-concept beliefs. To come back to Freire again, this happens in particular when students are seen only as objects to control—as “students of the teacher”—as opposed to “students teachers.” Really, for PBL in libraries to be truly effective, we have come to believe its tenets must be revised and paired with concepts from critical pedagogy.

The Second Session of Instruction:

After the first session of library instruction driven by Problem-Based Learning, no more than a week later, we met for a second fifty-minute session of PBL. To start this second session, which was in the students’ classroom and not the library’s computer lab, we spent about five to ten minutes reviewing resources from the last time—things like Academic Search Complete, JSTOR, ProQuest Newsstand, Literati by Credo, and site-specific Googling—before we broke the students up into four groups. We gave two of the groups a question we had prepared with their instructor during our initial planning meeting and the other two groups a different question. The students had twenty minutes to investigate their questions (they had brought laptops), and for the last twenty minutes of the class, each group stood up, embodied the role of the teacher, and taught us about what they had found, using the classroom’s desktop computer, big screen, and projector.

Similar to what other instruction librarians had done, we chose to have more than one session of Problem-Based Learning because we felt it was impossible for students to explore questions meaningfully in a one-shot sliver of time. Barbara Kenney, who argues that PBL can be done in a one-shot, insists that instruction librarians can make good use of a fifty-minute session “By creatively designing an instruction plan that relies on defined goals and objectives based on a problem that captures student interest.”52 However, she doesn’t show how this can be done in less than an hour because, when she gives an example of her own, it’s in an eighty-minute block of time.53

In another article that seeks to show PBL can be done in one shot, Katelyn Angell and Katherine Boss describe a program that’s more of a scavenger hunt (one branded “The Amazing Library Race”) where students search for answers to prompts like “Look in the library catalog for any books written by Jay-Z. Write down the call number of the book.”54 Such prompts are even more closed-ended and less “ill-structured” than the ones we had come up with and don’t push people into thinking about problems beyond how a catalog works, where books are shelved, and whether or not they’ll place in a competition. Furthermore, in “The Amazing Library Race,” although the students did work in groups, it was more to compete against instead of teach one another. The groups never had a chance to “report” to others or “debrief” what they discovered because they were pitted against one another. Really, though this event had elements of PBL in it (that is, problems and group work) it has more the appearance of an active-learning activity.

The more we conducted our own sequenced sessions of Problem-Based Learning, the more we saw that even 100 minutes weren’t enough; we would have preferred to have at least one more fifty-minute session. And the more we employed PBL, the more we valued its emphasis on group work, although we eventually came to see it much differently from other instruction librarians, not to mention the original architects of PBL at McMaster University. In the paragraph above, we chose to put quotes around words like “report” and “debrief” not just because we’re citing language that other instruction librarians prefer but because we want to distance ourselves from those verbs. In the final twenty minutes in which the students shared their findings, we could not say they were “debriefing” us, which is language that’s absurdly corporatized, even militarized. No, they were teaching us and each other. When Alexius Smith Macklin writes, “In PBL, there is no teacher, per se”55 we believe they are nullifying students who are also teachers. Responding to Smith Macklin, we would say, “In PBL, everyone has the chance to be a teacher or a student. Roles are not fixed but fluid.”

Again, this is not to discount how skilled instruction librarians clearly are in the PBL articles we read or to imply they don’t care about students. In Kate Wenger’s “Problem-Based Learning and Information Literacy,” she describes a beautifully designed series of five seventy-five minute sessions of PBL.56 While we aspire to teach as well as Wenger does in that sequence, we also find it strange she uses no form of the verb “teach” when describing the students’ actions. (Instead, the students “recorded,” “addressed,” “spent time,” “discussed,” “felt,” “gave,” “responded,” “started to develop,” and, for a second time, “gave.”57 ) What’s more, on the part of Wenger or the faculty member she worked with, there is no admission of having learned from the students. The student are never seen as teachers.

So it’s a great contrast in perspective that as we and the instructor roved around the room in the first half of this second session of instruction, we recognized that students were actively working with and teaching one another, not to mention us. For example, as one group of four students embarked on researching the question “What actions are being taken to stem the rise of human trafficking in Europe and the US?”, one student suggested they would start looking up terms in Literati and asked if someone else would like to give Academic Search Complete a try. When we checked back with this group a few minutes later, we noticed that the student who was using Literati had found its “Mind Map,” which is a feature that visually links words in the shape of a web. At the center of this student’s web was “human trafficking,” and they were sharing other language the program had generated with the rest of the group.

The student who had been using Academic Search Complete now wanted to know how you can figure out if an article is peer-reviewed or not. We said that, to start, in its results screen, you can check a box that says “Scholarly/Peer-Reviewed.” Another student in the group noted that JSTOR doesn’t have that checkbox.

“So how do you find out if something is peer-reviewed in JSTOR?” we asked the group.

Joking, the student using Academic Search Complete said, “Oh, I just won’t use JSTOR then.”

We laughed before we suggested they could always Google the name of a journal title to see if they could figure out if it’s peer-reviewed. That, or they could use a specialized database like Ulrichsweb to confirm what a journal’s designation is.

“And when you research,” we said, “You’ll often have to consult more than one database. Academic Search Complete gives you access to one pile of information, but JSTOR gives you access to another pile. When you research, you might have lots of luck in one place and little success in another.”

A student in another group in another class—one looking into the question “At colleges, what are the attitudes toward terms like ‘safe space,’ ‘microaggression,’ and ‘trigger warning’?”—wanted to know if site-specific Googling worked only in general for “.com,” “.org,” “.gov,” and “.edu” or if they could use it for combing a specific website.

“Can I use it to search YouTube?” she wanted to know.

We asked her to try it, and she plugged in “” for a search about safe spaces on college campuses and got some results.

Someone else in her group had found an article on JSTOR and wanted to know if they had to read the whole thing to understand it.

“Does it have that thing at the beginning? The summary?” someone else in the group asked before we could respond. They explained that academic articles usually have summaries at the beginning, and we took the opportunity to say that, in the genre of scholarly writing, those summaries are often called “abstracts.”

Again, although these scenes are anecdotes, we do feel they capture the spirit of the seven follow-up sessions of Problem-Based Learning we participated in. In moments like these, it’s apparent that students benefited from working together in groups and that they were acting not just as students but also as teachers.

The students’ embodiment of what it means to teach was clearest in the final twenty minutes of this second session, when they stood up at the front of the classroom and, using the room’s computer and projector, taught us and one another about what they had found. One group that had looked into the question “At colleges, what are the attitudes toward terms like ‘safe space,’ ‘microaggression,’ and ‘trigger warning’?” said they had to use many different search terms in order to find anything in an academic database like Academic Search Complete. They showed us how, in the “Advanced Search” page of that database, they had even gone so far as to use the “NOT” Boolean operator in order to weed “high school” out of their search results.

“At first, we weren’t sure what the ‘NOT’ was for,” one of them said, “but it did help us find this article.” It was an article called “How White Faculty Perceive and React to Difficult Dialogues on Race,” and another student in the group mentioned that it would be especially good for people at our own school to read because, currently on our campus, a program called “Difficult Dialogues” had been started to address the campus climate, particularly with regard to race.

Someone else in their group showed how they had done a site-specific Google search of our college’s website, using terms like “safe space” and “microaggression” and were surprised that very few webpages came up. “It only comes up on a Women’s Studies course description,” this student said.

Another group that had explored the questions “Are there colleges that provide rape kits—and the professionals (SANEs) [Sexual Assault Nurse Examiners] trained to administer them—for their students? Should our college provide these?” found that our college did not provide access to sexual assault kits or staff sexual-assault nurse examiners in our health center. The students pointed out that using the search term “sane” didn’t always get them the information they were looking for and that it sometimes resulted in resources about mental health. Nevertheless, they had used site-specific Googling to find universities, like the University of Iowa and Oregon State University, that did provide sexual-assault kits and counseling for their students. They also used Google Maps to show us just how far away the nearest hospital was from our own institution.

“This is how far a victim would have to drive to be examined,” one of the students said.

Their instructor praised them not only for finding salient information but also for making such a compelling argument. “These are the kinds of problems and solutions you could write about for your own papers,” she said.

As we cited earlier in this paper, one of the core tenets of Problem-Based Learning is “Learning Is Student-Centered.” In response to that declaration, however, we can’t help but ask, “But what happens when students are, in fact, teachers?” In such cases, does it mean the center shouldn’t be on them? From our experiences using Problem-Based Learning in library instruction, we have found it’s impossible to say that students aren’t also teachers, that they don’t teach themselves as well as their librarians and instructors. What’s more, to say they are “reporting” or “debriefing” when they are truly teaching is to misrepresent the agency they’re taking in the classroom. In Pedagogy of the Oppressed, Paulo Freire writes, “Education must begin with the solution of the teacher-student contradiction, by reconciling the poles of contradiction so that both are simultaneously teachers and students.”58 Although we have enjoyed experimenting with Problem-Based Learning, and though it’s a significant improvement on the banking-concept education we had been relying on previously, for us it’s apparent that the instruction librarians who have used it are still trying to solve its contradictions. In fact, lately, we have been turning to people who work with Critical Librarianship because, through #critlib, we see that, years before us, librarians have been employing critical pedagogy and the ideas of theorists like Paulo Freire to question practices in information literacy instruction.59, 60 Practitioners of Critical Librarianship seem to find, as we did, that if students aren’t seen for who they are, and if their experiences aren’t respected and heard, then the education that results isn’t democratic but oppressive.

Results From Our Survey:61

In the articles we read about Problem-Based Learning in library instruction, no one had surveyed students about what their experiences with learning in libraries had been like in the past. Most of the students we worked with were first-year college students—and a good number of them were first-semester college students—so their experiences with library instruction often went back to their time in high school. In one of our pre-instruction questions, we asked students how they had been taught about library resources in the past. With a lecture? An activity? In groups? Other?

In response, eighty of the 110 students (73%) said they had been taught about library resources through lecturing. Four students (4%) said they had learned in groups. For us, this confirms that, when students come to our library instruction, they already have a history of learning only via banking-concept methods. Percentages like these convince us to continue with teaching methods that trouble the boundaries between what it means to be a “teacher” or a “student.”

Later, in a post-instruction survey, we asked students about the group work we had incorporated into our Problem-Based Learning. Ninety-six out of around 126 students took this survey, and seventy-six of them (79%) said that working together in groups was helpful. Furthermore, twenty-four of those students (25% of the total) responded positively to our prompt “Usually, I don’t enjoy working in groups, but this was still helpful.” We were surprised to see so many of the students approved of working in groups because, anecdotally, we had sometimes heard from them that working in groups can be problematic; that is, it can be awkward to have students assign roles for themselves that are equitable, and for some students, group work is associated with busy work. But, again, these results encourage us to continue finding ways for students to work together in groups, particularly in ones that lead students, librarians, and instructors to act as both students and teachers. The point is not to be “student-centered” because it’s just as important for the center to be on “students-teachers” and the problems they’re studying.

At the end of our post-instruction survey, we also left open-ended space for students to write about their favorite and least favorite parts of the instruction. Here is a sampling of their responses:

What was your favorite part of this instruction?

  • “It was good examples that were being used. I like that we were able to do a test to see what sources we can find as if we were actually writing the proposal paper.”
  • “Doing the research myself, instead of just listening the whole time. (Hands on feeling ).”
  • “Meeting as a group, and being given examples of how to properly research these topics”
  • “My favorite part was investigating the different terms my group had to research because I had not known what some of the words meant.”
  • “Seeing how other groups found their articles.”
  • “I think actually looking up real examples of research questions was very helpful to actually get used to using the online sources.”
  • “the group part because we could talk with others about what they were doing.”

What was your least favorite part of this instruction and why?

  • “The lengthy instruction of how to look through research information we already know how to.”
  • “My least favorite part was the group presentation because it was difficult for me to find good material.”
  • “lecturing how to use the data bases because i already knew how.”
  • “Learning how to search the databases because this has already been taught to me.”
  • “I have learned how to use databases many times. My least favorite part was having to sit throughout the entire presentation again. It was a nice refresher, but very repetitive.”
  • “the library instruction part because it wasn’t very interactive.”
  • “the repetitive instruction about some of it.”

There is richness in these responses, and they give us plenty to think about as we work to revise our instructional design. For one thing, although we see some people recognized the benefit in research questions provided to them from us and their instructor, we think it would be far better if they were to look into open-ended, “ill-structured” questions, problems, and case studies of their own design. These questions of their own could replace the ones we provided; that, or perhaps they could draft and investigate their own in a third session of instruction.

We also acknowledge that many students perceived our instruction as repetitive and not interactive enough, and that means we still have much to learn when it comes to offering library instruction in which they feel seen, engaged, and alive. At the beginning of this paper, we placed an epigraph that spotlights text from a letter of Paolo Freire’s. In it, decades after the publication of Pedagogy of the Oppressed, he admits he still thinks about the relationships of and differences between students and teachers.62 He raises the crucial point that anyone who identifies as a teacher must take pains not to let their authority veer into the authoritarian. We see that, in the future, as we work to bring more critical pedagogy to Problem-Based Learning—and to be in conversation with people practicing #critlib—the best way to guard against authoritarian practices is to share the dissonances and delights of teaching with others.

A sincere thanks to Amanda Hornby, Sofia Leung, Annie Pho, and Lead Pipe editors for your direct, thoughtful feedback about this final paper and earlier drafts. Through your generous comments, we came to see new perspectives and found connections we had missed.


Angell, Katelyn, and Katherine Boss. “Adapting the Amazing Library Race: Using problem-based learning in library orientations.” College & Undergraduate Libraries 23, no. 1 (2016): 44-55.

Barrows, Howard S. “Problem‐based learning in medicine and beyond: A brief overview.” New Directions for Teaching and Learning 1996, no. 68 (1996): 3-12.

Beilin, Ian. “Beyond the threshold: Conformity, resistance, and the ACRL information literacy framework for higher education.” In the Library with the Lead Pipe (2015).

Cheney, Debora. “Problem-based learning: Librarians as collaborators.” portal: Libraries and the Academy 4, no. 4 (2004): 495-508.

Diekema, Anne R., Wendy Holliday, and Heather Leary. “Re-framing information literacy: Problem-based learning as informed learning.” Library & Information Science Research 33, no. 4 (2011): 261-268.

Freire, Paulo, and Macedo, Donaldo P. Letters to Cristina: Reflections on My Life and Work. New York: Routledge, 1996.

Freire, Paulo, Ramos, Myra Bergman, and Macedo, Donaldo P. Pedagogy of the Oppressed. 30th Anniversary ed. New York: Bloomsbury Academic, 2012.

Hines, Samantha, and Eric H. Hines. “Faculty and librarian collaboration on problem-based learning.” Journal of Library Innovation 3, no. 2 (2012): 18-32.

hooks, bell. Teaching to Transgress: Education as the Practice of Freedom. New York: Routledge, 1994.

Kenney, Barbara. “Revitalizing the One-Shot Instruction Session Using Problem-Based Learning.” Reference & User Services Quarterly 47, no. 4 (2008): 386-91.

Pelikan, Michael. “Problem-Based Learning in the Library: Evolving a Realistic Approach.” portal: Libraries and the Academy 4, no. 4 (2004): 509-20.

Riedler, Martina, and Mustafa Yunus Eryaman. “Transformative Library Pedagogy and Community Based Libraries: A Freirean Perspective.” Critical Theory For Library and Information Science: Exploring the Social From Across Disciplines (2010): 89-99.

Shor, Ira. When Students Have Power: Negotiating Authority in a Critical Pedagogy. Chicago: University of Chicago Press, 1996.

Smith Macklin, Alexius. “Integrating Information Literacy Using Problem-based Learning.” Reference Services Review 29, no. 4 (2001): 306-14.

Snavely, Loanne. “Making Problem-Based Learning Work: Institutional Changes.” portal: Libraries and the Academy 4, no. 4 (2004): 521-31.

Spence, Larry. “The Usual Doesn’t Work: Why We Need Problem-Based Learning.” portal: Libraries and the Academy 4, no. 4 (2004): 485-93.

Kate Wenger. “Problem-Based Learning and Information Literacy: A Natural Partnership.” Pennsylvania Libraries: Research & Practice 2, no. 2 (2014): 142-54.

Pre- and Post-Instruction Survey Questions:

Pre-instruction Questions:

What kinds of research materials have you used for assignments? Check all that apply.

  • Databases (like JSTOR)
  • Academic journals
  • Books
  • E-Books
  • Encyclopedias
  • Wikipedia
  • Google
  • Google Scholar
  • Other (please specify)

Generally, how confident are you using research materials for assignments?

  • Very confident
  • Confident
  • Somewhat confident
  • Not very confident

What do you look for to assess the quality of research materials? Check all that apply.

  • Author
  • Date
  • Publisher
  • Peer-review
  • Format (print material or online)
  • Bias
  • Popularity
  • Type of web address (.com, .org, .edu, .gov)

How confident are you assessing the quality of research materials?

  • Very confident
  • Confident
  • Somewhat confident
  • Not very confident

Have you been taught how to use research materials?

  • Yes. In one class
  • Yes. In more than one class
  • No
  • I’m not sure / I don’t remember

Who taught you?

  • A teacher
  • A librarian
  • The teacher and librarian were team teachers
  • I taught myself
  • Other (please specify)

How were you taught? Check all that apply

  • With a lecture
  • With an activity
  • In groups
  • Other (please specify)

If you’re working to finish an assignment, how likely are you to try to use a research material that you don’t have experience with?

  • Very likely
  • Somewhat likely
  • Likely
  • Not very likely

Rank from 1-5 (1 being the most credible, 5 being the least credible) the credibility of these materials.

  • An article from a national newspaper
  • A blog
  • A scholarly, peer-reviewed journal
  • An advocacy organization’s website
  • A government website

Post-Instruction Questions:

What kinds of research materials have you used for assignments? Check all that apply.

  • Databases (like JSTOR)
  • Academic journals
  • Books
  • E-Books
  • Encyclopedias
  • Wikipedia
  • Google
  • Google Scholar
  • Other (please specify)

Generally, how confident are you using research materials for assignments?

  • Very confident
  • Confident
  • Somewhat confident
  • Not very confident

What do you look for to assess the quality of research materials? Check all that apply.

  • Author
  • Date
  • Publisher
  • Peer-review
  • Format (print material or online)
  • Bias
  • Popularity
  • Web address (.com, .org, .edu, .gov)

How confident are you assessing the quality of research materials?

  • Very confident
  • Confident
  • Somewhat confident
  • Not very confident

If you’re working to finish an assignment, how likely are you to try to use a research material that you don’t have experience with?

  • Very likely
  • Somewhat likely
  • Likely
  • Not very likely

Rank from 1-5 (1 being the most credible, 5 being the least credible) the credibility of these materials.

  • An article from a national newspaper
  • A blog
  • A scholarly, peer-reviewed journal
  • An advocacy organization’s website
  • A government website

How effective for your learning was using problems that your teacher provided?

  • Not at all effective
  • Somewhat effective
  • Effective
  • Very effective

How helpful was it to research these problems in groups?

  • Usually, I don’t enjoy working in groups, but this was still helpful
  • Usually, I don’t enjoy working in groups, and this was not helpful
  • Usually, I like working in groups, and this was helpful
  • Usually, I like working in groups, and this was not helpful

What was your favorite part of this instruction?

What was your least favorite? Why?

  1. Paulo Freire, Letters to Cristina: Reflections on My Life and Work (New York, Routledge, 1996), 162
  2. LibGuides are a Springshare product that enable librarians to make simple websites full of images, links, and widgets. At our institution, we used them to make resource guides for majors like “Biology,” “Black Studies,” or “Women’s Studies.”
  3. Paulo Freire, Pedagogy of the Oppressed (New York: Bloomsbury Academic, 2012), 72
  4. Ira Shor, When Students Have Power: Negotiating Authority in a Critical Pedagogy (Chicago: University of Chicago Press, 1996), 12
  5. bell hooks, Teaching to Transgress (New York: Routledge, 1994), 15
  6. Howard Barrows, “Problem-Based Learning in Medicine and Beyond: A Brief Overview.” New Directions for Teaching and Learning, no. 68 (1996): 4
  7. Barrows, “Problem-Based Learning in Medicine and Beyond: A Brief Overview,” 5-6
  8. Freire, Pedagogy of the Oppressed, 60
  9. Freire, Pedagogy of the Oppressed, 61
  10. Debora Cheney, “Problem-Based Learning: Librarians as Collaborators.” portal: Libraries and the Academy 4, no. 4 (2004): 495-508.
  11. Eric Hines and Samantha Hines, “Faculty and Librarian Collaboration on Problem-Based Learning.” Journal of Library Innovation 3, no. 2 (2012): 18-32.
  12. Barbara Kenney, “Revitalizing the One-Shot Instruction Session Using Problem-Based Learning.” Reference & User Services Quarterly 47, no. 4 (2008): 386-91.
  13. Katelyn Angell and Katherine Boss, “Adapting the Amazing Library Race: Using Problem-based Learning in Library Orientations.” College & Undergraduate Libraries 23, no. 1 (2016): 44-55.
  14. Anne Diekema, Wendy Holliday, and Heather Leary, Diekema, Holliday, and Leary. “Re-framing Information Literacy: Problem-based Learning as Informed Learning.” Library and Information Science Research 33, no. 4 (2011): 261-68.
  15. Kate Wenger. “Problem-Based Learning and Information Literacy: A Natural Partnership.” Pennsylvania Libraries: Research & Practice 2, no. 2 (2014): 142-54.
  16. Angell and Boss, “Adapting the Amazing Library Race: Using Problem-based Learning in Library Orientations,” 44-55.
  17. Kenney, “Revitalizing the One-Shot Instruction Session Using Problem-Based Learning,” 386-91.
  18. Wenger,”Problem-Based Learning and Information Literacy: A Natural Partnership,” 142-54.
  19. Alexius Smith Macklin, “Integrating Information Literacy Using Problem-based Learning.” Reference Services Review 29, no. 4 (2001): 306-14.
  20. Alexius Smith Macklin, “Integrating Information Literacy Using Problem-based Learning.” Reference Services Review 29, no. 4 (2001): 306-14.
  21. Wenger, “Problem-Based Learning and Information Literacy: A Natural Partnership,” 142-54.
  22. Angell and Boss, “Adapting the Amazing Library Race: Using Problem-based Learning in Library Orientations,” 44-55.
  23. Kenney, “Revitalizing the One-Shot Instruction Session Using Problem-Based Learning,” 386-91.
  24. Smith Macklin, “Integrating Information Literacy Using Problem-based Learning,” 306-14.
  25. Cheney, “Problem-Based Learning: Librarians as Collaborators,” 495-508.
  26. Hines and Hines, “Faculty and Librarian Collaboration on Problem-Based Learning,” 18-32.
  27. Michael Pelikan, “Problem-Based Learning in the Library: Evolving a Realistic Approach.” portal: Libraries and the Academy 4, no. 4 (2004): 509-20.
  28. Hines and Hines, “Faculty and Librarian Collaboration on Problem-Based Learning,” 18-32.
  29. Pelikan, “Problem-Based Learning in the Library: Evolving a Realistic Approach,” 509-20.
  30. Smith Macklin, “Integrating Information Literacy Using Problem-based Learning,” 306-14.
  31. Angell and Boss, “Adapting the Amazing Library Race: Using Problem-based Learning in Library Orientations,” 44-55.
  32. Diekema, Holliday, and Leary. “Re-framing Information Literacy: Problem-based Learning as Informed Learning,” 261-68.
  33. Diekema, Holliday, and Leary. “Re-framing Information Literacy: Problem-based Learning as Informed Learning,” 261-68.
  34. Wenger, “Problem-Based Learning and Information Literacy: A Natural Partnership,” 147.
  35. Cheney, “Problem-Based Learning: Librarians as Collaborators,” 497.
  36. Ibid, 506.
  37. Kenney, “Revitalizing the One-Shot Instruction Session Using Problem-Based Learning,” 388.
  38. Kenney, “Revitalizing the One-Shot Instruction Session Using Problem-Based Learning,” 388.
  39. Kenney, “Revitalizing the One-Shot Instruction Session Using Problem-Based Learning,” 389.
  40. Smith Macklin, “Integrating Information Literacy Using Problem-based Learning,” 308.
  41. Cheney, “Problem-Based Learning: Librarians as Collaborators,” 498.
  42. Pelikan, “Problem-Based Learning in the Library: Evolving a Realistic Approach,” 515.
  43. Diekema, Holliday, and Leary. “Re-framing Information Literacy: Problem-based Learning as Informed Learning,” 263.
  44. Hines and Hines, “Faculty and Librarian Collaboration on Problem-Based Learning,” 22.
  45. When we came up with this pair of questions with the instructor, at the time, there were many articles in the national news about the prevalence of sexual assault on college campuses and how problematic their investigations were. We thought these questions could go well with a problem-solving assignment in which a student could write about how they might right injustice on college campuses, but looking at these questions now, we see they obviously could, and potentially did, raise past trauma. This is not to say that we shouldn’t encourage students to investigate topics like sexual assault, but it is to acknowledge that we were rash to offer up these questions without developing trust over time with students and giving them the clear option to opt out or select an alternative topic. Paul Baepler and J.D. Walker’s article “Active Learning Classrooms and Educational Alliances: Changing Relationships to Improve Learning” in particular reveals how trust can be forged between teachers and students, not to mention where we went wrong.
  46. Kenney, “Revitalizing the One-Shot Instruction Session Using Problem-Based Learning,” 390.
  47. hooks, Teaching to Transgress, 46
  48. Cheney, “Problem-Based Learning: Librarians as Collaborators,” 497.
  49. Kenney, “Revitalizing the One-Shot Instruction Session Using Problem-Based Learning,” 390.
  50. Hines and Hines, “Faculty and Librarian Collaboration on Problem-Based Learning,” 19.
  51. Cheney, “Problem-Based Learning: Librarians as Collaborators,” 506.
  52. Kenney, “Revitalizing the One-Shot Instruction Session Using Problem-Based Learning,” 387.
  53. Ibid, 389
  54. Angell and Boss, “Adapting the Amazing Library Race: Using Problem-based Learning in Library Orientations,” 47.
  55. Smith Macklin, “Integrating Information Literacy Using Problem-based Learning,” 309.
  56. Wenger, “Problem-Based Learning and Information Literacy: A Natural Partnership,” 149-50.
  57. Ibid.
  58. Freire, Pedagogy of the Oppressed, 72.
  59. Martina Riedler and Mustafa Yunus Eryaman. “Transformative Library Pedagogy and Community Based Libraries: A Freirean Perspective.” Critical Theory For Library and Information Science: Exploring the Social From Across Disciplines (2010): 89-99
  60. Ian Beilin. “Beyond the threshold: Conformity, resistance, and the ACRL information literacy framework for higher education.” In the Library with the Lead Pipe (2015).
  61.  We went through the IRB at our school, and our application did not end up “under review” or “exempt.” Instead, they determined that our project was not technically research and, as a result, was excluded from the IRB process altogether. (They said what we were doing was more of an assessment of a practice.) Though we were excluded from the IRB process, at the beginning our first session of instruction, before we gave the students surveys, we still gave them consent forms, letting them know about our project and how we intended to present and write about it. The form also let them know that participation (or lack or participation) in the surveys or sessions of instruction would not affect their grades.
  62. Freire, Letters to Cristina: Reflections on My Life and Work, 162

Announcing VIVO Camp / DuraSpace News

VIVO Camp is a multi-day training event designed specifically for new and prospective users. Camp will be held November 9-11, 2017 on the campus of Duke University in Durham, NC. Over two and a half days, VIVO Camp will start with an introduction to VIVO leading to a comprehensive overview by exploring these topics:

  • VIVO features

  • Examples and demos of VIVO including customizations

  • Representing scholarship

  • Loading, displaying and using VIVO data

  • Introduction to the ontologies

On reading Library Journal, September, 1877 / Karen Coyle

Of the many advantages to retirement is the particular one of idle time. And I will say that as a librarian one could do no better than to spend some of that time communing with the history of the profession. The difficulty is that it is so rich, so familiar in many ways that it is hard to move through it quickly. Here is just a fraction of the potential value to be found in the September issue of volume two of Library Journal.* Admittedly this is a particularly interesting number because it reports on the second meeting of the American Library Association.

For any student of library history it is especially interesting to encounter certain names as living, working members of the profession.

Other names reflect works that continued on, some until today, such as Poole and Bowker, both names associated with long-running periodical indexes.

What is particularly striking, though, is how many of the topics of today were already being discussed then, although obviously in a different context. The association was formed, at least in part, to help librarianship achieve the status of a profession. Discussed were the educating of the public on the role of libraries and librarians as well as providing education so that there could be a group of professionals to take the jobs that needed that professional knowledge. There was work to be done to convince state legislatures to support state and local libraries.

One of the first acts of the American Library Association when it was founded in 1876 (as reported in the first issue of Library Journal) was to create a Committee on Cooperation. This is the seed for today's cooperative cataloging efforts as well as other forms of sharing among libraries. In 1877, undoubtedly encouraged by the participation of some members of the publishing community in ALA, there was hope that libraries and publishers would work together to create catalog entries for in-print works.
This is one hope of the early participants that we are still working on, especially the desire that such catalog copy would be "uniform." Note that there were also discussions about having librarians contribute to the periodical indexes of R. R. Bowker and Poole, so the cooperation would flow in both directions.

The physical organization of libraries also was of interest, and a detailed plan for a round (actually octagonal) library design was presented:
His conclusion, however, shows a difference in our concepts of user privacy.
Especially interesting to me are the discussions of library technology. I was unaware of some of the emerging technologies for reproduction such as the papyrograph and the electric pen. In 1877, the big question, though, was whether to employ the new (but as yet un-perfected) technology of the typewriter in library practice.

There was some poo-pooing of this new technology, but some members felt it may be reaching a state of usefulness.

"The President" in this case is Justin Winsor, Superintendent of the Boston Library, then president of the American Library Association. Substituting more modern technologies, I suspect we have all taken part in this discussion during our careers.

Reading through the Journal evokes a strong sense of "le plus ça change..." but I admit that I find it all rather reassuring. The historical beginnings give me a sense of why we are who we are today, and what factors are behind some of our embedded thinking on topics.

* Many of the early volumes are available from HathiTrust, if you have access. Although the texts themselves are public domain, these are Google-digitized books and are not available without a login. (Don't get me started!) If you do not have access to those, most of the volumes are available through the Internet Archive. Select "text" and search on "library journal". As someone without HathiTrust institutional access I have found most numbers in the range 1-39, but am missing (hint, hint): 5/1880; 8-9/1887-88; 17/1892; 19/1894; 28-30/1903-1905; 34-37;1909-1912. If I can complete the run I think it would be good to create a compressed archive of the whole and make that available via the Internet Archive to save others the time of acquiring them one at a time. If I can find the remainder that are pre-1923 I will add those in.