Planet Code4Lib

Libraries’ tech pipeline problem / Coral Sheldon-Hess

“We’ve got a pipeline problem, so let’s build a better pipeline.” –Bess Sadler, Code4Lib 2014 Conference (the link goes to the video)

I’ve been thinking hard (for two years, judging by the draft date on this post) about how to grow as a programmer, when one is also a librarian. I’m talking not so much about teaching/learning the basics of coding, which is something a lot of people are working really hard on, but more about getting from “OK, I finished yet another Python/Rails/JavaScript/whatever workshop” or “OK, I’ve been through all of Code Academy/edX/whatever”—or from where I am, “OK, I can Do Interesting Things™ with code, but there are huge gaps in my tech knowledge and vocabulary”—to the point where one could get a full-time librarian-coder position.

I should add, right here: I’m no longer trying to get a librarian-coder position*. This post isn’t about me, although it is, of course, from my perspective and informed by my experiences. This post is about a field I love, which is currently shooting itself in the foot, which frustrates me.

Bess is right: libraries need 1) more developers and 2) more diversity among them. Libraries are hamstrung by expensive, insufficient vendor “solutions.” (I’m not hating on the vendors, here; libraries’ problems are complex, and fragmentation and a number of other issues make it difficult for vendors to provide really good solutions.) Libraries and librarians could be so much more effective if we had good software, with interoperable APIs, designed specifically to fill modern libraries’ needs.

Please, don’t get me wrong: I know some libraries are working on this. But they’re too few, and their developers’ demographics do not represent the demographics of libraries at large, let alone our patron bases. I argue that the dearth and the demographic skew will continue and probably worsen, unless we make a radical change to our hiring practices and training options for technical talent.

Building technical skills among librarians

The biggest issue I see is that we offer a fair number of very basic learn-to-code workshops, but we don’t offer a realistic path from there to writing code as a job. To put a finer point on it, we do not offer “junior developer” positions in libraries; we write job ads asking for unicorns, with expert- or near-expert-level skills in at least two areas (I’ve seen ones that wanted strong skills in development, user experience, and devops, for instance).

This is unfortunate, because developing real fluency with any skill, including coding, requires practicing it regularly. In the case of software development, there are things you can really only learn on the job, working with other developers (ask me about Git, sometime); only, nobody seems willing to hire for that. And, yes, I understand that there are lots of single-person teams in libraries—far more than there should be—but many open source software projects can fill in a lot of that group learning and mentoring experience, if a lone developer is allowed to participate in them on work time. (OSS is how I am planning to fill in those skills, myself.)

From what I can tell, if you’re a librarian who wants to learn to code, you generally have two really bad options: 1) learn in your spare time, somehow; or 2) quit libraries and work somewhere else until your skills are built up. I’ve been down both of those roads, and as a result I no longer have “be a [paid] librarian-developer” on my goals list.

Option one: Learn in your spare time

This option is clown shoes. It isn’t sustainable for anybody, really, but it’s especially not sustainable for people in caretaker roles (e.g. single parents), people with certain disabilities (who have less energy and free time to start with), people who need to work more than one job, etc.—that is, people from marginalized groups. Frankly, it’s oppressive, and it’s absolutely a contributing factor to libtech’s largely male, white, middle to upper-middle class, able-bodied demographics—in contrast to the demographics of the field at large (which is also most of those things, but certainly not predominantly male).

“I’ve never bought this ‘do it in your spare time’ stuff. And it turns out that doing it in your spare time is terribly discriminatory, because … a prominent aspect of oppression is that you have more to do in less spare time.” – Valerie Aurora, during her keynote interview for Code4Lib 2014 (the link goes to the video)

“It’s become the norm in many technology shops to expect that people will take care of skills upgrading on their own time. But that’s just not a sustainable model. Even people who adore late night, just-for-fun hacking sessions during the legendary ‘larval phase’ of discovering software development can come to feel differently in a later part of their lives.” – Bess Sadler, same talk as above

I tried to make it work, in my last library job, by taking one day off every other week** to work on my development skills. I did make some headway—a lot, arguably—but one day every two weeks is not enough to build real fluency, just as fiddling around alone did not help me build the skills that a project with a team would have. Not only do most people not have the privilege of dropping to 90% of their work time, but even if you do, that’s not an effective route to learning enough!

And, here, you might think of the coding bootcamps (at more than $10k per) or the (free, but you have to live in NYC) Recurse Center (which sits on my bucket list, unvisited), but, again: most people can’t afford to take three months away from work, like that. And the Recurse Center isn’t so much a school (hence the name change away from “Hacker School”) as it is a place to get away from the pressures of daily life and just code; realistically, you have to be at a certain level to get in. My point, though, is that the people for whom these are realistic options tend to be among the least marginalized in other ways. So, I argue that they are not solutions and not something we should expect people to do.

Option two: go work in tech

If you can’t get the training you need within libraries or in your spare time, it kind of makes sense to go find a job with some tech company, work there for a few years, build up your skills, and then come back. I thought so, anyway. It turns out, this plan was clown shoes, too.

Every woman I’ve talked to who has taken this approach has had a terrible experience. (I also know of a few women who’ve tried this approach and haven’t reported back, at least to me. So my data is incomplete, here. Still, tech’s horror stories are numerous, so go with me here.) I have a theory that library vendors are a safer bet and may be open to hiring newer developers than libraries currently are, but I don’t have enough data (or anecdata) to back it up, so I’m going to talk about tech-tech.

Frankly, if we expect members of any marginalized group to go work in tech in order to build up the skills necessary for a librarian-developer job, we are throwing them to the wolves. In tech, even able-bodied straight cisgender middle class white women are a badly marginalized group, and heaven help you if you’re on any other axis of oppression.

And, sure, yeah. Not all tech. I’ll agree that there are non-terrible jobs for people from marginalized groups in tech, but you have to be skilled enough to get to be that choosy, which people in the scenario we’re discussing are not. I think my story is a pretty good illustration of how even a promising-looking tech job can still turn out horrible. (TLDR: I found a company that could talk about basic inclusivity and diversity in a knowledgeable way and seemed to want to build a healthy culture. It did not have a healthy culture.)

We just can’t outsource that skill-building period to non-library tech. It isn’t right. We stand to lose good people that way.

We need to develop our own techies—I’m talking code, here, because it’s what I know, but most of my argument expands to all of libtech and possibly even to library leadership—or continue offering our patrons sub-par software built within vendor silos and patched together by a small, privileged subset of our field. I don’t have to tell you what that looks like; we live with it, already.

What to do?

I’m going to focus on what you, as an individual organization, or leader within an organization, can do to help; I acknowledge that there are some systemic issues at play, beyond what my relatively small suggestions can reach, and I hope this post gets people talking and thinking about them (and not just to wave their hands and sigh and complain that “there isn’t enough money,” because doomsaying is boring and not helpful).

First of all, when you’re looking at adding to the tech talent in your organization, look within your organization. Is there a cataloger who knows some scripting and might want to learn more? (Ask around! Find out!) What about your web content manager, UX person, etc.? (Offer!) You’ll probably be tempted to look at men, first, because society has programmed us all in evil ways (seriously), so acknowledge that impulse and look harder. The same goes for race and disability and having the MLIS, which is too often a stand-in for socioeconomic class; actively resist those biases (and we all have those biases).

If you need tech talent and can’t grow it from within your organization, sit down and figure out what you really need, on day one, versus what might be nice to have, but could realistically wait. Don’t put a single nice-to-have on your requirements list, and don’t you dare lose sight of what is and isn’t necessary when evaluating candidates.

Recruit in diverse and non-traditional spaces for tech folks — dashing off an email to Code4Lib is not good enough (although, sure, do that too; they’re nice folks). LibTechWomen is an obvious choice, as are the Spectrum Scholars, but you might also look at the cataloging listservs or the UX listservs, just to name two options. Maybe see who tweets about #libtechgender and #critlib (and possibly #lismicroaggressions?), and invite those folks to apply and to share your linted job opening with their networks.

Don’t use whiteboard interviews! They are useless and unnecessarily intimidating! They screen for “confidence,” not technical ability. Pair-programming exercises, with actual taking turns and pairing, are a good alternative. Talking through scenarios is also a good alternative.

Don’t give candidates technology vocabulary tests. Not only is it nearly useless as an evaluation tool (and a little insulting); it actively discriminates against people without formal CS education (or, cough, people with CS minors from more than a decade ago). You want to know that they can approach a problem in an organized manner, not that they can define a term that’s easily Googled.

Do some reading about impostor syndrome, stereotype threat, and responsible tech hiring. Model View Culture’s a good place to start; here is their hiring issue.

(I have a whole slew of comments about hiring, and I’ll make those—and probably repeat the list above—in another post.)

Once you have someone in a position, or (better) you’re growing someone into a position, be sure to set reasonable expectations and deadlines. There will be some training time for any tech person; you want this, because something built with enough forethought and research will be better than something hurriedly duct-taped (figuratively, you hope) together.

Give people access to mentorship, in whatever form you can. If you can’t give them access to a team within your organization, give them dedicated time to contribute to relevant OSS projects. Send them to—just to name two really inclusive and helpful conferences/communities—Code4Lib (which has regional meetings, too) and/or Open Source Bridge.


So… that’s what I’ve got. What have I missed? What else should we be doing to help fix this gap?


* In truth, as excited as I am about starting my own business, I wouldn’t turn down an interview for a librarian-coder position local to Pittsburgh, but 1) it doesn’t feel like the wind is blowing that way, here, and 2) I’m in the midst of a whole slew of posts that may make me unemployable, anyway ;) (back to the text)

** To be fair, I did get to do some development on the clock, there. Unfortunately, because I wore so many hats, and other hats grew more quickly, it was not a large part of my work. Still, I got most of my PHP experience there, and I’m glad I had the opportunity. (back to the text)


How Twitter Uses Apache Lucene for Real-Time Search / SearchHub

As we countdown to the annual Lucene/Solr Revolution conference in Austin this October, we’re highlighting talks and sessions from past conferences. Today, we’re highlighting Michael Busch’s session session on how Twitter executes real-time search with Apache Lucene. Twitter’s search engine serves billions of queries per day from different Lucene indexes, while appending more than hundreds of millions of tweets per day in real time. This session will give an overview of Twitter’s search architecture and recent changes and improvements that have been made. It will focus on the usage of Lucene and the modifications that have been made to it to support Twitter’s unique performance requirements. Michael Busch is architect in Twitter’s Search & Content organization. He designed and implemented Twitter’s current search index, which is based on Apache Lucene and optimized for realtime search. Prior to Twitter Michael worked at IBM on search and eDiscovery applications. Michael is Lucene committer and Apache member for many years.
Search at Twitter: Presented by Michael Busch, Twitter from Lucidworks
Twitter’s search engine serves billions of queries per day from different Lucene indexes, while appending more than hundreds of millions of tweets per day in real time. This session will give an overview of Twitter’s search architecture and recent changes and improvements that have been made. It will focus on the usage of Lucene and the modifications that have been made to it to support Twitter’s unique performance requirements. lucenerevolution-avatarJoin us at Lucene/Solr Revolution 2015, the biggest open source conference dedicated to Apache Lucene/Solr on October 13-16, 2015 in Austin, Texas. Come meet and network with the thought leaders building and deploying Lucene/Solr open source search technology. Full details and registration…

The post How Twitter Uses Apache Lucene for Real-Time Search appeared first on Lucidworks.

025: WordPress for Libraries with Chad Haefele / LibUX

Chad Haefele sat with Amanda and I — Michael — to talk about his new book about WordPress for Libraries. If you’ve been paying attention then you know WordPress is our jam, so we were chomping at the bit.

WordPress for Libraries by Chad HaefeleYou only have until tomorrow at the time of this posting, but if you jump on it you can enter to win a free copy.

Also, Chad has a lot to say about usability testing, especially using optimal workshop tools, as well as about organizations allocating a user experience design budget —

as well as the inglorious end of Google Wave.


The post 025: WordPress for Libraries with Chad Haefele appeared first on LibUX.

New Report: “Open Budget Data: Mapping the Landscape” / Open Knowledge Foundation

We’re pleased to announce a new report, “Open Budget Data: Mapping the Landscape” undertaken as a collaboration between Open Knowledge, the Global Initiative for Financial Transparency and the Digital Methods Initiative at the University of Amsterdam.

The report offers an unprecedented empirical mapping and analysis of the emerging issue of open budget data, which has appeared as ideals from the open data movement have begun to gain traction amongst advocates and practitioners of financial transparency.

In the report we chart the definitions, best practices, actors, issues and initiatives associated with the emerging issue of open budget data in different forms of digital media to navigate this developing field and to identify trends, gaps and opportunities for supporting it.

In doing so, our objective is to enable practitioners – in particular civil society organisations, intergovernmental organisations, governments, multilaterals and funders – to navigate this developing field and to identify trends, gaps and opportunities for supporting it.

How public money is collected and distributed is one of the most pressing political questions of our time, influencing the health, well-being and prospects of billions of people. Decisions about fiscal policy affect everyone-determining everything from the resourcing of essential public services, to the capacity of public institutions to take action on global challenges such as poverty, inequality or climate change.

Digital technologies have the potential to transform the way that information about public money is organised, circulated and utilised in society, which in turn could shape the character of public debate, democratic engagement, governmental accountability and public participation in decision-making about public funds. Data could play a vital role in tackling the democratic deficit in fiscal policy and in supporting better outcomes for citizens.

The report includes the following recommendations:

  1. CSOs, IGOs, multilaterals and governments should undertake further work to identify, engage with and map the interests of a broader range of civil society actors whose work might benefit from open fiscal data, in order to inform data release priorities and data standards work. Stronger feedback loops should be established between the contexts of data production and its various contexts of usage in civil society – particularly in journalism and in advocacy.

  2. Governments, IGOs and funders should support pilot projects undertaken by CSOs and/or media organisations in order to further explore the role of data in the democratisation of fiscal policy – especially in relation to areas which appear to have been comparatively under-explored in this field, such as tax distribution and tax base erosion, or tracking money through from revenues to results.

  3. Governments should work to make data “citizen readable” as well as “machine readable”, and should take steps to ensure that information about flows of public money and the institutional processes around them are accessible to non-specialist audiences – including through documentation, media, events and guidance materials. This is a critical step towards the greater democratisation and accountability of fiscal policy.

  4. Further research should be undertaken to explore the potential implications and impacts of opening up information about public finance which is currently not routinely disclosed, such as more detailed data about tax revenues – as well as measures needed to protect the personal privacy of individuals.

  5. CSOs, IGOs, multilaterals and governments should work together to promote and adopt consistent definitions of open budget data, open spending data and open fiscal data in order to establish the legal and technical openness of public information about public money as a global norm in financial transparency.

Viewshare Supports Critical Thinking in the Classroom / Library of Congress: The Signal

This year I had the pleasure of meeting Dr. Peggy Spitzer Christoff, lecturer in Asian and Asian American Studies at Stony Brook University. She shared with me how she’s using the Library of Congress’ Viewshare tool to engage her students in an introduction to Asia Studies course. Peg talked about using digital platforms as a way to improve writing, visual and information literacy skills in her students. In this interview, she talks about why and how Viewshare is useful in connecting the students’ time “surfing the web” to creating presentations that require reflection and analysis.

Abbey: How did you first hear about Viewshare and what inspired you to use it in your classes?

Peg Christoff, Lecturer at Stony Brook University

Peg Christoff, Lecturer at Stony Brook University

Peg: I heard about it through the monthly Library of Congress Women’s History Discussion Group, about three years ago. At the time, Trevor Owens [former Library of Congress staff member] was doing presentations throughout the Library and he presented Viewshare to that group. It sounded like a neat way to organize information. Around the same time, I was developing the Department of Asian and Asian American Studies’ introductory (gateway) course for first and second year students at Stony Brook University. Faculty in our department were concerned that students couldn’t find Asian countries on a map and had very little understanding of basic information about Asia. I thought that developing a student project using Viewshare would enable each student to identify, describe and visually represent aspects of Asia of their choosing — as a launching pad for further exploration. Plus, I liked the idea of students writing paragraphs to describe each of the items they selected because it could help them become better writers. Finally, I wanted students to learn how to use an Excel spreadsheet in the context of a digital platform.

Abbey: So it sounds like the digital platforms project is allowing your students to explore a specific topic they may not be familiar with (i.e., Asian Studies) with a resource they are probably more familiar with (i.e., the web) while at the same time exposing them to basic data curation principles. Would you agree?

Peg: Yes. Combining these into one project has been so popular because we’ve broadened student interest in how collections are developed and organized.

Abbey: Why do you think Viewshare works well in the classroom?

Peg: Because students have the freedom to develop their own collections of Asian artifacts and, at the end of the semester, share their collections with each other. Students approach the assignment differently and it’s surprising to them (and me) to see how their interests in “Asia” change throughout the semester, as they develop their collections.

Abbey: Please walk us through how you approach teaching your students to use Viewshare in their assignments.

Peg: I introduce the Viewshare platform to engage students in critical thinking. The project requires students to select, classify, and describe the significance of Asian artifacts relating to subjects of common concern — education, health, religion and values, consumer issues, family and home, mobility, children, careers and work, entertainment and leisure, etc. Also, I want students to think about cultured spaces in India, Southeast Asia, China, Korea, Japan and Asian communities in the United States. I encourage students to consider the emotional appeal of the items, which could include anything from a photograph of the Demilitarized Zone (DMZ) in Korea, to ornamental jade pieces from China, to ancient religious texts from India, to anime from Japan. Food has a particularly emotional appeal, especially for college students.

Undergrad TAs have developed power point slides as “tutorials” on how to use Viewshare, which I post on Blackboard. We explore the website in class and everyone signs up for an account at the very beginning of the semester. The TA helps with troubleshooting. Four times throughout the semester, the students add several artifacts, I grade their written descriptions and the TA reviews their excel spreadsheet to correct format problems. Then, around the last few weeks of the semester, the students upload their excel spreadsheet into the Viewshare platform and generate maps, timelines, pie charts, etc. Here’s an example of a typical final project.

Example Final Project

Example Final Project

Abbey: How have your students reacted to using Viewshare?

Peg: Sometimes they are frustrated when they can’t get the platform to load correctly. Almost always they enjoy seeing the final result and would like to work more on it — if we only had more time during the semester.

Abbey: Do you see any possibilities for making more use of Viewshare?

Peg: I’d like to keep track of the Asian artifacts the students select and how they describe them over long periods of time — to interpret changes in student interests. (We have a large Asian population on campus and over 50% of my students are either Asian or Asian American.)

Also, my department would like to use the Viewshare platform to illustrate a collection of Asian connections to Long Island.

Abbey: Anything else to add?

Peg: I think Viewshare is really ideal for student projects. And I have used Viewshare in academic writing to organize data and illustrate patterns. I just cited a Viewshare view in a footnote.

The Case for Open Tools in Pedagogy / LITA

Academic libraries support certain software by virtue of what they have available on their public computers, what their librarians are trained to use, and what instruction sessions they offer. Sometimes libraries don’t have a choice in the software they are tasked with supporting, but often they do. If the goal of the software support is to simply help students achieve success in the short term, then any software that the library already has a license for is fair game. If the goal is to teach them a tool they can rely on anywhere, then libraries must consider the impact of choosing open tools over commercial ones.

Suppose we have a student, we’ll call them “Student A”, who wants to learn about citation management. They see a workshop on EndNote, a popular piece of citation management software, and they decide to attend. Student A becomes enamored with EndNote and continues to grow their skills with it throughout their undergraduate career. Upon graduating, Student A gets hired and is expected to keep up with the latest research in their field, but suddenly they no longer have access to EndNote through their university’s subscription. They can either pay for an individual license, or choose a new piece of citation management software (losing all of their hard earned EndNote-specific skills in the process).

Now let’s imagine Student B who also wants to learn about citation management software but ends up going to a workshop promoting Zotero, an open source alternative to EndNote. Similar to Student A, Student B continues to use Zotero throughout their undergraduate career, slowly mastering it. Since Zotero requires no license to use, Student B continues to use Zotero after graduating, allowing the skills that served them as a student to continue to do so as a professional.

Which one of these scenarios do you think is more helpful to the student in the long run? By teaching our students to use tools that they will lose access to once outside of the university system, we are essentially handing them a ticking time bomb that will explode as they transition from student to professional, which happens to be one of the most vulnerable and stressful periods in one’s life. Any academic library that cares about the continuing success of their students once they graduate should definitely take a look at their list of current supported software and ask themselves, “Am I teaching a tool or a time bomb?”

Telling VIVO Stories at Duke University with Julia Trimmer / DuraSpace News

“Telling VIVO Stories” is a community-led initiative aimed at introducing project leaders and their ideas to one another while providing details about VIVO implementations for the community and beyond. The following interview includes personal observations that may not represent the opinions and views of Duke University or the VIVO Project. Carol Minton Morris from DuraSpace interviewed Julia Trimmer from Duke University to learn about Scholars@Duke.

Better Search with Fusion Signals / SearchHub

Signals in Lucidworks Fusion leverage information about external activity, e.g., information collected from logfiles and transaction databases, to improve the quality of search results. This post follows on my previous post, Basics of Storing Signals in Solr with Fusion for Data Engineers, which showed how to index and aggregate signal data. In this post, I show how to write and debug query pipelines using this aggregated signal information.

User clicks provide a link between what people ask for and what they choose to view, given a set of search results, usually with product images. In the aggregate, if users have winnowed the set of search results for a given kind of thing, down to a set of products that are exactly that kind of thing, e.g., if the logfile entries link queries for “Netgear”, or “router”, or “netgear router” to clicks for products that really are routers, then this information can be used to improve new searches over the product catalog.

The Story So Far

To show how signals can be used to improve search in an e-commerce application, I created a set of Fusion collections:

  • A collection called “bb_catalog”, which contains Best Buy product data, a dataset comprised of over 1.2M items, mainly consumer electronics such as household appliances, TVs, computers, and entertainment media such as games, music, and movies. This is the primary collection.
  • An auxiliary collection called “bb_catalog_signals”, created from a synthetic dataset over Best Buy query logs from 2011. This is the raw signals data, meaning that each logfile entry is stored as an individual document.
  • An auxiliary collection called “bb_catalog_signals_aggr” derived from the data in “bb_catalog_signals” by aggregating all raw signal records based on the combination of search query, field “query_s”, item clicked on, field “doc_id_s”, and search categories, field “filters_ss”.

All documents in collection “bb_catalog” have a unique product ID stored in field “id”. All items belong to one of more categories which are stored in the field “categories_ss”.

The following screenshot shows the Fusion UI search panel over collection “bb_catalog”, after using the Search UI Configuration tool to limit the document fields displayed. The gear icon next to the search box toggles this control open and closed. The “Documents” settings are set so that the primary field displayed is “name_t”, the secondary field is “id”, and additional fields are “name_t”, “id”, and “category_ss”. The document in the yellow rectangle is a Netgear router with product id “1208844”.


For collection “bb_catalog_signals”, the search query string is stored in field “query_s”, the timestamp is stored in field “tz_timestamp_txt”, the id of the document clicked on is stored in field “doc_id_s”, and the set of category filters are stored in fields “filters_ss” as well as “filters_orig_ss”.

The following screenshot shows the results of a search for raw signals where the id of the product clicked on was “1208844”.


The collection “bb_catalog_signals_aggr” contains aggregated signals. In addition to the fields “doc_id_s”, “query_s”, and “filter_ss”, aggregated click signals contain fields:

  • “count_i” – the number of raw signals found for this query, doc, filter combo.
  • “weight_d” – a real-number used as a multiplier to boost the score of these documents.
  • “tz_timestamp_txt” – all timestamps of raw signals, stored as a list of strings.

The following screenshot shows aggregated signals for searches for “netgear”. There were 3 raw signals where the search query “netgear” and some set of category choices resulted in a click on the item with id “1208844”:


Using Click Signals in a Fusion Query Pipeline

Fusion&aposs Query Pipelines take as input a set of search terms and process them into Solr query request. The Fusion UI Search panel has a control which allows you to choose the processing pipeline. In the following screenshot of the collection “bb_catalog”, the query pipeline control is just below the search input box. Here the pipeline chosen is “bb_catalog-default” (circled in yellow):


The pre-configured default query pipelines consist of 3 stages:

  • A Search Fields query stage, used to define common Solr query parameters. The initial configuration specifies that the 10 best-scoring documents should be returned.
  • A Facet query stage which defines the facets to be returned as part of the Solr search results. No facet field names are specified in the initial defaults.
  • A Solr query stage which transforms a query request object into a Solr query and submits the request to Solr. The default configuration specifies the HTTP method as a POST request.

In order to get text-based search over the collection “bb_catalog” to work as expected, the Search Field query stage must be configured to specify the set of fields that which contain relevant text. For the majority of the 1.2M products in the product catalog, the item name, found in field “name_t” is only field amenable to free text search. The following screenshot shows how to add this field to the Search Fields stage by editing the query pipeline via the Fusion 2 UI:

add search field, search term: ipad

The search panel on the right displays the results of a search for “ipad”. There were 1,359 hits for this query, which far exceeds the number of items that are an Apple iPad. The best scoring items contain “iPad” in the title, sometimes twice, but these are all iPad accessories, not the device itself.

Recommendation Boosting query stage

A Recommendation Boosting stage uses aggregated signals to selectively boost items in the set of search results. The following screenshot show the results of the same search after adding a Recommendations Boosting stage to the query pipeline:

recommendations boost, search term: ipad

The edit pipeline panel on the left shows the updated query pipeline “bb_catalog-default” after adding a “Recommendations Boosting” stage. All parameter settings for this stage have been left at their default values. In particular, the recommendation boosts are applied to field “id”. The search panel on the right shows the updated results for the search query “ipad”. Now the three most relevant items are for Apple iPads. They are iPad 2 models because the click dataset used here is based on logfile data from 2011, and at that time, the iPad 2 was the most recent iPad on the market. There were more clicks on the 16GB iPads over the more expensive 32GB model, and for the color black over the color white.

Peeking Under the Hood

Of course, under the hood, Fusion is leveraging the awesome power of Solr. To see how this works, I show both the Fusion query and the JSON of the Solr response. To display the Fusion query, I go into the Search UI Configuration and change the “General” settings and check the set “Show Query URL” option. To see the Solr response in JSON format, I change the display control from “Results” to “JSON”.

The following screenshot shows the Fusion UI search display for “ipad”:

recommendations boost, under the hood

The query “ipad” entered via the Fusion UI search box is transformed into the following request sent to the Fusion REST-API:


This request to the Query Pipelines API sends a query through the query pipeline “bb_catalog-default” for the collection “bb_catalog” using the Solr “select” request handler, where the search query parameter “q” has value “ipad”. Because the parameter “debug” has value “true”, the Solr response contains debug information, outlined by the yellow rectangle. The “bb_catalog-default” query pipeline transforms the query “ipad” into the following Solr query:

"parsedquery": "(+DisjunctionMaxQuery((name_t:ipad)) 
id:1945531^4.0904393 id:2339322^1.5108471 id:1945595^1.0636971
id:1945674^0.4065684 id:2842056^0.3342921 id:2408224^0.4388061
id:2339386^0.39254773 id:2319133^0.32736558 id:9924603^0.1956079

The outer part of this expression, “( … )/no_coord” is a reporting detail, indicating Solr&aposs “coord scoring” feature wasn&apost used.

The enclosed expression consists of:

  • The search: “+DisjunctionMaxQuery(name_t:ipad)”.
  • A set of selective boosts to be applied to the search results

The field name “name_t” is supplied by the set of search fields specified by the Search Fields query stage. (Note: if no search fields are specified, the default search field name “text” is used. Since the documents in collection “bb_catalog” don&apost contain a field named “text”, this stage must be configured with the appropriate set of search fields.)

The Recommendations Boosting stage was configured with the default parameters:

  • Number of Recommendations: 10
  • Number of Signals: 100

There are 10 documents boosted, with ids ( 1945531, 2339322, 1945595, 1945674, 2842056, 2408224, 2339386, 2319133, 9924603, 1432551 ). This set of 10 documents represents documents which had at least 100 clicks where “ipad” occurred in the user search query. The boost factor is a number derived from the aggregated signals by the Recommendation Boosting stage. If those documents contain the term “name_t:ipad”, then they will be boosted. If those documents don&apost contain the term, then they won&apost be returned by the Solr query.

To summarize: adding in the Recommendations Boosting stage results in a Solr query where selective boosts will be applied to 10 documents, based on clickstream information from an undifferentiated set of previous searches. The improvement in the quality of the search results is dramatic.

Even Better Search

Adding more processing to the query pipeline allows for user-specific and search-specific refinements. Like the Recommendations Boosting stage, these more complex query pipelines leverage Solr&aposs expressive query language, flexible scoring, and lightning fast search and indexing. Fusion query pipelines plus aggregated signals give you the tools you need to rapidly improve the user search experience.

The post Better Search with Fusion Signals appeared first on Lucidworks.

Koha - 3.20.3, 3.18.10, 3.16.14 / FOSS4Lib Recent Releases

Release Date: 
Monday, August 31, 2015

Last updated September 1, 2015. Created by David Nind on September 1, 2015.
Log in to edit this page.

Monthly maintenance releases for Koha.

See the release announcements for the details:

New Exhibitions from the Public Library Partnerships Project / DPLA

We are pleased to announce the publication of 10 new exhibitions created by DPLA Hubs and public librarian participants in our Public Library Partnerships Project (PLPP), funded by the Bill and Melinda Gates Foundation. Over the course of the last six months, curators from Digital Commonwealth, Digital Library of Georgia, Minnesota Digital Library, the Montana Memory Project, and Mountain West Digital Library researched and built these exhibitions to showcase content digitized through PLPP. Through this final phase of the project, public librarians had the opportunity to share their new content, learn exhibition curation skills, explore Omeka for future projects, and contribute to an open peer review process for exhibition drafts.

A History of US Public Libraries: Patriotic Labor: America During World War I Best Foot Forward: Quack Cures and Self-Remedies: Patent Medicine Boom and Bust: The Industries That Settled Montana Recreational Tourism in the Mountain West Children in Progressive-Era America Roosevelt's Tree Army: The Civilian Conservation Corps Georgia's Home Front: World War II Urban Parks in the United States

Congratulations to all of our curators and, in particular, our exhibition organizers: Greta Bahnemann, Jennifer Birnel, Hillary Brady, Anna Fahey-Flynn, Greer Martin, Mandy Mastrovita, Anna Neatrour, Carla Urban, Della Yeager, and Franky Abbott.

Thanks to the following reviewers who participated in our open peer review process: Dale Alger, Cody Allen, Greta Bahnemann, Alexandra Beswick, Jennifer Birnel, Hillary Brady, Wanda Brown, Anne Dalton, Carly Delsigne, Liz Dube, Ted Hathaway, Sarah Hawkins, Jenny Herring, Tammi Jalowiec, Stef Johnson, Greer Martin, Sheila McAlister, Lisa Mecklenberg-Jackson, Tina Monaco, Mary Moore, Anna Neatrour, Michele Poor, Amy Rudersdorf, Beth Safford, Angela Stanley, Kathy Turton, and Carla Urban.

For more information about the Public Library Partnerships Project, please contact PLPP project manager, Franky Abbott:

Momentum, we have it! / District Dispatch

The word Momentum displayed as a Newtons Cradle

Source: Real Momentum

As you may have read here, school libraries are well represented in S. 1177, the Every Child Achieves Act.  In fact, we were more successful with this bill than we have been in recent history and this is largely due to your efforts in contacting Congress.

Currently, the House Committee on Education and Workforce (H.R. 5, the Student Success Act) and the Senate Committee on Health, Education, Labor and Pensions are preparing to go to “conference” in an attempt to work out differences between the two versions of the legislation and reach agreement on reauthorization of ESEA. ALA is encouraged that provisions included under S. 1177, would support effective school library programs. In particular, ALA is pleased that effective school library program provisions were adopted unanimously during HELP Committee consideration of an amendment offered by Senator Whitehouse (D-RI)) and on the Senate floor with an amendment offered by Senators Reed (D-RI) and Cochran (R-MS).

ALA is asking (with your help!) that any conference agreement to reauthorize ESEA maintain the following provisions that were overwhelmingly adopted by the HELP Committee and the full Senate under S. 1177, the Every Child Achieves Act:

  1. Title V, Part H – Literacy and Arts Education – Authorizes activities to promote literacy programs that support the development of literacy skills in low-income communities (similar to the Innovative Approaches to Literacy program that has been funded through appropriations) as well as activities to promote arts education for disadvantaged students.
  2. Title I – Improving Basic Programs Operated by State and Local Educational Agencies – Under Title I of ESEA, State Educational Agencies (SEAs) and local educational agencies (LEAs) must develop plans on how they will implement activities funded under the Act.
  3. Title V, Part G – Innovative Technology Expands Children’s Horizons (I-TECH) – Authorizes activities to ensure all students have access to personalized, rigorous learning experiences that are supported through technology and to ensure that educators have the knowledge and skills to use technology to personalize learning.

Now is the time to keep the momentum going! Contact your Senators and Representative to let them know that you support the effective school library provisions found in the Senate bill and they should too!

A complete list of school library provisions found in S.1177 can be found here.

The post Momentum, we have it! appeared first on District Dispatch.

Call for Convenors / Access Conference

Do you want to be part of the magic of AccessYYZ? Well, aren’t you lucky? Turns out we’re  looking for some convenors!

Convening isn’t much work (not that we think you’re a slacker or anything)–all you have to do is introduce the name of the session, read the bio of the speaker(s), and thank any sponsors. Oh, and facilitate any question and answer segments. Which doesn’t actually mean you’re on the hook to come up with questions (that’d be rather unpleasant of us) so much as you’ll repeat questions from the crowd into the microphone. Yup, that’s it. We’ll give you a script and everything!

In return, you’ll get eternal gratitude from the AccessYYZ Organizing Committee. And also a high five! If you’re into that sort of thing. Even if you’re not, you’ll get to enjoy the bright lights and the glory that comes with standing up in front of some of libraryland’s most talented humans for 60 seconds. Sound good? We thought so.

You can dibs a session by filling out the Doodle poll.

Supporting ProseMirror inline HTML editor / Peter Sefton

The world needs a good, sane in-browser editing component, one that edits document structure (headings, lists, quotes etc) rather than format (font, size etc). I’ve been thinking for a while that an editing component based around Markdown (or Commonmark) would be just the thing. Markdown/Commonmark is effectively a spec for the minimal sensible markup set for documents, it’s more than adequate for articles, theses, reports etc. And it can be extended with document semantics.

Anyway, there’s a crowdfunding campaign going on for an editor called ProseMirror that does just that, and promises collaborative editing as well. It’s beta quality but looks promising, I chipped in 50 Euros to try to get it over the line to be released as open source.

The author says:

Who I am

This campaign is being run by Marijn Haverbeke, author of CodeMirror, a widely used in-browser code editor, Eloquent JavaScript, a freely available JavaScript book, and Tern, which is an editor-assistance engine for JavaScript coding that I also crowd-funded here. I have a long history of releasing and maintaining solid open software. My work on CodeMirror (which you might know as the editor in Chrome and Firefox’s dev tools) has given me lots of experience with writing a fast, solid, extendable editor. Many of the techniques that went into ProseMirror have already proven themselves in CodeMirror.

There’s a lot to like with this editor - it has a nice floating toolbar that pops up at the right of the paragraph, with a couple of non-quite-standard behaviours that just might catch on. Mostly works, but has some really obvious bugs usability issues , like when I try to make a nested list it makes commonmark like this:

* List item
* List item
* * List item

And it even renders the two bullets side by side in the HTML view. Even thought that is apparently supported by commonmark, for a prose editor it’s just wrong. Nobody means two bullets unless they’re up to no good, typographically speaking.

The editor should do the thing you almost certainly mean. Something like:

* List item
* List item
  * List item

But, if that stuff gets cleaned up then this will be perfect for producing Scholarly Markdown, and Scholarly HTML. The $84 AUD means I’ll get priority on a reporting a bug, assuming it reaches its funding goal.

Apache Solr for Multi-language Content Discovery Through Entity Driven Search / SearchHub

As we countdown to the annual Lucene/Solr Revolution conference in Austin this October, we’re highlighting talks and sessions from past conferences. Today, we’re highlighting Alessandro Benedetti’s session on using entity driven search for multi-language content discovery and search. This talk is about the description of the implementation of a Semantic Search Engine based on Solr. Meaningfully structuring content is critical, Natural Language Processing and Semantic Enrichment is becoming increasingly important to improve the quality of Solr search results. Our solution is based on three advanced features:
  1. Entity-oriented search – Searching not by keyword, but by entities (concepts in a certain domain)
  2. Knowledge graphs – Leveraging relationships amongst entities: Linked Data datasets (Freebase, DbPedia, Custom …)
  3. Search assistance – Autocomplete and Spellchecking are now common features, but using semantic data makes it possible to offer smarter features, driving the users to build queries in a natural way.
The approach includes unstructured data processing mechanisms integrated with Solr to automatically index semantic and multi-language information. Smart Autocomplete will complete users’ query with entity names and properties from the domain knowledge graph. As the user types, the system will propose a set of named entities and/or a set of entity types across different languages. As the user accepts a suggestion, the system will dynamically adapt following suggestions and return relevant documents. Semantic More Like This will find similar documents to a seed one, based on the underlying knowledge in the documents, instead of tokens. Alessandro Benedetti is a search expert and semantic technology passionate, working in the R&D division of Zaizi. His favorite work is in R&D on information retrieval, NLP and machine learning with a big emphasis on data structures, algorithms and probability theory. Alessandro earned his Masters in Computer Science with full grade in 2009, then spent 6 month with Universita’ degli Studi di Roma working on his masters thesis around a new approach to improve semantic web search. Alessandro spent 3 years with Sourcesense as a Search and Open Source consultant and developer.
Multi-language Content Discovery Through Entity Driven Search: Presented by Alessandro Benedetti, Zaizi from Lucidworks
lucenerevolution-avatarJoin us at Lucene/Solr Revolution 2015, the biggest open source conference dedicated to Apache Lucene/Solr on October 13-16, 2015 in Austin, Texas. Come meet and network with the thought leaders building and deploying Lucene/Solr open source search technology. Full details and registration…

The post Apache Solr for Multi-language Content Discovery Through Entity Driven Search appeared first on Lucidworks.

Islandora 150 / Islandora

Last fall I issued a challenge to the community to help us get 100 dots on our Installations Map by the end of 2014. It was a tight goal, but we got there, and our map got a little more cluttered:

This year I'm asking the community to stretch a little further. We know there are far more sites out there than show on our list. I want to find at least 48 of them and put them on the map by the end of 2015, taking us to 150. For helping us get there, I'm offering up a chance to win one of five Islandora Tuque Tuques or one of three ridiculously adorable Islandoracon lobster finger puppets.

How to help (and enter the draw):

  • Tell me about your Islandora repository and send a link
  • Tell me about your Islandora repository in development
  • Give me a new link to your repository if you are already on the list but not linked or at an outdated link.
  • Nominate a public-facing Islandora repository that is not already on the list.
  • Give some new information about your link, so we can tag it as:
    • Insitutional Repository
    • Research Data
    • Digital Humanities
    • Consortium/Multisite

Email with your new or updated Installation Map dots. If we hit 150 by January 1st, 2016, I will draw for the hats and lobsters and send out some New Year's prizes to the lucky winners. Thanks for your help!

Packaging Video DVDs for the Repository / Mark E. Phillips

For a while I’ve had two large boxes of DVDs that a partner institution dropped off with the hopes of having them added to The Portal to Texas History.  These DVDs were from oral histories conducted by the local historical commission from 1998-2002 and were converted from VHS to DVD sometime in the late 2000s.  They were interested in adding these to the Portal so that they could be viewed by a wider audience and also be preserved in the UNT Libraries’ digital repository.

So these DVDs sat on my desk for a while because I couldn’t figure out what I wanted to do with them.  I wanted to figure out a workflow that I could use from all Video DVD based projects in the future and it hurt my head whenever I started to work on the project.  So they sat.

When the partner politely emailed about the disks and asked about the delay in getting them loaded I figured it was finally time to get a workflow figured out so that I could get the originals back to the partner.  I’m sharing the workflow that I came up with here because I didn’t see much prior information on this sort of thing when I was researching the process.


I had two primary goals of the conversion workflow, first I wanted to retain an exact copy of the disk that we were working with.  All of these videos were VHS to DVD conversions most likely completed with a stand alone recorder.  They had very simple title screens and lacked other features but I figured for other kinds of Video DVD work in the future that they might have more features that I didn’t want to lose by just extracting the video.  The second goal was to pull off the video from the DVD without introducing additional compression during the process. When these files get ingested into the repository and the final access system they will be converted into an mp4 container using the h.264 codex so they will get another round of  compression later.

With these two goals in mind here is what I ended up with.

For the conversion I used my MacBook Pro and SuperDrive.  I first created an iso image of the disc using the hdiutil command.

hdiutil makehybrid -iso -joliet -o image.iso /Volumes/DVD_VR/

Once this image as created I mounted the image by double clicking on the image.iso file in the Finder.

I then loaded makeMKV and created an MKV file from the video and audio on the disk that I was interested in.  This resulting mkv file would contain the primary video content that users will interact with in the future.  I saved this file as title00.mkv

makeMKV screenshot

makeMKV screenshot

Once this step was completed I used ffmpeg to convert the mkv container to an mpeg container to add to the repository.   I could of kept the container as an mkv but decided to move it over to mpeg because we already have a number of those files in the repository and no mkv files to date.  The ffmpeg command is as follows.

ffmpeg -i title00.mkv -vcodec copy -acodec copy -f vob -copyts -y video.mpg

Because the the makeMKV and ffmpeg commands are just muxing the video and audio and not compressing, they tend to process very quickly in just a few seconds.  The most time consuming part of the process is getting the iso created in the first step.

With all of these files now created I packaged them up for loading into the repository.  Here is what a pre-submission package looks like for a Video DVD using this workflow.

├── 01_mpg/
│   └── DI028_dodo_parker_1998-07-15.mpg
├── 02_iso/
│   └── DI028_dodo_parker_1998-07-15.iso
└── metadata.xml

You can see that we place the mpg and iso files in separate folders, 01_mpg for the mpg and 02_iso for the iso file.  When we create the SIP for these files we will notate that the 02_iso format should not be pushed to the dissemination package (what we locally call an Access Content Package or ACP) so the iso file and folder will just live with the archival package.

This seemed to work for me to get these Video DVDs converted over and placed in the repository.  The workflow satisfied my two goals of retaining a full copy of the original disk as an iso and also getting a copy of the video from the disk in a format that didn’t introduce an extra compression step.  I think that there is probably a way of getting from the iso straight to the mpg version, probably with the handy ffmpeg (or possibly mplayer?) but I haven’t take the time to look into that.

There is a downside to this way of handling Video DVDs, which is that it will most likely take up twice the amount of storage as the original disk, so for a 4 GB Video DVD, we will be storing 8 GB of data in the repository,  this would probably add up for a very large project, but that’s a worry for another day.  (and a worry that honestly gets smaller year after year)

I hope that this explanation of how I processed Video  DVDs for inclusion into our repository was useful to someone else.

Let me know what you think via Twitter if you have questions or comments.

Mining Events for Recommendations / SearchHub

Summary: TheEventMiner” feature in Lucidworks Fusion can be used to mine event logs to power recommendations. We describe how the system uses graph navigation to generate diverse and high-quality recommendations.

User Events

The log files that most web services generate are a rich source of data for learning about user behavior and modifying system behavior based on this. For example, most search engines will automatically log details on user queries and the resulting clicked documents (URLs). We can define a (user, query, click, time) record which records a unique “event” that occurred at a specific time in the system. Other examples of event data include e-commerce transactions (e.g. “add to cart”, “purchase”), call data records, financial transactions etc. By analyzing a large volume of these events we can “surface” implicit structures in the data (e.g. relationships between users, queries and documents), and use this information to make recommendations, improve search result quality and power analytics for business owners. In this article we describe the steps we take to support this functionality.

1. Grouping Events into Sessions

Event logs can be considered as a form of “time series” data, where the logged events are in temporal order. We can then make use of the observation that events close together in time will be more closely related than events further apart. To do this we need to group the event data into sessions.
A session is a time window for all events generated by a given source (like a unique user ID). If two or more queries (e.g. “climate change” and “sea level rise”) frequently occur together in a search session then we may decide that those two queries are related. The same would apply for documents that are frequently clicked on together. A “session reconstruction” operation identifies users’ sessions by processing raw event logs and grouping them based on user IDs, using the time-intervals between each and every event. If two events triggered by the same user occur too far apart in time, they will be treated as coming from two different sessions. For this to be possible we need some kind of unique ID in the raw event data that allows us to tell that two or more events are related because they were initiated by the same user within a given time period. However, from a privacy point of view, we do not need an ID which identifies an actual real person with all their associated personal information. All we need is an (opaque) unique ID which allows us to track an “actor” in the system.

2. Generating a Co-Occurrence Matrix from the Session Data

We are interested in entities that frequently co-occur, as we might then infer some kind of interdependence between those entities. For example, a click event can be described using a click(user, query, document) tuple, and we associate each of those entities with each other and with other similar events within a session. A key point here is that we generate the co-occurrence relations not just between the same field types e.g. (query, query) pairs, but also “cross-field” relations e.g. (query, document), (document, user) pairs etc. This will give us an N x N co-occurrence matrix, where N = all unique instances of the field types that we want to calculate co-occurrence relations for. Figure 1 below shows a co-occurrence matrix that encodes how many times different characters co-occur (appear together in the text) in the novel “Les Miserables”. Each colored cell represents two characters that appeared in the same chapter; darker cells indicate characters that co-occurred more frequently. The diagonal line going from the top left to the bottom right shows that each character co-occurs with itself. You can also see that the character named “Valjean”, the protagonist of the novel, appears with nearly every other character in the book.


Figure 1. “Les Miserables” Co-occurrence Matrix by Mike Bostock.

In Fusion we generate a similar type of matrix, where each of the items is one of the types specified when configuring the system. The value in each cell will then be the frequency of co-occurrence for any two given items e.g. a (query, document) pair, a (query, query) pair, a (user, query) pair etc.

For example, if the query “Les Mis” and a click on the web page for the musical appear together in the same user session then they will be treated as having co-occurred. The frequency of co-occurrence is then the number of times this has happened in the raw event logs being processed.

3. Generating a Graph from the Matrix

The co-occurrence matrix from the previous step can also be treated as an “adjacency matrix”, which encodes whether two vertices (nodes) in a graph are “adjacent” to each other i.e. have a link or “co-occur”. This matrix can then be used to generate a graph, as shown in Figure 2:


Figure 2. Generating a Graph from a Matrix.

Here the values in the matrix are the frequency of co-occurrence for those two vertices. We can see that in the graph representation these are stored as “weights” on the edge (link) between the nodes e.g. nodes V2 and V3 co-occurred 5 times together.

We encode the graph structure in a collection in Solr using a simple JSON record for each node. Each record contains fields that list the IDs of other nodes that point “in” at this record, or which this node points “out” to.

Fusion provides an abstraction layer which hides the details of constructing queries to Solr to navigate the graph. Because we know the IDs of the records we are interested in we can generate a single boolean query where the individual IDs we are looking for are separated by OR operators e.g. (id:3677 OR id:9762 OR id:1459). This means we only make a single request to Solr to get the details we need.

In addition, the fact that we are only interested in the neighborhood graph around a start point means the system does not have to store the entire graph (which is potentially very large) in memory.

4. Powering Recommendations from the Graph

At query/recommendation time we can use the graph to make suggestions on which other items in that graph are most related to the input item, using the following approach:

  1. Navigate the co-occurrence graph out from the seed item to harvest additional entities (documents, users, queries).
  2. Merge the list of entities harvested from different nodes in the graph so that the more lists an entity appears in the more weight it receives and the higher it rises in the final output list.
  3. Weights are based on the reciprocal rank of the overall rank of the entity. The overall rank is calculated as the sum of the rank of the result the entity came from and the rank of the entity within its own list.

The following image shows the graph surrounding the document “Midnight Club: Los Angeles” from a sample data set:


Figure 3. An Example Neighborhood Graph.

Here the relative size of the nodes shows how frequently they occurred in the raw event data, and the size of the arrows is a visual indicator of the weight or frequency of co-occurrence between two elements.

For example, we can see that the query “midnight club” (blue node on bottom RHS) most frequently resulted in a click on the “Midnight Club: Los Angeles Complete Edition Platinum Hits” product (as opposed to the original version above it). This is the type of information that would be useful to a business analyst trying to understand user behavior on a site.

Diversity in Recommendations

For a given item, we may only have a small number of items that co-occur with it (based on the co-occurrence matrix). By adding in the data from navigating the graph (which comes from the matrix), we increase the diversity of suggestions. Items that appear in multiple source lists then rise to the top. We believe this helps improve the quality of the recommendations & reduce bias. For example, in Figure 4 we show some sample recommendations for the query “Call of Duty”, where the recommendations are coming from a “popularity-based” recommender i.e. it gives a large weight to items with the most clicks. We can see that the suggestions are all from the “Call of Duty” video game franchise:


Figure 4. Recommendations from a “popularity-based” recommender system.

In contrast, in Figure 5 we show the recommendations from EventMiner for the same query:


Figure 5. Recommendations from navigating the graph.

Here we can see that the suggestions are now more diverse, with the first two being games from the same genre (“First Person Shooter” games) as the original query.

In the case of an e-commerce site, diversity in recommendations can be an important factor in suggesting items to a user that are related to their original query, but which they may not be aware of. This in turn can help increase the overall CTR (Click-Through Rate) and conversion rate on the site, which would have a direct positive impact on revenue and customer retention.

Evaluating Recommendation Quality

To evaluate the quality of the recommendations produced by this approach we used CrowdFlower to get user judgements on the relevance of the suggestions produced by EventMiner. Figure 6 shows an example of how a sample recommendation was presented to a human judge:


Figure 6. Example relevance judgment screen (CrowdFlower).

Here the original user query (“resident evil”) is shown, along with an example recommendation (another video game called “Dead Island”). We can see that the judge is asked to select one of four options, which is used to give the item a numeric relevance score:

  1. Off Topic
  2. Acceptable
  3. Good
  4. Excellent
  In this example the user might judge the relevance for this suggestion as “good”, as the game being recommended is in the same genre (“survival horror”) as the original query. Note that the product title contains no terms in common with the query i.e. the recommendations are based purely on the graph navigation and do not rely on an overlap between the query and the document being suggested. In Table 1 we summarize the results of this evaluation:
Items Judgements Users Avg. Relevance (1 – 4)
1000 2319 30 3.27

Here we can see that the average relevance score across all judgements was 3.27 i.e. “good” to “excellent”.


If you want an “out-of-the-box” recommender system that generates high-quality recommendations from your data please consider downloading and trying out Lucidworks Fusion.

The post Mining Events for Recommendations appeared first on Lucidworks.

Michigan becomes the latest Hydra Partner / Hydra Project

We are delighted to announce that the University of Michigan has become the latest formal Hydra Partner.  Maurice York, their Associate University Librarian for Library Information Technology, writes:

“The strength, vibrancy and richness of the Hydra community is compelling to us.  We are motivated by partnership and collaboration with this community, more than simply use of the technology and tools. The interest in and commitment to the community is organization-wide; last fall we sent over twenty participants to Hydra Connect from across five technology and service divisions; our showing this year will be equally strong, our enthusiasm tempered only by the registration limits.”

Welcome Michigan!  We look forward to a long collaboration with you.

Update on the Library Privacy Pledge / Eric Hellman

The Library Privacy Pledge of 2015, which I wrote about previously, has been finalized. We got a lot of good feedback, and the big changes have focused on the schedule.

Now, any library , organization or company that signs the pledge will have 6 months to implement HTTPS from the effective date of their signature. This should give everyone plenty of margin to do a good job on the implementation.

We pushed back our launch date to the first week of November. That's when we'll announce the list of "charter signatories". If you want your library, company or organization to be included in the charter signatory list, please send an e-mail to

The Let's Encrypt project will be launching soon. They are just one certificate authority that can help with HTTPS implementation.

I think this is an very important step for the library information community to take, together. Let's make it happen.

Here's the finalized pledge:

The Library Freedom Project is inviting the library community - libraries, vendors that serve libraries, and membership organizations - to sign the "Library Digital Privacy Pledge of 2015". For this first pledge, we're focusing on the use of HTTPS to deliver library services and the information resources offered by libraries. It’s just a first step: HTTPS is a privacy prerequisite, not a privacy solution. Building a culture of library digital privacy will not end with this 2015 pledge, but committing to this first modest step together will begin a process that won't turn back.  We aim to gather momentum and raise awareness with this pledge; and will develop similar pledges in the future as appropriate to advance digital privacy practices for library patrons.

We focus on HTTPS as a first step because of its timeliness. The Let's Encrypt initiative of the Electronic Frontier Foundation will soon launch a new certificate infrastructure that will remove much of the cost and technical difficulty involved in the implementation of HTTPS, with general availability scheduled for September. Due to a heightened concern about digital surveillance, many prominent internet companies, such as Google, Twitter, and Facebook, have moved their services exclusively to HTTPS rather than relying on unencrypted HTTP connections. The White House has issued a directive that all government websites must move their services to HTTPS by the end of 2016. We believe that libraries must also make this change, lest they be viewed as technology and privacy laggards, and dishonor their proud history of protecting reader privacy.

The 3rd article of the American Library Association Code of Ethics sets a broad objective:

We protect each library user's right to privacy and confidentiality with respect to information sought or received and resources consulted, borrowed, acquired or transmitted.
It's not always clear how to interpret this broad mandate, especially when everything is done on the internet. However, one principle of implementation should be clear and uncontroversial:
Library services and resources should be delivered, whenever practical, over channels that are immune to eavesdropping.

The current best practice dictated by this principle is as following:
Libraries and vendors that serve libraries and library patrons, should require HTTPS for all services and resources delivered via the web.

The Pledge for Libraries:

1. We will make every effort to ensure that web services and information resources under direct control of our library will use HTTPS within six months. [ dated______ ]

2. Starting in 2016, our library will assure that any new or renewed contracts for web services or information resources will require support for HTTPS by the end of 2016.

The Pledge for Service Providers (Publishers and Vendors):

1. We will make every effort to ensure that all web services that we (the signatories) offer to libraries will enable HTTPS within six months. [ dated______ ]

2. All web services that we (the signatories) offer to libraries will default to HTTPS by the end of 2016.

The Pledge for Membership Organizations:

1. We will make every effort to ensure that all web services that our organization directly control will use HTTPS within six months. [ dated______ ]

2. We encourage our members to support and sign the appropriate version of the pledge.

There's a FAQ available, too. All this will soon be posted on the Library Freedom Project website.

Link roundup August 30, 2015 / Harvard Library Innovation Lab

This is the good stuff.

Rethinking Work

Putting Elon Musk and Steve Jobs on a Pedestal Misrepresents How Innovation Happens

Lamp Shows | HAIKU SALUT

Lawn Order | 99% Invisible

Cineca DSpace Service Provider Update / DuraSpace News

From Andrea Bollini, Cineca

It has been a hot and productive summer here in Cineca,  we have carried out several DSpace activities together with the go live of the National ORCID Hub to support the adoption of ORCID in Italy [1][2].

iSchool / Ed Summers

As you can see, I’ve recently changed things around here at Yeah, it’s looking quite spartan at the moment, although I’m hoping that will change in the coming year. I really wanted to optimize this space for writing in my favorite editor, and making it easy to publish and preserve the content. Wordpress has served me well over the last 10 years and up till now I’ve resisted the urge to switch over to a static site. But yesterday I converted the 394 posts, archived the Wordpress site and database, and am now using Jekyll. I haven’t been using Ruby as much in the past few years, but the tooling around Jekyll feels very solid, especially given GitHub’s investment in it.

Honestly, there was something that pushed me over the edge to do the switch. Next week I’m starting in the University of Maryland iSchool, where I will be pursuing a doctoral degree. I’m specifically hoping to examine some of the ideas I dredged up while preparing for my talk at NDF in New Zealand a couple years ago. I was given almost a year to think about what I wanted to talk about – so it was a great opportunity for me to reflect on my professional career so far, and examine where I wanted to go.

After I got back I happened across a paper by Steven Jackson called Rethinking Repair, which introduced me to what felt like a very new and exciting approach to information technology design and innovation that he calls Broken World Thinking. In hindsight I can see that both of these things conspired to make returning to school at 46 years of age look like a logical thing to do. If all goes as planned I’m going to be doing this part-time while also working at the Maryland Istitute for Technology in the Humanities, so it’s going to take a while. But I’m in a good spot, and am not in any rush … so it’s all good as far as I’m concerned.

I’m planning to use this space for notes about what I’m reading, papers, reflections etc. I thought about putting my citations, notes into Evernote, Zotero, Mendeley etc, and I may still do that. But I’m going to try to keep it relatively simple and use this space as best I can to start. My blog has always had a navel gazy kind of feel to it, so I doubt it’s going to matter much.

To get things started I thought I’d share the personal statement I wrote for admission to the iSchool. I’m already feeling more focus than when I wrote it almost a year ago, so it will be interesting to return to it periodically. The thing that has become clearer to me in the intervening year is that I’m increasingly interested in examining the role that broken world thinking has played in both the design and evolution of the Web.

So here’s the personal statement. Hoepfully it’s not too personal :-)

For close to twenty years I have been working as a software developer in the field of libraries and archives. As I was completing my Masters degree in the mid-1990s, the Web was going through a period of rapid growth and evolution. The computer labs at Rutgers University provided me with what felt like a front row seat to the development of this new medium of the World Wide Web. My classes on hypermedia and information seeking behavior gave me a critical foundation for engaging with the emerging Web. When I graduated I was well positioned to build a career around the development of software applications for making library and archival material available on the Web. Now, after working in the field, I would like to pursue a PhD in the UMD iSchool to better understand the role that the Web plays as an information platform in our society, with a particular focus on how archival theory and practice can inform it. I am specifically interested in archives of born digital Web content, but also in what it means to create a website that gets called an archive. As the use of the Web continues to accelerate and proliferate it is more and more important to have a better understanding of its archival properties.

My interest in how computing (specifically the World Wide Web) can be informed by archival theory developed while working in the Repository Development Center under Babak Hamidzadeh at the Library of Congress. During my eight years at LC I designed and built both internally focused digital curation tools as well as access systems intended for researchers and the public. For example, I designed a Web based quality assurance tool that was used by curators to approve millions of images that were delivered as part of our various digital conversion projects. I also designed the National Digital Newspaper Program’s delivery application, Chronicling America, that provides thousands of researchers access to over 8 million pages of historic American newspapers every day. In addition, I implemented the data management application that transfers and inventories 500 million tweets a day to the Library of Congress. I prototyped the Library of Congress Linked Data Service which makes millions of authority records available using Linked Data technologies.

These projects gave me hands on, practical experience using the Web to manage and deliver Library of Congress data assets. Since I like to use agile methodologies to develop software, this work necessarily brought me into direct contact with the people who needed the tools built, namely archivists. It was through these interactions over the years that I began to recognize that my Masters work at Rutgers University was in fact quite biased towards libraries, and lacked depth when it came to the theory and praxis of archives. I remedied this by spending about two years of personal study focused on reading about archival theory and practice with a focus on appraisal, provenance, ethics, preservation and access. I also began participating member of the Society of American Archivists.

During this period of study I became particularly interested in the More Product Less Process (MPLP) approach to archival work. I found that MPLP had a positive impact on the design of archival processing software since it oriented the work around making content available, rather than on often time consuming preservation activities. The importance of access to digital material is particularly evident since copies are easy to make, but rendering can often prove challenging. In this regard I observed that requirements for digital preservation metadata and file formats can paradoxically hamper preservation efforts. I found that making content available sooner rather than later can serve as an excellent test of whether digital preservation processing has been sufficient. While working with Trevor Owens on the processing of the Carl Sagan collection we developed an experimental system for processing born digital content using lightweight preservation standards such as BagIt in combination with automated topic model driven description tools that could be used by archivists. This work also leveraged the Web and the browser for access by automatically converting formats such as WordPerfect to HTML, so they could be viewable and indexable, while keeping the original file for preservation.

Another strand of archival theory that captured my interest was the work of Terry Cook, Verne Harris, Frank Upward and Sue McKemmish on post-custodial thinking and the archival enterprise. It was specifically my work with the Web archiving team at the Library of Congress that highlighted how important it is for record management practices to be pushed outwards onto the Web. I gained experience in seeing what makes a particular web page or website easier to harvest, and how impractical it is to collect the entire Web. I gained an appreciation for how innovation in the area of Web archiving was driven by real problems such as dynamic content and social media. For example I worked with the Internet Archive to archive Web content related to the killing of Michael Brown in Ferguson, Missouri by creating an archive of 13 million tweets, which I used as an appraisal tool, to help the Internet Archive identify Web content that needed archiving. In general I also saw how traditional, monolithic approaches to system building needed to be replaced with distributed processing architectures and the application of cloud computing technologies to easily and efficiently build up and tear down such systems on demand.

Around this time I also began to see parallels between the work of Matthew Kirschenbaum on the forensic and formal materiality of disk based media and my interests in the Web as a medium. Archivists usually think of the Web content as volatile and unstable, where turning off a web server can result in links breaking, and content disappearing forever. However it is also the case that Web content is easily copied, and the Internet itself was designed to route around damage. I began to notice how technologies such as distributed revision control systems, Web caches, and peer-to-peer distribution technologies like BitTorrent can make Web content extremely resilient. It was this emerging interest in the materiality of the Web that drew me to a position in the Maryland Institute for Technology in the Humanities where Kirschenbaum is the Assistant Director.

There are several iSchool faculty that I would potentially like to work with in developing my research. I am interested in the ethical dimensions to Web archiving and how technical architectures embody social values, which is one of Katie Shilton’s areas of research. Brian Butler’s work studying online community development and open data is also highly relevant to the study of collaborative and cooperative models for Web archiving. Ricky Punzalan’s work on virtual reunification in Web archives is also of interest because of its parallels with post-custodial archival theory, and the role of access in preservation. And Richard Marciano’s work on digital curation, in particular his recent work with the NSF on Brown Dog, would be an opportunity for me to further my experience building tools for digital preservation.

If admitted to the program I would focus my research on how Web archives are constructed and made accessible. This would include a historical analysis of the development of Web archiving technologies and organizations. I plan to look specifically at the evolution and deployment of Web standards and their relationship to notions of impermanence, and change over time. I will systematically examine current technical architectures for harvesting and providing access to Web archives. Based on user behavior studies I would also like to reimagine what some of the tools for building and providing access to Web archives might look like. I expect that I would spend a portion of my time prototyping and using my skills as a software developer to build, test and evaluate these ideas. Of course, I would expect to adapt much of this plan based on the things I learn during my course of study in the iSchool, and the opportunities presented by working with faculty.

Upon completion of the PhD program I plan to continue working on digital humanities and preservation projects at MITH. I think the PhD program could also qualify me to help build the iSchool’s new Digital Curation Lab at UMD, or similar centers at other institutions. My hope is that my academic work will not only theoretically ground my work at MITH, but will also be a source of fruitful collaboration with the iSchool, the Library and larger community at the University of Maryland. I look forward to helping educate a new generation of archivists in the theory and practice of Web archiving.

Learn About Islandora at the Amigos Online Conference / Cherry Hill Company

On September 17, 2015, I'll be giving the presentation "Bring you Local, Unique Content to the Web Using Islandora" at the Amigos Open Source Software and Tools for the Library and Archive online conference. Amigos is bringing together practitioners from around the library field who have used open source in projects at their library. My talk will be about the Islandora digital asset management system, the fundamental building block of the Cherry Hill LibraryDAMS service.

Every library has content that is unique to itself and its community. Islandora is open source software that enables libraries to store, present, and preserve that unique content to their communities and to the world. Built atop the popular Drupal content management system and the Fedora digital object repository, Islandora powers many digital projects on the...

Read more »

How Shutterstock Searches 35 Million Images by Color Using Apache Solr / SearchHub

As we countdown to the annual Lucene/Solr Revolution conference in Austin this October, we’re highlighting talks and sessions from past conferences. Today, we’re highlighting Shutterstock engineer Chris Becker’s session on how they use Apache Solr to search 35 million images by color. This talk covers some of the methods they’ve used for building color search applications at Shutterstock using Solr to search 40 million images. A couple of these applications can be found in Shutterstock Labs – notably Spectrum and Palette. We’ll go over the steps for extracting color data from images and indexing them into Solr, as well as looking at some ways to query color data in your Solr index. We’ll cover some issues such as what does relevance mean when you’re searching for colors rather than text, and how you can achieve various effects by ranking on different visual attributes. At the timeof this presetnation, Chris was the Principal Engineer of Search at Shutterstock– a stock photography marketplace selling over 35 million images– where he’s worked on image search since 2008. In that time he’s worked on all the pieces of Shutterstock’s search technology ecosystem from the core platform, to relevance algorithms, search analytics, image processing, similarity search, internationalization, and user experience. He started using Solr in 2011 and has used it for building various image search and analytics applications.
Searching Images by Color: Presented by Chris Becker, Shutterstock from Lucidworks
lucenerevolution-avatarJoin us at Lucene/Solr Revolution 2015, the biggest open source conference dedicated to Apache Lucene/Solr on October 13-16, 2015 in Austin, Texas. Come meet and network with the thought leaders building and deploying Lucene/Solr open source search technology. Full details and registration…

The post How Shutterstock Searches 35 Million Images by Color Using Apache Solr appeared first on Lucidworks.

DPLA Welcomes Four New Service Hubs to Our Growing Network / DPLA

The Digital Public Library of America is pleased to announce the addition of four Service Hubs that will be joining our Hub network. The Hubs represent Illinois, Michigan, Pennsylvania and Wisconsin.  The addition of these Hubs continues our efforts to help build local community and capacity, and further efforts to build an on-ramp to DPLA participation for every cultural heritage institution in the United States and its territories.

These Hubs were selected from the second round of our application process for new DPLA Hubs.  Each Hub has a strong commitment to bring together the cultural heritage content in their state to be a part of DPLA, and to build community and data quality among the participants.

In Illinois, the Service Hub responsibilities will be shared by the Illinois State Library, the Chicago Public Library, the Consortium of Academic and Research Libraries of Illinois (CARLI), and the University of Illinois at Urbana Champaign. More information about the Illinois planning process can be found here. Illinois plans to make available collections documenting coal mining in the state, World War II photographs taken by an Illinois veteran and photographer, and collections documenting rural healthcare in the state.

In Michigan, the Service Hub responsibilities will be shared by the University of Michigan, Michigan State University, Wayne State University, Western Michigan University, the Midwest Collaborative for Library Services and the Library of Michigan.  Collections to be shared with the DPLA cover topics including the history of the Motor City, historically significant American cookbooks, and Civil War diaries from the Midwest.

In Pennsylvania, the Service Hub will be led by Temple University, Penn State University, University of Pennsylvania and Free Library of Philadelphia in partnership with the Philadelphia Consortium of Special Collections Libraries (PACSCL) and the Pennsylvania Academic Library Consortium (PALCI), among other key institutions throughout the state.  More information about the Service Hub planning process in Pennsylvania can be found here.  Collections to be shared with DPLA cover topics including the Civil Rights Movement in Pennsylvania, Early American History, and the Pittsburgh Iron and Steel Industry.

The final Service Hub, representing Wisconsin will be led by Wisconsin Library Services (WiLS) in partnership with the University of Wisconsin-Madison, Milwaukee Public Library, University of Wisconsin-Milwaukee, Wisconsin Department of Public Instruction and Wisconsin Historical Society.  The Wisconsin Service Hub will build off of the Recollection Wisconsin statewide initiative.  Materials to be made available document the American Civil Rights Movement’s Freedom Summer and the diversity of Wisconsin, including collections documenting the lives of Native Americans in the state.

“We are excited to welcome these four new Service Hubs to the DPLA Network,” said Emily Gore, DPLA Director for Content. “These four states have each led robust, collaborative planning efforts and will undoubtedly be strong contributors to the DPLA Hubs Network.  We look forward to making their materials available in the coming months.”

The March on Washington: Hear the Call / DPLA

Fifty-two years ago this week, more than 200,000 Americans came together in the nation’s capitol to rally in support of the ongoing Civil Rights movement. It was at that march that Martin Luther King Jr.’s iconic “I Have A Dream” speech was delivered. And it was at that march that the course of American history was forever changed, in an event that resonates with protests, marches, and movements for change around the country decades later.

Get a new perspective on the historic March on Washington with this incredible collection from WGBH via Digital Commonwealth. This collection of audio pieces, 15 hours in total, offers uninterrupted coverage of the March on Washington, recorded by WGBH and the Educational Radio Network (a small radio distribution network that later became part of National Public Radio). This type of coverage was unprecedented in 1963, and offers a wholly unique view on one of the nation’s most crucial historic moments.

In this audio series, you can hear Martin Luther King Jr.’s historic speech, along with the words of many other prominent civil rights leaders–John Lewis, Bayard Rustin, Jackie Robinson, Roy Wilkins,  Rosa Parks, and Fred Shuttlesworth. There are interviews with Hollywood elite like Marlon Brando and Arthur Miller, alongside the complex views of the “everyman” Washington resident. There’s also the folk music of the movement, recorded live here, of Joan Baez, Bob Dylan, and Peter, Paul, and Mary. There are the stories of some of the thousands of Americans who came to Washington D.C. that August–teachers, social workers, activists, and even a man who roller-skated to the march all the way from Chicago.

Hear speeches made about the global nonviolence movement, the labor movement, and powerful words from Holocaust survivor Joachim Prinz. Another notable moment in the collection is an announcement of the death of W.E.B DuBois, one of the founders of the NAACP and an early voice for civil rights issues.

These historic speeches are just part of the coverage, however. There are fascinating, if more mundane, announcements, too, about the amount of traffic in Washington and issues with both marchers’ and commuters’ travel (though they reported that “north of K Street appears just as it would on a Sunday in Washington”). Another big, though less notable, issue of the day, according to WGBH reports, was food poisoning from the chicken in boxed lunches served to participants at the march. There is also information about the preparation for the press, which a member of the march’s press committee says included more than 300 “out-of-town correspondents.” This was in addition to the core Washington reporters, radio stations, like WGBH, TV networks, and international stations from Canada, Japan, France, Germany and the United Kingdom. These types of minute details and logistics offer a new window into a complex historic event, bringing together thousands of Americans at the nation’s capitol (though, as WGBH reported, not without its transportation hurdles!).

At the end of the demonstration, you can hear for yourself a powerful pledge, recited from the crowd, to further the mission of the march. It ends poignantly: “I pledge my heart and my mind and my body unequivocally and without regard to personal sacrifice, to the achievement of social peace through social justice.”

Hear the pledge, alongside the rest of the march as it was broadcast live, in this inspiring and insightful collection, courtesy of WGBH via Digital Commonwealth.

Banner image courtesy of the National Archives and Records Administration.

A view of the March on Washington, showing the Reflecting Pool and the Washington Monument. Courtesy of the National Archives and Records Administration.

A view of the March on Washington, showing the Reflecting Pool and the Washington Monument. Courtesy of the National Archives and Records Administration.

Am I a “librarian”? / Jonathan Rochkind

I got an MLIS degree, received a bit over 9 years ago, because I wanted to be a librarian, although I wasn’t sure what kind. I love libraries for their 100+ year tradition of investigation and application of information organization and retrieval (a fascinating domain, increasingly central to our social organization); I love libraries for being one of the few information organizations in our increasingly information-centric society that (often) aren’t trying to make a profit off our users so can align organizational interests with user interests and act with no motive but our user’s benefit; and I love libraries for their mountains of books too (I love books).

Originally I didn’t plan on continuing as a software engineer, I wanted to be ‘a librarian’.  But through becoming familiar with the library environment, including but not limited to job prospects, I eventually realized that IT systems are integral to nearly every task staff and users perform at or with a librarian — and I could have a job using less-than-great tech knowing that I could make it better but having no opportunity to do so — or I could have job making it better.  The rest is history.

I still consider myself a librarian. I think what I do — design, build, and maintain internal and purchased systems by which our patrons interact with the library and our services over the web —  is part of being a librarian in the 21st century.

I’m not sure if all my colleagues consider me a ‘real librarian’ (and my position does not require an MLIS degree).  I’m also never sure, when strangers or aquaintances ask me what I do for work, whether to say ‘librarian’, since they assume a librarian does something different then what I spend my time doing.

But David Lee King in a blog post What’s the Most Visited Part of your Library? (thanks Bill Dueber for the pointer), reminds us, I think from a public library perspective:

Do you adequately staff the busiest parts of your library? For example, if you have a busy reference desk, you probably make sure there are staff to meet demand….

Here’s what I mean. Take a peek at some annual stats from my library:

  • Door count: 797,478 people
  • Meeting room use: 137,882 people
  • Library program attendance: 76,043 attendees
  • Art Gallery visitors: 25,231 visitors
  • Reference questions: 271,315 questions asked

How about website visits? We had 1,113,146 total visits to the website in 2014. The only larger number is is our circulation count (2,300,865 items)….

…So I’ll ask my question again: Do you adequately staff the busiest parts of your library?

I don’t have numbers in front of me from our academic library, but I’m confident that our ‘website’ — by which I mean to include our catalog, ILL system, link resolver, etc, all of the places users get library services over the web, the things me and my colleagues work on — is one of the most, if not the most, used ‘service points’ at our library.

I’m confident that the online services I work on reach more patrons, and are cumulatively used for more patron-hours, than our reference or circulation desks.

I’m confident the same as true at your library, and almost every library.

What would it mean for an organization to take account of this?  “adequate staffing”, as King says, absolutely. Where are staff positions allocated?  But also in general, how are non-staff resources allocated?  How is respect allocated? Who is considered a ‘real librarian’? (And I don’t really think it’s about MLIS degree either, even though I led with that). Are IT professionals (and their departments and managers) considered technicians to maintain ‘infrastructure’ as precisely specified by ‘real librarians’, or are they considered important professional partners collaborating in serving our users?  Who is consulted for important decisions? Is online service downtime taken as seriously (or more) than an unexpected closure to the physical building, and are resources allocated correspondingly? Is User Experience  (UX) research done in an actual serious way into how your online services are meeting user needs — are resources (including but not limited to staff positions) provided for such?

What would it look like for a library to take seriously that it’s online services are, by far, the most used service point in a library?  Does your library look like that?

In the 21st century, libraries are Information Technology organizations. Do those running them realize that? Are they run as if they were? What would it look like for them to be?

It would be nice to start with just some respect.

Although I realize that in many of our libraries respect may not be correlated with MLIS-holders or who’s considered a “real librarian” either.  There may be some perception that ‘real librarians’ are outdated. It’s time to update our notion of what librarians are in the 21st century, and to start running our libraries recognizing how central our IT systems, and the development of such in professional ways, are to our ability to serve users as they deserve.

Filed under: General

Indexing Arabic Content in Apache Solr / SearchHub

As we countdown to the annual Lucene/Solr Revolution conference in Austin this October, we’re highlighting talks and sessions from past conferences. Today, we’re highlighting Ramzi Alqrainy‘s session on using Solr to index and search documents and files in Arabic. Arabic language poses several challenges faced by the Natural Language Processing (NLP), largely due to the fact that Arabic language, unlike European languages, has a very rich and sophisticated morphological system. This talk will cover some of the challenges and how to solve them with Solr and will also present the challenges that were handled by Opensooq as a real case in the Middle East. Ramzi Alqrainy is one of the most recognized experts within Artificial Intelligence and Information Retrieval fields in the Middle East. He is an active researcher and technology blogger, with a focus on information retrieval.
Arabic Content with Apache Solr: Presented by Ramzi Alqrainy, OpenSooq from Lucidworks
lucenerevolution-avatarJoin us at Lucene/Solr Revolution 2015, the biggest open source conference dedicated to Apache Lucene/Solr on October 13-16, 2015 in Austin, Texas. Come meet and network with the thought leaders building and deploying Lucene/Solr open source search technology. Full details and registration…

The post Indexing Arabic Content in Apache Solr appeared first on Lucidworks.

August Library Tech Roundup / LITA

LITA_MITimage courtesy of Flickr user cdevers (CC BY NC ND)

Each month, the LITA bloggers will share selected library tech links, resources, and ideas that resonated with us. Enjoy – and don’t hesitate to tell us what piqued your interest recently in the comments section!

Brianna M.

Here are some of the things that caught my eye this month, mostly related to digital scholarship.

John K.

Jacob S.

  • I’m thankful for Shawn Averkamp’s Python library for interacting with ContentDM (CDM), including a Python class for editing CDM metadata via their Catcher, making it much less of a pain batch editing CDM metadata records.
  • I recently watched an ALA webinar where Allison Jai O’Dell presented on TemaTres, a platform for publishing linked data controlled vocabularies.

Nimisha B.

There have been a lot of great publications and discussions in the realm of Critlib lately concerning cataloging and library discovery. Here are some, and a few other things of note:

Michael R.

  • Adobe Flash’s days seem numbered as Google Chrome will stop displaying Flash adverts by default, following Firefox’s lead. With any luck, Java will soon follow Flash into the dustbin of history.
  • NPR picked up the story of DIY tractor repairs running afoul of the DMCA. The U.S. Copyright Office is considering a DMCA exemption for vehicle repair; a decision is scheduled for October.
  • Media autoplay violates user control and choice. Video of a fatal, tragic Virginia shooting has been playing automatically in people’s feeds. Ads on autoplay are annoying, but this…!

Cinthya I.

These are a bit all over the map, but interesting nonetheless!

Bill D.

I’m all about using data in libraries, and a few things really caught my eye this month.

David K.

Whitni W.

Marlon H.

  • Ever since I read an ACRL piece about library adventures with Raspberry Pi, I’ve wanted to build my own as a terminal for catalog searches and as an self checkout machine. Adafruit user Ruizbrothers‘ example of how to Build an All-In-One Desktop using the latest version of Raspberry Pi might just what I need to finally get that project rolling.
  • With summer session over (and with it my MSIS, yay!) I am finally getting around to planning my upgrade from Windows 8.1 to 10. Lifehacker’s Alan Henry, provides quite a few good reasons to opt for a Clean Install over the standard upgrade option. With more and more of my programs conveniently located just a quick download away and a wide array of cloud solutions safeguarding my data, I think I found my weekend project.

Share the most interesting library tech resource you found this August in the comments!

MIME type / William Denton

Screenshot of an email notification I received from Bell, as viewed in Alpine:

Your e-bill is ready

Inside: Content-Type: text/plain; charset="ISO-8859-1"

In the future, email billing will be mandatory and email bills will be unreadable.

The impact of branding and communication on university reputation / HangingTogether

This is the fourth and final in a series of posts about the OCLC Research Library Partnership Rep, Rank and Role meeting that took place 3-4 June 2014 in San Francisco. Our earlier posts focused on advancing university goals around reputation, assessment and recording of research (Jim’s University reputation and ranking – another way that Europe is not like US) and how libraries can be instrumental in all of this (Ricky’s University reputation and ranking — getting the researchers on board), as well the build vs. buy debate around research system solutions (Roy’s To buy or not to buy and the importance of identifiers).

Helping to advance your university’s goals around reputation assessment takes more than just getting your researchers on board—it’s important to tie it all together with branding and communications for both internal and external audiences.

University branding. Branding your university is a process of strategically communicating what you think and say about it. Your university’s brand is about its relevancy and differentiation with respect to its customers—it is its face to the world, the sum of all of the characteristics that make it unique. It’s also a promise: a fundamental set of principles that are understood by, and make an emotional connection with, anyone who comes into contact with your university. It defines how your university is perceived by its internal and external constituents.

In order to create your university’s brand you need to understand your university’s strengths, what it stands for and how it is unique. How does your university want to be perceived? What is your institution’s mission and how does your faculty’s research contribute to it? Start to figure this out by connecting your university’s high-level goals with the major elements that constitute your university’s reputation.

What´s your story on blackboardDraft a clear brand story that explains your university’s essence to your internal and external stakeholders. For example, “Our university wants to make a contribution to society—here are three ways in which we’re uniquely situated to do so.” To make a difference, you need a great narrative that is authentic, understandable and believable.

Branding’s influence on reputation. Branding and reputation are tightly interconnected. Reputation is what others think, feel and say about your university—how your efforts regarding branding and what you have done or delivered are seen by your various constituencies. Reputation is about your university’s legitimacy, credibility and respect among all of its constituencies—it’s an evaluation of its brand from its constituents’ perspective. Your university will build its reputation by doing all the things it said it was going to do while building its brand.

In order to achieve excellence, your university needs both a solid reputation, a strong brand and consistent communication. As Peter Schiffer said in his “Managing Reputation, Managing Risk” presentation, reputation will only come with excellence and its communication. Reputation first comes from excellence, but then you have to communicate it. This can be difficult because it’s not in the academic culture. Researchers are often eager to communicate about their own research but may be more reluctant about or less focused on promoting their university.

In order to overcome this, it’s important that the university make its faculty, staff and students its brand ambassadors. This can be done by explaining to them what is going on, why it is important, what it means to them and how they can help. Equip them with the knowledge and understanding to enable them to collaborate with and inspire others. Make it clear that it is everybody’s job to communicate. It has to be part of the culture.

The library’s role. The library can help to support the university’s branding efforts by reinforcing the importance of communication and incorporating it into the culture. You can do this by:

  • creating internal communications groups and encouraging staff to participate;
  • consistently repeating the brand story and messages in a variety of venues including digital communications such as campus-wide calendars, email lists and social media;
  • asking for and encouraging feedback; and
  • building collaborative efforts and allowing external stakeholders (citizens, corporations, legislators, etc.) to join your community, consult your expertise, and support your efforts.

It’s important to keep in mind that communication will continue to be important as your university begins to achieve the rankings it wants. Communicating your university’s progress in the rankings can lead to even better rankings, so it is in the university’s interest to make an effort to do this.

In her “What league are you in? Rankings, ratings, and the quest to be the best” presentation, Virginia Steel explained how UCLA is hugely focused on the three R’s: rankings, ratings, and reputation. So much so that the UCLA website has an entire section devoted to rankings that includes a 7-page pdf about how UCLA stacks up. She also spoke about how the UCLA library helped to set up an infrastructure to maximize faculty research, improve teaching and help the campus to measure how well it is doing.

In conclusion, branding and communication can have a significant impact on your university’s reputation. It’s important that your university’s faculty and staff understand your university’s story, what it means to them, and how they can help to tell it. Your library can strengthen these efforts via its own communications activities.

Influencing rank, reputation is a long-term effort that is instrumental in attracting and retaining the best faculty, staff, students and funding. For more information about these and other presentations from the Rep, Rank and Role meeting, view the video playlist or download the slides from the program agenda.

About Melissa Renspie

Melissa Renspie is a Senior Communications Officer in OCLC Research. She shares information about OCLC Research activities and accomplishments with OCLC Research Library Partners and the wider OCLC Research community through a variety of communications vehicles and channels.

Exploring Solr Anti-Patterns with Sematext’s Rafał Kuć / SearchHub

As we countdown to the annual Lucene/Solr Revolution conference in Austin this October, we’re highlighting talks and sessions from past conferences. Today, we’re highlighting Sematext’s Rafał Kuć’s session about Solr Anti-Patterns. Be sure to catch his talk at this year’s conference: Large Scale Log Analytics with Solr. Through his work as a consultant and software engineer, Rafał has seen multiple patterns in how Solr is used and how it should be used. We usually say what should be done, but we don’t talk and point out what shouldn’t be done. This talk will point out common mistakes and roads that should be avoided at all costs. This session will not only to show the bad patterns, but also show the differences before and after. The talk is divided into three major sections:
  1. General configuration pitfalls that people are used to making. We will discuss different use cases showing the proper path that one should take
  2. We will focus on data modeling and what to avoid when making your data indexable. Again, we will see real life use cases followed by the guidance around how to handle them properly
  3. Finally, we will talk about queries and all the juicy mistakes made when it comes to searching for indexed data
Each shown use case will be illustrated by the before and after analysis – we will see the metrics changes, so the talk will not only bring pure facts, but hopefully know-how worth remembering. ” Full deck available on SlideShare:
Solr Anti-Patterns: Presented by Rafał Kuć, Sematext from Lucidworks
lucenerevolution-avatarBe sure to catch Rafał’s talk – Large Scale Log Analytics with Solr – at Lucene/Solr Revolution 2015, the biggest open source conference dedicated to Apache Lucene/Solr on October 13-16, 2015 in Austin, Texas. Come meet and network with the thought leaders building and deploying Lucene/Solr open source search technology. Full details and registration…

The post Exploring Solr Anti-Patterns with Sematext’s Rafał Kuć appeared first on Lucidworks.

Tech industry association releases copyright report / District Dispatch

Circuit board.


The Computer and Communications Industry Association (CCIA) released a white paper “Copyright Reform for a Digital Economy” yesterday that includes many ideas also supported by libraries. The American Library Association shares the same philosophy that the purpose of the copyright law is to advance learning and benefit the public. We both believe that U.S. copyright law is a utilitarian system that rewards authors through a statutorily created but limited monopoly in order to serve the public. Any revision of the copyright law needs to reflect that viewpoint and recognize that today, copyright impacts everyone, not just media companies. The white paper does get a little wonky in parts, but check out at least the executive summary (or watch the webinar) to learn why we argue for fair use, licensing transparency, statutory damages reform and a respect for innovation and new creativity.

The post Tech industry association releases copyright report appeared first on District Dispatch.

SAA Awards NDSA for Outstanding Published Work / Library of Congress: The Signal

2015-nat-agenda-coverThe National Digital Stewardship Alliance has been awarded a special commendation in the Preservation Publication category for the 2015 National Agenda for Digital Stewardship from the Society of American Archivists. The award recognize outstanding published work related to archives preservation, and was presented as part of the 2015 SAA annual conference in Cleveland, Ohio.

“The amount of digital information being produced by science, government and society is now so great, and the risks to it so diverse, that no organization can effectively ensure durable access to all the information it needs. The 2015 National Agenda, produced through a collaborative effort of the stewardship community,  provides a roadmap for collaboration to steward today’s knowledge and culture for future generations,” said Dr. Micah Altman, director of Research, MIT Libraries, who is currently serving as NDSA Coordinating Committee chair.

The National Agenda is an ongoing program of the NDSA Coordinating Committee, which seeks to highlight emerging challenges and offer possible collaborative solutions. It is based on the collective experiences of NDSA members, priorities expressed in NDSA Working Groups and overarching issues that are identified in NDSA member surveys. By integrating the perspectives of dozens of experts and hundreds of institutions the National Agenda provides funders and decision‐makers insight into emerging technological trends, gaps in digital stewardship capacity, and key areas for funding, research and development. The goal of the National Agenda is to help ensure that today’s valuable digital content remains accessible and comprehensible in the future, supporting a thriving economy, a robust democracy and a rich cultural heritage. These goals align with SAA Core Values.

In addition to providing high-level insight for executive decision-makers, the Agenda is meant to offer advice and guidance for practitioners. Recurring themes in the 2015 Agenda are issues around building digital collections, advocating for resources, continuing to support technical development and broadening the evidence base for digital preservation.

Building Digital Collections

Much of the investment and effort in the field of digital preservation has been focused on developing technical infrastructure, networks of partnerships, education and training and establishing standards and practices. Little has been invested in understanding how the stewardship community will coordinate the acquisition and management of born-digital materials in a systematic and public way. A gap is starting to emerge between the types of materials that are being created and used in our society and the types of materials that make their way into libraries and archives. The NDSA’s core recommendations in the 2015 National Agenda are:

  • Build the evidence base for evaluating at-risk, large-scale digital content for acquisition. Develop contextual knowledge about born-digital content areas that characterizes the risks and efforts to ensure durable access to them.
  • Understand the technical implications of acquiring large-scale digital content. Extend systematic surveys and environmental scans of organizational capacity and preservation storage practices to help guide selection decisions.
  • Share information about what content is being collected and what level of access is provided. Communication and coordinate collection priority statements at national, regional and institutional levels.
  • Support partnerships, donations and agreements with creators and owners of digital content and stewards. Connect with communities across commercial, nonprofit, private and public sectors that create digital content to leverage their incentives to preserve.

Advocating for Resources

Despite continued preservation mandates and over ten years of work and progress in building digital preservation programs, the community still struggles with advocating for resources, adequate staffing and articulating the shared responsibility for stewardship. The NDSA’s core recommendations in the 2015 National Agenda are:

  • Advocate for resources. Share strategies and develop unified messages to advocate for funding and resources; share cost information and models; and develop tools and strategies that inform the evaluation and management of digital collection value and usage.
  • Enhance staffing and training. Explore and expand models of support that provide interdisciplinary and practical experiences for emerging professionals and apply those models to programs for established professionals. Evaluate and articulate both the broad mix of roles and specialized set of skills in which digital stewardship professionals are involved.
  • Foster multi-institutional collaboration. Foster collaboration through open sources software development; information sharing on staffing and resources; coordination on content selection and engagement with development of standards and practices; and identify understand and connect with stakeholders outside of the cultural heritage sector.

Technical Development

Broadly speaking, the infrastructure that enables digital preservation involves the staff, workflows, resources, equipment, and policies that ensure long-term access to digital information. The NDSA’s core recommendations in the 2015 National Agenda are:

  • Coordinate and sustain an ecosystem of shared services. Better identify and implement processes to maintain key software platforms, tools and services; identify technologies which integrate well to form a sustainable digital workflow; and identify better models to support long-term sustainability for common goods are needed.
  • Foster best practice development. Give priority to the development of standards and best practices, especially in the areas of format migrations and long-term data integrity.

Broadening the Evidence Base

Research is critical to the advancement of both basic understanding and the effective practice of digital preservation. Research in digital preservation is under-resourced, in part this is because the payoff from long-term access occurs primarily in the medium-long and tends to benefit broad and diverse communities. Investments in core research will yield large impacts. The NDSA’s core recommendations in the 2015 National Agenda are:

  • Build the evidence base for digital preservation. Give priority to programs that systematically contribute to the overall cumulative evidence base for digital preservation practice and resulting outcomes–including supporting test beds for systematic comparison of preservation practices.
  • Better integrate research and practices. Give priority to programs that rigorously integrate research and practice or that increase the scalability of digital stewardship.

Access the full recommendations from the NDSA 2015 National Agenda (pdf) at the NDSA web site.

Virtual Shelf Browse is a hit? / Jonathan Rochkind

With the semester starting back up here, we’re getting lots of positive feedback about the new Virtual Shelf Browse feature.

I don’t have usage statistics or anything at the moment, but it seems to be a hit, allowing people to do something like a physical browse of the shelves, from their device screen.

Positive feedback has come from underclassmen as well as professors. I am still assuming it is disciplinarily specific (some disciplines/departments simply don’t use monographs much), but appreciation and use does seem to cut across academic status/level.

Here’s an example of our Virtual Shelf Browse.

Here’s a blog post from last month where I discuss the feature in more detail.

Filed under: General

Jobs in Information Technology: August 26, 2015 / LITA

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week:

Librarian for Health Sciences, Old Dominion University Perry Library, Norfolk, VA

Head of Systems Development, Old Dominion University Perry Library, Norfolk, VA

Huck Chair, and Head of Special Collections, Pennsylvania State University Libraries, Special Collections, University Park, PA

Senior Library Manager – Olympia, Timberland Regional Library, Olympia, WA

Technology and Content Strategy Manager (Librarian III), Suffolk Public Library, Suffolk, VA

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

Registration Now Open for a Fall Forum on the Future of Library Discovery / Peter Murray

Helping patrons find the information they need is an important part of the library profession, and in the past decade the profession has seen the rise of dedicated “discovery systems” to address that need. The National Information Standards Organization (NISO) is active at the intersection of libraries, content suppliers, and service providers in smoothing out the wrinkles between these parties:

Next in this effort is a two-day meeting where these three groups will hear about the latest activities and plan activities to advance the standards landscape. Registration for this meeting has just opened, and included below is the announcement. I’ll be Baltimore in early October to participate and offer the closing keynote, and I hope you will be able to attend in person or participate in the live stream.

NISO will host a two–day meeting to take place in Baltimore, Maryland on October 5 & 6, 2015 on The Future of Library Discovery. In February 2015, NISO published a white paper commissioned from library consultant Marshall Breeding by NISO’s Discovery to Delivery Topic Committee. The in-person meeting will be an extension of the white paper with a series of presenters and panels offering an overview of the current resource discovery environment. Attendees will then participate in several conversations that will examine possibilities regarding how these technologies, methodologies, and products might be able to adapt to changes in the evolving information landscape in scholarly communications and to take advantage of new technologies, metadata models, or linking environments to better accomplish the needs of libraries to provide access to resources.

For the full agenda, please visit:

Confirmed speakers include:

  • Opening Keynote: Marshall Breeding, Independent Library Consultant,
  • Scott Bernier, Senior Vice President, Marketing, EBSCO
  • Michael Levine-Clark, Professor / Associate Dean for Scholarly Communication and Collections Services, University of Denver Libraries
  • Gregg Gordon, President & CEO, Social Sciences Research Network (SSRN)
  • Neil Grindley, Head of Resource Discovery, Jisc
  • Steve Guttman, Senior Product Manager, ProQuest
  • Karen Resch McKeown, Director, Product Discovery, Usage and Analytics, Gale | Cengage Learning
  • Jason S. Price, Ph.D., Director of Licensing Operations, SCELC Library Consortium
  • Mike Showalter, Executive Director, End-User Services, OCLC
  • Christine Stohn, Product Manager, ExLibris Group
  • Julie Zhu, Manager, Discovery Service Relations, Marketing, Sales & Design, IEEE
  • Closing Keynote: Peter Murray, Library Technologist and blogger at the Disruptive Library Technology Jester

This event is generously sponsored by: EBSCO, Sage Publications, ExLibris Group, and Elsevier. Thank you!

Early Bird rates until September! The cost to attend the two-day seminar in person for NISO Members (Voting or LSA) is only $250.00; Nonmember: $300.00; and for Students: $150.00. To register, click here.

Please visit the event page for the most up-to-date information on the agenda, speakers and registration information.

For any questions regarding your in-person or virtual attendance at this NISO event, contact Juliana Wood, Educational Programs Manager, via email or phone 301.654.2512.

We hope to see you in Baltimore in the Fall!

Archives Alive!: librarian-faculty collaboration and an alternative to the five-page paper / In the Library, With the Lead Pipe

Download PDF
Image courtesy of Florian Klauer,

Image courtesy of Florian Klauer, (CC0 1.0)

In brief: The short research paper is ubiquitous in undergraduate liberal arts education. But is this assignment type an effective way to assess student learning or writing skills? We argue that it rarely is, and instead serves as an artifact maintained out of instructor familiarity with and unnecessary allegiance to timeworn conceptions of “academia.” As an alternative, we detail the Archives Alive! assignment developed by librarians and faculty at the University of Iowa and designed to bring Rhetoric students into contact with archival collections and digital skills. We also discuss how librarians can collaborate with instructors on new assignment models that build meaningful skills for students, highlight library collections, and foster connections on campus and with the broader community.


Anyone who has spent much time working at an academic library reference desk has encountered students scrambling to find sources for research papers they have already written. These students just need to add a few quotes (preferably from 3–5 different sources) in support of their preformed arguments. And quickly, because the paper is due tomorrow…or perhaps in a couple of hours.

Anyone who has taught undergraduate rhetoric, composition, or English has likely slogged through grading those same papers: formulaic phrasing or overwrought syntax; patchwriting if not outright plagiarism; vagaries grounded in hyperbole such as, “Since the dawn of time…”

And of course, anyone who has ever written a five-page paper knows the score. On sites like Yahoo Answers, you can find helpful instructions for how long it takes to write such a paper, how many sources to include, and how to tweak the margins/fonts to require as little actual writing as possible. Whether the page limit is five or fifteen, as an assignment format, the short research paper is pervasive, largely unloved by its participants, and deeply flawed.

We — an instruction librarian and an English Ph.D./rhetorician now a library department head — believe that another world is possible, where assignments can be deeply engaging for both students and instructors. We also believe that librarians can help make this change happen. We recently created the Archives Alive! assignment now used by many sections of a core required course at the University of Iowa (UI). The story of the assignment’s development is a story of risk and collaboration. In particular, it highlights the benefits of librarians approaching instructors to create assignments where students produce work for public audiences, and where student work can contribute to projects beyond their classrooms.

The prevalence and weakness of short research paper assignments

The five-page (or n-page) paper lurks everywhere in academia. Its form privileges quantity, outmuscling quality and utterly preoccupying students with concerns about numbers. Although professors may intend these assignments to facilitate exploratory learning, students often focus on meeting the expectations of their professors.1 Because assignments may be vague on the more qualitative aspects, students often fixate on the concrete word count or required number of references.2 It’s the mass of text that preoccupies these students. Their arguments and the audiences for them are secondary considerations.

This quantification of thought sends the absolute wrong message to students. Good arguments are not necessarily quantifiable. We’re not suggesting that there’s categorically no difference between a one-page paper and a five-pager. Tom has had enterprising or lazy students ask if it would be possible to write a successful one-page paper. It would, but that would be a hell of a paper. And that’s the thing: the impressive, succinct, artful one-page paper is not what we tend to teach students. We tell them that in order to tease out an argument, in order to excavate the multiple facets of the topic being addressed, we need a particular number of paragraphs and pages. In prescribing these numbers, we remain silent on what other possible forms might better serve their arguments. And in doing so, are we adequately modeling our own enthusiasm for the subject being taught? If we love rhetoric or composition, what is it that we love about it? Are we, through our assignments, conveying that love? In a world where the relevant searches for “five-page paper” are expeditious rather than enlightening, we doubt it.

In the run up to the 2013 annual meeting of the Modern Language Association, Tom and our colleague, Matt Gilchrist — a lecturer in the UI Rhetoric Department and Director of Iowa Digital Engagement and Learning (IDEAL) — ran a call for papers for a panel titled “Beyond the Essay.” They were interested in what other kinds of assignments instructors were asking their students to undertake in service to their learning. We received a bunch of marvelous proposals that, to use the parlance of educational theory, described hybrid learning models. The panel drew interest from graduate students, non-tenure track lecturers, and the whole gamut of tenure track faculty. People read gardens as texts, mapped local narratives, created marketing campaigns for local non-profits. Tom also received a email from an incredulous think tank member who asked, “Do you really want to give this generation of college students relief from writing college essays?”

Years later, looking back on our work, we think: “yes, yes we do.” And “relief” is a fitting word. Those essays are needlessly stressful, arid work — for both writer and reader. There’s no texture, no hook; nothing animates them. They serve as stock exercises in a form that as one colleague of ours has noted, is not replicated anywhere outside of academia. And that is where we are sending many (if not most) of these students — beyond the bounds of academia. So what are we preparing them for? And how are we preparing them for it? Short essay assignments can still play a role but over-reliance on the form does not serve students or faculty best.3

Why is this a librarian problem?

Although most librarians may not assign short research papers, we are often brought in to provide instruction or reference to assist students. As advocates for information literacy, we have a stake in whether these types of assignments help students build the skills we wish them to have. That is: does a short research paper help students learn to do research?

In short, not necessarily. Among the findings of the Citation Project, sophomore students often cite the first page of a work, and rarely cite any source more than once in a paper, suggesting cursory engagement with texts.4 The work of Project Information Literacy around employer satisfaction with recent college graduates further suggests that students don’t always get the skills required in the workplace. As one of their employer-participants explained, “They do well as long as the what, when, why, and how is clear in advance.”5 Although we are wary of focusing on the interests of employers, this statement also raises questions about how students will handle other research and critical thinking outside the classroom. Whether making personal medical decisions, researching local ballot initiatives for an election, or flirting with a potential partner on an online dating site, an inquiry isn’t over simply because you found your three references or reached a word limit.

Unfortunately, librarians often find themselves simply reacting to assignments, rather than advocating for projects that will purposefully build student skills. The chapter titles of the popular One-Shot Library Instruction Survival Guide allude to common problems of faculty non-collaboration: “they never told me this in library school,” “the teaching faculty won’t/don’t…” “but how will I cover everything?”6 Despite lamentations of the one-shot, many librarians cling to it as the only scrap of contact they can get with students in the classroom. Even embedded librarians well-integrated into a course may have very little role in assignment design.7

There may be a gap in perception between faculty and librarians. In an ethnographic study of faculty who were heavy users of library instruction, Manuel et al. found that library advocates sometimes had opposing motivations to librarians, for example showing little interest in lifelong learning or critical thinking as goals for library instruction.8 As one of their informants explicitly stated, they bring students to the library because the research paper is “the basic goal of the course.”9 Faculty may also assume that students will learn research skills simply by doing research, and leave out clear information literacy or research outcomes from assignments.10

The most successful librarian-faculty relationships occur when there are shared goals.11 However, as in the case below, the common ground may not be immediately visible. Nalani Meuleumans and Carr describe a program targeting new faculty members, with clear aims to shape their expectations for library instruction.12 Creative thinking on the part of the librarian can help unearth potential for greater collaboration, but it also requires willingness to be flexible and make active suggestions. Combining clearly articulated learning needs with new and interesting library services can lead to fruitful adventures.

What is the point? Developing successful assignment types

In our view, the most successful assignments meet two criteria. First, the assignment type fits the learning outcomes and skills being developed. Although this may seem obvious, we believe that archaic assignment types are often selected by rote. A new graduate instructor or junior faculty member is handed a stock syllabus for an introductory level course and is encouraged to maintain the status quo with respect to assignments because the clock is ticking on time to degree or tenure. The pressure to reach professional objectives outside of teaching become reason to cling to the “tried and true” assignments of the 20th century. Research paper assignments fit some learning outcomes and skills — for example, writing in an academic voice or learning a particular citation style — but certainly not all.

Second, successful assignments are placed within a context broader than the course. Students, like anyone else, shape their work to fit their audience. Although not every quickwrite or draft must be shared broadly, when students understand that their work contributes to a larger project or could be seen a wider audience, they tend to take it more seriously. At worst, it’s a vanity concern in which students don’t want to look bad; at best, it imbues them with a sense of relevance that extends beyond the bounds of the classroom.

Of course, there are research paper assignments that meet these two criteria. A research paper can be a part of a broader scholarly conversation on a topic, or at least a stepping stone to a student’s contribution to such scholarship. But few of our students will go on to become academics, and so the question arises: what do they get out of this “academic” practice? We found that the answer was little that cannot be replicated in assignment formats more relevant to students’ future professional lives.

Case study: Archives Alive!

So that’s where we were in 2013: tired, bored with our assignments, suspicious that we were not fully delivering the course objectives, and worried that we were merely reinscribing old methods onto our students who were poised to be citizens of the 21st century and needed, badly, to be able to move nimbly amidst its various forms of communication.

Tom: I was a non-tenure track lecturer in the Rhetoric Department, and co-directing a Provost-funded student success initiative with my friend and colleague Matt Gilchrist. Called Iowa Digital Engagement and Learning (IDEAL), the program was designed, in part, to help instructors rethink existing assignments and make them more digitally and publicly-inclined. Our thinking was that students needed to be honing digital composition and public engagement skills. Part of the departmental mission was to train students in writing, public speaking, and research.

Kelly: The library’s crowdsourcing transcription project, DIY History, had huge success with the broader public, thanks in part to a viral post on Reddit. People from all over the world were transcribing digitized archival collections, but the materials weren’t necessarily getting used on campus, let alone in the classroom. My colleague Jen Wolfe approached me to see if I had ideas about departments that might be open to developing something new using the pioneer letters in DIY History. Rhetoric seemed like a natural fit because their assignments often involved analysis of a text, and because I had strong relationships with several of their lead instructors, including both Tom and Matt. It helped that the two of them were known to be open to quirky suggestions, so we asked if they wanted to pilot…something.

Tom: And of course, Matt and I said yes.

Kelly: The first few planning meetings had an open-endedness that was both refreshing and intimidating. For the usual one-shot, it is very rare to have any say in what an assignment looks like, since the syllabus is generally set long before the librarian is asked to come in. I think library instruction is often brought in as the clean-up crew when an early assignment goes wrong — yikes, my students don’t know how to research, please help!

Tom: In my teaching at the time, research remained the last of the skills I introduced to my students. We grappled with reading, writing, public speaking, analysis….and research. And this after-the-fact approach irked me. Research became a sort of window dressing for students rather than the foundation of their work. They were seeking sources to hang on their arguments, rather than building those arguments on the sources they had read and analyzed. For a long time, I had been thinking about ways to better thread these skills together. The letters in DIY History presented a different way to engage students in research. The letters themselves were not necessarily making pre-formed arguments, and I chose not to introduce them with much more context than: let’s look at these intimate writings from other people in history. The approach relieved (or robbed) students of the impulse to tie their arguments to ready-made contexts. Much of the curriculum at the time encouraged instructors to use controversies as a means of getting students to understand the complexity of making an argument, to recognize the myth of argumentative dichotomies, the need to evaluate sources, etc. I was prepared to set that approach aside in favor of simply letting students dig into primary sources that they might find engaging. I asked my students to do the following things:

  1. Transcribe the letter.
  2. Rhetorically analyze its content (why did the letter writers choose the words they did?) in a 400-word blog post.
  3. Historically contextualize the letter (what historical content is present in the letter, or barring that: what was going on globally at the time) in the same blog post for another 400 words. The intent here was to locate this letter in a real moment and possibly juxtapose the local with the global.
  4. Create a two minute or so “Ken Burns” style video that walked the viewer through any aspect of the letter that the student found interesting.
  5. Live present their findings to the class using any visual medium they found appealing (PowerPoint, prezi, etc.) but not simply show their videos.

The intent was to get them conversant in rhetorical analysis, writing, research, public speaking, and digital composition – all in the same assignment (while also helping create a searchable index of these texts for scholars).

Kelly: During the very first pilot, my own assumptions about one-shots limited what we did. Students had already looked at a few of the letters, and seemed really excited about the project. I had prepared a research guide for the assignment, and we used that to navigate to the finding aid for the archival collections the letters came from. It was a total buzzkill! Students were confused by the format, and suddenly felt intimidated by the formality. We moved on to explore historical newspaper collections, and asked students to try to find an article from the date of the letter they were looking at, and their joy started to come back.

Tom: We should point out here that one of the reasons for the return of their joy was reading old newspaper advertisements. Students were intrigued by the fact that people a hundred years earlier advertised and purchased things like hats. Hats became a simple hook to the past.

Kelly: But, it was a good challenge to my assumption: did they really need to understand how to use the finding aid to complete the assignment? No, as much as my archival studies profs would hate to hear it, they really didn’t. The purpose of the assignment was to do a rhetorical  analysis of the letter, with very minor historical context. Some students would come back and use the contextual information later, but it was secondary. Overwhelming students with the arcane form of the finding aid did not serve them well. These weren’t history students, and our goal wasn’t to make them into historians — or even to make them feel like historians.

Tom: Right. We wanted to use the primary source material to foreground the work of rhetorical analysis against the backdrop of historical research. After all, I was expected to be teaching them rhetoric — the art of persuasion. In many cases, analyzing the rhetoric of the letter also required researching contemporary idioms and terminology. I should also point out that the letters fostered remarkable collaboration. Cursive, for example, brought out the cooperative spirit in them. We worked on transcription in class and when students had difficulty reading a word, we would put it to the class to essentially crowdsource an answer about what was written there. Was this scribble an “s” or an “f”? I was impressed by the problem-solving groupthink that possessed them.

Kelly: By the second term we ran the assignment, we had expanded to three sections. That term, students in all sections looked at letters from a single scrapbook collection. This approach had a serendipitous peer-evaluation factor where several students in each section read letters from the same group of half a dozen American men serving in WWII. The students’ curiosity about filling in the gaps in these narratives or between references and words they understood and those they didn’t, led them to connect their work with that of their peers.

Tom: I will admit to being deeply suspicious about using such a small set of letter writers. I thought I’d be hearing the same names and the same stories and views over and over again. I was sure we were running the risk of replicating an assignment along the lines of asking students to weigh in with their views on the drinking age or the legalization of recreational marijuana use. I couldn’t have been more wrong. In class and during their final presentations students questioned one another about their shared letter writers. They asked things like: “When was that letter written?” or “Had he already said this to Evelyn?” as they pieced together a larger narrative.

Clarence Clark letter, May 3, 1941, page 1.

Clarence Clark letter, May 3, 1941, page 1.

Kelly: Evelyn Birkby, the woman who had donated the scrapbooks, ended up agreeing to do a phone call with one section, which I sat in on. It was truly an experience in rhetoric as these students carefully tried to ask this 94-year-old woman about the nature of her relationships with all these men 70-plus years ago. She later expressed to the curator of the Iowa Women’s Archive that she was delighted to know that her materials were being used, not just sitting in storage somewhere.

Tom: These letters also introduce some content that is more immediately graspable for our students. The soldiers mention films, music, and plays. The students can relate to those things — but they often don’t know the works being referenced. So, boom: there’s a research question. And they love it. Pop culture references, military lingo, idioms all become portals for analysis and with it: research. Tellingly their blog posts (a form that I think produces a more compelling and earnest voice than formal papers which often encourage stilted language and overwrought syntax) improve. They care about what they are writing and about the people writing the letters. As one student commented, “This project taught me that when something interests you, it never really feels like research as much as it feels like learning more about an old friend or uncovering hidden, exciting secrets.” Another student talked about wanting to read the letters of their deceased grandfather as “good bonding experience for us.” And while our students have chafed against the videos, they do admit to enjoying the sense of accomplishment upon seeing their arguments in documentary style. Their presentations are also a delight to watch. They interrupt one another, they go over time with questions, they carry on conversations after class about the letters and Evelyn’s connection to these men. They consider themselves (mild) experts on their letters. And they feel they have contributed to the scholarly enterprise. At the very least, they have transcribed letters for other scholars, making those handwritten texts searchable. I’ll note here that one question I often get when discussing this assignment is: “Isn’t this just student labor?” To which I often reply that nothing is more laborious for students and instructors than the rote five-pager. And why adhere to an assignment model that pretends to include students in the experience (that Dewey objective) of scholarly work, when we can use one that actually does?

Kelly: It definitely requires ongoing maintenance. Once a collection is fully transcribed, it can’t be reused for the assignment. It has taken conversations each term with the library staff who really know the collections to identify good fits for the assignment, and then the assignment gets tweaked to fit as well.

Other examples

Lately, we in the UI Libraries have been working on calling attention to little used or little seen collections. We’ve commenced a Collections to Courses initiative that tries to bring the holdings of the Libraries into broader circulation in the classroom. For instance, like our colleagues at Notre Dame University and The University of Pennsylvania, we are identifying and promoting public domain holdings that can be openly remixed by students. And, in turn, we are encouraging students to archive their remixes with the library for future remixing. We’re interested in creating intellectual feedback loops where students create knowledge that will be stored by the Libraries and those works can in turn be used by other scholars (students and faculty alike). We’ve also begun archiving student works produced by the Iowa Narratives Project in our institutional repository, Iowa Research Online. That project asks students to work in groups and create eight-minute podcasts out of an interview with a local citizen. Students must make audio recordings, edit them in the style of, say, StoryCorps or This American Life or RadioLab, take photographs, and write a brief paragraph of context for the interview. In our experience, students often compose essays in one take. It’s four in the morning, they’ve just tumbled a bunch of text onto the page and…damn, it’s perfect (in their exhausted eyes). By contrast, no students edit like the students asked to make an audio recording of themselves. We find that students do not readily edit their own writing in the same way they do their multimedia. Students making audio recordings of their own voices, for example, will do multiple takes without any prompting — they know what sounds good. So what if we used assignments that highlight editing of multimedia as a gateway for helping them understand why and how to edit writing?

The recipe (we think) for librarians to propose this kind of change

For librarians interested in pursuing this kind of pedagogical change with instructors, we have some suggestions for successful collaborations.

  1. Strategize. Consider your target. Are there faculty/instructors who are known to be willing to experiment? Folks who are big advocates for the library? A course whose instructors are particularly grateful for help from instruction librarians? Or perhaps there are courses whose regular assignments produce groans every term. At the University of Iowa, all students are required to take a course titled “Rhetoric,” which is meant to introduce students to the art of persuasion. In that course, many instructors, students, and librarians alike lamented the long-standing paper-about-a-controversy. Those lamentations were an invitation for new ideas. By targeting shared frustrations and overlapping objectives, instructors and librarians were able to jointly remake the assignment in a way that better achieved their goals. If you can think of projects that both advance the library’s goals and instructor and student need, you’re likely to have a better chance of lasting success. Archives Alive! helped promote our digital collections in the classroom while hitting multiple course objectives tied to Rhetoric.
  2. Advocate. Consider the possible motivations of the people you approach. Will they see this as the solution to a perennial problem? an innovative feather in their teaching cap? a hassle this late in the term? an opportunity to give back to the library? However you package your suggestion, be clear about your intended role in the project. Meulemans and Carr recommend practicing answers to hard questions from faculty, so you are prepared to stand up for yourself in the moment.13 If you’re afraid of a tough interaction, roleplay with colleagues who might have helpful feedback.
  3. Work backwards from your objective. If you’re going to rethink an assignment, think first about what it is supposed to do. Not along the lines of “it’s supposed to generate a paper” — but rather along the lines of “what do you want your students to be able to do?” If you want your students to become better researchers, think about what that means to you. What is a better researcher able to do? Once you have a sense of what it is you’d like as in end product (in terms of skills), work backwards towards the assignment prompt.14 Ask yourself what steps the student will need to take to wind up at the desired end point. In the case of Archives Alive! we wanted to arrive at a live presentation on a topic of interest to the student that had been reasonably researched. And of course, “of interest” and “reasonably researched” don’t make that endpoint particularly easy to attain. Working backwards also better allows you to anticipate the time needed to work through each step in the assignment process. Unlike simply assigning a paper with draft and final due dates, our assignment included due dates for component parts of the assignment. This approach helped students lay the foundation for their eventual live presentation by completing one part of the assignment at a time.
  4. Be honest. When you ask students to undertake new assignment models, be honest with them. Tell them this hasn’t been done before. Acknowledge that there will be bumps in the road. And tell them that they will be your troubleshooters. As they walk through the assignment, the problems they encounter will help the next semester’s students. This goes for interacting with faculty, too. There are costs associated with implementing new assignment forms; they take up time both in and out of class. So remain flexible when navigating a faculty member’s approach to the project, and find ways to be generous of your own time and resources.
  5. Promote. Once your students have crafted these engaging, enlightening, and entertaining works, share them. Get them out of the classroom to present in a more public setting. At Iowa, we have had tremendous success getting classes to share their work in our Learning Commons within the Main Library, an open space that gets a lot of foot traffic. And to the extent that the works are digital, circulate them on the internet. Call attention to your hard work and that of your students, by inviting faculty and administrators to come listen to your students’ presentations. Celebrate their effort by trusting that it is something the public will find interesting.
  6. Take risks. Let go of your assumptions of what library instruction means. For Archives Alive!, we went into it without really knowing what the assignment would look like. It took a lot of conversation to clarify the goals of the instructors and of the librarians, and to brainstorm about how to get all those goals met. For students to interact fully with the documents, we had to let them focus on deciphering the cursive, and let the finding aid wait for another time.
  7. Reflect and repeat. Examine how things went, make adjustments, and try it again. Whether or not you can reuse the assignment as developed, it has certainly taught you something, and hopefully broadened your network of connections on campus. Both of us have developed a reputation for willingness to experiment, which draws otherwise unexpected opportunities.


Ironically, the Archives Alive! assignment helped us bury the myth that the Rhetoric course was where University of Iowa students learn all their research skills. By intentionally designing an assignment where students engaged with primary source materials, we uncovered necessary scaffolding that was otherwise being left out. We also got students to better understand research as an engaging and ongoing endeavor rather than a set number of citations. This experience has given Kelly more confidence to set limitations with faculty who expect a whirlwind one-shot to solve all research woes.

It has also opened up collaborations within the library, as the folks who work with digital collections, special collections and archives have to communicate and brainstorm. And this partnership isn’t dependent on personal relationships: a host of collaborations have continued although both Kelly and Jen Wolfe, the other librarian involved at the start of the project, have left the University of Iowa. This work also led Tom into the library, where he now heads the Digital Scholarship & Publishing Studio.

Each of us has also had misfires in suggesting new projects: assignment designs that bombed, instructors who balked at making changes. However, the process of proposing and brainstorming remains a necessary one. At its root, education is about curiosity and the experience of seeking out answers to our questions. For us, asking questions of our assignments and looking for new, innovative ways to shape the curriculum has been incredibly rewarding — and brought with it some much needed relief from the five-page paper and its host of dated, restrictive, and staid trappings. We encourage you to usher in a similar sense of curiosity and relief as you and your students explore what new forms the 21st century has to offer.

Many thanks to In the Library with the Lead Pipe for inviting us to publish with them and for their wonderful guidance and support. We would like to particularly thank our publishing editor Ellie Collier, our internal reviewer Annie Pho, and our external reviewer Kate Rubick. Your feedback and suggestions were indispensable. 

Works Cited

Bowers, Cecilia V. McInnis, Byron Chew, Michael R. Bowers, Charlotte E. Ford, Caroline Smith, and Christopher Herrington. “Interdisciplinary Synergy: A Partnership Between Business and Library Faculty and Its Effects on Students’ Information Literacy.” Journal of Business & Finance Librarianship 14, no. 2 (June 2009): 110–27.

Buchanan, Heidi E., and Beth A. McDonough. The One-Shot Library Instruction Survival Guide. (2014).

Head, Alison J., Michele Van Hoeck, Jordan Eschler, and Sean Fullerton. “What Information Competencies Matter in Today’s Workplace?” Library and Information Research 37, no. 114 (2013): 74–104.

Head, Alison J, and Michael B Eisenberg. “Assigning Inquiry: How Handouts for Research Assignments Guide Today’s College Students.” Available at SSRN 2281494, 2010.

Jamieson, Sandra. “Reading and Engaging Sources: What Students’ Use of Sources Reveals About Advanced Reading Skills.” Across the Disciplines 10, no. 4 (2013).

Manuel, Kate, Susan E Beck, and Molly Molloy. “An Ethnographic Study of Attitudes Influencing Faculty Collaboration in Library Instruction.” The Reference Librarian 43, no. 89–90 (2005): 139–61.

McGuinness, Claire. “What Faculty Think–Exploring the Barriers to Information Literacy Development in Undergraduate Education.” The Journal of Academic Librarianship 32, no. 6 (November 2006): 573–82. doi:10.1016/j.acalib.2006.06.002.

Nalani Meulemans, Yvonne, and Allison Carr. “Not at Your Service: Building Genuine Faculty-Librarian Partnerships.” Reference Services Review 41, no. 1 (2013): 80–90.

Wiggins, Grant P. & McTighe, J. (1998). Chapter 1: What is backward design? Understanding by Design. Alexandria, Virginia: Association for Supervision and Curriculum Development. Retrieved from

Valentine, Barbara. “The Legitimate Effort in Research Papers: Student Commitment versus Faculty Expectations.” The Journal of Academic Librarianship 27, no. 2 (2001): 107–15.

  1. Valentine, “The Legitimate Effort in Research Papers: Student Commitment versus Faculty Expectations.”
  2. Project Information Literacy found, for example, that ⅔ of the assignment handouts in their sample required some particular type of structure, and over half had a required number of citations. Head and Eisenberg, “Assigning Inquiry: How Handouts for Research Assignments Guide Today’s College Students,” 8.
  3. A 2010 Project Information Literacy study on research assignment handouts found that 83% of the undergraduate research assignments in their study pool were plain old research papers.
  4. Jamieson, “Reading and Engaging Sources: What Students’ Use of Sources Reveals About Advanced Reading Skills.”
  5. Head et al., “What Information Competencies Matter in Today’s Workplace?” 86.
  6. Buchanan and McDonough
  7. See for example Gaspar and Wetzel, “A Case Study in Collaboration: Assessing Academic Librarian/faculty Partnerships,” 586.
  8. Manuel, Beck, and Molloy, “An Ethnographic Study of Attitudes Influencing Faculty Collaboration in Library Instruction,” 47.
  9. Ibid, 45.
  10. McGuinness, “What Faculty Think — Exploring the Barriers to Information Literacy Development in Undergraduate Education.”
  11. Bowers et al., “Interdisciplinary Synergy: A Partnership Between Business and Library Faculty and Its Effects on Students’ Information Literacy,” 113.
  12. Nalani Meulemans and Carr, “Not at Your Service: Building Genuine Faculty-Librarian Partnerships.”
  13. Ibid, 88.
  14. Wiggins, Grant P. & McTighe, J. (1998). Chapter 1: What is backward design? Understanding by Design. Alexandria, Virginia: Association for Supervision and Curriculum Development. Retrieved from

DuraSpace Selects Gunter Media Group, Inc. as a Registered Service Provider for VIVO / DuraSpace News

Winchester, MA  Gunter Media Group, Inc., an executive management consulting firm that helps libraries, publishers and companies leverage key operational, technical, business and human assets, has become a DuraSpace Registered Service Provider (RSP) for the VIVO Project. Gunter Media Group, Inc.  will provide VIVO related services such as strategic evaluation, project management, installation, search engine optimization and integration for institutions looking to join the VIVO network.

CyanogenMod / William Denton

I installed CyanogenMod on my three-year-old Samsung Galaxy S III phone. It was easier than I’d expected, and it’s like having a new phone. Here are a few notes.

These S3s were very popular, of course, and with good reason. They’re good. This one has served me well for three years. I bought it after staying up all night for In Fear We Trust at the 2012 Nuit Blanche—I remember grabbing some zeds at 8 am, waking up at 10, looking at the newspaper and seeing a big ad from Rogers, my phone company, saying I could upgrade for a penny if I committed to a three-year contract (which they can no longer do). I needed a new phone—I’d cracked the screen on the one I had, and slivers were starting to come off in my fingers—and I thought, “Heck, I’ve already stayed up all night, I might as well get a new phone. This weekend, anything can happen!” I might have been a little dazed, but it was still a good decision.

Mind the gap “Mind the gap”

There are an astounding number of videos on YouTube of people showing how to install Cyanogenmod on phones, and they all mention links in the notes or on their web site where everything is all explained. I watched a few to see what the general process was like, but the specifics didn’t help. (Furthermore, I can’t make head or tail out of, and downloading binaries made by some anonymous user on a message board is weird and unsettling.) For that, all I really needed were these two pages: the official instructions and one with extra details about how to do it with TWRP (a recovery boot manager thing):

Before all that, I’d done a few things:

  • backed up what I could
  • made a list of all the apps installed
  • got all my photos and such off
  • made sure developer mode was enabled
  • installed adb and heimdall (sudo apt-get install android-tools-adb heimdall-flash on my Ubuntu machine)

Then following the instructions linked above worked very nicely (except I had to run heimdall as root). I used the d2can TWRP image, because that’s what I saw in the prompt when I ran adb shell to log into my phone when it was connected by USB. I got the latest versions of Cyanogenmod and the Google apps package and they went in a treat.

The only problem I had was rebooting into the TWRP recovery mode after installing it. I was pressing Volume Up + Home + Power, as required, but either not long enough, or too long, and it took a few tries to realize when to let go of the buttons.


After the install I rebooted (the first time was slow) and all was well. I hadn’t bricked it and I could make a phone call. Two-factor authentication made reconnecting my accounts take a little while, but that’s no problem. Now I’m reinstalling all my favourite apps (like Tasker) and configuring things the way I like. I think I liked a few things about the S3 a little more, like the way text messages looked, but generally everything is an improvement, and I’m very happy to be rid of the useless apps Rogers and Samsung forced on me.

The home screen can handle more icons and widgets, but I had to get rid of the Google search box as soon as possible.

Overall the phone seems to be running faster, but it may also be using more battery, or that may just be the new operating system getting used to my aged old worn-down system. I’ll see. Either way, it’s like having a new phone. Rogers only had me on Android 4.1.1, but now I’ve got Android 4.4.1, and when there’s a Cyanogenmod version of Android 5 for my phone, I’ll install it.

My #tableflip story / Coral Sheldon-Hess

This is my tableflip story, at long last. It’s relevant to another post I will make later this week, but there’s too much here for it all to fit cleanly into that one. And I think I did promise to write one.

Maybe techbros will look at it and say “no big loss; they’re a junior developer, anyway.” That’s fine. I look at it and say “no big loss; I’ll be my own boss and be happy doing work that makes people’s lives better.” Because, while I admit that there are things I want to learn that are going to be harder to pick up outside of a Real Tech Job™, my health and mental well-being have to come first. It’s clear they won’t, in tech.

It all started last summer

When I went looking for a tech job outside of libraries, I realized I was at a disadvantage, due to my nontraditional background (a CS minor over a decade ago is hardly enough to get me through the standard algorithm-, whiteboarding-, and vocab-heavy tech interview), so I wouldn’t get to be super picky. Still, I knew enough to ask lots of questions about the company’s culture; I read enough Model View Culture that I went in with eyes open.

When I got an offer from That Tech Job I Had, I thought I’d lucked out, finding a place that said so many of the right things about gender diversity; plus, they had lots of female interns and hosted the local women-in-tech nights (both are Official Good Signs™). More importantly, it was super collaborative, and the other developers were really open and cool about teaching/learning/asking questions/answering questions. Being a newer developer, I knew I had a lot to learn, so that last bit was key, for me. Finally, when I was visiting, it was clear that people were leaving at 5-6pm, and there was a lot of life in their work-life balance; as someone with a chronic illness, for whom even 40 hours/week can be pretty taxing, this was attractive to me.

There were some red flags, admittedly. I noticed that someone in a wheelchair would be unable to access the office (the key reader was too high), that management was all male (except for the office manager who orders food and does HR work and, no joke, takes notes at meetings), that only one developer was older than me (the nicest guy! I almost didn’t leave due to his helpfulness alone, no joke), that the final stage of the interview required 2-3 days of unpaid labor (seriously), that it employed an open office setup (tldr: high illness rate, bad for attention), that hiring-related HR functions were mostly handled by an intern (seriously), and that most of the women in the office were interns or on teams other than development. It was also a “nonprofit startup,” which is two axes for potentially justifying mistreating people, right there. So, again: eyes open. I knew they had work to do to be properly inclusive, but they seemed invested in doing the work; it was in their strategic plan!

Sadly, as I pointed out in my exit email (yes, I tried to help fix the culture even as I left), they failed badly at living up to their ideals of openness and inclusivity, at that point in time.*

Ironically, this organization that had such high ideals ended up being super oppressive because of one of its co-founders. He was unaware of his own biases (which is extra sad-funny because his partner writes peer-reviewed papers about bias), and he constantly gaslighted his employees, telling them that they were misremembering conversations, when it was he who had changed his mind. Multiple people warned me about his “bad memory,” and one person confessed they’d taken to recording conversations with this guy, just to play back for themselves, to prove they weren’t crazy; they never played it back for him, because he’d have reacted very, very badly.

All that was frustrating, yes, and I shouldn’t downplay how much heartache and self-questioning it caused me. Simultaneously, there was a move away from the expectation of reasonable hours, which was also not great. And there was also this whole debacle with that co-founder shifting me off a project right as the crunch time ended and the opportunity for learning began; also, curiously, right as it was a success. But…

The thing that ultimately led to my quitting was this co-founder’s policy that some new developers (disproportionately not white and Asian men in their 20s) were to be isolated from the other developers and given tasks that they could only ask him for help on; they were not allowed to talk to the rest of the team about work. Further, he would not ever answer questions, only give hints. He claimed it was for our own good, that the only way to “really learn” was alone, that he was helping us, and no amount of protesting would convince him he was wrong. (<snark>This was true of most things, really.</snark>)

Speaking in general, this tactic was incredibly harmful, not just because it was applied unevenly; it also sent the clear message that those of us undergoing this treatment were unworthy of proper training and undeserving of the other developers’ time (a message some of my peers tried hard to dispel, by offering to help on the sly). This guy could not have activated our impostor syndrome and stereotype threat harder if he’d tried.

For my part, since I’d taken the job in part for the collaborative environment, with no idea that this weird rite of passage was coming, I was appalled; since I know more than nothing about the pedagogy of learning to code, I was horrified; and since it didn’t happen until I’d been working there for two months, completed a hugely successful project, and chosen my own tickets to solve for several weeks (and solved them successfully!), I was taken completely aback. The main task he assigned was also completely unreasonable, both beyond my skill level and with an absurdly short deadline. I did successfully complete the task (plus a second, more reasonable one, assigned at the same time), but it took me five times longer than he’d allowed for, and he made a big show of shaming me about it in front of two new coworkers on a Friday afternoon at 4:30 and making the office manager take notes so it could “go in [my] file.” To be clear, this task was not part of the critical path; in fact, that code was going to be ripped out and replaced within the month.

It was pretty obvious that this was the start of a paper trail leading toward them firing me. (It isn’t relevant, but he added some untrue and unverifiable things to those notes, to make them look extra damning.) I can’t know for sure why, but my best guess: I had complained when the work week got beyond about 60 hours for 3-4 weeks in a row, leading up to the completion of that project I mentioned above; I never specifically said that I couldn’t handle it because I had a chronic illness—ironically, because I was afraid he’d want to fire me.

So I left, because the constant gaslighting, isolation, and apparent efforts to discredit me were tanking my confidence, because I didn’t see a way to circumvent the co-founder if he wanted me gone, and because I wasn’t willing to stay and let this man destroy me.

Within four months of starting, I was gone, their second female developer to quit in less than two years of existence.

(╯°□°)╯︵ ┻━┻

To recap: I found an organization that was so promising and passed most of the tests I knew to give it and showed a pretty good understanding of at least basic diversity issues; and it still pretty much destroyed my confidence as a developer. I was useless for months after that job ended. In talking to a few other people who left, I don’t appear to have been the only person whose confidence was damaged. We’re all bouncing back, though.

And I guess, strictly speaking, I’m not leaving tech. I’m leaving other people’s tech organizations. I’ll do tech, alone and with teams I choose to work with (including any of my peer-level coworkers from That Place, if you’re reading this!). For now, that looks like building websites, mostly in WordPress. As I build skills, I hope to grow into more development-heavy contracting. Maybe it’ll take me longer to get there, but I will be amazing. I’m not letting anybody else’s egotistic elitism drown me, ever again.

* Reportedly, they’re getting better now. For instance, one thing I pointed out seems to be improving: they had previously only kept one female intern as a full employee, ever, when dude-interns got jobs almost as a matter of course. This last batch was the reverse. Also, they’re building in a layer of middle management (all white men, unfortunately), which may serve to protect the developers from the toxic co-founder, whose work takes him out of the office more and more often. My friends who stayed seem hopeful that it is improving, and I sincerely hope they’re right. My peers and the office manager were/are all delightful people, and I wish them the best.

iPads in the Library / LITA

Charging cart filled with ipads

Getting Started/Setting Things Up

Several years ago we added twenty iPad 2s to use in our children’s and teen programming. They have a variety of apps on them ranging from early literacy and math apps to Garage Band and iMovie to Minecraft and Clash of Clans*. Ten of the iPads are geared towards younger kids and ten are slanted towards teen interests.

Not surprisingly, the iPads were very popular when we first acquired them. We treated app selection as an extension of our collection development policy. Both the Children’s and Adult Services departments have a staff iPad they can use to try out apps before adding them to the programming iPads.

We bought a cart from Spectrum Industries (a WI-based company; we also have several laptop carts from them) so that we had a place to house and charge the devices. The cart has space for forty iPads/tablets total. We use an Apple MacBook and the Configurator app to handle updating the iPads and adding content to them. We created a Volume Purchase Program account in order to buy multiple copies of apps and then get reimbursed for taxes after the fact. The VPP does not allow for tax exempt status but the process of receiving refunds is pretty seamless.

The back of our iPad cart showing power plugs and USB ports.

The only ‘bothersome’ part of updating the iPads is switching the cable from the power plug to the USB ports (see above) and then making sure that all the iPads have their power cables plugged firmly into them to make a solid connection. Once I’d done it a few times it became less awkward. The MacBook needs to be plugged into the wall or it won’t have enough power for the iPads. It also works best running on an ethernet connection versus WiFi for downloading content.

It takes a little effort to set up the Conifgurator** but once you have it done, all you need to do is plug the USB into the MacBook, launch the Configurator, and the iPads get updated in about ten to fifteen minutes even if there’s an iOS update.

Maintaining the Service/Adjusting to Our Changing Environment

Everything was great. Patrons loved the iPads. They were easy to maintain. They were getting used.

Then the school district got a grant and gave every student, K-12, their own iPad.

They rolled them out starting with the high school students and eventually down through the Kindergartners. The iPads are the students’ responsibility. They use them for homework and note-taking. Starting in third grade they get to take them home over the summer.

Suddenly our iPads weren’t so interesting any more. Not only that, but our computer usage plummeted. Now that our students had their own Internet-capable device they didn’t need our computers any more. They do need our WiFi and not surprisingly those numbers went up.

There are restrictions for the students. For example, younger students can’t put games on their iPads. And while older students have fewer restrictions, they don’t tend to put pay apps on their iPads. That means we have things on our iPads that the students couldn’t or didn’t have.

I started meeting with the person at the school district in charge of the program a couple times a year. We talk about technology we’re implementing at our respective workplaces and figure out what we can do to supplement and help each other. I’ll unpack this in a future post and talk about creating local technology partnerships.

Recently I formed a technology committee consisting of staff from every department in the library. One of the things we’ll be addressing is the iPads. We want to make sure that they’re being used. Also, it won’t be too long and they will be out-of-date and we’ll have to decide if we’re replacing them and whether we’d just recycle the old devices or repurpose them (as OPACs potentially?).

We don’t circulate iPads but I’d certainly be open to that idea. How many of you have iPads/tablets in your library? What hurdles have you faced?

* This is a list of what apps are on the iPads as of August 2015. Pay apps are marked with a $:

  • Children’s iPads (10): ABC Alphabet Phonics, Air Hockey Gold, Bub – Wider, Bunny Fun $, Cliffed: Norm’s World XL, Dizzypad HD, Don’t Let the Pigeon Run This App! $, Easy-Bake Treats, eliasMATCH $, Escape – Norm’s World XL, Fairway Solitaire HD, Fashion Math, Go Away, Big Green Monster! $, Hickory Dickory Dock, Jetpack Joyride, Make It Pop $, Mango Languages, Minecraft – Pocket Edition $, Moo, Baa, La La La! $, My Little Pony: Twilight Sparkle, Teacher for a Day $, NFL Kicker 13, Offroad Legends Sahara, OverDrive, PewPew, PITFALL!, PopOut! The Tale of Peter Rabbit! $, Punch Quest, Skee-Ball HD Free, Sound Shaker $, Spot the Dot $, The Cat in the Hat – Dr. Seuss $, Waterslide Express
  • Teen iPads (10): Air Hockey Gold, Bad Flapping Dragon, Bub – Wider, Can You Escape, Clash of Clans, Cliffed: Norm’s World XL, Codea $, Cut the Rope Free, Despicable Me: Minion Rush, Dizzypad HD, Easy-Bake Treats, Escape – Norm’s World XL, Fairway Solitaire HD, Fashion Math, Fruit Ninja Free, GarageBand $, iMovie $, Jetpack Joyride, Mango Languages, Minecraft – Pocket Edition $, NFL Kicker 13, Ninja Saga, Offroad Legends Sahara, OverDrive, PewPew, PITFALL!, Punch Quest, Restaurant Town, Skee-Ball HD Free, Stupid Zombies Free, Temple Run, Waterslide Express, Zombies vs. Ninja

** It’s complicated but worth spelling out so I’m working on a follow-up post to explain the process of creating a VPP account and getting the Configurator set up the way you want it. – Extending Benefits / Richard Wallis

I find myself in New York for the day on my way back from the excellent Smart Data 2015 Conference in San Jose. It’s a long story about red-eye flights and significant weekend savings which I won’t bore you with, but it did result in some great chill-out time in Central Park to reflect on the week.

In its long auspicious history the SemTech, Semantic Tech & Business, and now Smart Data Conference has always attracted a good cross section of the best and brightest in Semantic Web, Linked Data, Web, and associated worlds. This year was no different for me in my new role as an independent working with OCLC and at Google.

I was there on behalf of OCLC to review significant developments with in general –  now with 640 Types (Classes) & 988 properties – used on over 10 Million web sites.  Plus the pioneering efforts OCLC are engaged with, publishing data in volume from and via APIs in their products.  Check out my slides:

By mining the 300+ million records in WorldCat to identify, describe, and publish approx. 200 million Work entity descriptions, and [soon to be shared] 90+ million Person entity descriptions, this pioneering continues.

These are not only significant steps forward for the bibliographic sector, but a great example of a pattern to be followed by most sectors:

  • Identify the entities in your data
  • Describe them well using
  • Publish embedded in html
  • Work with, do not try to replace, the domain specific vocabularies – Bibframe in the library world
  • Work with the community to extend an enhance to enable better representation of your resources
  • If is still not broad enough for you, build an extension to it that solves your problems whilst still maintaining the significant benefits of sharing using – in the library world’s case this was

This has not been an overnight operation for OCLC. If you would like to read more about it, I can recommend the recently published Library Linked Data in the Cloud – Godby, Wang, Mixter.

schemaorg1.jpgThrough OCLC and now Google I have been working with and around since 2012. The presentation at Smart Data arrived at an opportune time to introduce and share some major developments with the vocabulary and the communities that surround it.

A couple of months back, Version 2.0 of introduced the potential for extensions to the vocabulary. With Version 2.1, released the week before the conference, this potential became a reality with the introduction of and

On a personal note the launch of these extensions, in particular, is the culmination of a bit of a journey that started a couple of years ago with forming of the Schema Bib Extend W3C Community Group (SchemaBibEx) which had great success in proposing additions and changes to the core vocabulary.

A journey that then took in the formation of the extension vocabulary which demonstrated both how to build a domain focused vocabulary on top of as well as how the open source software, that powers the site, could be forked for such an effort. These two laying the ground work for defining how hosted and external extensions will operate, and for SchemaBibex to be one of the first groups to propose a hosted extension.

Finally this last month working at Google with Dan Brickley on, has been a bit of a blur as I brushed up my Python skills to turn the potential in version 2.0 in to the reality of fully integrated and operational extensions in version 2.1. And to get it all done in time to talk about at Smart Data was the icing on the cake.

Of course things are not stoping there. On the not too distant horizon are:

  • The final acceptance of & – currently they are in final review.
  • SchemaBibEx can now follow up this initial version of with items from its backlog.
  • New extension proposals are already in the works such as:,,
  • More work on the software to improve the navigation and helpfulness of the site for those looking to understand and adopt and/or the extensions.
  • The checking of the capability for the software to host external extensions without too much effort.
  • And of course the continuing list of proposals and fixes for the core vocabulary and the site itself.

I believe we are on the cusp of a significant step forward for as it becomes ubiquitous across the web; more organisations, encouraged by extensions, prepare to publish their data; and the SEO community recognise  proof of it actually working – but more of that in the next post.

Global Open Data Index 2015 is open for submissions / Open Knowledge Foundation

The Global Open Data Index measures and benchmarks the openness of government data around the world, and then presents this information in a way that is easy to understand and easy to use. Each year the open data community and Open Knowledge produces an annual ranking of countries, peer reviewed by our network of local open data experts. Launched in 2012 as tool to track the state of open data around the world. More and more governments were being to set up open data portals and make commitments to release open government data and we wanted to know whether those commitments were really translating into release of actual data.

The Index focuses on 15 key datasets that are essential for transparency and accountability (such as election results and government spending data), and those vital for providing critical services to citizens (such as maps and water quality). Today, we are pleased to announce that we are collecting submissions for the 2015 Index!

The Global Open Data Index tracks whether this data is actually released in a way that is accessible to citizens, media and civil society, and is unique in that it crowdsources its survey results from the global open data community. Crowdsourcing this data provides a tool for communities around the world to learn more about the open data available in their respective countries, and ensures that the results reflect the experience of civil society in finding open information, rather than accepting government claims of openness. Furthermore, the Global Open Data Index is not only a benchmarking tool, it also plays a foundational role in sustaining the open government data community around the world. If, for example, the government of a country does publish a dataset, but this is not clear to the public and it cannot be found through a simple search, then the data can easily be overlooked. Governments and open data practitioners can review the Index results to locate the data, see how accessible the data appears to citizens, and, in the case that improvements are necessary, advocate for making the data truly open.

Screen Shot 2015-08-25 at 13.35.24


Methodology and Dataset Updates

After four years of leading this global civil society assessment of the state of open data around the world, we have learned a few things and have updated both the datasets we are evaluating and the methodology of the Index itself to reflect these learnings! One of the major changes has been to run a massive consultation of the open data community to determine the datasets that we should be tracking. As a result of this consultation, we have added five datasets to the 2015 Index. This year, in addition to the ten datasets we evaluated last year, we will also be evaluating the release of water quality data, procurement data, health performance data, weather data and land ownership data. If you are interested in learning more about the consultation and its results, you can read more on our blog!

How can I contribute?

2015 Index contributions open today! We have done our best to make contributing to the Index as easy as possible. Check out the contribution tutorial in English and Spanish, ask questions in the discussion forum, reach out on twitter (#GODI15) or speak to one of our 10 regional community leads! There are countless ways to get help so please do not hesitate to ask! We would love for you to be involved. Follow #GODI15 on Twitter for more updates.

Important Dates

The Index team is hitting the road! We will be talking to people about the Index at the African Open Data Conference in Tanzania next week and will also be running Index sessions at both AbreLATAM and ConDatos in two weeks! Mor and Katelyn will be on the ground so please feel free to reach out!

Contributions will be open from August 25th, 2015 through September 20th, 2015. After the 20th of September we will begin the arduous peer review process! If you are interested in getting involved in the review, please do not hesitate to contact us. Finally, we will be launching the final version of the 2015 Global Open Data Index Ranking at the OGP Summit in Mexico in late October! This will be your opportunity to talk to us about the results and what that means in terms of the national action plans and commitments that governments are making! We are looking forward to a lively discussion!

Only 15 tickets left for Hydra Connect 2015 / Hydra Project

Four weeks to go!  Yes, Hydra Connect 2015 is just four weeks away.  The Connect 2015 wiki page has full details of the program and other aspects of the event. As I write this there are only 15 tickets left so, if you haven’t booked already, you really ought to do so very soon!  All our discounted hotel rooms are sold out, but apparently the discount travel sites can still find you a good deal.