Planet Code4Lib

Dine-Arounds / Access Conference

Thursday night, hang out with friends and try some of the best dining (and brewing!) Fredericton has to offer with Dine-Arounds. Sign-up sheets will be available at the registration desk on Wednesday and Thursday (Oct. 5 and 6), but you can get a sneak peek at the dining selection now. In addition to our dine-around reservations, you’ll find recommendations and links to menus for a great selection of restaurants, pubs, and bistros to sample from during your visit to Fredericton!

A scholar’s pool of tears, Part 1 / Karen G. Schneider

This is Part 1 of the origin story of the following scholarly article. In this blog post I review how this article was produced and accepted for publication, and why I chose a non-OA journal.

Schneider, K.G. (in press). To Be Real: Antecedents and Consequences of Sexual Identity Disclosure by Academic Library Directors, The Journal of Academic Librarianship, Available online 13 August 2016, ISSN 0099-1333,

Chapter 1: Somehow, I write this thing

To be Real is a heavily-remastered version of the qualifying paper I wrote for the doctoral program I’m in. This article was a hurdle I had to pass on the way to becoming a doctoral candidate in a program, now ended, for organizational leadership in libraries (officially called Managerial Leadership in the Information Professions). This program has produced PhDs at the leadership level now working in executive roles in dozens of library organizations, and I look forward to donning the tam that will invest me in their ranks.

To be Real was just one hurdle before I could begin working on my dissertation (now in progress). Some of you are now itching to tell me that “the best dissertation is a done dissertation,” or that “the most important page is the one with the signatures on it.” O.k., consider it said. Also insert any joke you like about the quality of dissertations; I’ve heard it. In the end, I still need to produce a redoubtable piece of original scholarship that meets the expectations of my program and my committee.  Now let’s move on.

There were other milestones in the program. I needed to stump through two years of classes, including 6 residential intensives in Boston or other East Coast locations; a heavy reading schedule; coursework, also known as endless hours flailing at a keyboard; a $500 moving violation incurred when I was headed to SFO to fly to Boston for my second semester and wearily zombied through a right turn without stopping; about 30 red Sharpie Ultra Fine Point markers (aka RPOD, or Red Pens of Death); and my “comps,” which were two four-hour closed-book exams requiring copious quantities of memorization, a feat at any age, no comment on what that meant for me.

What has kept me going is a mixture of pride, stubbornness, encouragement from others, good executive skills, and a keen interest in the topic. I have also benefited from the advantage of what is known in life course theory as the principle of time and place. (Note how I can no longer just say “I had lucky timing.” Hopefully, with a good intervention team, I can be deprogrammed post-dissertation.)

To be real, known as the “680” (for its course number), was not the first or the second, but my third attempt at producing scholarly research on the path to my doctorate. The first two efforts were technically solid, featuring all the structural elements of a good research paper. But the more I learned, the more I felt they were topically dubious, and I issued cease-and-desists after they made it through the IRB process.

Yes, I picked the topics, then watched myself outgrow them, which was a good process in itself. It was hard to wave goodbye to the earlier projects, but the value of earning an MFA in writing is that I don’t feel bad about discarding weak work. “Reduce, reuse, recycle” is my battle cry.

Once my committee accepted To be real, I began developing my doctoral topic, which builds on the work in this dissertation but goes into bold new areas–or so I comfort myself when I am spending lovely early-autumn weekend days analyzing 900 minutes of interviews and drafting chapters. I defended my topic to my advisors, then squired my dissertation proposal through institutional review, and kaboom! I was finally ABD.

At several key points in my proposal, I cite To be Real, which was gathering metaphorical dust in a metaphorical drawer in my real-world office. Rather than have my dissertation lean at key points on an unpublished paper, my ever-patient dissertation advisor suggested that I actually try publishing To be Real. Frankly, as I trudged through milestones toward the doctorate while balancing huge day jobs and Life Issues, I had entirely forgotten this was something I should do.

Chapter 2, In which I seek publication

Publish my research–what a positively brill idea! I asked someone whose insights I deeply respect where I should send it, and was given a list of six LIS journals  to consider for the first round. Yes, that’s how I made the first cut, which is similar to how I have determined where to send literary essays: by referrals from people I trust.

From that list of peer-reviewed LIS journals, the key factors I considered were:

  1. Prestige of the publication
  2. How much work I had to do to have my paper considered for publication
  3. How likely it was my article would be published before I finished my dissertation
  4. Open access was a plus, but not a requirement.

You might be surprised to learn how much #2 and #3 drove my decision-making. At least for the first round of submissions, I rejected journals that require authors to reformat citations from APA to another citation schema simply to submit a paper for consideration. No other principle was at stake than “I do not have the time for this.” Nevertheless, learning that some journals do indeed require this investment of personal effort on a highly speculative venture made me greatly sympathetic to the thousands of tenure-track librarians jumping through crazy hoops to try to get at least an in-press citation in time to show scholarly production in their annual review.

Also, time was of the essence, since I wanted the article to at least be accepted before my dissertation was finished, and I’m a writing banshee these days, trying to get ‘er done. We all at least nominally subscribe to the myth of scrupulously avoiding simultaneous submissions to multiple journals. Indeed, I was faithful to this practice simply because I didn’t have the bandwidth to submit to more than one journal at a time. But that ruled out journals that might take a couple of years to reject my article, let alone accept it.

I was open to paying subvention fees (the cost to make an article Gold OA), noting that they ranged from $1100 to $2500 for the journals I was considering–something that would be prohibitive on a junior faculty’s salary. In the same vein, I would have paid an author’s fee to publish in an OA journal that used that funding model. But not everyone has that kind of scratch.

In any event, the paper went out to the first journal on the list, and very quickly I heard back from the editor with feedback from two reviewers. The paper was accepted, provided I made some changes. I hadn’t planned on being accepted by the first journal I submitted to, but to paraphrase Yogi Berra, I saw a fork in the road, and I took it.

Chapter 3: In which I encounter the peer review process

Yet another advantage of having gone through an MFA program is understanding that Anne Lamott’s writing about “shitty first drafts” is an artful understatement; for most of my writing, I can only tell if a piece is a keeper by the fifth or so draft, if that.

I had faith in my research, and my paper had all the right components, well-executed, but I questioned my writing. It felt turgid, dense, and remote–characteristics belying its subject matter or the very interesting interviews that were its primary data. I know good writers feel that way pretty much all the time, but I had a persistent sense of unease about my paper, without quite being able to determine what to do about it. It did not help that when I showed it to peers their response was… silence. Above all, I wanted my research not simply to be published, but to be read.

I have written in the past how much I love a good editor. It’s like working with a great hair stylist. You are you, and yet, so much better. With that in mind, we’ll scoot quickly past the feedback from Reviewer 1, a living parody of the peer review process.

You know those jokes about reviewers who blithely object to the research direction on which the paper is based? Yes, Reviewer 1 was that kind of reviewer.  “The authors really only present the viewpoints of those who are ‘out.'” I don’t even know how to respond to that, other than to say that’s my area of research. Reviewer 1 also ruminated aloud–painfully, considering this person lives and breathes among us in higher education–that he or she did not understand the term “antecedent.” (The “antecedent and consequences” framework is classic and well-understood in qualitative research; and in any event, the word “antecedent” is hardly obscure.) And so on.

If Reviewer 2 had been like Reviewer 1, I would have pushed on to another journal. There is a difference between knowing that my work needs improvement and radically redesigning a valid and important research project from the ground up based on reviewers’ whims, nor was there a middle ground where I could have simultaneously satisfied Reviewer 1 and Reviewer 2. As much as I wanted to publish To be Real in a timely manner, my career wasn’t hanging on the balance if I didn’t.

But Reviewer 2 not only respected my research direction, but  also provided some of the best writing feedback I have received since, indeed, the MFA program–advice that I fully believe not only improved this paper tenfold, but is helping my dissertation. In close to 1,000 words, Reviewer 2 commented on the value and quality of my research, but gently advised me to use pseudonyms or labels for the research participants; extend quotations more fully; and do a better job of summing up paragraphs and linking the findings to the literature review (“divergence and convergence”). Reviewer 2 ever so delicately observed that the conclusion had “too much context” and that all that blurbage (my term, not the reviewers) blurred the main points. There was more, all of it worthwhile.

I summarized Reviewer 2’s advice, taped it to the wall over my desk, and got to work. Indeed, once I labeled participants (Leader A, Leader B, etc.) and extended their quotations, I felt vastly better about my article. Doing this moved my writing from being an over-long jumble of “data analysis” to a paper about real people and their lived experiences. Following the other recommendations from Reviewer 2–expand, chop, link, add, tighten, clarify; Reduce! Reuse! Recycle!–also improved the paper to the point where I no longer felt apologetic about inflicting it on the scholarly canon.

Several more editorial go-rounds quickly followed, largely related to citations and formatting. The editors were fast, good, and clear, and when we had moments of confusion, we quickly came to agreement. In the last go-round, with a burst of adrenaline I looked up every single citation in my article and found that five had the wrong pagination; each one of these errors, for the record, was mine alone. Correcting these errors felt like a victory lap.

I then tried to follow the guidance for green OA, and the reason this blog post doesn’t link to the author’s final corrected proof, and indeed the reason I broke this post in two, is that three weeks and two days after the first of three help desk inquiries with very pleasant people, I’m still not entirely sure which document version To be Real that represents.

Part 2 of A Scholar’s Pool of Tears will have a link to the author’s final corrected proof of To be Real and will discuss the intricacies of navigating the liminal world of OA that is not born OA; the OA advocacy happening in my world; and the implications of the publishing environment scholars now work in.

Closing in on Client-side IIIF Content Search / Jason Ronallo

It sounds like client-side search inside may at some point be feasible for a IIIF-compatible viewer, so I wanted to test the idea a bit further. This time I’m not going to try to paint a bounding box over an image like in my last post, but just use client-side search results to create IIIF Content Search API JSON that could be passed to a more capable viewer.

This page is a test for that. Some of what I need in a Presentation manifest I’ve only deployed to staging. From there this example uses an issue from the Nubian Message. First, you can look at how I created the lunr index using this gist. I did not have to use the manifest to do this, but it seemed like a nice little reuse of the API since I’ve begun to include seeAlso links to hOCR for each canvas. The manifest2lunr tool isn’t very flexible right now, but it does successfully download the manifest and hOCR, parse the hOCR, and create a data file with everything we need.

In the data file are included the pre-created lunr.js index and the documents including the OCR text. What was extracted into documents and indexed is the the text of each paragraph. This could be changed to segment by lines or some other segment depending on the type of content and use case. The id/ref/key for each paragraph combines the identifier for the canvas (shortened to keep index size small) and the x, y, w, h that can be used to highlight that paragraph. We can just parse the ref that is returned from lunr to get the coordinates we need. We can’t get back from lunr.js what words actually match our query so we have to fake it some. This limitation also means at this point there is no reason to go back to our original text for anything just for hit highlighting. The documents with original text are still in the original data should the client-side implementation evolve some in the future.

Also included with the data file is the URL for the original manifest the data was created from and the base URLs for creating canvas and image URLs. These base URLs could have a better, generic implementation with URL templates but it works well enough in this case because of the URL structure I’m using for canvases and images.

manifest URL:
base canvas URL:
base image URL:

Now we can search and see the results in the textareas below.

Raw results that lunr.js gives us are in the following textarea. The ref includes everything we need to create a canvas URI with a xywh fragment hash.

Resulting IIIF Content API JSON-LD:

Since I use the same identifier part for canvases and images in my implementation, I can even show matching images without going back to the presentation manifest. This isn’t necessary in a fuller viewer implementation since the content search JSON already links back to the canvas in the presentation manifest, and each canvas already contains information about where to find images.

I’ve not tested if this content search JSON would actually work in a viewer, but it seems close enough to begin fiddling with until it does. I think in order for this to be feasible in a IIIF-compatible viewer the following would still need to happen:

  • Some way to advertise this client-side service and data/index file via a Presentation manifest.
  • A way to turn on the search box for a viewer and listen to events from it.
  • A way to push the resulting Content Search JSON to the viewer for display.

What else would need to be done? How might we accomplish this? I think it’d be great to have something like this as part of a viable option for search inside for static sites while still using the rest of the IIIF ecosystem and powerful viewers like UniversalViewer.

Nicolini (6) / Ed Summers

This chapter focuses on Ethno-Methodology (EM) which Nicolini characterizes as practice-oriented much like the earlier praxeology of Bourdieu, but more interested in description and less in theory building and particularly the correctness of the descriptions. Garfinkel (1967) is cited as codifying EM around the idea of accountability or making activities legible to others. It’s interesting that Garfinkel originally went to school to study accounting, at least according to Wikipedia. There are several characteristics of accountability:

  • social activities have an order
  • the order is public (observable)
  • the order is mundane, banal, witnessed by anyone
  • orders are oriented to each other
  • the order makes sense to the people performing
  • experts in the order can describe it, they have language for it

This attention to rules is borrowed from some extent from Husserl and Schutz, but comes very close to Wittgenstein’s notion of rules, and rule following. His idea of relexivity is different from Bourdieu and Giddens in that reflexivity is connected with accountability: people make their practices accountable by making them reflexive. Similarly Garfinkel uses the idea of indexicality to talk about the way meanings are embedded in actions, and much of EM’s work can be found in the study of how people work with this indexicality when it pushes up against the way things work in the world: How do people do it?

EM is also concerned with how people perform their competency in an activity, and their membership in a group of other competent people. EM inspired two lines of research: work studies and conversation analysis. It’s interesting and curious that Nicolini says that these ideas of accountability, indexicality, membership and reflexivity are used just to open the space for research, and are abandoned as concepts when it comes to doing the work of EM.

It is important for the descriptions to embody just-thisness or haecceity (a new word for me) – they are told from the perspective of the people involved in the action, using their distinctive words and motivations. To do this the researcher must immerse themself in the domain under study. They must become a legitimate participant to understand the language, rules and activities. This idea is known as unique adequacy. It can require the researcher to dedicate their life to becoming a proficient member of a community. Mehan & Wood (1975) goes so far as to claim that EM isn’t so much a method or a theory but a form of life (recalling Wittgenstein again). This strong version of EM can lead to researchers giving up their research as their membership in the community under study takes over. It feels like there must be some pretty strong parallels with this approach to Bourdieu’s habitus.

EM was apparently the subject of fierce debate in the sociology community, and EM practitioners found it difficult to get academic jobs. In the 1990s EM practices got new life in the work of Suchman, Dourish (who are on my reading list for later in the semester) and others who conducted workplace studies, examining the effects of technology in order to inform design.

EM-orientated workplace studies are not limited, in fact, to claiming in principle—as other theories do—that actors are bricoleurs, that they improvise and construct their world, that there is power and hierarchy. Rather, such studies go to great length to present living instances of bricolaging, improvisation, and power and hierarchy making and unmaking. They are not limited to claiming that organizations and society are socially constructed, that decisions and conducts are context-dependent, and that knowledge is practical and situated. Instead, they set out to provide evidentiary empirical substantiation to these claims, describing in detail how ordered and organized scenes of action are accomplished, how members build on contextual clues for accomplishing their activities, how knowing is visibly exhibited in what members actually do or refrain from doing. (p. 148)

EM feels like the most humanistic of the practice theories reviewed so far. It doesn’t attempt to make a theory, but instead embodies a sensibility, or a set of questions, and a way of approaching study, rather than a formula for conducting the research. EM is data driven and at the same time it is a literary genre.


Garfinkel, H. (1967). Studies in ethnomethodology. Prentice Hall.

Mehan, H., & Wood, H. (1975). The reality of ethnomethodology. John Wiley & Sons Inc.

Sravanthi Adusumilli – New Library Technology Development Graduate Assistant / Villanova Library Technology Blog

Tech grad asst 1 resize

Sravanthi (Sravs) Adusumilli , a graduate of Acharya Nagarjuna University, Guntar, India, joined the Library Technology Development team in August. She reports to Demian Kratz, team leader. She is currently working on redesigning “Finding Augustine.” “Finding Augustine” is “[a] rich and readily accessible biographical collection concerning Augustine of Hippo and his legacy;” it is sponsored by the Augustinian Institute at Villanova University.

Adusumilli has a bachelor’s degree in computer science engineering and is now enrolled in the Master of Science in Computer Engineering program with an anticipated graduation in May 2018. She plans to work as a data scientist.

Her hometown is Machilipatnam, India, a city on the southeast coast. Adusumilli’s hobbies are cooking and gardening.


Client-side Search Inside for Images with Bounding Boxes / Jason Ronallo

It is possible to create a Level 0 IIIF Image API implementation with just static images and an info.json. And some institutions are probably pre-creating Presentation API manifests or even hand-crafting them. All that’s required then is to put those files up behind any web server with no other application code running and you can provide the user with a great viewing experience.

The one piece that currently requires a server-side application component is the IIIF Content Search API. This usually involves a search index like Solr as well as application code in front of it to convert the results to JSON-LD. I’ve implemented search inside using the content search via Ocracoke. With decent client-side search from libraries like lunr.js it ought to be possible to create a search inside experience even for a completely static site.

Here’s a simple example:

This works first of all because the page has been OCR’d with Tesseract which outputs hOCR. (I developed Ocracoke in part to help with automating an OCR workflow.) The hOCR output is basically HTML that also includes the bounding boxes of sections of the page based on the size of the digitized image. We can then use this information to draw boxes over top of the corresponding portion of the image. So how do we use search to find the section of the page to highlight?

The first step in this case for simplicity’s sake was to use an image of known size. This is possible to do hit highlighting in a tiling pan/zoom viewer like OpenSeadragon as evidenced by UniversalViewer. The page image at 20% of the original fits within the width of this site:

I then used some code from Ocracoke to rescale the original hOCR to create bounding box coordinates that would match on the resized page image. I parsed that resized hOCR file to find all the paragraphs and recorded their position and text in a JSON file.

At this point I could have created the lunr.js index file ahead of time to save the client some work. In this example the client requests the JSON file and adds each document to the index. The Fabric.js library is used to create a HTML canvas, add the page image as a background, and draw and remove rectangles for matches over top of the relevant section. Take a look at the JavaScript to see how this all works. Pretty simple to put all these pieces together to get a decent search inside experience.

If you gave this a try you’ll notice that this implementation does not highlight words but sections of the page. It might be possible to make this work for individual words, but it would increase the total size of the documents as the bounding boxes for each word would need to be retained. Indexing each word separately would also disrupt the ability to do phrase searching. There’s some discussion in lunr.js issues about adding the ability to get term start positions within a text that may make this possible in the future without these drawbacks. I had originally considered just trying to achieve getting the user to the correct page, but I think targeting some level of segment of the page is a reasonable compromise.

I don’t use the IIIF Content Search API in this demonstration, but it ought to be enough of a proof of concept to show the way towards a viewer that can support a completely static site including search inside. Anyone on ideas or thoughts on how a static version of content search could be identified in a IIIF Presentation manifest? Without a URL service point what might this look like?

Pew: A generation gap for digital readiness / District Dispatch

Digital Readiness Gaps,” a new Pew Research Center report, explores a spectrum of digital readiness, from digitally ready to unprepared. Researcher John Horrigan finds that more than half (52%) of U.S. adults may be considered “relatively hesitant” and the least likely to use digital tools for personal learning.
The research explores three dimensions of digital readiness: (1) the digital skills required to use the internet; (2) trust, namely, people’s ability to assess the trustworthiness of information found online and to protect their personal information; and (3) use, that is, the extent to which people use digital tools to complete online activities (e.g., personal learning or online courses).

person staring at a computer monitor

“Digital Readiness Gaps,” a new Pew Research Center report, explores a spectrum of digital readiness.

The analysis identifies five distinct groups on the spectrum:

Relatively more prepared

  • Digitally Ready (17%): Have technology resources and are confident in their digital skills and capacity to determine the trustworthiness of online information. The Digitally Ready enjoy high-income and education levels, and are likely to be in their 30s or 40s.
  • Cautious Clickers (31%): Have strong levels of tech ownership, are confident in their digital skills, and are relatively knowledgeable about new online learning concepts. Unlike the Digitally Ready, they are less likely to use the internet for personal learning. The Cautious Clickers have above average educational and income levels, and are usually in their 30s or 40s.

Relatively hesitant

  • The Reluctant (33%): Have below average confidence in their digital skills, little concern about their ability to trust information online, and very low awareness of online learning concepts. The Reluctant are middle-aged and have relatively lower levels of income and education.
  • Traditional Learners (5%): Are active learners and have technology, but are unlikely to use the internet for learning purposes, tend to need help with using digital devices, and express above average concern about the trustworthiness of information online. This group is more likely to be middle-aged, ethnically diverse, and lower- to lower-middle income.
  • The Unprepared (14%): Have relatively lower levels of tech adoption, very low confidence in their digital skills, and a high degree of difficulty determining whether online information is trustworthy. The Unprepared are older, with relatively low income and educational levels.

By examining digital readiness, rather than the “digital divide,” Pew’s research highlights the fact that people’s lack of digital skills and trust in technology may, in turn, impact their use of digital resources. In other words, digital literacy and trust may boost meaningful internet use.

As the report observes, libraries understand that digital readiness involves digital skills combined with the digital literacy tools to enable people to assess the trustworthiness of online information. The report also notes that library users and the highly wired are more likely to use the internet for personal learning (55% and 60%, respectively, compared with 52% of all personal learners) and more likely to have taken an online course.

Horrigan notes that the research focuses on online learning, and may not project to people’s capacity (or lack of capacity) to perform health-related web searches or use mobile apps for civic activities, for instance. There also is some fluidity among the groups identified, and the finding represent a snapshot in time that may change in coming years as e-learning evolves.

Unsurprisingly, libraries have long been at the forefront of digital literacy efforts in their communities, as ALA documented in 2013. As the recent Digital Inclusion Survey indicated, all public libraries provide free public access to the internet, and most offer diverse digital content and services, as well as formal and informal technology training.

What’s more, the public trusts libraries to teach digital literacy skills. In a prior report, Pew found that 47 percent of American adults agree that libraries contribute “a lot” to providing a trusted place for people to learn about new technologies. Another Pew report revealed that 80 percent of adults believe that libraries should “definitely” offer programs to teach people how to use digital tools.

This newest report is an important addition to the body of research conducted by the Pew Research Center (including a previous public library engagement typology) and fodder for planning related to digital inclusion efforts, including work underway at the Federal Communications Commission.

Note: OITP Deputy Director Larra Clark will interview Pew researcher John Horrigan for a Public Libraries Online podcast interview, which will be posted in coming weeks.

The post Pew: A generation gap for digital readiness appeared first on District Dispatch.

Client-side Video Search Inside / Jason Ronallo

Below the video use the input to search within the captions. This is done completely client-side. Read below for how it was done.

As part of thinking more about how to develop static websites without losing functionality, I wanted to be able to search inside a video.

To create the WebVTT captions file I used random words and picked 4 randomly to place as captions every 5 seconds throughout this 12+ minute video. I used an American English word list, randomly sorted it and took the top 100 words. Many of them ended with “’s” so I just removed all those for now. You can see the full word list, look at the WebVTT file, or just play the video to see the captions.

sort -R /usr/share/dict/american-english | head -n 100 > random-words.txt

Here’s the script I used to create the WebVTT file using our random words.

#!/usr/bin/env ruby
random_words_path = File.expand_path '../random-words.txt', __FILE__
webvtt_file_path = File.expand_path '../search-webvtt.vtt', __FILE__

def timestamp(total_seconds)
  seconds = total_seconds % 60
  minutes = (total_seconds / 60) % 60
  hours = total_seconds / (60 * 60)
  format("%02d:%02d:%02d.000", hours, minutes, seconds)

words =
cue_start = 0
cue_end = 0, 'w') do |fh|
  fh.puts "WEBVTT\n\nNOTE This file was automatically generated by\n\n"
  144.times do |i|
    cue_words = words.sample(4)
    cue_start = i * 5
    cue_end = cue_start + 5
    fh.puts "#{timestamp(cue_start)} --> #{timestamp(cue_end)}"
    fh.puts cue_words.join(' ')

The markup including the caption track looks like:

<video preload="auto" autoplay poster="" controls>
  <source src="" type="video/mp4">
  <source src="" type="video/webm">
  <track id="search-webvtt" kind="captions" label="captions" lang="en" src="/video/search-webvtt/search-webvtt.vtt" default>

<p><input type="text" id="search" placeholder="Search the captions..." width="100%" autocomplete='off'></p>
<div id="result-count"></div>
<div class="list-group searchresults"></div>
<script type="text/javascript" src="/javascripts/search-webvtt.js"></script>

In the browser we can get the WebVTT cues and index each of the cues into lunr.js:

var index = null;
// store the cues with a key of start time and value the text
// this will be used later to retrieve the text as lunr.js does not
// keep it around.
var cue_docs = {};
var video_elem = document.getElementsByTagName('video')[0];
video_elem.addEventListener("loadedmetadata", function () {
  var track_elem = document.getElementById("search-webvtt");
  var cues = track_elem.track.cues;
  index = lunr(function () {

  for (var i = 0; i <= cues.length - 1; i++) {
    var cue = cues[i];
    cue_docs[cue.startTime] = cue.text;
      id: cue.startTime,
      text: cue.text

We can set things up that when a result is clicked on we’ll get the data-seconds attribute and make the video jump to that point in time:

$(document).on('click', '.result', function(){
  video_elem.currentTime = this.getAttribute('data-seconds');

We create a search box and display the results. Note that the searching itself just becomes one line:

$('input#search').on('keyup', function () {
  // Get query
  var query = $(this).val();
  // Search for it
  var result =;
  var searchresults = $('.searchresults');
  var resultcount = $('#result-count');
  if (result.length === 0) {
  } else {
    resultcount.html('results: ' + result.length);
    // Makes more sense in this case to sort by time than relevance
    // The ref is the seconds
    var sorted_results = result.sort(function(a, b){
      if (a.ref < b.ref) {
        return -1;
      } else {
        return 1;

    // Display each of the results
    for (var item in sorted_results) {
      var start_seconds = sorted_results[item].ref;
      var text = cue_docs[start_seconds];
      var seconds_text = start_seconds.toString().split('.')[0];
      var searchitem = '<a class="list-group-item result" data-seconds="'+ start_seconds +'" href="#t='+ start_seconds + '">' + text + ' <span class="badge">' + seconds_text + 's</span></a>';

And that’s all it takes to create search within for a video for your static website.

Video from Boiling Process with Sine Inputs–All Boiling Methods.

Site Search with Middleman and lunr.js / Jason Ronallo

One of the tasks I have for this year is to review all the applications I’ve developed and consider how to lower their maintenance costs. Even for applications that aren’t being actively fed new content they need to be updated for security vulnerabilities in the framework and libraries. One easy way to do that is to consider shutting then down, and I wish more applications I have developed were candidates for sunsetting.

We have some older applications that are still useful and can’t be shut down. They’re are largely static but occasionally do get an update. We’ve thought about how to “pickle” certain applications by taking a snapshot of them and just making that static representation live on without the application code running behind it, but we’ve never pursued that approach as making changes that need to be applied across the site can be annoying.

For a couple of these applications I’m considering migrating them to a static site generator. That would allow us to make changes, not worry about updating dependencies, and remove concerns about security. One feature though that seemed difficult to replace without a server-side component is search. So I’m newly interested in the problem of site search for static sites. Here’s how I added site search to this blog as a way to test out site search without a server-side component.

Before making this change I was just pointing out to a Google site search, which isn’t the kind of thing I could do for one of our sites at work. What I’m doing now is certainly more complex than a simple search box like that, but the middleman-search gem made it rather simple to implement. There were a few things that took me a little time to figure out, so I’m sharing snippets here to maybe save someone else some time.

First, if using this with Middleman 4 using the master version might help:

gem 'middleman-search', github: 'manastech/middleman-search'

Then the code to activate the plugin in config.rb was updated for the structure of my blog. The pages for tagging polluted the index so I added a very rudimentary way to skip over some paths from getting indexed. I also added a way to store the section of the site (as “group”) in order to be able to display that along with any search result.

activate :search do |search|
  search.resources = ['about/', 'blog/', 'bots/', 'bots-blog/', 'demos/',
    'experience/', 'presentations/', 'projects/', '/writing']
  search.fields = {
    title:   {boost: 100, store: true, required: true},
    content: {boost: 50},
    url:     {index: false, store: true}

  search_skip = ['Articles Tagged', 'Posts by Tag']

  search.before_index = do |to_index, to_store, resource|
    if search_skip.any?{|ss| ss ==}
    to_store[:group] = resource.path.split('/').first

When the site is built is creates a search.json file at the root (unless you tell it to put it somewhere else). In order to encourage the client to cache it, we’ll set our ajax request to cache it. As the site gets updated we’ll want to bust the cache, so we need to add “.json” to the list of extensions that Middleman will create a digest hash for and properly link to. The way of doing this that is in all of the documentation did not work for me. This did, but required spelling out each of the extensions to create a hash for rather than just trying to append “.json” to asset_hash.exts.

activate :asset_hash do |asset_hash|
  asset_hash.ignore = [/demos/]
  asset_hash.exts = %w[ .css .js .png .jpg .eot .svg .ttf .woff .json ]

Now I just created a simple erb file (with frontmatter) to make up the search page. I’ve added a form to fallback to a Duck Duck Go site search.

title: Search

<%= javascript_include_tag 'search' %>


  <input type="text" id="search" placeholder="Search..." width="100%">

<div id="result-count"></div>

<div class="list-group searchresults">

<div id="duckduckgo-fallback-search">
  <p>If you can't find what you're looking for try searching this site via Duck Duck Go:</p>
  <form action="" method="get" role="search">
    <div class="form-group">
      <input class="search form-control" type="text" name="q" value=" " autocomplete="off">

And here’s the JavaScript, the beginnings of it borrowed from the middleman-search readme and this blog post. Unfortunately the helper search_index_path provided by middleman-search did not work–the method was simply never found. One magic thing that took me a long time to figure out was that using this helper was completely unnecessary. It is totally fine to just include the URL as /search.json and Middleman will convert it to the asset hash name when it builds the site.

The other piece that I needed to open the console for was to find out why the search results only gave me back documents with a ref and score like this: { ref: 6, score: 0.5273936305006518 }. The data packaged into search.json includes both the index and the documents. Once we get the reference to the document, we can retrieve the document to give us the url, title, and section for the page.

Updated 2016-09-23 to use Duck Duck Go as the fallback search service.

var lunrIndex = null;
var lunrData  = null;
// Download index data
  url: "/search.json",
  cache: true,
  method: 'GET',
  success: function(data) {
    lunrData = data;
    lunrIndex = lunr.Index.load(lunrData.index);

$(document).ready(function () {
  var duckduckgosearch = $('#duckduckgo-fallback-search');

  $('input#search').on('keyup', function () {
    // Get query
    var query = $(this).val();
    // Search for it
    var result =;
    // Output it
    var searchresults = $('.searchresults');
    var resultcount = $('#result-count');
    if (result.length === 0) {
      // Hide results
      if (query.length == 0) {
      } else {
    } else {
      // Show results
      resultcount.html('results: ' + result.length);
      for (var item in result) {
        // A result only gives us a reference to a document
        var ref = result[item].ref;
        // Using the reference get the document
        var doc =[ref];
        // Get the section of the site
        var group = " <span class='badge'>" + + '</span>';
        var searchitem = '<a class="list-group-item" href="' + doc.url + '">' + doc.title + group + '</a>';

That’s it. Solr-like search for a completely static site. Try it.

Simple Interoperability Wins with IIIF / Jason Ronallo

At the NCSU Libraries we recently migrated from Djatoka as our image server and from a bespoke user interface for paginated reading and search inside to a IIIF-compatible image server and viewers. While we gained a lot from the switch, we pretty quickly saw the interoperability wins gained from adopting the IIIF standards.

For a long while we had been using a book reader that I developed that wasn’t great, but that was better than the paginated readers that were out there at the time in a number of ways. Since then it hasn’t aged well compared to the most recent viewers. When we were moving towards IIIF it gave us the opportunity to evaluate our options again. I really like a lot of the features that UniversalViewer (UV) provides including a thumbnail navigation pane and support for the IIIF Content Search API with hit highlighting.

UniversalViewer embedded on the NCSU Libraries Rare and Unique Digital Collection site

While UV is somewhat responsive, it does not completely respond well to very narrow windows or small mobile device screens. We had spent a lot of effort with our past reader to provide a decent experience on mobile, so it was disappointing to take away functionality that we had provided to users for years.

While UV has plans to make progress soon on updating the interface to work better on mobile, we didn’t want to wait to push out the improved desktop experience to users. So like much of the progress on the web we relied on a fallback. Desktop and other large display users would get the more powerful UV interface, while narrow window and mobile users would get–something else. At first we just included the first image for a resource as a static image along with a PDF when available. So we didn’t provide any pan/zoom interface or way to see any other images for the resource. That at least let us get on to other things.

And here is where the story gets interesting and IIIF really comes into its own. It is worth understanding just a bit of how IIIF works to understand the next part. All we needed to create in order to implement UV was a IIIF Presentation manifest, which is a JSON-LD document with information on how to display (or present) the resource to a user including a list of all the images to display in order. You can see an example of a manifest here. While many probably still associate IIIF only with the Image API and the image servers that have developed around that specification, the real wow factor is with the Presentation API. The same manifest can be used in multiple viewers that know how to parse these manifests. And further the other viewer could be on our own site or within a completely different interface developed and hosted by someone else.

We looked at other viewers that have implemented the Presentation API and chose to use Leaflet-IIIF. Leaflet is a simple, mobile-friendly tiled map viewer, and Jack Reed developed a plugin to add support for IIIF Presentation manifests to Leaflet. This was almost all we needed, except the default way to change images in Leaflet-IIIF is via a layers control that works well enough for a few choices of map layers, but does not scale to a 20+ page book. I needed another way to move through the pages. It had been a long while since I had done anything with Leaflet, but once I had found Leaflet.Easybutton it was simple to create two buttons for next and previous pages, and the EasyButton plugin also helped with changing the states of the buttons when reaching the first or last page. Here’s what our current solution looks like in action:

The last bit was to show/hide the two different viewers based on media queries. While we could have tried to use JavaScript to choose the correct viewer and then switch between the two, initializing these types of viewers at different times and different sizes can sometimes be tricky. Based on my previous experience even hiding and showing these types of viewers could lead to many hours of frustration, so I was happy to see that both UV and Leaflet had no deal breaker issues with being repeatedly shown and hidden again as a browser window gets narrower and wider.

We hope this is a temporary solution, but I think it clearly shows how shared standards can be a big interoperability win. With the short deadline we had for pushing out this new functionality, having multiple viewers available meant that we could forge ahead while still providing a good experience for most users. Having multiple viewers available that cover different niches saved us countless hours of work.

You can see more of how we’ve implemented these viewers here:

Thanks to Simeon Warner for pointing out how this was a IIIF interoperability win. And here he created a short screencast of this in action as well:

Workflows and Tools / Brown University Library Digital Technologies Projects

Digital preservation is simultaneously a new and old topic. So many libraries and archives are only now dipping their toes into these complicated waters, even though the long-term preservation of our born-digital and digitized holdings has been a concern for a while now. I think it is often forgotten that trustworthy standard-bearers, like the Digital Preservation Management Workshop and The Open Archival Information System (OAIS) Model, have been around for over a decade. The OAIS Reference Model in particular is a great resource, but it can be intimidating. Full implementation requires a specific set of resources, which not all institutions have. In this way, comparing one’s own program to another which is further along in an attempt to emulate their progress is often a frustrating endeavor.

I’ve witnessed this disparity most notably at conferences. Conferences, unconferences, and colloquia can be really helpful in that people are (thankfully) very open with their workflows and documentation. It’s one of my favorite things about working in a library; there aren’t trade secrets, and there isn’t an attitude of competition. We celebrate each other’s successes and want to help one another. With that said, some of the conversations at these events are often diluted with tool comparison and institution-specific jargon. The disparity of resources can make these conversations frustrating. How can I compare our fledgling web archiving initiative with other institutions who have entire web archiving teams? Brown has a robust and well-supported Fedora repository, but what about institutions who are in the early stages of implementing a system like that? How do we share and develop ideas about what our tools should be doing if our conversations center around the tools themselves?

For our digital accession workflow, I’ve taken a different approach than what came naturally at first. I initially planned our workflow around the implementation of Artefactual’s Archivematica, but I could never get a test instance installed adequately. This, of course, did not stop the flow of digitized and born-digital material in need of processing. I realized I was trying to plan around the tool, when I wasn’t even sure what I needed to tool to do. Technology will inevitably change, and unless we have a basis for why a tool was implemented, it will be very difficult to navigate that change.


For this reason, I’ve been working on a high-level born-digital accessioning workflow where I can insert or take out tools as needed (see above). This workflow outlines the basic procedures of stabilizing, documenting, and packaging content for long-term storage. It has also been a good point of discussion among both internal and external colleagues. For example, after sharing this diagram on Twitter, someone suggested creating an inventory before running a virus scan. When I talked about this in our daily stand-up meeting, one of the Library’s developers mentioned that compressed folders may in fact strengthen their argument. Unless both the inventory and the virus scan account for items within a compressed folder, there is actually a risk that the scan might miss something. This is one example of the type of conversations I’d like to be having. It’s great to know which tools are available, but focusing strictly on tool implementation keeps us from asking some hard questions.

Crowne Plaza Shuttle / Access Conference

For Access and Hackfest attendees who are staying at the Crowne Plaza Fredericton Lord Beaverbrook, daily shuttle runs between the Crowne Plaza and the Wu Conference Centre have been arranged.

Tuesday, Oct. 4

  • Crowne to Wu: 7:30am & 7:50am
  • Wu to Crowne: 4:15pm & 4:35pm

Wednesday, Oct. 5

  • Crowne to Wu: 7:30am & 7:50am
  • Wu to Crowne: 4:30pm & 4:50pm

Thursday, Oct. 6

  • Crowne to Wu: 7:30am & 7:50am
  • Wu to Crowne: 5:10pm & 5:30pm

Friday, Oct. 7

  • Crowne to Wu: 7:30am & 7:50am
  • Wu to Crowne: 4:45pm

See? We love you that much! Thanks to the Crowne Plaza and UNB Conference Services for helping make this service available!

The Collective Approach: Reinventing Affordable, Useful, and Fun Professional Development / In the Library, With the Lead Pipe

In Brief:  In 2014, a small group of librarians at the University of Tennessee set out to redefine the library conference landscape. Frustrated by the high cost and lack of tangible skills and takeaways at professional organization gatherings, they conceived of a low-cost, high-value symposium where academic librarians might learn, create, and collaborate together. The resulting event, The Collective, first took place in February 2015 and is now an annual opportunity for librarians from around the US and the globe to redefine scholarly communication and professional development in a fun and creative platform. The Collective model offers a practical and repeatable blueprint for other librarians or academics looking to further reinvent and revolutionize their continuing education and convocation.

by Ashley Maynor and Corey Halaychik


In 2014, a small group of librarians at the University of Tennessee set out to redefine the library conference landscape. Frustrated by the high cost and lack of tangible skills and takeaways at professional organization gatherings, they conceived of a low-cost, high-value symposium where academic librarians might learn, create, and collaborate together. The resulting event, The Collective, first took place in February 2015 and is now an annual opportunity for librarians from around the US and the globe to redefine scholarly communication and professional development in a fun and creative platform. The Collective model offers a practical and repeatable blueprint for other librarians or academics looking to further reinvent and revolutionize their continuing education and convocation.

Current Professional Development Landscape

There are a number of professional organizations that service library professionals, many of which offer annual conferences where librarians come together to share knowledge, skills, and learn about new products. These gatherings, however, tend to be costly for participants, rely heavily on trade industry sponsorships (that may impact or influence programming decisions), and tend towards “show and tell” or “sage on a stage” presentations with little time dedicated towards audience interaction. Few, if any, offer transparency in their review process (i.e. sharing reviewer names or qualifications, disclosing the procedure or nature of the review process, sharing feedback with submitters, etc.). There is also often a large span of time between the proposal process and conference itself; as just one example, the American Library Association solicits proposals fifteen months before the annual conference gathering.

At their worst, the result is stale programming, bait and switch session descriptions, and undue corporate influence on panels and program content with a low return on the registration fee and cost of attendance for conference goers. Discounts are often offered tiered or offered only to select individuals. It is common for conferences to offer “early bird” registrations and special rates for, presenters, organizers, or other privileged individuals. Furthermore, many conferences highlight differences among attendees types using paraphernalia, such as ribbons, to designate organizers, presenters, sponsors, committee members, and the like as “special” attendees. The gatherings are often large (400+). The size combined with the typical presentation format often translates into an intimidating environment for connecting with new people.

Figure 1: 2015 Registration and lodging costs and conferred benefits by conference. Data taken from official conference websites.

The Collective Mission & Values

The Collective is designed as an alternative to these large, expensive, traditional professional gatherings that compose the professional development landscape. Figure 1 above shows how The Collective measures in terms of its costs and confered benefits compared to some of the most well-known conferences for academic librarians. Its mission is to support learning, networking, and kick-starting new ideas among next-generation librarians and library stakeholders where the participants determine the content.

The Collective seeks to achieve the following goals:

  • To dissolve the traditional conference divide between “presenters” and “attendees” by making everyone active participants.
  • To make a low-cost event— where all participant costs are subsidized and everyone, even conference organizers, pay an equally low registration fee. We believe participants should receive more value than the registration fee paid, as opposed to the traditional profit-generating conference.
  • To eliminate vendor expo areas and create an event climate where vendors are treated as colleagues who can learn and collaborate with us to better serve our users. We believe developing relationships is far more effective than hard sales and we think sessions should contain content, not sales pitches.
  • To have a practitioner-oriented gathering—aimed at librarians on the front lines rather than highlighting administrators or those in positions of top-level leadership.
  • To offer interactive sessions—no “sage on a stage,” with an emphasis on tangible takeaways, networking, conversation, hands-on activities, and collaboration.
  • To offer transparency and fairness in the proposal review process. Session content is solicited through a public forum (see Figure 2) with a public voting process combined with a blind peer-review and committee review. We offer feedback on all submissions, including all non-accepted proposals.To help keep our content relevant, we shorten the lag between proposals and the event; ours is less than six months.

To help librarians “learn, create and collaborate” as our tagline professes, we have carefully designed our programming process to support these goals.


The quality of a conference and its utility to participants is often in correlation to the quality of its programming, so we sought to program The Collective in a new way to ensure high quality content. The overall style of The Collective draws on the best of conferences and un-conferences alike, including THATCamp and The Conversation (a unique coming together in the film industry in 2008 and 2010). We hope to achieve a balance of the flexibility, surprise, and active participation of an unconference combined with the organization, programming rigor, and geographic diversity of a national or international conference.

For the main program, conference organizers solicit session topics, ideas, and feedback through a transparent, inclusive three-step process. Rather than create a niche conference to serve a limited type of librarians, we solicit ideas each year around a broad theme that encourages cross-pollination among attendee types/librarian roles. Past themes include Libraries as Curators & Creators (2015), Adopt, Adapt, Evolve: Reinvigorating & Rejuvenating Our Libraries (2016), and Make It Beautiful, Make It Usable (2017).

First, ideas for conference sessions are solicited through a public “Session Picker,” an online, public idea generation, commenting, and voting platform inspired by the SXSW Interactive Conference PanelPicker (Figure 2). First round submissions are quick and easy: all that’s required is a title, three-sentence description, and indication of session format. The formats encouraged include but are not limited to lightning talks, pecha kucha, dork shorts, interactive plenaries, interactive panels, roundtable discussions, hands-on workshops, Q&A sessions, small group breakouts, skill-building workshops, make hack and play sessions.

Figure 2: A screenshot from the 2017 Session Picker.

While some conferences include workshops for additional fees pre- or post conference, we aim to make every single session a worthwhile opportunity for hands-on learning, discussion, skill-building, and/or networking with no special fees or paid events. Proposals are solicited through dozens of listservs, on The Collective website, through social media the summer/fall before the gathering, and strategic partnerships. At this early proposal stage, all presenters in a session do not have to be known; in fact, we encourage prospective attendees to use the platform to network outside their known circle to find additional presenters to collaborate with. Collective organizers will also assist with finding session collaborators via a call for participation. Lastly, unlike many conferences, individuals are free to suggest sessions they might find interesting but they themselves won’t directly be involved in organizing.

When the picker closes, a programming committee of academic librarians reviews the proposals for feasibility, diversity, interest (based in part on public votes), and quality. At this stage, some submitters might be encouraged to combine or collaborate on proposals if theirs are similar. Most proposals are invited to round two – peer review. Those that do not make it to the second round are rejected due to content overlap, lack of interactivity, or otherwise failing meet the spirit of The Collective motto (i.e. “learn, create, collaborate”).

In the second phase, invitations are sent to round one applicants who are asked to submit more detailed proposals, roughly 200-500 words, with special attention to the format and interactivity of their session. Submitters are encouraged to outline any handouts/tip sheets, tangible takeaways or skills, or anticipated outcomes of their session. Each of these proposals is then scored on a rubric (Figure 3) and commented on by at least two (and usually three) outside reviewers. The review panel is constituted of a rotating combination of academic librarians, library product designers/vendors, and independent information professionals. While reviewers do not know the identity of the submitters, the reviewer names and qualifications are posted on the conference website. We also screen all submissions for any obvious conflicts of interest and assign reviews according to the topic of the session (as designated by the submitter) vis-à-vis a particular reviewer’s qualifications (Figure 4).

Reviewers are asked to evaluate submissions on relevance to the upcoming Collective’s theme, whether or not the topic is timely and current, the interest or novelty of the session’s approach, whether or not the presentation is conducive to participation, and evidence that the idea can and will be further developed before the actual event (Figure 3). Sessions are then ranked according to their public vote and peer review score before the final phase, program committee review.

Figure3Figure 3: A scoring rubric from the 2015 Collective peer review.

Figure4Figure 4: Topic areas for proposals for the 2016 Collective and for reviewer expertise.

The programming committee carefully examines each proposal, its ranking, and the balance among session types or focus for making final selections. Regardless of whether or not a session is selected for inclusion, each submitter receives their rubric scores, vote tally, and anonymized comments from the peer review. Often, non-accepted proposal submitters are invited to collaborate with accepted sessions or may reply to open calls for participation for accepted sessions. Because of the emphasis on interactive sessions, the traditional hierarchy of presenter and non-presenter is subverted; even those not presenting will have much to gain from attending The Collective.

Finally, the organizers of all accepted sessions are required to participate in a planning phone call with a member of The Collective’s programming team. This phone call is used to assist in further development of interactive components, to discuss technical needs, decide on space requirements, and to troubleshoot any issues that session organizers are having. This personal, one-on-one contact helps ensure a smooth event, creates a personal connection before the occasion, and ensures that what is proposed on paper can be translated in person.

We strive to treat programming as ongoing relationships with professionals in our field rather than a simple “submit and forget it” process. The programming committee endeavors to develop relationships of support that begin with the proposal submission, continue with the planning call, and extend beyond the gathering as a long-term network of peers.

Importantly, The Collective’s programming is not limited to sessions from the open call. We also include two non-keynote plenary sessions. In the past, these have included an interactive discussion of library leadership and a board-game networking mashup. Day one of The Collective traditionally closes with the “Failure Confessions” – a series of lightning talks, followed by an open mic, where librarians share stories about spectacular failures we can all learn from.

To encourage informal networking and collaboration, our meeting space also features various pop-up “unconference” spaces, which include the IdeaLibrary, TinkerLab, and Shhh Room. The IdeaLibrary (Figure 5) is a space for impromptu brainstorming, networking, and discussion. We provide inspirational books on creativity, notepads, and other supplies to help participants kick-start conversations. The TinkerLab (Figure 6) is a mobile makerspace with some simple tools and kits for hands-on exploration and demonstrations of gaming, conductivity and DIY electronics, music-making, 3-D printing, and prototyping. Past available equipment included Creopop pens, Ozobot robots, 3Doodlers, Makey-Makeys, and LittleBits electronic kits. The Shhh Room is a space free of cell phones and computers and dedicated to quiet reflection. The space is also equipped with yoga mats, seat cushions, and meditative coloring books.


Figure 5: Books and a postcard mailing station in the 2016 IdeaLibrary.


Figure 6: Photos from the 2016 TinkerLab.

Because of the highly interactive hands-on nature of the sessions, we do not stream or record sessions. Instead, we emphasize real-time, face-to-face interaction. Instead, , to encourage that takeaways travel home with the attendees, we aim to document and share information from each year through community notetaking.. Each session has a designated volunteer notetaker who takes notes in an open Google document, which is open for editing and additions from all participants. Documents, such as slides, handouts, and takeaways are shared through both Sched, our online schedule manager, and through Google Docs post-conference.

The conference closes with a door prize raffle—open to everyone who completes our conference feedback survey. Immediately following the raffle, we host an open mic for sharing best conference lessons and feedback. The unedited, anonymous survey results are made public each year and are used for continuous re-thinking and improving of future events.

Networking & Local Connections

A major focus of The Collective is providing multiple opportunities for attendees to network with their peers. This is an important aspect of The Collective which builds a sense of community among attendees and, more importantly, presents opportunities for future collaboration outside of the annual gathering. We believe that the best networking events are those that build camaraderie through informal, shared experiences. We also use our networking events as a way to introduce attendees to the city of Knoxville as a great place to live and work by highlighting the best of Knoxville’s local dining and entertainment scenes. Professional gatherings often seem to take place in a vacuum, confined to the grounds of sterile conference centers. At The Collective, however, we’ve made concerted efforts to emphasize the place of our gathering—Knoxville, Tennessee, and just minutes from the University of Tennessee—and conduct the business of the meeting so as to benefit the community we call home.

Rather than host lunch or dinner at a hotel or conference center, we make it easy for participants to explore the city by locating our gathering a few minutes’ walk to downtown, providing custom maps of the city with recommended dining spots, and hosting our social events outside of the meeting space. Our first social activity offering is optional dine-arounds the evening before the conference: dutch-style dinners at downtown restaurants arranged by local hosts. These small group dinners provide an easy way to dine well in Knoxville and get to know attendees ahead of the main event. Each group of diners is led by a local host that not only leads the group’s walk to and from dinner but also answers questions about Knoxville attractions, what to expect at The Collective, etc.

We also partner with our local tourism board, VisitKnoxville, to use uncommon spaces for our complimentary reception and dinner for all attendees. In 2015, we hosted a Blues and BBQ night atop the iconic Sunsphere, built for the 1982 World’s Fair. In 2016, we organized a Southern Speakeasy at The Standard event venue in downtown Knoxville where the fifteen-piece Streamliners Jazz Orchestra performed while participants had photos taken in our photo booth, and enjoyed cocktails, a catered dinner from locally owned Rickard Ridge Barbeque, and nationally renowned Mag-Pies cakes.

Other community efforts include working with local artists to design artwork for our conference tote, hiring local musicians for our reception, sourcing branded swag from local suppliers, and using locally owned restaurants and bakers to cater our receptions. We’ve also scheduled our event during a typical low tourist season for our city. This scheduling not only provides a benefit to our community but also to our participants: hotels offer great room rates during this time of year to help our attendees maximize their travel budgets.

Finally, each closing night, we invite attendees to join in a“Literary Libations” pub crawl, organized in partnership with VisitKnoxville, our local tourism bureau. Local establishments offer appetizer specials and literary-themed cocktails to entice Collective attendees to come out and socialize; in 2016, the closing night outing wrapped up with an exclusive party Collective attendee-only event at the Oliver Royale restaurant and bar. This self-directed socializing showcases our city and also helps to boost our local economy in the off-season, which makes it an attractive partnership opportunity for venues.

Incentives & Partnerships

In the same way we’re breaking the mold of how to host a gathering, we also seek to redefine how an organization works with event sponsors. First, we don’t exchange money for programming directly or implicitly. Instead, we gladly work with vendors who wish to submit session proposals to move beyond the “show and tell” and sales demonstrations that are so common in the conference landscape. We require that vendor sessions adhere to the same standards of participation of other sessions and must not skew towards a sales pitch.

Rather than looking for one or two sponsors to foot the bill of the conference, we ask for small amounts of money from many different organizations. We keep it simple by offering sponsorship opportunities set at two funding levels. Each level provides a package of benefits including registrations, sponsorship acknowledgment, and, in the case of the higher funding tier, opportunity to include marketing materials in the conference tote. We aim to foster interactions between vendor representatives and librarians who normally wouldn’t interact with one another, which helps to redefine vendors as not “other” than librarians but instead as thoughtful partners who share the same commitment to getting the right information to the right user at the right time.

In this spirit, we do not provide a vendor expo area or marquee or sponsored programming sessions. This approach helps ensure that we don’t privilege one product or service over another and also means smaller businesses and start-ups have a chance to participate in our unique event, which we believe fosters meaningful and long-term relationships with librarians instead of elevator pitches as folks walk the conference floor.

We keep our conference bag design free of any advertising and our commemorative drink bottle only displays The Collective logo. Furthermore, we highly encourage sponsoring organizations to send technical experts, product development managers, and user experience employees in the place of sales staff as a way to connect product users directly to those designing the products.

Additionally, we create opportunities for all sponsor representatives to participate as equals during the gathering. All attendees (regardless of status) are invited and encouraged to participate in any session they’d like to. By removing the sales-focused channels that frequently occur at conferences (expos, product demos, etc.) we believe this also removes the tension between vendors and librarians. Because vendors aren’t tied up focusing on sales, they are free to participate in the sessions. Both of these factors help create an environment of idea exchange where everyone has the ability to redefine the roll between vendor and librarian as adversarial to one of collaborative partnership.

Finally, The Collective was started in large part to relieve discontent with the state of many contemporary conferences. While we are excited about what the Collective provides, we believe the movement to change the landscape is more important than any one particular conference or “brand.” We therefore welcome organizers from other conferences to participate in The Collective and willingly share our documentation and provide advice to anyone interested in starting their own non-profit, regional gatherings.

Logistics & Funding

The size, scheduling, and event overlap can greatly color the participant’s experience of a conference. So, while we encourage and receive registrations from more than 36 states and 140 organizations, we intentionally limit the size of the gathering (170 in 2015, 225 in 2016, 300 in 2017) and don’t plan to grow beyond an annual attendance of 350, so that attendees can easily meet new people and not feel intimidated by the environment. We use software to make our session schedule easy to use and have no more than four concurrent sessions happening at any one time. The programming committee takes great pains to distribute session content so that like-themed sessions are not competing in the same time slots and that there’s something for everyone during each session period. We also keep the pop-up spaces open for the duration of the gathering, so that there is always an informal unconference alternative to any planned session.

The cost of any professional development can be significant and how funds are used equally shapes the conference goers’ experience. Our smaller sponsorship level approach requires more entities to be contacted about funding, so we dedicate a significant amount of time to fundraising with the campaign typically starting around the time of ALA Annual (June/July preceding The Collective) and continuing until the first day of the event. Further complicating this funding challenge is our pledge to be an affordable alternative to large mega-conferences. The cost of attendance is heavily subsidized by sponsorship funds with the registration fees covering roughly half of the actual cost of the event per person, so we must raise more funds than the typical event since costs are not passed on to attendees.

While the amount and source of funding received matters to us, equally important is how funds are used. The Collective employs a number of strategies for being good stewards of sponsorship dollars so that we can do more with less. First, we use a mixed approach to procuring goods and services; when possible, we borrow material from other institutions, such as the University of Tennessee, the University of Dayton, or North Carolina State University libraries, especially for our TinkerLab. Borrowing materials such as laptop carts, additional microphones, etc., significantly cuts down on the expense of buying or renting overpriced AV technology.

We also think carefully about our food and beverage costs at the meeting venue. Rather than pay exorbitant amounts for soft drinks (a canned cola at one venue was $4.00 per can consumed) or bottled water, we provide endless coffee and supply each attendee with a beautiful commemorative bottle for water, which has the added eco-friendly benefit. We also minimize the size of our printed program (to save paper and printing costs) and instead encourage attendees to use the free Sched mobile app for our schedule.

These simple savings free up funds to purchase supplies that directly support presenters, inspire creativity, and help off-set total costs. We also strive to purchase materials that can be used beyond the event as a way to demonstrate value to our institutional sponsor and promote The Collective year-round. For example, many of the demo tools and equipment we purchase for the TinkerLab can be used by the University of Tennessee Libraries, our primary institutional sponsor, throughout the year for their studio and makerspaces. While it’s a hassle to store reception décor items, we’ve found that purchasing linens (and paying to have them laundered), purchasing reusable items such as LED candles and the like is significantly lower than the rental cost for these supplies.

Our choice of host city is also key in keeping our costs down. We are able to leverage existing community relationships of The Collective co-directors, resulting in discounts, cost-in-kind donations of goods and services, and unique experiences for attendees. As mentioned earlier, our making The Collective a “Knoxville event” allowed us to partner with the local, VisitKnoxville tourism office.  VisitKnoxville has assisted in negotiating conference and reception venues, resulting in lower room rates, discounted reception expenses, and free meeting venue space with a reasonable minimum food and beverage spend.

We also strive to keep our marketing efforts affordable by using a combination of low- or no-cost marketing channels. Low-cost efforts involve using printed marketing material sparingly and ensuring that material is not tied to a specific year so that it may be reused. We also take advantage of free modes of communication including social media and listservs to advertise both calls for proposals and to share registration updates.

We rely on an energetic volunteer staff of individuals as passionate as we are about revolutionizing the professional development landscape. Our collective, unpaid work means we can program, plan logistics, and maintain our website and marketing at no cost to the conference participant.

Building A Community

Building a community requires an online presence and identity, social media and sharing, and year-round activities. When we created The Collective, we spent many hours working with a group to develop an inspiring logo and website that represents our mission: to learn, create, and collaborate. As our logo’s Venn diagram illustrates, we look for connections, overlap, and opportunities among seemingly disparate groups and ideas.

During the event, we highly encourage the use of Twitter, Instagram, and Facebook posts to share information and takeaways. Additionally, our collaborative note-taking documents live on via our website for those unable to attend in person or those wishing to review the content of past gatherings.

We also design our conference swag to provide year-round brand presence and awareness. Our unique conference totes offer fun designs that are free of any marketing and sure to to conversation topics. Our complimentary drink bottles are high-quality and branded only with The Collective logo to help attendees remember their great experience throughout the year.

While The Collective’s unique approach to professional development offers plenty of opportunities for networking and collaboration during the gathering, we believe that the good work being born from it shouldn’t end when the event does. We therefore have focused on building a year-round community among our attendees: we use social media to both keep our alumni and future participants informed and as a way for them to connect once the annual event has concluded to promote the work of and celebrate the success of past attendees. Social media is also used to advertise meetups between Collective attendees at other conferences. Finally, this article itself is an attempt to created documentation and share with like-minded individuals interested in hosting a similar event.

The focus we have given to community building has paid off in terms of an increasing number of and increased involvement from attendees year after year. In our first year, we were surprised to have attendance from over 170 individuals from 31 states. (We would have been pleased with a regional gathering of 70!) In 2016, we moved to a larger venue and attendance between 2015 and 2016 increased by 40% with participants hailing from over 140 institutions. This diversity has been especially helpful in gaining wider points of view with regards to programing preferences and we are especially excited to see a growing range of geographic diversity with our first international participants from Canada and Sweden in 2016.


With two successful gatherings completed and a third in planning, we believe we have found a recipe for organizing and executing a successful and useful professional development event. Those wishing to revitalize or start their own event can employ the following tips to ensure their own success:

  1. There’s no substitute for excellent content. Make programing the main focus of the event; aim to attract and offer sessions that allow for learning, creating, and collaboration. Keep activities fresh, ensure participants walk away with tangible skills, and open the door to the sharing of ideas and future collaboration. We strongly suggest that traditional “sage on the stage” style conference presentations be eschewed and aim for hands-on or highly interactive sessions that make the participants the focus instead of the presenter. This interactivity brings more people into the conversation, opens the door for better discovery, higher interaction, and builds a stronger sense of community.
  2. Make the event about everyday practitioners. Administrators can certainly bring a wealth of knowledge, level of expertise, and unique point of view to sessions, but we believe that the ratio of administrators to practitioners at a conference should be reflective of the real-world landscape. This ensures that those individuals who are in the field have an opportunity to both share their experiences and learn from colleagues who face the same daily challenges. Furthermore, all sessions should offer an opportunity for the free exchange of ideas to occur. No one should be harassed or put down for their ideas; while dissent is an important aspect of discussion it should be done in a respectful manner and in an atmosphere of openness.
  3. Because librarians don’t work in a vacuum, we believe professional development shouldn’t either. Conferences planned around a broad theme rather than a job specialization facilitates cross-pollination between various groups and stakeholders which can lead to better understanding of personal roles and creative solutions to common challenges. It also opens the door for broader collaboration between the multitude of specializations that exist in today’s universities.

Finally, work hard to keep costs down. Professional development shouldn’t feel like tithing and participants will be more energized – and likely to return – if they feel the value to cost ratio is high. Keeping registration rates affordable also lowers the entry threshold for librarians with small or non-existent travel budgets. This creates a broader range of experiences, opinions, and points of view during sessions which improves the overall quality of idea exchanges taking place.


Many thanks to the peer reviewers for this article, Bethany Messersmith and Kathy Hart, and publishing editor Sofia Leung for their contributions. Thanks also to the many volunteers and attendees of The Collective who have made our dream of better professional development a reality.

Works Cited & Further Reading

THATCamp – The Humanities and Technology Camp – is “an open, inexpensive meeting where humanists and technologists of all skill levels learn and build together in sessions proposed on the spot.” Read more about this inexpensive unconference format here:

The Conversation – This grass-roots gathering was “developed by a group of filmmakers, investors, entrepreneurs, journalists, and consultants interested in the new creative and business opportunities — and new ways of connecting with audiences.” It took place at a pivotal moment for film distribution in 2008 and 2010. See for more information.

SXSW Festival PanelPicker – South by Southwest uses a three-part process to select content for its annual Interactive Conference that combines public votes, an advisory board, and staff. This format helped inspire our three-part programming process. Read about it here:

The Collective –

New Theses & Dissertations site / Brown University Library Digital Technologies Projects

Last week, we went live with a new site for Electronic Theses and Dissertations.


My part in the planning and coding of the site started back in January, and it was nice to see the site go into production (although we do have more work to do with the new site and shutting down the old one).

Old Site


The old site was written in PHP and only allowed PhD dissertations to be uploaded. It was a multi-step process to ingest the dissertations into the BDR: use a php script to grab the information from the database and turn it into MODS, split and massage the MODS data as needed, map the MODS data files to the corresponding PDF, and run a script to ingest the dissertation into the BDR. The process worked, but it could be improved.

New Site

The new site is written in Python and Django. It now allows for Masters theses as well as PhD dissertations to be uploaded. Ingesting the theses and dissertations into the BDR will be a simple process of selecting the theses/dissertations in the Django admin when they are ready to ingest, and running the ingest admin action – the site will know how to ingest the theses and dissertations into the BDR in the correct format.

Nicolini (5) / Ed Summers

In Chapter 5 Nicolini takes a look at how practice theories have been informed by activity theory. Activity theory was pioneered by the psychologist Lev Vygotsky in the 1920s and 1930s. Since Vygotsky activity theory has grown and evolved in a variety of directions that are all characterized by the attention to the role of objects and an attention to the role of conflict or dialectic in human activity. Nicolini spends most of the chapter looking specifically at cultural and historical activity theory.

He starts off by recalling the previous discussion of Marx, particularly his description of work in Das Kapital, where work is broken up into a set of interdependent components:

  1. the worker
  2. the material upon which the worker works
  3. the instruments used to carry out the work
  4. the actions of the worker
  5. the goal towards which the worker works
  6. the product of the work

The identity of the worker is a net effect of this process. Vygotsky and other activity theorists took these rough categories and refined them. Vygotsky in particular focused attention on mediation, or how we as humans typically interact with our environments using cultural artifacts (things designed by people) and that language itself was an example of such an artifact. These artifacts transform the person using them, and the environment: workers are transformed by their tools.

Instead of focusing on individual behavior, activity theorists often examine how actions are materially situated at various levels: actions, activities and operations which are a function of thinking about the collective effort involved. This idea was introduced by Leont’ev (1978). Kuutti (1996) is cited a few times, which is interesting because Kuutti & Bannon (2014) is how I found out about Nicolini in the first place (small world). To illustrate the various levels Leont’ev has an example of using the gears in a car with manual transmission, and how a person starts out performing the individual actions of shifting gears as they learn, but eventually they become automatic operations that are performed without much thinking during other activities such speeding up, stopping, going up hills, etc. The operations can also be dismantled and reassembled and recomposed to create new actions. I’m reminded of push starting my parent’s VW Bug when the battery was dead. The example of manual transmission is particularly poignant because of the prevalence of automatic cars today, where those shifting actions have been subsumed or embodied in the automatic transmission. The actions can no longer be decomposed, at least not by most of us non-mechanics. It makes me wonder briefly about the power dynamics are embodied in that change.

It wasn’t until Y. Engeström (1987) that the focus came explicitly to bear on the social. Yrjö Engeström (who is referenced and linked in Wikipedia but there is not an article for him yet) is credited for starting the influential Scandinavian activity theory strand of work, and helping bring it to the West. The connection to Scandinavia makes me think about participatory design which came from that region, and what connections there are between it and activity theory. Also action research seems similarly inflected, but perhaps it’s more of a western rebranding? At any rate Engeström got people thinking about an activity system which Nicolini describes as a “collective, systemic, object-oriented formation”, which is summarized with this diagram:

This makes me wonder if there might be something in this conceptual diagram from Engeström for me to use in analyzing my interviews with web archivists. It’s kind of strange to run across this idea of object-oriented again outside of the computer science context. I can’t help but wonder how much cross-talk there was between psychology/sociology and computer science. The phrase is also being deployed in humanistic circles with the focus on object oriented ontology, which is a philosophy of decentering the human. It’s kind of ironic given how object-oriented programming has fallen out of favor a bit in software development, with a resurgence of interest in functional programming. But functions can be objects, but then there is the post-functionalism move, so…but I digress, completely.

This is where Cultural and Historical Activity Theory (CHAT) come in, which is concerned with the ways in which objects are both socially and objectively constructed. It seems like an attempt at a middle path between social sciences and the physical sciences. This focus on the material and where it intersects with the social is something I really like about this line of thought coming from Marx. It’s interesting that Nicolini uses the phrase “bite back” here to talk about how objects can affect us as well. I seem to remember running across this idea in some of Ian Bogost’s work but I can’t place it right now. It’s an interesting phrase that might be fun to follow in the literature. Anyway CHAT is a relatively recent formulation (relative to activity theory) and credited to Miettinen & Virkkunen (2005). It seems like a useful thing to follow up on in the context of my Web archiving study because I do need a way to talk about the material of the archive, the archival system, the Web and the people working with it (archivists, researchers, etc)–and fitting the analysis into existing work will be helpful.


  • objects and agents emerge together and define each other (Miettinen & Virkkunen, 2005)
  • the object is inherently fragmented (never completely visible, seen in multiple ways)
  • objects evolve: they are always contestable and (often) contested

CHAT folks will often look at at least two activity systems, to show how interactions between activities embed social practices: Knotworking. The resulting networks (???) looks like an important paper to read to follow up on this idea.

Activity systems are, in fact, by definition internally fragmented and inconsistent. The tensions and conflicts emerging from such contradictions constitute the origin and the source of energy for the continuous change and expansion of activity systems and their components. (p. 114)

I’m wondering if there is a connection between the idea of Broken-ness, fragmentation and broken world thinking & repair. The idea of Knotworking (Y. Engeström, Engeström, & Vähäaho, 1999) specifically recalls Jackson, Gillespie, & Payette (2014). I like the idea of zooming on sites of conflict or contradiction as a way of locating activities and practices, and seeing them as integral features, and locations for dialectical processes and resolution (Marx). CHAT also stresses that these sites are also useful as spaces for intervention and redesign. It is suggested that it might be necessary to engage at this level to truly understand the activity. [Antonio Gramsci] is cited here for his idea of organic intellectuals. Engestrom (2000) and Y. Engeström (2001) both look like they could be good things to read to follow up on this idea about interventions, particularly for its focus on ethnographic methods, the necessity to generate thick description. Also there is a connection back to American Pragmatism that seems like an important connection, at least for me (Miettinen, 2006).

It’s a bit early to say, but after reading this chapter about CHAT I feel like I’ve found the conceptual tool I was missing for analyzing my interview transcripts. It also situates my work on DocNow by positioning that work as an intervention for understanding, which is extremely helpful. Nicolini’s critique of a strong version of CHAT, one that treats the activity system itself into a thing itself seems very apt here. Also, some Marxists have criticized CHAT for its conservative use of Marx: fixing small local problems, without looking at the larger picture.


Engestrom, Y. (2000). Activity theory as a framework for analyzing and redesigning work. Ergonomics, 43(7), 960–974.

Engeström, Y. (1987). Learning by expanding: N activity-theoretical approach to developmental research. Orienta-Konsultit.

Engeström, Y. (2001). Expansive learning at work: Toward an activity theoretical reconceptualization. Journal of Education and Work, 14(1), 133–156.

Engeström, Y., Engeström, R., & Vähäaho, T. (1999). Activity theory and social practice: Cultural- historical approaches. In S. Chaiklin, M. Hedegaard, & U. J. Jensen (Eds.),. Aarhus University Press Aarhus,, Denmark.

Jackson, S. J., Gillespie, T., & Payette, S. (2014). The policy knot: Re-integrating policy, practice and design in CSCW studies of social computing. In Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing (pp. 588–602). Association for Computing Machinery.

Kuutti, K. (1996). Activity theory as a potential framework for human-computer interaction research. Context and Consciousness: Activity Theory and Human-Computer Interaction, 17–44.

Kuutti, K., & Bannon, L. J. (2014). The turn to practice in HCI: Towards a research agenda. In Proceedings of the 32nd annual ACM Conference on Human Factors in Computing Systems (pp. 3543–3552). Association for Computing Machinery. Retrieved from

Leont’ev, A. N. (1978). Activity, consciousness, personality. Prentice Hall.

Miettinen, R. (2006). Epistemology of transformative material activity: John dewey’s pragmatism and cultural-historical activity theory. Journal for the Theory of Social Behaviour, 36(4), 389–408.

Miettinen, R., & Virkkunen, J. (2005). Epistemic objects, artefacts and organizational change. Organization, 12(3), 437–456.

Academic year / William Denton

I work at a university library, and when I analyse data I like to arrange things by academic year (September to August) so I often need to find the academic year for a given date. Here are Ruby and R functions I made to do that. Both are pretty simple—they could be better, I’m sure, but they’re good enough for now. They use the same method: subtract eight months and then find the year you’re in.

The Ruby is the shortest, and uses the Date class. First, subtract eight months, with <<.

d « n: Returns a date object pointing n months before self. The n should be a numeric value.

Rather cryptic. Then we find the year with .year, which is pretty clear. This is the function:

require 'date'

def academic_year(date)
  (Date.parse(date) << 8).year


> academic_year("2016-09-22")
=> 2016

The function is very short because Ruby nicely handles leap years and months of varying lengths. What is 30 October 2015 - eight months?

> Date.parse("2015-10-30") << 8
=> #<Date: 2015-02-28 ((2457082j,0s,0n),+0s,2299161j)>

2016 is a leap year—what is 30 October 2016 - eight months?

> Date.parse("2016-10-30") << 8
=> #<Date: 2016-02-29 ((2457448j,0s,0n),+0s,2299161j)>

Sensible. And the function returns a number (a Fixnum), not a string, which is what I want.

In R things are more complicated. How to subtract months from a date in R? gives a few answers, but none are pretty. Using lubridate makes things much easier (and besides, I use lubridate in pretty much everything anyway).


academic_year <- function(date) {
  as.integer(format(floor_date(floor_date(as.Date(date), "month") - months(8), "year"), "%Y"))


> academic_year("2016-09-22")
[1] 2016

The floor_date function gets called twice, the first time to drop back to the start of the month, which avoids R’s problems dealing with leap years:

> as.Date("2016-10-30") - months(8)
[1] NA

But you can always subtract 8 months from the first of a month. Then the function goes to 01 January of that year, pulls out just the year (“%Y”) and returns it as an integer. I’m sure it could be faster.

And once the academic year is identified, when making charts it’s nice to have September–August on the x axis. I often do something like this, with a data frame called data that has a date column:

library(dplyr) # I always use it

data <- data %>% mutate (month_name = month(date, label = TRUE))
data$month_name <- factor(data$month_name, levels = c("Sep", "Oct", "Nov", "Dec", "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug"))

Finding the academic year of a date could be a code golf thing, but Stack Overflow has too many rules.

Where Did All Those Bits Go? / David Rosenthal

Lay people reading the press about storage, and even some "experts" writing in the press about storage, believe two things:
  • per byte, storage media are getting cheaper very rapidly (Kryder's Law), and
  • the demand for storage greatly exceeds the supply.
These two things cannot both be true. Follow me below the fold for an analysis of a typical example, Lauro Rizatti's article in EE Times entitled Digital Data Storage is Undergoing Mind-Boggling Growth.

Why can't these statements both be true? If the demand for storage greatly exceeded the supply, the price would rise until supply and demand were in balance.

In 2011 we actually conducted an experiment to show that this is what happens. We nearly halved the supply of disk drives by flooding large parts of Thailand including the parts where disks were manufactured. This flooding didn't change the demand for disks, because these parts of Thailand were not large consumers of disks. What happened? As shown in this 2013 graph from Preeti Gupta, the price of disks immediately nearly doubled, choking off demand to match the available supply, and then fell slowly as supply recovered.

So we have two statements. The first is "per byte, storage media are getting cheaper very rapidly". We can argue about exactly how rapidly, but there are decades of factual data recording the drop in cost per byte of disk and other storage media (see Preeti's graph). So it is reasonable to believe the first statement. Anyone who has been buying computers for a few years can testify to it.

The second is "the demand for storage greatly exceeds the supply". The first statement is true, so this has to be false. Why do people believe it? The evidence for the excess of demand over supply in Rizatti's article is this graph. Where does the graph come from? What exactly do the two bars on the graph show?

The orange bars are labeled "output", which I believe represents the total number of bytes of storage media manufactured each year. This number should be fairly accurate, but it overstates the amount of newly created information stored each year for many reasons:
  • Newly manufactured media does not instantly get filled. There are delays in the distribution pipeline - for example I have nearly half a terabyte of unwritten DVD-R media sitting on a shelf. This is likely to be a fairly small percentage.
  • Some media that gets filled turns out to be faulty and gets returned under warranty. This is likely to be a fairly small percentage.
  • Some of the newly manufactured media replaces obsolete media, so isn't available to store newly created information.
  • Because of overhead from file systems and so on, newly created information occupies more bytes of storage than its raw size. This is typically a small percentage.
  • If newly created information does actually get written to a storage medium, several copies of it normally get written. This is likely to be a factor of about two.
  • Some newly created information exists in vast numbers of copies. For example, my iPhone 6 claims to have 64GB of storage. That corresponds to the amount of newly manufactured storage medium (flash) it consumes. But about 8.5GB of that is consumed by a copy of iOS, the same information that consumes 8.5GB in every iPhone 6. Between October 2014 and October 2015 Apple sold 222M iPhones, So those 8.5GB of information are replicated 222M times, consuming about 1.9EB of the storage manufactured in that year.
The mismatch between the blue and orange bars is much greater than it appears.

What do the blue bars represent? They are labeled "demand" but, as we have seen, the demand for storage depends on the price. There's no price specified for these bars. The caption of the graph says "Source: Recode", which I believe refers to this 2014 article by Rocky Pimentel entitled Stuffed: Why Data Storage Is Hot Again. (Really!). Based on the IDC/EMC Digital Universe report, Pimentel writes:
The total amount of digital data generated in 2013 will come to 3.5 zettabytes (a zettabyte is 1 with 21 zeros after it, and is equivalent to about the storage of one trillion USB keys). The 3.5 zettabytes generated this year will triple the amount of data created in 2010. By 2020, the world will generate 40 zettabytes of data annually, or more than 5,200 gigabytes of data for every person on the planet.
The operative words are "data generated". Not "data stored permanently", nor "bytes of storage consumed". The numbers projected by IDC for "data generated" have always greatly exceeded the numbers actually reported for storage media manufactured in a given year, which in turn as discussed above exaggerate the capacity added to the world's storage infrastructure. So where were the extra projected bytes stored?

The assumption behind "demand exceeds supply" is that every byte of "data generated" in the IDC report is a byte of demand for permanent storage capacity. In a world where storage was free there would still be much data generated that was never intended to be stored for any length of time, and would thus not represent demand for storage media.

In the real word, where Storage Will Be Much Less Free Than It Used To Be, there are at least two answers to the question Why Not Store It All? Data costs money to store and, as Maciej Cegłowski's Haunted By Data points out, it costs a whole lot more when it leaks.

WD results
There's another way of looking at the idea that Digital Data Storage is Undergoing Mind-Boggling Growth.  What does it mean for an industry to have Mind-Boggling Growth? It means that the companies in the industry have rapidly increasing revenues and, normally, rapidly increasing profits.

Seagate results
The graphs show the results for the two companies that manufacture the bulk of the storage bytes each year. Revenues are flat or decreasing, profits are decreasing for both companies. These do not look like companies faced by insatiable demand for their products; they look like mature companies facing increasing difficulty in scaling their technology.

For a long time, discussions of storage have been bedevilled by the confusion between IDC's projections for "data generated" and the actual demand for storage media. Don't get fooled. Any articles using the IDC/EMC numbers as storage demand can be ignored.

The Lenovo X240 Keyboard and the End/Insert Key With FnLk On as a Software Developer on Linux / Jason Ronallo

As a software developer I’m using keys like F5 a lot. When I’m doing any writing, I use F6 a lot to turn off and on spell correction underlining. On the Lenovo X240 the function keys are overlaid on the same keys as volume and brightness control. This causes some problems for me. Luckily there’s a solution that works for me under Linux.

To access the function keys you have to also press the Fn key. If most of what you’re doing is reloading a browser and not using the volume control, then this is a problem, so they’ve created a function lock which is enabled by pressing the Fn and Esc/FnLk key. The Fn key lights up and you can press F5 without using the Fn modifier key.

That’s all well and good until you get to another quirk of this keyboard where the Home, End, and Delete keys are in the same function key row in a way that the End key also functions as the Insert key. When function lock is on the End key becomes an Insert key. I don’t ever use the Insert key on a keyboard, so I understand why they combined the End/Insert key. But in this combination it doesn’t work for me as a software developer. I’m continually going between something that needs to be reloaded with F5 and in an editor where I need to quickly go to the end of a line in a program.

Luckily there’s a pretty simple answer to this if you don’t ever need to use the Insert key. I found the answer on askubuntu.

All I needed to do was run the following:

xmodmap -e "keycode 118 = End"

And now even when the function keys are locked the End/Insert key always behaves as End. To make this is permanent and the mapping gets loaded with X11 starts, add xmodmap -e "keycode 118 = End" to your ~/.xinitrc.

Styling HTML5 Video with CSS / Jason Ronallo

If you add an image to an HTML document you can style it with CSS. You can add borders, change its opacity, use CSS animations, and lots more. HTML5 video is just as easy to add to your pages and you can style video too. Lots of tutorials will show you how to style video controls, but I haven’t seen anything that will show you how to style the video itself. Read on for an extreme example of styling video just to show what’s possible.

Here’s a simple example of a video with a single source wrapped in a div:

<div id="styled_video_container">
    <video src="/video/wind.mp4" type="video/mp4" controls poster="/video/wind.png" id="styled_video" muted preload="metadata" loop>

Add some buttons under the video to style and play the video and then to stop the madness.

<button type="button" id="style_it">Style It!</button>
<button type="button" id="stop_style_it">Stop It!</button>

We’ll use this JavaScript just to add a class to the containing element of the video and play/pause the video.

jQuery(document).ready(function($) {
  $('#style_it').on('click', function(){
  $('#stop_style_it').on('click', function(){

Using the class that gets added we can then style and animate the video element with CSS. This is a simplified version without vendor flags.

#styled_video_container.style_it  {
    background: linear-gradient(to bottom, #ff670f 0%,#e20d0d 100%); 
#styled_video_container.style_it video {
    border: 10px solid green !important;
    opacity: 0.6;
    transition: all 8s ease-in-out;
    transform: rotate(300deg);
    box-shadow:         12px 9px 13px rgba(255, 0, 255, 0.75);

Stupid Video Styling Tricks


OK, maybe there aren’t a lot of practical uses for styling video with CSS, but it is still fun to know that we can. Do you have a practical use for styling video with CSS that you can share?

HTML5 Video Caption Cue Settings in WebVTT / Jason Ronallo

TL;DR Check out my tool to better understand how cue settings position captions for HTML5 video.

Having video be a part of the Web with HTML5 <video> opens up a lot of new opportunities for creating rich video experiences. Being able to style video with CSS and control it with the JavaScript API makes it possible to do fun stuff and to create accessible players and a consistent experience across browsers. With better support in browsers for timed text tracks in the <track> element, I hope to see more captioned video.

An important consideration in creating really professional looking closed captions is placing them correctly. I don’t rely on captions, but I do increasingly turn them on to improve my viewing experience. I’ve come to appreciate some attributes of really well done captions. Accuracy is certainly important. The captions should match the words spoken. As someone who can hear, I see inaccurate captions all too often. Thoroughness is another factor. Are all the sounds important for the action represented in captions. Captions will also include a “music” caption, but other sounds, especially those off screen are often omitted. But accuracy and thoroughness aren’t the only factors to consider when evaluating caption quality.

Placement of captions can be equally important. The captions should not block other important content. They should not run off the edge of the screen. If two speakers are on screen you want the appropriate captions to be placed near each speaker. If a sound or voice is coming from off screen, the caption is best placed as close to the source as possible. These extra clues can help with understanding the content and action. These are the basics. There are other style guidelines for producing good captions. Producing good captions is something of an art form. More than two rows long is usually too much, and rows ought to be split at phrase breaks. Periods should be used to end sentences and are usually the end of a single cue. There’s judgment necessary to have pleasing phrasing.

While there are tools for doing this proper placement for television and burned in captions, I haven’t found a tool for this for Web video. While I haven’t yet have a tool to do this, in the following I’ll show you how to:

  • Use the JavaScript API to dynamically change cue text and settings.
  • Control placement of captions for your HTML5 video using cue settings.
  • Play around with different cue settings to better understand how they work.
  • Style captions with CSS.

Track and Cue JavaScript API

The <video> element has an API which allows you to get a list of all tracks for that video.

Let’s say we have the following video markup which is the only video on the page. This video is embedded far below, so you should be able to run these in the console of your developer tools right now.

<video poster="soybean-talk-clip.png" controls autoplay loop>
  <source src="soybean-talk-clip.mp4" type="video/mp4">
  <track label="Captions" kind="captions" srclang="en" src="soybean-talk-clip.vtt" id="soybean-talk-clip-captions" default>

Here we get the first video on the page:

var video = document.getElementsByTagName('video')[0];

You can then get all the tracks (in this case just one) with the following:

var tracks = video.textTracks; // returns a TextTrackList
var track = tracks[0]; // returns TextTrack

Alternately, if your track element has an id you can get it more directly:

var track = document.getElementById('soybean-talk-clip-captions').track;

Once you have the track you can see the kind, label, and language:

track.kind; // "captions"
track.label; // "Captions"
track.language; // "en"

You can also get all the cues as a TextTrackCueList:

var cues = track.cues; // TextTrackCueList

In our example we have just two cues. We can also get just the active cues (in this case only one so far):

var active_cues = track.activeCues; // TextTrackCueList

Now we can see the text of the current cue:

var text = active_cues[0].text; 

Now the really interesting part is that we can change the text of the caption dynamically and it will immediately change:

track.activeCues[0].text = "This is a completely different caption text!!!!1";

Cue Settings

We can also then change the position of the cue using cue settings. The following will move the first active cue to the top of the video.

track.activeCues[0].line = 1;

The cue can also be aligned to the start of the line position:

track.activeCues[0].align = "start";

Now for one last trick we’ll add another cue with the arguments of start time and end time in seconds and the cue text:

var new_cue = new VTTCue(1,30, "This is the next of the new cue.");

We’ll set a position for our new cue before we place it in the track:

new_cue.line = 5;

Then we can add the cue to the track:


And now you should see your new cue for most of the duration of the video.

Playing with Cue Settings

The other settings you can play with including position and size. Position is the text position as a percentage of the width of the video. The size is the width of the cue as a percentage of the width of the video.

While I could go through all of the different cue settings, I found it easier to understand them after I built a demonstration of dynamically changing all the cue settings. There you can play around with all the settings together to see how they actually interact with each other.

At least as of the time of this writing there is some variability between how different browsers apply these settings.

Test WebVTT Cue Settings and Styling

Cue Settings in WebVTT

I’m honestly still a bit confused about all of the optional ways in which cue settings can be defined in WebVTT. The demonstration outputs the simplest and most straightforward representation of cue settings. You’d have to read the spec for optional ways to apply some cue settings in WebVTT.

Styling Cues

In browsers that support styling of cues (Chrome, Opera, Safari), the demonstration also allows you to apply styling to cues in a few different ways. This CSS code is included in the demo to show some simple examples of styling.

::cue(.red){ color: red; }
::cue(.blue){ color: blue; }
::cue(.green){ color: green; }
::cue(.yellow){ color: yellow; }
::cue(.background-red){ background-color: red; }
::cue(.background-blue){ background-color: blue; }
::cue(.background-green){ background-color: green; }
::cue(.background-yellow){ background-color: yellow; }

Then the following cue text can be added to show red text with a yellow background. The

<>This cue has red text with a yellow background.</c>

In the demo you can see which text styles are supported by which browsers for styling the ::cue pseudo-element. There’s a text box at the bottom that allows you to enter any arbitrary styles and see what effect they have.

Example Video

Test WebVTT Cue Settings and Styling

HTML Slide Decks With Synchronized and Interactive Audience Notes Using WebSockets / Jason Ronallo

One question I got asked after giving my Code4Lib presentation on WebSockets was how I created my slides. I’ve written about how I create HTML slides before, but this time I added some new features like an audience interface that synchronizes automatically with the slides and allows for audience participation.

TL;DR I’ve open sourced starterdeck-node for creating synchronized and interactive HTML slide decks.

Not every time that I give a presentation am I able to use the technologies that I am talking about within the presentation itself, so I like to do it when I can. I write my slide decks as Markdown and convert them with Pandoc to HTML slides which use DZslides for slide sizing and animations. I use a browser to present the slides. Working this way with HTML has allowed me to do things like embed HTML5 video into a presentation on HTML5 video and show examples of the JavaScript API and how videos can be styled with CSS.

For a presentation on WebSockets I gave at Code4Lib 2014, I wanted to provide another example from within the presentation itself of what you can do with WebSockets. If you have the slides and the audience notes handout page open at the same time, you will see how they are synchronized. (Beware slowness as it is a large self-contained HTML download using data URIs.) When you change to certain slides in the presenter view, new content is revealed in the audience view. Because the slides are just an HTML page, it is possible to make the slides more interactive. WebSockets are used to allow the slides to send messages to each audience members’ browser and reveal notes. I am never able to say everything that I would want to in one short 20 minute talk, so this provided me a way to give the audience some supplementary material.

Within the slides I even included a simplistic chat application that allowed the audience to send messages directly to the presenter slides. (Every talk on WebSockets needs a gratuitous chat application.) At the end of the talk I also accepted questions from the audience via an input field. The questions were then delivered to the slides via WebSockets and displayed right within a slide using a little JavaScript. What I like most about this is that even someone who did not feel confident enough to step up to a microphone would have the opportunity to ask an anonymous question. And I even got a few legitimate questions amongst the requests for me to dance.

Another nice side benefit of getting the audience to notes before the presentation starts is that you can include your contact information and Twitter handle on the page.

I have wrapped up all this functionality for creating interactive slide decks into a project called starterdeck-node. It includes the WebSocket server and a simple starting point for creating your own slides. It strings together a bunch of different tools to make creating and deploying slide decks like this simpler so you’ll need to look at the requirements. This is still definitely just a tool for hackers, but having this scaffolding in place ought to make the next slide deck easier to create.

Here’s a video where I show starterdeck-node at work. Slides on the left; audience notes on the right.

Other Features

While the new exciting feature added in this version of the project is synchronization between presenter slides and audience notes, there are also lots of other great features if you want to create HTML slide decks. Even if you aren’t going to use the synchronization feature, there are still lots of reasons why you might want to create your HTML slides with starterdeck-node.

Self-contained HTML. Pandoc uses data-URIs so that the HTML version of your slides have no external dependencies. Everything including images, video, JavaScript, CSS, and fonts are all embedded within a single HTML document. That means that even if there’s no internet connection from the podium you’ll still be able to deliver your presentation.

Onstage view. Part of what gets built is a DZSlides onstage view where the presenter can see the current slide, next slide, speaker notes, and current time.

Single page view. This view is a self-contained, single-page layout version of the slides and speaker notes. This is a much nicer way to read a presentation than just flipping through the slides on various slide sharing sites. If you put a lot of work into your talk and are writing speaker notes, this is a great way to reuse them.

PDF backup. A script is included to create a PDF backup of your presentation. Sometimes you have to use the computer at the podium and it has an old version of IE on it. PDF backup to the rescue. While you won’t get all the features of the HTML presentation you’re still in business. The included Node.js app provides a server so that a headless browser can take screenshots of each slide. These screenshots are then compiled into the PDF.


I’d love to hear from anyone who tries to use it. I’ll list any examples I hear about below.

Here are some examples of slide decks that have used starterdeck-node or starterdeck.

A Plugin For Mediaelement.js For Preview Thumbnails on Hover Over the Time Rail Using WebVTT / Jason Ronallo

The time rail or progress bar on video players gives the viewer some indication of how much of the video they’ve watched, what portion of the video remains to be viewed, and how much of the video is buffered. The time rail can also be clicked on to jump to a particular time within the video. But figuring out where in the video you want to go can feel kind of random. You can usually hover over the time rail and move from side to side and see the time that you’d jump to if you clicked, but who knows what you might see when you get there.

Some video players have begun to use the time rail to show video thumbnails on hover in a tooltip. For most videos these thumbnails give a much better idea of what you’ll see when you click to jump to that time. I’ll show you how you can create your own thumbnail previews using HTML5 video.

TL;DR Use the time rail thumbnails plugin for Mediaelement.js.

Archival Use Case

We usually follow agile practices in our archival processing. This style of processing became popularized by the article More Product, Less Process: Revamping Traditional Archival Processing by Mark A. Greene and Dennis Meissner. For instance, we don’t read every page of every folder in every box of every collection in order to describe it well enough for us to make the collection accessible to researchers. Over time we may decide to make the materials for a particular collection or parts of a collection more discoverable by doing the work to look closer and add more metadata to our description of the contents. But we try not to allow the perfect from being the enemy of the good enough. Our goal is to make the materials accessible to researchers and not hidden in some box no one knows about.

Some of our collections of videos are highly curated like for video oral histories. We’ve created transcripts for the whole video. We extract out the most interesting or on topic clips. For each of these video clips we create a WebVTT caption file and an interface to navigate within the video from the transcript.

At NCSU Libraries we have begun digitizing more archival videos. And for these videos we’re much more likely to treat them like other archival materials. We’re never going to watch every minute of every video about cucumbers or agricultural machinery in order to fully describe the contents. Digitization gives us some opportunities to automate the summarization that would be manually done with physical materials. Many of these videos don’t even have dialogue, so even when automated video transcription is more accurate and cheaper we’ll still be left with only the images. In any case, the visual component is a good place to start.

Video Thumbnail Previews

When you hover over the time rail on some video viewers, you see a thumbnail image from the video at that time. YouTube does this for many of its videos. I first saw that this would be possible with HTML5 video when I saw the JW Player page on Adding Preview Thumbnails. From there I took the idea to use an image sprite and a WebVTT file to structure which media fragments from the sprite to use in the thumbnail preview. I’ve implemented this as a plugin for Mediaelement.js. You can see detailed instructions there on how to use the plugin, but I’ll give the summary here.

1. Create an Image Sprite from the Video

This uses ffmpeg to take a snapshot every 5 seconds in the video and then uses montage (from ImageMagick) to stitch them together into a sprite. This means that only one file needs to be downloaded before you can show the preview thumbnail.

ffmpeg -i "video-name.mp4" -f image2 -vf fps=fps=1/5 video-name-%05d.jpg
montage video-name*jpg -tile 5x -geometry 150x video-name-sprite.jpg

2. Create a WebVTT metadata file

This is just a standard WebVTT file except the cue text is metadata instead of captions. The URL is to an image and uses a spatial Media Fragment for what part of the sprite to display in the tooltip.


00:00:00.000 --> 00:00:05.000,0,150,100

00:00:05.000 --> 00:00:10.000,0,150,100

00:00:10.000 --> 00:00:15.000,0,150,100

00:00:15.000 --> 00:00:20.000,0,150,100

00:00:20.000 --> 00:00:25.000,0,150,100

00:00:25.000 --> 00:00:30.000,100,150,100

3. Add the Video Thumbnail Preview Track

Put the following within the <video> element.

<track kind="metadata" class="time-rail-thumbnails" src=""></track>

4. Initialize the Plugin

The following assumes that you’re already using Mediaelement.js, jQuery, and have included the vtt.js library.

   features: ['playpause','progress','current','duration','tracks','volume', 'timerailthumbnails'],
    timeRailThumbnailsSeconds: 5

The Result

See Bug Sprays and Pets with sound.


The plugin can either be installed using the Rails gem or the Bower package.


One of the DOM API features I hadn’t used before is MutationObserver. One thing the thumbnail preview plugin needs to do is know what time is being hovered over on the time rail. I could have calculated this myself, but I wanted to rely on MediaElement.js to provide the information. Maybe there’s a callback in MediaElement.js for when this is updated, but I couldn’t find it. Instead I use a MutationObserver to watch for when MediaElement.js changes the DOM for the default display of a timestamp on hover. Looking at the time code there then allows the plugin to pick the correct cue text to use for the media fragment. MutationObserver is more performant than the now deprecated MutationEvents. I’ve experienced very little latency using a MutationObserver which allows it to trigger lots of events quickly.

The plugin currently only works in the browsers that support MutationObserver, which is most current browsers. In browsers that do not support MutationObserver the plugin will do nothing at all and just show the default timestamp on hover. I’d be interested in other ideas on how to solve this kind of problem, though it is nice to know that plugins that rely on another library have tools like MutationObserver around.

Other Caveats

This plugin is brand new and works for me, but there are some caveats. All the images in the sprite must have the same dimensions. The durations for each thumbnail must be consistent. The timestamps currently aren’t really used to determine which thumbnail to display, but is instead faked relying on the consistent durations. The plugin just does some simple addition and plucks out the correct thumbnail from the array of cues. Hopefully in future versions I can address some of these issues.


Having this feature be available for our digitized video, we’ve already found things in our collection that we wouldn’t have seen before. You can see how a “Profession with a Future” evidently involves shortening your life by smoking (at about 9:05). I found a spinning spherical display of Soy-O and synthetic meat (at about 2:12). Some videos switch between black & white and color which you wouldn’t know just from the poster image. And there are some videos, like talking heads, that appear from the thumbnails to have no surprises at all. But maybe you like watching boiling water for almost 13 minutes.

OK, this isn’t really a discovery in itself, but it is fun to watch a head banging JFK as you go back and forth over the time rail. He really likes milk. And Eisenhower had a different speaking style.

You can see this in action for all of our videos on the NCSU Libraries’ Rare & Unique Digital Collections site and make your own discoveries. Let me know if you find anything interesting.

Preview Thumbnail Sprite Reuse

Since we already had the sprite images for the time rail hover preview, I created another interface to allow a user to jump through a video. Under the video player is a control button that shows a modal with the thumbnail sprite. The sprite alone provides a nice overview of the video that allows you to see very quickly what might be of interest. I used an image map so that the rather large sprite images would only have to be in memory once. (Yes, image maps are still valid in HTML5 and have their legitimate uses.) jQuery RWD Image Maps allows the map area coordinates to scale up and down across devices. Hovering over a single thumb will show the timestamp for that frame. Clicking a thumbnail will set the current time for the video to be the start time of that section of the video. One advantage of this feature is that it doesn’t require the kind of fine motor skill necessary to hover over the video player time rail and move back and forth to show each of the thumbnails.

This feature has just been added this week and deployed to production this week, so I’m looking for feedback on whether folks find this useful, how to improve it, and any bugs that are encountered.

Summarization Services

I expect that automated summarization services will become increasingly important for researchers as archives do more large-scale digitization of physical collections and collect more born digital resources in bulk. We’re already seeing projects like fondz which autogenerates archival description by extracting the contents of born digital resources. At NCSU Libraries we’re working on other ways to summarize the metadata we create as we ingest born digital collections. As we learn more what summarization services and interfaces are useful for researchers, I hope to see more work done in this area. And this is just the beginning of what we can do with summarizing archival video.

Social Media For My Institution – a new LITA web course / LITA

Social Media For My Institution: from “mine” to “ours”socialmedia

Instructor: Dr. Plamen Miltenoff
Wednesdays, 10/19/2016 – 11/9/2016
Blended format web course

Register Online, page arranged by session date (login required)

This course has been re-scheduled from a previous date.

A course for librarians who want to explore the institutional application of social media. Based on an established academic course at St. Cloud State University “Social Media in Global Context”. This course will critically examine the institutional need of social media (SM) and juxtapose it to its private use. Discuss the mechanics of choice for recent and future SM tools. Present a theoretical introduction to the subculture of social media. Show how to streamline library SM policies with the goals and mission of the institution. There will be hands-on exercises on creation and dissemination of textual and multimedia content, and patrons’ engagement. And will include brainstorming on suitable for the institution strategies regarding resources, human and technological, workload share, storytelling, and branding and related issues such as privacy, security etc.

This is a blended format web course:

The course will be delivered as 4 separate live webinar lectures, one per week on Wednesdays, October 19, 26, November 2, and 9 at 2pm Central. You do not have to attend the live lectures in order to participate. The webinars will be recorded and distributed through the web course platform, Moodle, for asynchronous participation. The web course space will also contain the exercises and discussions for the course.

Details here and Registration here


By the end of this class, participants will be able to:

  • Move from the state of personal use of social media (SM) and contemplate the institutional approach
  • Have a hands-on experience with finding and selecting multimedia resources and their application for branding of the institution
  • Participants will acquire the foundational structure of the elements, which constitute meaningful institutional social media

plamenmiltenoffDr. Plamen Miltenoff is an information specialist and Professor at St. Cloud State University. His education includes several graduate degrees in history and Library and Information Science and in education. His professional interests encompass social Web development and design, gaming and gamification environments. For more information see

And don’t miss other upcoming LITA fall continuing education offerings:

Beyond Usage Statistics: How to use Google Analytics to Improve your Repository
Presenter: Hui Zhang
Tuesday, October 11, 2016
11:00 am – 12:30 pm Central Time
Register Online, page arranged by session date (login required)

Online Productivity Tools: Smart Shortcuts and Clever Tricks
Presenter: Jaclyn McKewan
Tuesday November 8, 2016
11:00 am – 12:30 pm Central Time
Register Online, page arranged by session date (login required)

Questions or Comments?

For questions or comments, contact LITA at (312) 280-4268 or Mark Beatty,

Spotlight on tax evasion: Connecting with citizens and activists working on tax justice campaigns across Africa / Open Knowledge Foundation

Open Knowledge International is coordinating the Open Data for Tax Justice project in partnership with the Tax Justice Network to create a global network of people and organisations using open data to inform local and global efforts around tax justice.

Tax evasion, corruption and illicit financial flows rob countries around the world of billions in revenue which could be spent on improving life for citizens.

That much can be agreed. But how many billions are lost, who is responsible and which countries are worst affected? Those are difficult questions to answer given the lack of transparency and public disclosure in many tax jurisdictions.

The consensus is that it is the economies of the world’s poorest countries which are proportionally most affected by this revenue loss, with African governments estimated to be losing between $30 billion and $60 billion a year to tax evasion or illicit financial flows, according to a 2015 report commissioned by the African Union and United Nations.

International bodies have been slow to produce solutions which fight for the equitable sharing of tax revenues with lobbying leading to a retrenchment of proposed transparency measures and scuppering efforts to create a global tax body under the auspices of the UN.


More transparency and public information is needed to understand the true extent of these issues. To that end, Open Knowledge International is coordinating the Open Data for Tax Justice project with the Tax Justice Network to create a global network of people and organisations using open data to improve advocacy, journalism and public policy around tax justice.

And last week, I joined the third iteration of the International Tax Justice Academy, organised by the Tax Justice Network – Africa, to connect with advocates working to shed light on these issues across Africa.

The picture they painted over three days was bleak: Dr Dereje Alemayehu laid out how the views of African countries had been marginalised or ignored in international tax negotiations due in part to a lack of strong regional power blocs; Jane Nalunga of SEATINI-Uganda bemoaned politicians who continue to “talk left, walk right” when it comes to taking action on cracking down on corrupt or illicit practices; and Professor Patrick Bond of South Africa’s Witwatersrand School of Governance foresaw a rise in violent economic protests across Africa as people become more and more aware of how their natural capital is being eroded.

Several speakers said that an absence of data, low public awareness, lack of political will and poor national or regional coordination all hampered efforts to generate action on illicit financial flows in countries across Africa. Everyone agreed that these debates are not helped by the opacity of key tax terms like transfer pricing, country-by-country reporting and beneficial ownership.

“…an absence of data, low public awareness, lack of political will and poor national or regional coordination all hampered efforts to generate action on illicit financial flows”

The governments of South Africa, Nigeria, Kenya and Tanzania may have all publicly pledged measures like creating beneficial ownership registers to stop individuals hiding their wealth or activities behind anonymous company structures. But at the same time a key concern of those attending the academy was the closing of civic space in many countries across the continent making it harder for them to carry out their work and investigate such activities.

Michael Otieno of the Tax Justice Network – Africa told delegates that they should set the advocacy agenda around tax to ensure that human rights and development issues could be understood by the public in the context of how taxes are collected, allocated and spent. He encouraged all those present to combine forces by adding their voices to the Stop the Bleeding campaign to end illicit financial flows from Africa.

Through our Open Data for Tax Justice project, Open Knowledge International will be looking to incorporate the views of more civil society groups, advocates and public policy makers like those at the tax justice academy into our work. If you would like to join us or learn more about the project, please email

Evergreen 2.9.8 and 2.10.7 released / Evergreen ILS

We are pleased to announce the release of Evergreen 2.9.8 and 2.10.7, both bugfix releases.

Evergreen 2.9.8 fixes the following issues:

  • When adding a price to the Acquisitions Brief Record price field, it will now propogate to the lineitem estimated price field.
  • Declares UTF-8 encoding when printing from the catalog to resolve issues where non-ASCII characters printed incorrectly in some browsers.
  • Fixes an issue where the circ module sometimes skipped over booking logic even when booking was running on a system.

Evergreen 2.10.7 fixes the same issues fixed in 2.9.8, and also fixes the following:

  • Fixes an issue where the workstation parameter was not passed through the login function, causing problems with opt-in settings and transit behaviors.

Please visit the downloads page to retrieve the server software and staff clients.

AVAILABLE: Fedora Camel Component 4.4.4 / DuraSpace News

From Aaron Coburn, systems administrator and programmer, Academic Technology Services, Amherst College

Amherst, MA  I would like to announce the 4.4.4 release of the Fedora Camel component.

This is a patch release that deprecates two endpoint options: transform and tombstone. Those options are still available in this release, but they will log a warning; they will be completely removed in the 4.5.0 release.

On the User Experience of Ebooks / LibUX

When it comes to ebooks I am in the minority: I prefer them to the real thing. The aesthetic or whats-it about the musty trappings of paper and ink or looming space-sapping towers of shelving just don’t capture my fancy. But these are precisely the go-to attributes people wax poetically about — and you can’t deny there’s something to it.

Books - Lots of them

In fact, beyond convenience ebooks in terms of user experience don’t have much to offer. They are certainly not as convenient as they could be.

All the storytelling power of the web is lost on such a stubbornly static industry where print – where it should be most advantageous – drags its feet. Write in the gloss on, but not in an ebook; embellish a narrative with animation at the New York Times (a newspaper), but not in an ebook; share, borrow, copy, paste, link-to anything but an ebook.

Note what is lacking when it comes to ebook’s advantages: the user experience. True, some people certainly prefer an e-reader (or their phone or tablet), but a physical book has its advantages as well: relative indestructibility, and little regret if it is destroyed or lost; tangibility, both in regards to feel and in the ability to notate; the ability to share or borrow; and, of course, the fact a book is an escape from the screens we look at nearly constantly. At the very best the user experience comparison (excluding the convenience factor) is a push; I’d argue it tilts towards physical books.

Ben Thompson
Disconfirming Ebooks

All things being equal, where ebooks lack can be made-up by the no-cost of their distribution, but the rarely discounted price of the ebook is often more expensive – if not especially costly considering that readers neither own nor can legally migrate their ebook-as-licensed-software to a device, medium, or format where the user experience could be improved.

This aligns with data demonstrating that while ebook access increases between the proliferation of internet-connected devices and even the amount of ebook lending programs in libraries the number of people reading ebooks isn’t meaningfully pulling away from those reading print – like we all imagined it might when this stuff was science fiction.

Reading Habits

Grim reader

Similar reports last year seemed to signal the death of the ebook — oh, and hey: you might like my podcast on the ebookalypse — was a misreading that totally ignored the sales of ebooks without isbns — you know, the self-publishers! — that proved not that the ebook was a lost cause but that Amazon dominates because of the ubiquity of Kindle and its superior bookstore. 

There, big-publisher books are forced to a fixed price using an Amazon-controlled interface wherein authors add and easily publish good content on the cheap. Again we see how investing in even a slightly better user experience than everyone else is at the crux of creating monopoly.

Ebook reading tends to be objectively better on a Kindle, and so the entire ebook market largely funnels through Amazon.

  • the price of ebooks are competitively low – or even free
  • ebooks, through Kindles or the Kindle App, are easy to downloaded
  • while still largely encumbered by DRM, readers already have a Kindle – so they don’t require inconvenient additional software or — what’s worse — to be read on a computer
  • since ebook reading kind of sucks on other platforms, there’s not really that much incentive in the present to port Kindle books anyway
  • features like WhisperSync enhance the reading experience in a way that isn’t available in print

— which is sort of what I was lamenting when I wrote, “all the storytelling power of the web is lost on such a stubbornly static industry where print – where it should be most advantageous – drags its feet.”

Other vendors, particularly those available to libraries, have so far been able to only provide a fine middling user experience that doesn’t do much for their desirability for either party. So, print wins out.

Jobs in Information Technology: September 21, 2016 / LITA

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week

Accellion, Marketing Database Manager, Palo Alto, CA

Nassau Library System, Assistant Director – Technology Operations, Uniondale, NY

Northampton Community College, Assistant Director, Library Services, Bethlehem, PA

California State University, San Bernardino, Information Technology and Web Services Librarian, San Bernardino, CA

UMass Dartmouth, Assistant/Associate Librarian – Arts and Humanities, Dartmouth, MA

Drexel University Libraries, Developer, Data Infrastructure, Philadelphia, PA

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

Now Everybody Knows Their Names… / SearchHub

As previously mentioned: On October 13th, Lucene/Solr Revolution 2016 will once again be hosting “Stump The Chump” in which I (The Chump) will be answering tough Solr questions — submitted by users like you — live, on stage, sight unseen.

Today, I’m happy to announce the Panel of experts that will be challenging me with those questions, and deciding which questions were able to Stump The Chump!

In addition to taunting me with the questions, and ridiculing all my attempts to stall while I rack my brain for answers, the Panel members will be responsible for deciding which questions did the best job of “Stumping” me and awarding prizes to the folks who submitted them.

Information on how to submit questions can be found on the session agenda page, and I’ll be posting more details with the Chump tag as we get closer to the conference.

(And don’t forget to register for the conference ASAP if you plan on attending!)

The post Now Everybody Knows Their Names… appeared first on

10 Ways to Use the Primary Source Sets in Your Classroom / DPLA

Now that the school year is well underway, we are already hearing great things about how educators and students across the country are putting DPLA and its education resources to use. The Primary Source Sets, in particular, were designed to be versatile and adaptable for a broad variety of classroom environments, lessons, assignments and grade levels, so we wanted to share a few different ideas that demonstrate that versatility in action!

Part of an 1830 pamphlet printed by the Cherokee nation discussing Indian Removal. Courtesy of Hargrett Library via Digital Library of Georgia.

An 1830 pamphlet printed by the Cherokee nation discussing Indian Removal. Courtesy of Hargrett Library via Digital Library of Georgia.

Document-Based Questions, or DBQs, ask students to critically engage with primary sources and use evidence to support an argument or position. DBQs have traditionally been a hallmark of the AP History class to prep for the exam, but we see room for broad applications of DBQs across a variety of courses all year long.  Pull sources from the sets to devise a DBQ for your students or assign a question from one of the teaching guides.

Example: Question 4 from Jacksonian Democracy?: “Using Jackson’s message to Congress concerning Indian Removal and the 1830 pamphlet by the Cherokee nation, determine whether Indian Removal was a democratic action taken by the federal government or an invasion of Cherokee sovereignty.”

Ask students to analyze, interpret, or respond to a specific primary source from the sets to kick off your class session or lesson unit. Alternately, let students pick one source from a set and respond.

Example: Ask students to examine this photograph from the Immigration and Americanization, 1880-1930 set to begin your class session on late nineteenth century immigration. What does it reveal about the experience of immigrating to the US? What questions does it raise?

Have students pick a set and use the sources in their next research project on that topic. For a more focused selection, try a thematic subset like Science and Technology or Women.

Each Primary Source Set Teaching Guide has at least one suggested classroom activity.  Try a new way of bringing primary sources to life in your classroom:

A photo of a student protester carrying a sign depicting a burned draft card, 1969. Courtesy of Suffolk University, Moakley Archive & Institute via Digital Commonwealth.

A photo of a student protester carrying a sign depicting a burned draft card, 1969. Courtesy of Suffolk University, Moakley Archive & Institute via Digital Commonwealth.

Be Creative
 – Explore where history meets social media in this activity from the set on The Things They Carried.
Take a Stand – Students create a vintage radio or TV advertisement in small groups to raise awareness about polio prevention in this activity from “There is no cure for Polio
Engage – Have students explore the Civil Rights Movement in stations using primary sources after reading The Watsons Go To Birmingham – 1963 in this activity.
Debate – Students teams stake a claim and debate each other in this activity from the Texas Revolution set.

Use the primary source sets to help students make connections between past and present and add historical perspective to the headlines and news stories we see every day.

Ida B. Wells and Anti-Lynching Activism may offer an important historical counterpart to the #BlackLivesMatter Movement.

Sets on the Fifteenth Amendment and Fannie Lou Hamer and the Civil Rights Movement in Mississippi could help contextualize voting rights activism today.

Sets on immigration provide a historical lens for contemporary news stories about immigration of Latinos, Muslims, and Syrian refugees.

Photograph of a Charleston dance contest in St. Louis on November 13, 1925 from The Great Gatsby set. Courtesy of the Missouri History Museum via Missouri Hub.

Photograph of a Charleston dance contest in St. Louis on November 13, 1925 from The Great Gatsby set. Courtesy of the Missouri History Museum via Missouri Hub.

Use the primary source sets to add historical and cultural context to works of literature.

Teacher Testimonials:
I use the literature primary source sets after we read each novel. It’s especially helpful for students to see the connection between what we read as fiction and in the real world.”

“I just started my first semester…where I learned that one of the student learning outcomes for literature courses is something like ‘students will be able to situate literary texts within their cultural contexts.’  This learning outcome is being assessed right now, and there is some room for improvement. Primary source sets to the rescue!”

Starting a unit on American Colonization, the Revolutionary War, or the Civil War and Reconstruction? Use the time period filters to see all the sets from that era and mix and match sources from sets to complement your lesson and help students make connections between topics and ideas.

Teacher Testimonial:
“I will use materials from several sets, including the Underground Railroad, to teach the novel Kindred this semester.”

An American poster discouraging food waste to assist the European Allies. Courtesy of North Carolina Department of Cultural Resources via North Carolina Digital Heritage Center.

An American poster discouraging food waste to assist the European Allies. Courtesy of North Carolina Department of Cultural Resources via North Carolina Digital Heritage Center.

Select five examples of a type of media featured throughout the sets and analyze how they communicate a message to their audiences. For example, analyzing five posters featured in the sets can introduce students to visual thinking and build interpretation skills.

Example: Consider starting with the World War I: America Heads to War set, which includes a great selection of posters.

After analyzing the primary sources in a set, ask students to write their own discussion questions to add to the list provided in the teaching guide. Use student-generated questions to drive class discussion and analysis of the topic.

Example: Check out the discussion questions in the teaching guide for Attacks on American Soil: Pearl Harbor and September 11 as a starting point and then add your own.

Using the DPLA sets as inspiration, have students create their own primary source sets. Student sets could be as simple as a list of links in a document or more elaborate using images on a website. Students can identify items in DPLA and write an overview about the chosen topic.

Teacher Testimonial: Before reading Code Name Verity, which is a Young Adult historical fiction novel, students had to locate 5 different primary sources about WWII on the DPLA website and then analyze them before sharing them with the class. Students were able to easily navigate the website.”

DPLA is an ever-growing resource and we’ll be working to create exhibitions and primary source sets and develop new educational opportunities all year so let’s keep in touch!

  • Stay in the loop and get all the updates from the education department by joining our email list for education.
  • And let us know about your experience using the primary source sets or DPLA in your class by emailing  Your feedback will impact our future work!

Volunteer for LITA! / LITA

Do you want to…

  • learn and apply valuable skills?
  • meet colleagues from all over the US (and maybe even beyond)?
  • help your colleagues learn, grow, and have great experiences with LITA?

Then please volunteer for a LITA committee!

As the LITA Vice President, I’m responsible (along with the Appointments Committee) for making committee appointments happen. What am I looking for?

People who get things done. If you’re a worker bee, a visionary, an artist, a coder, a problem-solver, a community builder, an initiative-taker, or anyone else ready to pitch in, I want you on our committees. (Conversely, I’m not looking for anyone who’s just here for a line on their CV.)

A diverse range of people. Our committees should reflect not just librarianship today, but the fully inclusive librarianship I’d like to see tomorrow — and that starts with making sure our leaders and our voices embrace a wide range. I want to appoint people from a variety of backgrounds, including perspectives from traditionally underrepresented groups.

If you’re inclined toward accomplishment (not just participation), and/or you bring a voice we don’t hear enough of around LITA, please say so on the committee volunteer form so that we know to flag you.

Wondering what the process looks like after you’ve submitted your volunteer form? Well, assuming I’ve got the code on my appointments app right, and assuming you put a working email on your volunteer form (please do this!), you should get an email with the details within a week after submitting your form.

I’m looking forward to hearing from you!

Carla Hayden: Harnessing the Power of Technology with the Resources at the Library of Congress / Library of Congress: The Signal

This is an excerpt from the inaugural speech by Carla Hayden, the Librarian of Congress.

Photo of Carla Hayden

The 14th Librarian of Congress, Carla Hayden. Photo by Shawn Miller.

Today, through the power of technology, thousands around the country are able to watch this ceremony live. This is the opportunity to build on the contributions of the Librarians who have come before, to realize a vision of a national library that reaches outside the limits of Washington.

When I contemplate the potential of harnessing that power of technology with the unparalleled resources at the Library of Congress, I am overwhelmed with the possibilities…This Library holds some of the world’s largest collections from maps to comic books; founding documents like Thomas Jefferson’s handwritten draft of the Declaration of Independence; the full papers of 23 presidents, and the works of eminent Americans such as Samuel Morse, Frederick Douglass, Clara Barton, Leonard Bernstein, Bob Hope and Thurgood Marshall.

What is the possibility for those treasures? How are they relevant today? I am reminded of a moment during the unrest in the City of Baltimore in April 2015. The Pennsylvania Avenue Branch library was located in the center of those events. But I made the decision to keep the library open, to provide a safe place for our citizens to gather. I was there, hand in hand with the staff, as we opened the doors every morning. Cars were still smoldering in the streets. Closed signs were hanging in storefronts for blocks. But people lined up outside the doors of the library. I remember in particular a young girl coming up to me and asking, “What’s the matter? What is everyone so upset about?” She came to the library for sanctuary and understanding.

Photo of Carla Hayden reading to children.

Librarian of Congress Carla Hayden reads to children from Brent Elementary school in the Young Readers Center, September 16, 2016. Photo by Shawn Miller.

I recently had the opportunity to view one of the latest Library of Congress acquisitions – the Rosa Parks Collection – which includes her family bible, the bible she carried in her purse, and her handwritten letters. In one such letter she reflects on her December 1, 1955 arrest, writing, “I had been pushed around all my life and felt at this moment that I couldn’t take it anymore.” That letter – and all of her papers – are now digitized and available online.

So anyone anywhere can read her words in her own handwriting. Read them in the classrooms of Racine, Wisconsin, in a small library on a reservation in New Mexico, and even in the library of a young girl in Baltimore, looking around as her city is in turmoil. That is a real public service. And a natural next step for this nation’s library, a place where you can touch history and imagine your future. This Library of Congress, a historic reference source for Congress, an established place for scholars, can also be a place where we grow scholars, where we inspire young authors, where we connect with those individuals outside the limits of Washington and help them make history themselves.

How do we accomplish this? By building on a legacy that depends so much on the people in this room. Not only the elected officials, who have quite a bit to say about the direction of this institution, but also the staff of the Library of Congress, my new colleagues, here on the mezzanine, watching in the Madison Hall, the Adams Café and the Montpelier Room; watching in Culpeper at the Packard Campus for audio/visual conservation; and watching at the National Library Services for the Blind and Physically Handicapped.

Public service has been such a motivating factor for me, in my life and my career. When I received the call from the White House about this opportunity, and was asked, “Will you serve?” Without hesitation I said “yes.” Throughout my career I have known the staff of the Library of Congress to be a dedicated and enthusiastic group of public servants. I look forward to working with you for years to come. But we cannot do it alone. I am calling on you, both who are here in person and those watching virtually, that to have a truly national library, an institution of opportunity for all: it is the responsibility of all.

That means collaborating with other institutions. That means private sector support and patriotic philanthropy for necessary projects like digitization. That means starting a new dialogue about connectivity to classrooms and other libraries. I cannot wait to work with all of you to seize this moment in our history. Let’s make history at the Library of Congress together.

New Open Knowledge Network chapters launched in Japan and Sweden / Open Knowledge Foundation

This month sees the launch of two new Chapters at the Open Knowledge Network, a chapter for Japan and a chapter for Sweden. Chapters are the Open Knowledge Network’s most developed form, which have legal independence from the organisation and are affiliated by a Memorandum of Understanding. For a full list of our current chapters, see here and to learn more about their structure visit the network guidelines.

Open Knowledge Japan is one of our oldest groups. Started in 2012, the group has done a lot of work promoting open data use in government. The group is also leading the open data day effort in Japan, with more than 60 local events around the country. This is our first chapter in East Asia.

Open Knowledge Sweden, the chapter in the land which implemented the first Freedom of Information legislation in 1766, is still active in promoting FOI through their platform Fragastaten, and is very active in hacks for heritage realms. They are currently part of EU funded project: Clarity- Open EGovernment Services.They have just launched OKawards which is going to be the first award in the region that provides recognition to Open Knowledge contributors from the public and private sector. They are our second chapter in the Nordic countries, joining their neighbours in Finland.  

“The launch of these new chapters emphasizes the importance of openness in East Asia and the Nordic countries…”

The Open Knowledge International global network now includes groups in over 40 countries, from Scotland to Cameroon, China to the Czech Republic. Eleven of these groups has now affiliated as chapters. This network of practice of dedicated civic activists, openness specialists, and data diggers are at the heart of the Open Knowledge International mission, and at the forefront of the movement for Open.

“The launch of these new chapters emphasizes the importance of openness in East Asia and the Nordic countries,” said Pavel Richter, Open Knowledge CEO. “These chapters are a manifestation of continuous engagement by volunteers around the world to work towards more open and accountable societies. We are looking forward to following their work and supporting their efforts in the future.”


One of the many events in Japan during open data day. Credit:

The Representative Director of Open Knowledge Japan, Masahiko Shoji, added, Open Knowledge Japan has been leading open data utilization and open knowledge movement in Japan in cooperation with 21 experts and ten companies.  We are delighted to become the official Chapter of Open Knowledge International and share this joy with the active open data communities in Japan.  We would like to move forward with other Asian Open Knowledge communities and the fellows around the world.”


Members of OK SE in open data day. Credit:

Members of OK SE in open data day. Credit:

Similarly, the Chairman of Open Knowledge Sweden, Serdar Temiz, said, “We are happy to be a closer part of changemakers network in Open Knowledge. To be a chapter at Open Knowledge Network is a great pleasure and a privilege. We are happy to be a part of an organization that is at the forefront of the Open Knowledge movement. It is very motivating for us that within 2 years of our initial period, OKI also recognizes our efforts in the OK community and we could become one of the few official Chapters”

How to build an evil library catalog / Galen Charlton

Consider a catalog for a small public library that features a way to sort search results by popularity. There are several ways to measure “popularity” of a book: circulations, hold requests, click-throughs in the catalog, downloads, patron-supplied ratings, place on bestseller lists, and so forth.

But let’s do a little thought experiment: let’s use a random number generator to calculate popularity.

However, the results will need to be plausible. It won’t do to have the catalog assert that the latest J.D. Robb book is gathering dust in the stacks. Conversely, the copy of 1959 edition of The geology and paleontology of the Elk Mountain and Tabernacle Butte area, Wyoming that was given to the library right after the last weeding is never going to be a doorbuster.

So let’s be clever and ensure that the 500 most circulated titles in the collection retain their expected popularity rating. Let’s also leave books that have never circulated alone in their dark corners, as well as those that have no cover images available. The rest, we leave to the tender mercies of the RNG.

What will happen? If patrons use the catalog’s popularity rankings, if they trust them — or at least are more likely to look at whatever shows up near the top of search results — we might expect that the titles with an artificial bump from the random number generator will circulate just a bit more often.

Of course, testing that hypothesis by letting a RNG skew search results in a real library catalog would be unethical.

But if one were clever enough to be subtle in one’s use of the RNG, the patrons would have a hard time figuring out that something was amiss.  From the user’s point of view, a sufficiently advanced search engine is indistinguishable from a black box.

This suggests some interesting possibilities for the Evil Librarian of Evil:

  • Some manual tweaks: after all, everybody really ought to read $BESTBOOK. (We won’t mention that it was written by the ELE’s nephew.)
  • Automatic personalization of search results. Does geolocation show that the patron’s IP address is on the wrong side of the tracks? Titles with a lower reading level just got more popular!
  • Has the patron logged in to the catalog? Personalization just got better! Let’s check the patron’s gender and tune accordingly!

Don’t be the ELE.

But as you work to improve library catalogs… take care not to become the ELE by accident.

Bookshare: 10 million accessible book downloads / District Dispatch

Hats off to Bookshare — a global literacy initiative aimed at providing accessible books free for people unable to read standard print — as they reach a record 10 million ebook downloads by print-disabled readers. Because 90% of books published in the United States are unavailable in accessible formats, people who are dyslexic, blind or have low vision have extremely limited access to books. Bookshare helps to bridge that gap by obtaining accessible files (when available) from over 820 publishers. Bookshare also scans titles when print is the only available format. As a result, K-12 students with a qualifying disability have free access to more than 460,000 books.

bookshare logo

Bookshare recently reached a record 10 million ebook downloads by print-disabled readers.

Bookshare is a service provided by Benetech, a non-profit technology organization in Silicon Valley that also works on human rights and environmental issues worldwide. Jim Fruchterman, CEO and founder of Benetech, wanted to use technology to dramatically reduce the costs of creating and delivering ebooks. With grant funds from the U.S. Department of Education, Bookshare initially focused on service to the K-12 population, but last year expanded service to public libraries in Georgia, Pennsylvania and at the New York Public Library. With over 425,000 members, Bookshare joins ALA in the pursuit of providing equitable access to all people regardless of circumstance.

ALA’s Office for Information Technology Policy (OITP) has had the pleasure to work with Benetech for a number of years on advocating for national and international copyright exceptions for people with print disabilities to increase access to content and to share accessible content across borders. We have also worked with Benetech on 3D printing, exploring ways that people with disabilities can use and benefit from the latest technologies. We congratulate Benetech on this milestone and look forward to future collaborations.

The post Bookshare: 10 million accessible book downloads appeared first on District Dispatch.

Department of Labor recommends Workforce Boards work with libraries / District Dispatch

Monday afternoon I attended the Department of Labor’s Customer Centered Design Challenge, which is all about how groups are implementing new workforce programs. I started out sitting next to Teresa Hitchcock from Bakersfield, California. She was telling me that she reached out to Nancy Kerr, her local library director and they have created an exciting partnership.

Large letters spell out "Marker Space" on the side of two glass walls outside the Cincinnati Library Makerspace

Source: Cincinnati Library

Nancy said that she had some empty space created when she down-sized her microfisch, but didn’t have any money for staff. They created a vibrant Teen Space with Maker technology and the Kern Youth Partnership staffs the space. Now the space is filled with teens looking for work and entrepreneurial opportunities.

When others heard our conversation, they talked about looking for public space for their programs too. They would like to integrate their services with libraries. A woman from the San Diego Career Centers suggested that local librarians reach out to their local workforce board and ask to meet. The workforce people don’t know we’re here and are looking to partner.

The Department of Labor released a “Training and Employment Notice (TEN)” in May 2016 recommending to Workforce Boards that they work with their local library: Department of Labor Training and Employment Notice 35-15

The post Department of Labor recommends Workforce Boards work with libraries appeared first on District Dispatch.

Brief Talk at the Storage Architecture Meeting / David Rosenthal

I was asked to give a brief summary of the discussions at the "Future of Storage" workshop to the Library of Congress' Storage Architecture meeting. Below the fold, the text of the talk with links to the sources.

As always, I'm grateful to the Library of Congress for inviting me. I was asked to give a brief report of what happened at the DARPA workshop on "The Future of Storage" that took place at Columbia last May. There has yet to be a public report on the proceedings, so I can't be specific about who (other than me) said what.

Three broad areas were discussed. First, I and others looked at the prospects for bulk storage over the medium term. How long is the medium term? Hard disk has been shipping for 60 years. Flash as a storage medium is nearly 30 years old (Eli Harari filed the key enabling patent in 1988), and it has yet to make an impact on bulk storage. It is pretty safe to say that these two media will dominate the bulk storage market for the next 10-15 years.

WD unit shipments
The debate is about how quickly flash will displace hard disk in this space. Flash is rapidly displacing hard disk in every market except bulk storage. High performance, low power, high density and robustness overwhelm the higher price per byte of flash.

WD revenues
In unit volume terms, we have hit peak disk. Since disk manufacture is a volume business, these reduced unit volumes are causing both major manufacturers financial difficulties, resulting in layoffs at both, and manufacturing capacity reductions.

Seagate revenues
These financial difficulties make the investments needed to further increase densities, specifically HAMR, more difficult. Continuing the real-time schedule slip of this much-delayed technology further into the future is reducing the rate at which $/GB decreases, and thus making hard disk less competitive with flash. Though it is worth noting that shingled drives are now available. We're starting to use Seagate's very affordable 8TB archive drives.

Exabytes shipped
Despite these difficulties, hard disk completely dominates the bytes of storage shipped. What would it take for flash to displace hard disk in the bulk storage market?

The world's capacity to make bytes of flash would have to increase dramatically. There are two possible (synergistic) ways to do this; it could be the result of either or both of:
  • Flash vs HDD Capex
    Building a lot of new flash fabs. This is extremely expensive, but flash advocates point to current low interest rates and strategic investment by Far East governments as a basis for optimism.

    But even if the money is available, bringing new fabs into production takes time. In the medium term it is likely that the fabs will come on-line, and accelerate the displacement of hard disk, but this won't happen quickly.
  • Increasing the bytes of storage on each wafer from existing fabs. Two technologies can do this; 3D flash is in volume production and quad-level cell (16 bits/cell) is in development. Although both are expensive to manufacture, the investment in doing so is a lot less than a whole new fab, and the impact is quicker.

    Write endurance
    As the table shows, the smaller the cell and the more bits it holds the lower the write endurance (and the lower the reliability). But QLC at larger cell size is competitive with TLC at a smaller size. QLC isn't likely to be used for write-intensive workloads but archival uses fit its characteristics well. Whether enough of the bulk storage market has low enough write loads to use QLC economically is an open question.
Second, there was discussion of potential alternate storage media, including DNA. Nature has just amplified the hype about DNA storage with How DNA could store all the world’s data, based on recent research from Microsoft involving 151KB of data. I believe that DNA will be an important archival medium decades from now, but to get there will require solving huge problems:
  • Source
    Writing data to DNA needs to get 6-8 orders of magnitude cheaper. The goal of the recently announced HGP-W project is to reduce it only 3 orders of magnitude in a decade. It has been getting cheaper more slowly than hard disk or flash.
  • Reading the data may be cheap but is always going to be very slow, so the idea that "DNA can store all the world's data" is misleading. At best it could store all the world's backups; there needs to be another copy on some faster medium.
  • The use of long-lived media whose writing cost is vastly greater than their reading cost is extremely difficult to justify. It is essentially a huge bet against technological progress.
  • As we see with HAMR there is a very long way between lab demos of working storage media and market penetration. We are many years from working DNA storage media.
Third, there was discussion of aggressive compression techniques. DARPA's customers have vast amounts of surveillance video on which they do face recognition and other feature extractions, resulting in much less data to store. But for forensic purposes, for example after an attack, they would like to be able to reconstruct the small fraction of the total that was relevant. This is becoming possible. By storing a small amount of additional data with the extracted features, and devoting an immense amount of computation to the task, the original video can be recovered. This provides an amazing level of compression, but it probably isn't suitable for most archival content.

Thanks are due to Brian Berg and Tom Coughlin for input to this talk, which drew on the reporting of Chris Mellor at The Register, but these opinions are mine alone.

Welcome, Terri! / Equinox Software

Equinox is pleased to announce that we have hired a new Office Manager.  Her name is Terri Harry and we couldn’t be more thrilled to have her on board!  Terri is local to the metro Atlanta area and started work in August.

Terri completed her Associate’s degree in Liberal Arts in 1985 from Polk Community College in Florida.  She pursued a degree in Industrial Engineering before family obligations put her education on hold.   Terri worked at Walt Disney World for ten years before moving north to Georgia.  Upon moving to Georgia, Terri was a stay at home mom to her two kids and menagerie of pets.  For the past 16 years, she has been heavily involved in local sci-fi conventions, giving her just the skill set she needed to take over office duties at Equinox.

Equinox Vice President Grace Dunbar said this about our newest employee, “We’re so pleased to have Terri join the team here at Equinox.  I know she’ll be great at handling the ‘non-linear, non-subjective, wibbly-wobbly, timey-wimey stuff’.”

When she’s not herding cats at local conventions, Dragon Con being her favorite; she enjoys spending time with her husband of 28 years and their two kids.  We’re happy to have her here at Equinox, herding all of our cats employees.

Evaluating Digital Library Accessibility / LibUX

In the context of digital libraries or digital repositories, the word “accessibility” sometimes refers to access as in open access literature, which is freely available via the internet to anyone anywhere (Crawford, 2011). While digital library systems fall within the context of this paper, accessibility is not used to refer to open access. Rather, the Web Accessibility Initiative (WAI) definition of accessibility is used, enabling people with disabilities to use the web (W3C, 2005). Kleynhans & Fourie (2014) note the lack of definition surrounding accessibility and indicate the importance of defining it. Overall, accessibility means that users with all disability types – visual, hearing, motor, and cognitive – are able to use the website (WebAIM, 2013).

According to 2013 American Community Survey data, an estimated 12.6% of the population of the United States has a disability (U.S. Census Bureau, 2013). Web accessibility is important so that people with disabilities have equal access to the information, and resources available on the web. Just as in the physical world, accessibility also benefits users without disabilities; accessible websites also benefit users on slower internet connections, or antiquated mobile devices (W3C, 2005). Further, attention to website accessibility improves the usability, or ease of use, and, improves search engine optimization (SEO) of websites (Kleynhans & Fourie, 2014; Moreno & Martinez, 2013; Nielsen, 2012; Rømen & Svanæs, 2011). Inaccessible websites widen the digital divide, because they restrict access to information on the basis of ability (Adam & Kreps, 2009).

Literature Review

Bertot, Snead, Jaeger, and McClure (2006) conducted a study to develop evaluations for assessing digital libraries. The assessment framework developed by Bertot et al. (2006) includes functionality, usability, and accessibility as determining factors for the success of digital libraries. The accessibility criteria include the provision of alternate forms of content, not using color to convey information, using clear navigation structures, and structuring tables to transform gracefully when enlarged.

Southwell and Slater (2012) evaluated academic library digital collection websites, using a variety of screen reader software. Rather than evaluate the accessibility of the website overall, the focus was placed on whether the digital item selected was screen-readable. The primary focus was to determine if digitized texts, as opposed to born-digital documents, were accessible. Thirty-eight of the libraries evaluated by Southwell and Slater used homegrown systems, and 31 used content management systems. An overwhelming majority of the libraries using content management systems, 25 (81%) for digital library content used CONTENTdm. Results of the study indicated that 42% of the items evaluated are accessible to screen readers. Typically, the absence of a transcript for image-based information was the cause of accessibility failure.

Cervone (2013) provides an overview of accessibility considerations and evaluation tools. Further, Cervone notes many visually impaired people do not use screen readers, instead opting to use browser and computer settings to compensate for their impairments. Using responsive design to gracefully accommodate increases in text sizes is suggested. However, many organizations, educational institutions, and libraries are still working to integrate responsive design on their websites (Rumsey, 2014). Organizations without responsive design should be mindful of how tables reflow (Bertot et al, 2006).

Fox (2008) suggests five principles to be mindful of when developing or redesigning a digital library website: simplicity, usability, navigability and findability, standards compliance, and accessibility. Adhering to any one of the aforementioned principles actually serves to support adherence of the other principles. For example, standards compliance sets the stage for accessibility, and accessible websites support findability of information (Moreno & Martinez, 2013).

Evaluating Accessibility

The task of evaluating accessibility is very complex. To begin with, there are a variety of standards with which to measure accessibility compliance. The most recent standard to measure accessibility is the Web Content Accessibility Guidelines (WCAG) 2.0, which were finalized by the W3C in 2008. The WCAG 2.0 is preceded by WCAG 1.0, which was recommended in May, 1999 (Mireia et al, 2009; W3C, 2008). Further, Section 508, Subpart 1194.22 can also be used to evaluate the accessibility of websites. Eleven of the 16 Section 508 checkpoints are based on the WCAG 1.0 specification. Recent studies of accessibility typically use the WGAG 2.0 guidelines (Billingham, 2014; Ringlaben, Bray & Packard, 2014; Rømen & Svanæs, 2012). A variety of tools for automatically assessing the accessibility compliance of websites are available. Using an automated validation tool is an excellent place to start when evaluating website accessibility. However, it is essential to follow automated accessibility checks with other processes to evaluate accessibility (W3C, 2005b).

In addition to the complexities of evaluation caused by the variety of standards to evaluate website accessibility, the number and variety of accessibility assessment tools convolutes the accessibility assessment process. The W3C provides a list of web accessibility evaluation tools. At the time of this writing, the list, which can be filtered by guideline, language, type of tool, type of output, automaticity, and license, contained 48 accessibility evaluation tools (W3C, 2014).


Digital library websites using the CONTENTdm platform were identified using the “CONTENTdm in action” website (CONTENTdm in action, n.d.). In some cases, links to collections pointed directly to content residing on the CONTENTdm platform, while in other cases, the landing page for the collection was stand-alone, with links to the content in the CONTENTdm system for further exploration.

The differences in how digital library content is displayed provided an additional opportunity for analysis – evaluating the collection landing page and the CONTENTdm driven page. Analyzing the two different page types provides an opportunity to identify and differentiate between accessibility issues on the collection landing pages and on the collection browse pages. Two academic library digital collections with a landing page separate from the “Browse Collection” interface were identified for analysis: the Carver-VCU Partnership Oral History Collection at the Virginia Commonwealth University (VCU) Libraries, and the Civil War in the American South Collection at the University of Central Florida (UCF). While some of the digital library collections landing pages were standalone, outside of the CONTENTdm system, both of the collection landing pages evaluated in this research project were generated within the CONTENTdm.

Carver-VCU Partnership Oral History Collection
Landing page for the UCF Civil War Collection

Preliminary accessibility evaluations were conducted with several of the automated tools listed by the W3C, in order to select the most appropriate tool for formal analysis. AChecker Web Accessibility Checker was selected after evaluating the results and output format generated by the following tools: Functional Accessibility Evaluator (FAE) 2.0, HTML_CodeSniffer, WAVE Web Accessibility Evaluation Tool, Accessibility Management Platform (AMP), and Cynthia Says from HiSoftware. Each evaluation tool has strengths and weaknesses, which are outside of the scope of this paper. AChecker Web Accessibility Evaluation Tool was selected for use based on its base functionality, readability of reports and data, and data export options.

Automated evaluation was conducted using AChecker Web Accessibility Checker. Pages were evaluated at the WCAG 2.0 AA level. WCAG 2.0 level A includes specifications websites must conform to, and level AA includes the specifications websites should conform to for accessibility. The URLs listed in Appendix A were inputted into the “address” field, with the option to check against, “WCAG 2.0 (Level AA)” with the view by guideline report format. Then the full accessibility review was outputted to PDF using the export option.

The WCAG 2.0 guidelines were selected for evaluation because the WCAG encompasses more disability types and usability principles than WCAG 1.0 and Section 508 (Mireia et al, 2009). To be clear, it is possible for a website to meet WCAG 2.0 standards, while not being functionally accessible (Clark, 2006). However, a website is certainly not accessible if it does not meet the WCAG 2.0 guidelines. Further, the automated accessibility check does not check for accessibility of individual items in the collection, as in the Southwell and Slater (2012) research.


The results of the accessibility evaluation are presented in the following three tables. Table 1 displays the general highest-level overview of the results – the total number of problems identified for each page. AChecker categorizes the results into three separate categories: known problems, likely problems, and potential problems. Issues AChecker can identify with certainty are categorized as known issues, more ambiguous barriers that “could go either way” and need human intervention to determine whether an issue exists are listed as likely problems, and issues that need human review for evaluation are listed as potential problems (Gay & Li, 2010).

Table 1: AChecker Accessibility Evaluation Results by Type of Problem
Page Known Problems (n) Likely Problems (n) Potential Problems (n)
VCU: Oral History Landing 4 0 160
VCU: Oral History Browse 231 0 1500
UCF: Civil War Landing 3 0 180
UCF: Civil War Browse 58 1 945

Table 2 and Table 3 display the specific guidelines where accessibility issues were identified by AChecker, for VCU and UCF content, respectively.

Table 2: Accessibility Evaluation Known Problem Flags by Guideline – VCU
Criteria Problem Detail Landing (n) Collection Browse (n)
1.1 Text Alternatives (A) Image Missing Alt Text 1 1
1.3 Adaptable (A) Missing Form Labels 0 148
1.4 Distinguishable (AA) Bold Element Used 1 3
2.4 Navigable (AA) Improper Header Nesting 0 1
3.1 Readable (A) Document Language Not Identified 2 2
3.3 Input Assistance (A) Element with more than one label 0 1
3.3 Input Assistance (A) Empty Label Text 0 74
Table 3: Accessibility Evaluation Known Problem Flags by Guideline – UCF
Criteria Problem Detail Landing (n) Collection Browse (n)
1.1 Text Alternatives (A) Image Missing Alt Text 0 0
1.3 Adaptable (A) Missing Form Labels 0 34
1.4 Distinguishable (AA) Bold Element Used 1 3
2.4 Navigable (AA) Improper Header Nesting 0 1
3.1 Readable (A) Document Language Not Identified 2 2
3.1 Input Assistance (A) Empty Label Text 0 17
4.1 Compatible (A) Non-unique ID Attribute 0 1

Data Interpretation

At the onset, the primary weakness of the interpretation of the accessibility evaluation results lies in not having direct access to a CONTENTdm system as a contributor or administrator. Therefore, interpretation relies on assumptions, which are supported by the similarity of the results of the two separate digital library collections on the CONTENTdm system.

The two digital library collections evaluated presented nearly identical known accessibility issues. Two errors were identified with the VCU collection that were not identified in the UCF collection; the issues were missing image alt text (on the landing and browse page), and element with more than one label (collection browse page). The image missing an alt tag is the header image for the template. Since no issue with a missing image alt tag was identified on the UCF collection, presumably the alt attribute of the header image can be modified by local administrators of the CONTENTdm system. The number of errors identified related to missing labels appears to be related to the number of collections available in the system. For example, 148 missing form label errors were identified on the VCU collection browse page, while only 34 were identified on the UCF collection browse page; the VCU system had 37 separate collections and the UCF system had 17 separate collections. The missing form labels are related to the faceted navigation to “add or remove other collections to the search. Although the collections may be reached directly from a specified landing page, the absence of form labels could make it impossible for visitors using screen reader technology to navigate to other collections in the system, or to view other collections along with the current selection.


Based on the number of known problems identified on the collection brows pages in the accessibility evaluation, it is important to determine if labels can be added by a local CONTENTdm system administrator. If a local CONTENTdm system administrator has the ability to add labels, then meaningful labels should be added for each element where labels were required. Because both collection browse pages presented the errors in the same structural location, it is likely that the missing labels are a function of how the system outputs the collection information onto the page. In the case of a system structure that generates inaccessible content, advocacy for the importance and necessity of accessibility is invaluable. Clients of OCLC should strongly urge the vendor to make accessibility corrections a priority for future updates and releases of the system. When customers consider features a priority, vendors should follow suit, especially in the competitive tech marketplace that currently exists. The value of accessibility advocacy to create positive change cannot be overstated.

VCU Collection Browse

UCF Civil War Collection Browse Page

There is plenty of work, beyond the initial automated check, that must be done in order to evaluate and improve the accessibility of digital library collections on the CONTENTdm platform. Each of the likely problems and potential problems identified in the AChecker report should be reviewed to determine if additional action is needed in order to provide accessible content. Some of the potential problems identified by AChecker include items with the need for long description, issues with table structure or display, and areas where visual context may be required. Correcting potential problems related to the need for visual context, where consumption of the information in the image requires being able to view the image, will provide at least some of the information needed to ensure individual items in the collection are accessible. After corrections are made, re-evaluate the pages with the AChecker tool. Follow up automated accessibility evaluation with manual evaluation, and, whenever possible, involve users with disabilities in the evaluation (Henry, 2006; W3C, 2005b). Although, many people with visual impairments do not use screen readers, they are invaluable evaluation tools, especially for projects where users with disabilities are not directly involved in the testing process (Southwell & Slater, 2012; W3C, 2005b).

Limitations and Recommendations for Further Research

The primary weakness of this research report is that it only scratches the surface of evaluating the accessibility of digital library content using CONTENTdm. Accessibility evaluation was conducted using only one automated assessment tool, the AChecker Web Accessibility Evaluation Tool. As Gay and Li (2010) point out, different automated accessibility evaluation tools perform different checks, and identify different problems. Comparing the results from a selection of automated accessibility evaluation tools would provide valuable information about the individual strengths and weaknesses of the tools, and when use of one tool over another can prove more beneficial. Although a CONTENTdm driven landing page and browse collection page were evaluated for accessibility, no individual item detail page was evaluated for accessibility. While evaluating an individual item detail page would not necessarily inform the discussion regarding individual collection item accessibility, identifying other potentially inaccessible system structures is a benefit of such analysis. Another limitation of the current study is that only the accessibility issues identified as known problems were analyzed to inform the results. A great deal of data from the initial automated accessibility remains untapped in this study. Providing additional detail regarding the issues identified as likely problems and potential problems allow for a more comprehensive view of the accessibility of the CONTENTdm system, even though this study identified some specific structural changes that are needed for accessibility. Further, accessibility assessments using other tools, such as screen readers, and additional manual accessibility evaluation would help fill in gaps in the information currently available about the accessibility. Finally, conducting accessibility studies of the CONTENTdm system with users with disabilities would help to identify any lingering accessibility issues that were not identified in the previously mentioned methods.


Accessibility management platform. (n.d.). Retrieved March 1, 2015, from

Adam, A., & Kreps, D. (2009). Disability and discourses of web accessibility. Information, Communication & Society, 12(7), 1041-1058. doi: 10.1080/13691180802552940

Bertot, J. C., Snead, J. T., Jaeger, P. T., & McClure, C. R. (2006). Functionality, usability, and accessibility. Performance Measurement and Metrics, 7(1), 17-28. doi:10.1108/14678040610654828

Billingham, L. (2014). Improving academic library website accessibility for people with disabilities. Library Management, 35(8/9), 565-581. doi: 10.1108/LM-11-2013-0107

Chowdhury, S., Landoni, M., & Gibb, F. (2006). Usability and impact of digital libraries: a review. Online Information Review, 30(6), 656-680. doi:10.1108/14684520610716153

Clark, J. (2006, May 23). To Hell with WCAG 2. Retrieved March 14, 2015, from

CONTENTdm in action. (n.d.). Retrieved January 31, 2015, from

Crawford, W. (2011). Open Access: What you need to know now. Chicago, IL, USA: American Library Association.

Functional Accessibility Evaluator 2.0. (n.d.). Retrieved March 1, 2015, from

Gay, G., & Li, C. Q. (2010). AChecker: open, interactive, customizable, web accessibility checking. Paper presented at the Proceedings of the 2010 International Cross Disciplinary Conference on Web Accessibility (W4A), Raleigh, North Carolina.

Henry, S. L. (2006). Understanding web accessibility. Web Accessibility (pp. 1-51): Apress.

HiSoftware Cynthia says portal. (n.d.). Retrieved March 1, 2015, from

HTML_CodeSniffer. (n.d.). Retrieved March 1, 2015, from

Kleynhans, S. A., & Fourie, I. (2014). Ensuring accessibility of electronic information resources for visually impaired people. Library Hi Tech, 32(2), 368-379. doi: 10.1108/LHT-11-2013-0148

Mireia, R., Merce, P., Marc, B., Miquel, T., Andreu, S., & Pilar, P. (2009). Web content accessibility guidelines 2.0. Program, 43(4), 392-406. doi: 10.1108/00330330910998048

Moreno, L., & Martinez, P. (2013). Overlapping factors in search engine optimization and web accessibility. Online Information Review, 37(4), 564-580. doi:10.1108/OIR-04-2012-0063

Nielsen, J. (2012). Usability 101: Introduction to usability. Retrieved October 21, 2014, from

Ringlaben, R., Bray, M., & Packard, A. (2014). Accessibility of American university special education departments’ web sites. Universal Access in the Information Society, 13(2), 249-254. doi: 10.1007/s10209-013-0302-7

Rømen, D., & Svanæs, D. (2012). Validating WCAG versions 1.0 and 2.0 through usability testing with disabled users. Universal Access in the Information Society, 11(4), 375-385. doi: 10.1007/s10209-011-0259-3

Rumsey, E. (2014, July). Responsive design sites: Higher ed, libraries, notables. Retrieved March 14, 2015, from

Southwell, K. L., & Slater, J. (2012). Accessibility of digital special collections using screen readers. Library Hi Tech, 30(3), 457-471. doi:10.1108/07378831211266609

Total validator. (n.d.). Retrieved March 1, 2015, from

U.S. Census Bureau. (2013). DP02 Selected Social Characteristics in the United States [Data]. 2013 American Community Survey 1-Year Estimates. Retrieved from

W3C. (2014, March). Easy checks – A first review of web accessibility. Retrieved March 5, 2015, from

W3C. (2005a) Introduction to web accessibility. Retrieved October 3, 2014, from

W3C. (2005b). Selecting web accessibility evaluation tools. Retrieved March 5, 2015, from

W3C. (2014, December 18). Web accessibility evaluation tools list. Retrieved March 1, 2015, from

W3C. (2008, December 11). Web content accessibility guidelines (WCAG) 2.0. Retrieved March 5, 2015, from

WAVE web accessibility tool. (n.d.). Retrieved March 1, 2015, from

WebAIM. (2014, April 22). Introduction to web accessibility. Retrieved March 5, 2015, from

How to advance open data research: Renewing our focus on the demand of open data, user needs and data for society. / Open Knowledge Foundation

Ahead of this year’s International Open Data Conference #iodc16, Danny Lämmerhirt and Stefaan Verhulst provide information on the Measuring and Increasing Impact Action Session, which will be held on Friday October 7, 2016 at IODC in Room E. Further information on the session can be found here.

Lord Kelvin’s famous quote “If you can not measure it, you can not improve it” equally applies to open data. Without more evidence of how open data contributes to meeting users’ needs and addressing societal challenges, efforts and policies toward releasing and using more data may be misinformed and based upon untested assumptions.

When done well, assessments, metrics, and audits can guide both (local) data providers and users to understand, reflect upon, and change how open data is designed. What we measure and how we measure is therefore decisive to advance open data.

Back in 2014, the Web Foundation and the GovLab at NYU brought together open data assessment experts from Open Knowledge International, Organisation for Economic Co-operation and Development, United Nations, Canada’s International Development Research Centre, and elsewhere to explore the development of common methods and frameworks for the study of open data. It resulted in a draft template or framework for measuring open data. Despite the increased awareness for more evidence-based open data approaches, since 2014 open data assessment methods have only advanced slowly. At the same time, governments publish more of their data openly, and more civil society groups, civil servants, and entrepreneurs employ open data to manifold ends: the broader public may detect environmental issues and advocate for policy changes, neighbourhood projects employ data to enable marginalized communities to participate in urban planning, public institutions may enhance their information exchange, and entrepreneurs embed open data in new business models.


In 2015, the International Open Data Conference roadmap made the following recommendations on how to improve the way we assess and measure open data.

  1. Reviewing and refining the Common Assessment Methods for Open Data framework. This framework lays out four areas of inquiry: context of open data, the datapublished, use practices and users, as well as the impact of opening data.
  2. Developing a catalogue of assessment methods to monitor progress against the International Open Data Charter (based on the Common Assessment Methods for Open Data).
  3. Networking researchers to exchange common methods and metrics. This helps to build methodologies that are reproducible and increase credibility and impact of research.
  4. Developing sectoral assessments.

In short, the IODC called for refining our assessment criteria and metrics by connecting researchers, and applying the assessments to specific areas. It is hard to tell how much progress has been made in answering these recommendations, but there is a sense among researchers and practitioners that the first two goals are yet to be fully addressed.

“…there seems to be a disconnect between top-level frameworks and on-the-ground research”

Instead we have seen various disparate, yet well meaning, efforts to enhance the understanding of the release and impact of open data. A working group was created to measure progress on the International Open Data Charter, which provides governments with principles for implementing open data policies. While this working group compiled a list of studies and their methodologies, it has not (yet) deepened the common framework of definitions and criteria to assess and measure the implementation of the Charter. In addition, there is an increase of sector- and case-specific studies that are often more descriptive and context specific in nature, yet do contribute to the need for examples that illustrate the value proposition for open data.

As such, there seems to be a disconnect between top-level frameworks and on-the-ground research, preventing the sharing of common methods and distilling replicable experiences about what works and what does not. How to proceed and what to prioritize will be the core focus of the “Action Track: Measurement” at IODC 2016. The role of research for (scaling) open data practice and policy and how to develop a common open data research infrastructure will also be discussed at various workshops during the Open Data Research Summit, and the findings will be shared during the Action Track.

In particular, the Action Track will seek to focus on:

  • Demand and use: Specifically, whether and how to study the demand for and use of open data—including user needs and data life cycle analysis (as opposed to being mainly focused on the data supply or capturing evidence of impact), given the nascent nature of many initiatives around the world. And how to identify how various variables including local context, data supply, types of users, and impact relate to each other, instead of regarding them as separate. To be more deductive, explanatory, and generate insights that are operational (for instance, with regard to what data sets to release) there may be a need to expand the area of demand and use case studies (such as org).
  • Informing supply and infrastructure: How to develop deeper collaboration between researchers and domain experts to help identify “key data” and inform the government data infrastructure needed to provide them. Principle 1 of the International Open Data Charter states that governments should provide key data open by default, yet the questions remains in how to identify “key” data (e.g., would that mean data relevant to society at large?). Which governments (and other public institutions) should be expected to provide key data and which information do we need to better understand government’s role in providing key data? How can we evaluate progress around publishing these data coherently if countries organize the capture, collection, and publication of this data differently?
  • Networking research and researchers: How to develop more and better exchange among the research community to identify gaps in knowledge, to develop common research methods and frameworks and to learn from each other? Possible topics to consider and evaluate include collaborative platforms to share findings (such as Open Governance Research Exchange – OGRX), expert networks (such as, implementing governance for collaboration, dedicated funding, research symposia (more below on ODRS), and interdisciplinary research projects.

Make the most of this Action Track: Your input is needed

To maximize outcomes, the Measurement Action Area will catalyze input from conversations prior to the IODC. Researchers who want to shape the future agenda of open data research are highly encouraged to participate and discuss in following channels:

1) The Measurement and Increasing Impact Action Session, which will take place on Friday October 7, 2016 at IODC in Room E (more details here).

2) The Open Data Research Symposium, which is further outlined below. You can follow this event on Twitter with the hashtag #ODRS16.


The Open Data Research Symposium

The Measurement and Increasing Impact Action Session will be complemented by the second Open Data Research Symposium (#ODRS16), held prior to the International Open Data Conference on October 5, 2016 from 9:00am to 5:00pm (CEST) in Madrid, Spain (view map here for exact location). Researchers interested in the Measurement and Increasing Impact Action Session are encouraged to participate in the Open Data Research Symposium.

The symposium offers open data researchers an opportunity to reflect critically on the findings of their completed research and to formulate the open data research agenda.

Special attention is paid to the question how we can increase our understanding of open data’s use and impacts. View the list of selected papers here and the tentative conference program here.

Interested researchers may register here. Please note that registration is mandatory for participation.

This piece originally appeared on the IODC blog and is reposted with permission.

nicolini-5 / Ed Summers

In Chapter 5 Nicolini takes a look at how practice theories have been informed by activity theory. Activity theory was pioneered by the psychologist Lev Vygotsky in the 1920s and 1930s. Since Vygotsky activity theory has grown and evolved in a variety of directions that are all characterized by the attention to the role of objects and an attention to the role of conflict or dialectic in human activity. Nicolini focuses specifically on cultural and historical activity theory which focuses on practice and has been picked up by the organization and management studies.

Things start off by talking about Marx again, specifically the description of work in [Das Kapital], where work is broken up into a set of interdependent components:

  1. the worker
  2. the material upon which the worker works
  3. the instruments used to carry out the work
  4. the actions of the worker
  5. the goal towards which the worker works
  6. the product of the work

The identity of the worker is a net effect of this process. Vygotsky and other activity theorists took these rough categories and refined them. Vygotsky in particular focused attention on mediation, or how we as humans typically interact with our environments using cultural artifacts (things designed by people) and that language itself was an example of such an artifact. These artifacts transform the person using them, and the environment: workers are transformed by their tools.

Instead of focusing on individual behavior, activity theorists often examine how actions are materially situated at various levels: actions, activities and operations which are a function of thinking about the collective effort involved. This idea was introduced by Leont’ev (1978). (???) is cited a few times, which is interesting because Kuutti & Bannon (2014) is how I found out about Nicolini in the first place (small world). To illustrate the various levels Leont’ev has an example of using the gears in a car with manual transmission, and how a person starts out performing the individual actions of shifting gears as they learn, but eventually they become automatic operations that are performed without much thinking during other activities such speeding up, stopping, going up hills, etc. The operations can also be dismantled and reassembled and recomposed to create new actions. I’m reminded of push starting my parent’s VW Bug when the battery was dead. The example of manual transmission is particularly poignant because of the prevalence of automatic cars today, where those shifting actions have been subsumed or embodied in the automatic transmission. The actions can no longer be decomposed, at least not by most of us non-mechanics. It makes me wonder briefly about the power dynamics are embodied in that change.

It wasn’t until Engestrom:1987 that the focus came explicitly to bear on the social. Yrjö Engeström (who is referenced and linked in Wikipedia but there is not an article for him yet) is credited for starting the influential Scandinavian activity theory strand of work, and helping bring it to the West. The connection to Scandinavia makes me think about participatory design which came from that region, and what connections there are between it and activity theory. Also action research seems similarly inflected, but perhaps it’s more of a western rebranding? At any rate Engeström got people thinking about an activity system which Nicolini describes as a “collective, systemic, object-oriented formation”, which is summarized with this diagram:

Activity SystemActivity System

This makes me wonder if there might be something in this conceptual diagram from Engeström for me to use in analyzing my interviews with web archivists. It’s kind of strange to run across this idea of object-oriented again outside of the computer science context. I can’t help but wonder how much cross-talk there was between psychology/sociology and computer science. The phrase is also being deployed in humanistic circles with the focus on object oriented ontology. It’s kind of ironic given how object-oriented programming has fallen out of favor a bit in software development, with a resurgence of interest in functional programming.

Kuutti, K., & Bannon, L. J. (2014). The turn to practice in HCI: Towards a research agenda. In Proceedings of the 32nd annual ACM Conference on Human Factors in Computing Systems (pp. 3543–3552). Association for Computing Machinery. Retrieved from

Leont’ev, A. N. (1978). Activity, consciousness, personality. Prentice Hall.

Copyright Clearance Center charges a mark-up / District Dispatch

It all started when I polled some librarians about recent permission fees paid for journal articles, just to have more background on current state of interlibrary loan. If permission fees were unreasonably high, it might be a data point to share if the House Judiciary Committee on the Courts, Intellectual Property, and the Internet considers the U.S. Copyright Office’s senseless proposal to rewrite Section 108. I expected to be shocked by high permission fees—and I was—but I also discovered something else that I just had to share.

I received a few examples from librarians regarding a particular journal. One in particular struck me. “I received a request today for a five page article from The Journal of Nanoscience and Nanotechnology and while processing it through ILLiad, the Copyright Clearance Center (CCC) indicated a fee of $503.50. So that would be a $100 a page — call me crazy, but something doesn’t seem right to me with that fee. I went to the publisher’s website and the article is available for $113, just over $20 a page.”

old-fashioned cash register

The CCC is making a lot of money collecting permission fees, even on public domain materials and disreputable journal publications.

I then asked CCC to clarify why an article from CCC was five times the cost of the very same article direct from the publisher. I received a quick response from CCC that said “Unfortunately, the prices that appear in our system are subject to change at the publishers’ discretion. CCC only processes the fees that the publisher provides us.”

I discovered that the publisher—who allegedly sets the price of the permission fee—also was used Ingenta document delivery, as an additional online permissions service. Just as the librarian said, Ingenta only charged $113 (which is still a big number for a five page article). I contacted the journal editor and asked about the difference and he responded immediately via email, “You are right that article is available for $113 from Ingenta. Just download from the Ingenta website.”

The difference in price can only be explained as a huge markup by CCC. Surely processing a 5-page article request cannot cost CCC an additional $400. Think about it. CCC is giving the rights holder $113 and taking the other $390.50. Deep pockets, right?

But wait, there’s more. I discovered that the publisher of the journal is American Scientific Publishers, a publisher on the predatory journal blacklist. (Holy cow!) Predatory journals are bogus journals that charge publication fees to gullible scholars and researchers to publish in a journal essentially posing as a reputable publication. With no editorial board and no peer review, academics are duped into publishing with a journal they believe to be trustworthy.

Here’s where we are at. CCC is collecting permission fees five times the amount of other permission services for journal articles from likely bogus publications. Are they sending any of the permission fees collected to the predatory journal publishers? And if they are, isn’t this a way to help predatory journals stay in business? Trustworthy publishers surely would not like that. In any case, with predatory journals numbering in the thousands, CCC has discovered a very large cash cow.

For years, the CCC masqueraded as a non-profit organization until the Commissioner of Internal Revenue caught up with them in 1982, in Copyright Clearance Center, Inc. v. Commissioner of Internal Revenue. Now that CCC is a privately held, for-profit company, we have limited information on its financials, but we do know that in 2011 (according to a CCC press release), they distributed over 188 million dollars to rights holders. That’s a big number from five years ago. How much money they pocketed for themselves is unknown, but I think we can rest assured that it was more than enough to jointly fund (with the Association of American Publishers) Cambridge University Press et al v. Patton et al, a four year-long litigation against Georgia State University’s e-reserve service. (They lost, but are requesting an appeal).

CCC is making a lot of money collecting permission fees, even on public domain materials and disreputable journal publications. Their profit margin could be as high as Elsevier’s! Academics are duped by predatory journals that are apparently doing fairly well financially. Libraries are paying high permission fees from the CCC unless they know to pay the predatory journal directly, keeping the predatory journal people in the black. As if the traditional scholarly communication cycle could get any more absurd!

The post Copyright Clearance Center charges a mark-up appeared first on District Dispatch.

News Search at Bloomberg / SearchHub

As we countdown to the annual Lucene/Solr Revolution conference in Boston this October, we’re highlighting talks and sessions from past conferences. Today, we’re highlighting Solr Committer Ramkumar Aiyengar’s talk, “Building the News Search Engine”.

Meet the backend which drives News Search at Bloomberg LP. In this session, Ramkumar Aiyengar talks about how he and his colleagues have successfully pushed Solr to unchartered territories over the last three years, delivering a real-time search engine critical to the workflow of hundreds of thousands of customers worldwide.

Ramkumar Aiyengar leads the News Search backend team at the Bloomberg R&D office in London. He joined Bloomberg from his university in India and has been with the News R&D team for nine years. He started working with Apache Solr/Lucene four years ago, and is now a committer to the project. Ramkumar is especially curious about Solr’s search distribution, architecture, and cloud functionality. He considers himself a Linux evangelist, and is one of those weird geeky creatures who considers Lisp beautiful and believes that Emacs is an operating system.

Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloomberg LP from Lucidworks

lucenerevolution-avatarJoin us at Lucene/Solr Revolution 2016, the biggest open source conference dedicated to Apache Lucene/Solr on October 11-14, 2016 in Boston, Massachusetts. Come meet and network with the thought leaders building and deploying Lucene/Solr open source search technology. Full details and registration…

The post News Search at Bloomberg appeared first on

Using Google Statistics for your Repository – a new LITA webinar / LITA

beyondusestatsBeyond Usage Statistics: How to use Google Analytics to Improve your Repository

Presenter: Hui Zhang
Tuesday, October 11, 2016
11:00 am – 12:30 pm Central Time

Register Online, page arranged by session date (login required)

Librarians and repository managers are increasingly asked to take a data-centric approach for content management and impact measurement. Usage statistics, such as page views and downloads, have been widely used for demonstrating repository impacts. However, usage statistics restrict your capacity of identifying user trends and patterns such as how many visits are contributed by crawlers, originated from a mobile device, or redirected by a search engine. Knowing these figures will help librarians to optimize the digital contents for better usability and discoverability. This 90 minute webinar will teach you the concepts of metrics and dimensions along with hands-on activities of how to use Google Analytics (GA) on library data from an institutional repository. Be sure to check the details page for takeaways and prerequisites.

Details here and Registration here

huizhangheadshotHui Zhang is the Digital Application Librarian at Oregon State University Libraries and Press. He has years of experience in generating impact reports with major platforms such as DSpace and Hydra Sufia using Google Analytics or local statistics index. Other than repository development, his interests include altmetrics, data visualization, and linked data

And don’t miss other upcoming LITA fall continuing education offerings:

Social Media For My Institution; from “mine” to “ours”
Instructor: Plamen Miltenoff
Starting Wednesday October 19, 2016, running for 4 weeks
Register Online, page arranged by session date (login required)

Online Productivity Tools: Smart Shortcuts and Clever Tricks
Presenter: Jaclyn McKewan
Tuesday November 8, 2016
11:00 am – 12:30 pm Central Time
Register Online, page arranged by session date (login required)

Questions or Comments?

For questions or comments, contact LITA at (312) 280-4268 or Mark Beatty,

Carousels Are Okay / LibUX

I recorded this episode at 2 a.m. this morning, because I’ve been feeling pretty good about the consistency of this podcast lately and by gosh I am not going to ruin it over a little something like sleep. No fooling, I am pretty entertained. This one’s a shorty, in which I make some enemies and defend the use of carousels on behalf of actually good user experiences – maybe.

Also, thank you for your kind reviews! Your brief reviews wherever you listen to LibUX make it easier to discover it.

Listen and please subscribe!

If you like, you can download the MP3 or subscribe to LibUX on StitcheriTunes, YouTube, Soundcloud, Google Play Music, or just plug our feed straight into your podcatcher of choice.