September 02, 2014


Libraries Making an Impact on the Web of Data

bibextendIt has been no secret that we are using a vocabulary developed by Google, Microsoft, and others for exposing structured metadata on the web. Documented at, this vocabulary is important because all of the major search engines are primed to look for it and will use it when they find it. Therefore, simply by using this vocabulary to describe library resources on the web we are adding structured data about library materials into all of the major search engines.

But as you can imagine, this vocabulary doesn’t (yet) identify everything we may wish to describe within the library world. Thus we launched an effort, headed up by our Technology Evangelist Richard Wallis, to extend the vocabulary. That initiative, supported by the W3C, brought together some 80 or more cultural heritage institution professionals from around the world to help specify new terms and properties to use in conjunction with

And then a really cool thing happened.

The folks who manage decided to adopt some of our elements directly into the vocabulary. To find out the details, see Richard’s post about it. But suffice it to say that this is huge. We have demonstrated the ability of libraries, museums, and archives to make a difference in the growing linked data ecology of the Internet.

Rather than being a metadata backwater as we have been since time immemorial, where no one but librarians understand our metadata, we are now embedding our descriptions of cultural heritage resources directly into the web itself. And what’s not to like about that?

About Roy Tennant

Roy Tennant works on projects related to improving the technological infrastructure of libraries, museums, and archives.

by Roy at September 02, 2014 09:04 PM

FOSS4Lib Recent Releases

DSpace - 4.2

Last updated September 2, 2014. Created by Peter Murray on September 2, 2014.
Log in to edit this page.

Release Date: 
Thursday, July 24, 2014

by Peter Murray at September 02, 2014 08:38 PM

District Dispatch

Idaho library welcomes FCC Commissioner

Idaho Library

Idaho Library

The article below comes from Ann Joslin, who is the Idaho State Librarian and president of the Chief Officers of State Library Agencies (COSLA).

On August 19, 2014, Idaho had the privilege of hosting Federal Communications Commission (FCC) Commissioner Michael O’Rielly at LinkIDAHO’s Broadband Summit in Boise. He was the keynote speaker and moderated a panel discussion on “Filling the Gaps in Broadband Delivery in Rural and Remote Areas.” His visit also provided an opportunity to showcase Idaho public library services with a trip to the Ada Community Library’s Lake Hazel Branch.

Girls with makerspace.Director Mary DeWalt and her staff prepared a brief fact sheet(pdf) with a general description of the library district and details of the broadband access and services they provide. We toured Lake Hazel’s activity room where their LSTA-funded Make-It materials (building kits, robotics, 3D printer) are housed, and the Commissioner could see several works-in-progress. In addition to a Lego robot and an FM radio were repair parts for a 3-D printer with plastic that had melted in a hot car—all teen group projects. This prompted the Commissioner to describe his view of the roles libraries play today, from traditional to community center to public access technology provider, and serving all age groups. We certainly agree on that point, and Lake Hazel is a perfect example of this!

The conversation was informal and ranged from local clientele, programming, and Internet capacity to statewide broadband capacity challenges. The staff described a variety of ways adults are using the library that involve computers and technology, such as downloading media, social networking, and workforce development. They have also seen an increasing number of parents come with their children to build something together using the Make-It tools.

In light of the FCC’s current order on E-rate modernization, Commissioner O’Rielly referenced his opposition to focusing the one-time $2 billion on Wi-Fi upgrades (internal connections). In their filings, both COSLA (Chief Officers of State Library Agencies) and ALA placed priority on bringing scalable and affordable broadband to more libraries, as well as increasing funding for internal connections (see filings here and here). The Commissioner granted that he was on the short end of that FCC vote, and expressed confidence that the library community can develop a better formula for distributing E-rate funds in the future.

The Lake Hazel Branch is illustrative of several challenges common to Idaho public libraries:

Ann Joslin

Ann Joslin

The FCC is an important regulatory agency for the public library community, and one whose policies and procedures we need to better understand. We appreciated Commissioner O’Rielly’s visit to the Lake Hazel Branch Library as a way for him to see first-hand the successes and challenges public libraries face on a daily basis and the impact that E-rate modernization will have on their ability to deliver their broadband-based services. Many thanks to Larra Clark, director of the ALA program on Networks, for contacting his office to suggest the visit.

The post Idaho library welcomes FCC Commissioner appeared first on District Dispatch.

by Larra Clark at September 02, 2014 07:06 PM

Chris Harris appointed OITP Fellow for youth and technology initiatives

Chris Harris

Chris Harris

Today, we welcome Chris Harris to his latest role for the Office for Information Technology Policy (OITP). Chris will serve as Fellow for the emerging OITP program on Children and Youth Initiatives.

In his other life, Chris is the director of the School Library System for the Genesee Valley Educational Partnership, an educational services agency supporting the libraries of 22 small, rural districts in western New York. Most recently, Chris integrated his personal interest in gaming with his passion for education and non-traditional learning and is editorial director of Play Play Learn.

Chris brings with him to OITP a long history of out of the box thinking when it comes to libraries—especially school libraries—and innovation in learning and library services. He was a participant in the first American Library Association (ALA) Emerging Leaders program in 2007 and honored as a Library Journal Mover and Shaker in 2008. Chris also writes a regular technology column for School Library Journal talking about “The Next Big Thing.” Chris has been deeply involved with the American Library Association (ALA) Digital Content Working Group, overseeing the E-Content blog and he just finished his term as Chair of the OITP Advisory Committee. In addition to claiming him for OITP, Chris continues to be active with committee work on behalf of school libraries as a member of the Library Advisory Committee for OITP’s Policy Revolution! initiative.

Needless to say, we at OITP are thrilled to have Chris join us as a Fellow. Chris is in on the ground floor as OITP develops its new program and will be integral in shaping it as well as helping to coordinate with ALA’s youth divisions, the American Association for School Librarians, the Association for Library Service to Children, and the Young Adult Library Services Association.

“Since OITP began looking at children and youth issues, Chris has not only brought his own expertise and interest to our discussions, he challenges all of us to view library services for young people in the broadest possible light,” said OITP Advisory Committee Chair, Dan Lee. “Chris has helped make it clear that OITP’s mission clearly intersects with the growing understanding inside and outside the library profession that youth and information technology policy issues need to be, and are front and center in public policy conversations.”

We are not shy about putting our Fellows to work (immediately). As Fellow, among other things, Chris will:

Personally, I am very happy to have Chris join the force of experts we have on hand as I will be working closely with Chris in the coming months to further define OITP’s work in this area. I already have a very long To Do list labeled, “Check with Chris.”

The post Chris Harris appointed OITP Fellow for youth and technology initiatives appeared first on District Dispatch.

by Marijke Visser at September 02, 2014 06:55 PM

Response to GPO’s call for feedback

A benefit of an association like ALA is the diversity of opinions and experiences that exists among the membership and this was expressed in my recent call for feedback. I would like to thank all of you who took the time to provide your thoughts on the proposals that the Government Printing Office (GPO) has made! Before attempting to craft an ALA response, it was important to hear from the membership. As in the past, there were divergent viewpoints and a concerted effort was made to reflect both sides of the coin. This letter was submitted to GPO last week and as I learn more from GPO, I will be sure to share it. Thank you again to all who commented and thank you to the chairs of the Committee on Legislation and the Government Information Subcommittee!

The post Response to GPO’s call for feedback appeared first on District Dispatch.

by jmcgilvray at September 02, 2014 05:42 PM

Richard Wallis

A Step for – A Leap for Bib Data on the Web

schema-org1 Regular readers of this blog may well know I am an enthusiast for – the generic vocabulary for describing things on the web as structured data, backed by the major search engines Google, Bing, Yahoo! & Yandex.  When I first got my head around it back in 2011 I soon realised it’s potential for making bibliographic resources, especially those within libraries, a heck of a lot more discoverable.  To be frank library resources did not, and still don’t, exactly leap in to view when searching the web – a bit of a problem when most people start searching for things with Google et al – and do not look elsewhere. as a generic vocabulary to describe most stuff, easily embedded in your web pages, has been a great success.  IMG_0655As was reported by Google’s Dan Brickley, at the recent Semantic Technology and Business Conference in San Jose, a sample of 12B pages showed approximately 21% containing markup.  Right from the beginning, however, I had concerns about its applicability to the bibliographic world – great start with the Book type, but there were gaps the coverage for such things as journal issues & volumes, multi-volume works, citations, and the relationship between a work and its editions.  Discovering others shared my combination of enthusiasm and concerns, I formed a W3C Community Group – Schema Bib Extend – to propose some bibliographic focused extensions to Which brings me to the events behind this post…

The SchemaBibEx group have had several proposals accepted over the last couple of years, such as making the [commercial] Offer more appropriate for describing loanable materials, and broadening of the citation property. Several other significant proposals were brought together in a package which I take great pleasure in reporting was included in the latest v1.9 release of  For many in our group these latest proposals were a long time coming after their initial proposal.  Although frustrating, the delays were symptomatic of a very healthy process.

Our proposals to add hasPart, isPartOf, exampleOfWork, and workExample to the CreativeWork Type will be available to many, as CreativeWork is the superclass to many types in many areas. Our proposals for issueNumber on PublicationIssue and volumeNumber on PerodicalVolume are very similar to others in the vocabulary, such as seasonNumber and episodeNumber in TV & Radio.  Under Dan Brickley’s careful organisation, tweaks and adjustments were made across a few areas resulting in a consistent style across parts of the vocabulary underpinned by CreativeWork.

Although the number of new types and properties are small, their addition to Schema opens up potential for much better description of periodicals and creative work relationships. To introduce the background to this, SchemaBibEx member Dan Scott and I were invited to jointly post on the Blog.

So, another step forward for   I believe that is more than just a step however, for those wishing to make the bibliographic resources more visible on the Web.  There as been some criticism that has been too simplistic to be able represent some of the relationships and subtleties from our world.  Criticism that was not unfounded.  Now with these enhancements, much of these criticisms are answered. There is more to do, but the major objective of the group that proposed them has been achieved – to lay the broad foundation for the description of bibliographic, and creative work, resources in sufficient detail for them to be understood by the search engines to become part of their knowledge graphs. Of course that is not the final end we are seeking.  The reason we share data is so that folks are guided to our resources – by sharing, using the well understood vocabulary,

worldcat Examples of a conceptual creative work being related to its editions, using exampleOfWork and workExample, have been available for some time.  In anticipation of their appearance in Schema, they were introduced into the OCLC WorldCat release of 194 million Work descriptions (for example: with the inverse relationship being asserted in an updated version of the basic WorldCat linked data that has been available since 2012.

by Richard Wallis at September 02, 2014 05:20 PM


Back to School with DPLA

It’s the first day of school for most kids in the United States, and so a good time to highlight the resources the Digital Public Library of America has ready and waiting for students and teachers this school year. Just like kids, DPLA spent the summer growing and maturing, adding new partners, new staff, and over a half-million items along the way. And we’ve been thinking a lot about how we can be most helpful in the classroom; this fall we will be talking to many educators from K-12 through college to get their advice.

Meanwhile, we encourage everyone to tell a teacher or student this week about these handy DPLA features:

From all of us at the Digital Public Library of America, we wish you a great school year! And don’t forget to let us know how you’re using DPLA for your homework or in the classroom.

Featured image credit: Detail of “Catherine M. Rooney, 6th grade teacher instructs her alert pupils on the way and how of War Ration Book Two,” circa 1943. Courtesy the National Archives and Records Administration (view original record).

cc-by-iconAll written content on this blog is made available under a Creative Commons Attribution 4.0 International License. All images found on this blog are available under the specific license(s) attributed to them, unless otherwise noted.

by Dan Cohen at September 02, 2014 05:00 PM

OCLC Dev Network

Nominations Open for the Next Developer House

OCLC invites you to nominate library technologists for our next Developer House event, where participants will identify and work on hands-on projects putting OCLC Web services to work solving real-world problems for libraries and their users.

Sponsored by the OCLC Developer Network, Developer House is a place where library technologists can gather together for five days to share their perspectives and expertise as they hack on OCLC web services.

by Shelley Hostetler at September 02, 2014 05:00 PM

District Dispatch

Health happens in libraries

In the past year, Americans from all walks of life turned to public libraries to learn more information about the federal health insurance marketplace. With the next open enrollment period spanning November 15, 2014 – February 15, 2015, there are more opportunities for libraries learn more about the health marketplace.

On September 24, 2014, WebJunction will offer a free webinar on the Affordable Care Act (Register now). This webinar will provide an overview of the 2015 open enrollment period and review opportunities to connect community members to health marketplace information through library service priorities and partnerships.

In this webinar, participants will receive an overview of objectives and resources for the 2015 open enrollment period from representatives from the Centers for Medicare and Medicaid Services. Webinar participants will learn about the Coverage to Care initiative, which supports individuals in learning how to utilize health coverage, and review opportunities to connect community members to marketplace information through library service priorities and partnerships. Participate in the conversation online with the hashtags #wjwebinar and #libs4health.

Webinar presenters:

Date: September 24, 2014
Time: 2:00 PM – 3:00 PM EDT
Register for the free event

If you cannot attend this live session, a recorded archive will be available to view at your convenience. Sign up here if you’d like to receive notifications about this project, including when the archive is available.

The post Health happens in libraries appeared first on District Dispatch.

by Jazzy Wright at September 02, 2014 04:37 PM

David Rosenthal

Interesting report on Digital Legal Deposit

Last month the International Publishers Association (IPA) put out an interesting report about the state of digital legal deposit for copyright purposes, with extended status reports from the national libraries of Germany, the Netherlands, the UK, France and Italy, and short reports from many other countries. The IPA's conclusions echo some themes I have mentioned before:

My reason for saying these things is based on experience. It shows that, no matter what the law says, if the publishers don't want you to collect their stuff, you will have a very hard time collecting it. On-line publishers need to have robust defenses against theft, which even national libraries would have difficulty overcoming without the publishers' cooperation.

The publishers' reason for saying these things is different. What are the publishers' "key concerns" on which voluntary collaboration is needed?
In other words, they are happy to deposit their content only under conditions that make it almost useless, such as that it only be accessible to one reader at a time physically at the library, just like a paper book.

Given that the finances of many national libraries are in dire straits, the publishers have a helpful suggestion:
"Countries might usefully consider other models, such as larger publishers self-archiving material, agreeing to make it available on request to libraries."
Or, in other words, lets just forget the whole idea of legal deposit.

Note: everything in quotes is from the report, emphasis in the original.

by David. ( at September 02, 2014 03:00 PM

Jonathan Rochkind

Defeating IE forced ‘compatibility mode’

We recently deployed a new version of our catalog front end (Rails, Blacklight), which is based on Bootstrap 3 CSS.

Bootstrap3 supports IE10 fine, IE9 mostly, and IE8 . IE8 has no media queries out of the box, so columns will be collapsed to single-column small-screen versions in Bootstrap3’s mobile-first CSS — although you can use the third party respond.js to bring media queries to IE8.  We tested IE8 with respond.js, and everything

IE7 according to bootstrap “should look and behave well enough… though not officially supported.”  We weren’t aware of any on-campus units that still had IE7 installed (although we certainly can’t say with certainty there aren’t any), and in general decided that IE7 was old enough that we were comfortable no longer supporting it (especially if the alternative was essentially not upgrading to latest version of Blacklight).

I did do some limited testing with IE7, and found that our Bootstrap3-based app definitely, as expected, fell back to a single column view on all monitor sizes (IE7 lacks media queries).   In a limited skim, all functionality did seem available, although some screen areas on some pages could look pretty jumbled and messy.

Meanwhile, however, Bootstrap also says that “Bootstrap is not supported in the old Internet Explorer compatibility modes.”

What we did not anticipate is that some units in our large and hetereogenous academic/medical organization(s) use, not only a fairly old version of IE (we were able to convince them to upgrade from IE8 to IE9, but no further) — but also one that was configured by group policy to use ‘compatibility mode’ for all websites. IE9 would have been great — but ‘compatibility mode’ not so much.

They reported that the upgraded catalog was unuseable on their browsers.

The bootstrap web site recommend adding a meta tag to your pages to “be sure you’re using the latest rendering mode for IE”:

<!-- note: not what we ended up doing or recommend -->
<meta http-equiv="X-UA-Compatible" content="IE=edge"

However, we didn’t have much luck getting this to work. Google research suggested that it probably would have worked if it is placed immediately after the opening <head> tag (and not in a conditional comment), to make sure IE encounters it before its  ‘rendering mode’ is otherwise fixed.   But this seemed fragile and easy for us to accidentally break with future development, especially when there’s no good way to have an automated test ensuring this is working, and we don’t have access to an IE configured exactly like theirs to test ourselves either. 

What did work, was sending that as an actual HTTP header. “X-UA-Compatible: IE=edge,chrome=1″

In a Rails4 app, this can be easily configured in your config/application.rb:

'X-UA-Compatible' => 'IE=edge,chrome=1'

After adding this header, affected users reported that the catalog site was displaying manageably again. 

Also, I discovered that I could mimic the forced compatibility mode at least to some extent in my own IE11, by clicking on the settings sprocket icon, choosing “Compatibility View Settings”, and then adding our top level domain to “Websites you’ve added to Compatibility View.”  Only top-level domains are accepted there. This did succesfully force our catalog to be displayed in horrible compatibility mode — but only until we added that header. I can’t say this is identical to an IE9 set by group policy to display all websites in compatibility mode, but in this case it seemed to behave equivalently. 

I think, with enough work on our CSS, we could have made the site display in an ugly but workable single-column layout even in IE8 with compatibility mode. It wasn’t doing that initially, many areas of pages were entirely missing. But it probably would have been quite a bit of work, and with this simple alternate solution it’s displaying much better than we ever could have reached with that approach. 


Filed under: Uncategorized

by jrochkind at September 02, 2014 02:29 PM

Library of Congress: The Signal

Stewarding Early Space Data: An Interview with Emily Frieda Shaw


Emily Frieda Shaw, Head of Preservation and Reformatting at Ohio State University

Preserving and managing research data is a significant concern for scientists and staff at research libraries. With that noted, many likely don’t realize the length of time in which valuable scientific data has accrued on a range of media in research settings. That is, data management often needs to be both backward- and forward-looking, considering a range of legacy media and formats as well as contemporary practice. To that end, I am excited to interview Emily Frieda Shaw, Head of Preservation and Reformatting at Ohio State University (prior to August 2014 she was the Digital Preservation Librarian at the University of Iowa Libraries). Emily talked about her work on James Van Allen’s data from the Explorer satellites launched in the 1950s at the Digital Preservation 2014 conference and I am excited to explore some of the issues that work raises.

Trevor: Could you tell us a bit about the context of the data you are working with? Who created it, how was it created, what kind of media is it on?

Emily: The data we’re working with was captured on reel-to-reel audio tapes at receiving stations around the globe as Explorer 1 passed overhead in orbit around Earth in the early months of 1958. Explorer predated the founding of NASA and was sent into orbit by a research team led by Dr. James Van Allen, then a Professor of Physics at the University of Iowa, to observe cosmic radiation. Each reel-to-reel Ampex tape contains up to 15 minutes of data on 7 tracks, including time stamps, station identifications and weather reports from station operators, and the “payload” data consisting of clicks, beeps and squeals generated by on-board instrumentation measuring radiation, temperature and micrometeorite impacts.

Once each tape was recorded, it was mailed to Iowa for analysis by a group of graduate students. A curious anomaly quickly emerged: At certain altitudes, the radiation data disappeared. More sensitive instruments sent into orbit by Dr. Van Allen’s team soon after Explorer 1 confirmed what this anomaly suggested: the Earth is surrounded by belts of intense radiation, dubbed soon thereafter as the Van Allen Radiation Belts. When the Geiger counter on board Explorer 1 registered no radiation at all, it was, in fact, actually overwhelmed by extremely high radiation.

We believe these tapes represent the first data set ever transmitted from outside Earth’s atmosphere. Thanks to the hard work and ingenuity of our friends at The MediaPreserve, and some generous funding from the Carver Foundation, we now have about 2 TB of .wav files converted from the Explorer 1 tapes, as well as digitized lab notebooks and personal journals of Drs. Van Allen and Ludwig, along with graphs, correspondence, photos, films and audio recordings.

In our work with this collection, the biggest discovery was a 700-page report from Goddard comprised almost entirely of data tables that represent the orbital ephemeris data set from Explorer 1. This 1959 report was digitized a few years back from the collections at the University of Illinois at Urbana-Champaign as part of the Google Books project and is being preserved in the Hathi Trust. This data set holds the key to interpreting the signals we hear on the tapes. There are some fascinating interplays between analog and digital, past and present, near and far in this project, and I feel very lucky to have landed in Iowa when I did.

Trevor: What challenges does this data represent for getting it off of it’s original media and into a format that is usable?

Emily: When my colleagues were first made aware of the Explorer mission tapes in 2009, they had been sitting in the basement of a building on the University of Iowa’s campus for decades. There was significant mold growth on the boxes and the tapes themselves, and my colleagues secured an emergency grant from the state to clean, move and temporarily rehouse the tapes. Three tapes were then sent to The MediaPreserve to see if they could figure out how to digitize the audio signals. Bob Strauss and Heath Condiotte hunted down a huge, of-the-era machine that could play back all of the discrete tracks on these tapes. As I understand it, Heath had to basically disassemble the entire thing and replace all of the transistors before he got it to work properly. Fortunately, we were able to play some of the digitized audio tracks from these test reels for Dr. George Ludwig, one of the key researchers on Dr. Van Allen’s team, before he passed away in 2012. Dr. Ludwig confirmed that they sounded — at least to his naked ear — as they should, so we felt confident proceeding with the digitization.

Explorer I data tape

Explorer I data tape

So, soon after I was hired in 2012, we secured funding from a private foundation to digitize the Explorer 1 tapes and proceeded to courier all 700 tapes to The MediaPreserve for thorough cleaning, rehousing and digital conversion. The grant is also funding the development and design of a web interface to the data and accompanying archival materials, which we [Iowa] hope to launch (pun definitely intended) some time this fall.

Trevor: What stakeholders are involved in the project? Specifically, I would be interested to hear how you are working with scientists to identify what the significant properties of these particular tapes are.

Emily: No one on the project team we assembled within the Libraries has any particular background in near-Earth physics. So we reached out to our colleagues in the University of Iowa Department of Physics, and they have been tremendously helpful and enthusiastic. After all, this data represents the legacy of their profession in a big picture sense, but also, more intimately, the history of their own department (their offices are in Van Allen Hall). Our colleagues in Physics have helped us understand how the audio signals were converted into usable data, what metadata might be needed in order to analyze the data set using contemporary tools and methods, how to package the data for such analysis, and how to deliver it to scientists where they will actually find and be able to use it.

We’re also working with a journalism professor from Northwestern University, who was Dr. Van Allen’s biographer, to weave an engaging (and historically accurate) narrative to tell the Explorer story to the general public.

Trevor: How are you imagining use and access to the resulting data set?

Emily: Unlike the digitized photos, books, manuscripts, music recordings and films we in libraries and archives have become accustomed to working with, we’re not sure how contemporary scientists (or non-scientists) might use a historic data set like this. Our colleagues in Physics have assured us that once we get this data (and accompanying metadata) packaged into the Common Data Format and archived with the National Space Science Data Center, analysis of the data set will be pretty trivial. They’re excited about this and grateful for the work we’re doing to preserve and provide access to early space data, and believe that almost as quickly as we are able to prepare the data set to be shared with the physics community, someone will pick it up and analyze it.

As the earliest known orbital data set, we know that this holds great historical significance. But the more we learn about Explorer 1, the less confident we are that the data from this first mission is/was scientifically significant. The Explorer I data — or rather, the points in its orbit during which the instruments recorded no data at all — hinted at a big scientific discovery.  But it was really Explorer III, sent into orbit in the summer of 1958 with more sophisticated instrumentation, that produced that data that led to the big “ah-hah” moment. So, we’re hoping to secure funding to digitize the tapes from that mission, which are currently in storage.

I also think there might be some interesting, as-yet-unimagined artistic applications for this data. Some of the audio is really pretty eerie and cool space noise.

Trevor: More broadly, how will this research data fit into the context of managing research data at the university? Is data management something that the libraries are getting significantly involved in? If so could you tell us a bit about your approach.

Emily: The University of Iowa, like all of our peers, is thinking and talking a lot about research data management. The Libraries are certainly involved in these discussions, but as far as I can tell, the focus is, understandably, on active research and is motivated primarily by the need to comply with funding agency requirements. In libraries, archives and museums, many of us are motivated by a moral imperative to preserve historically significant information. However, this ethos does not typically pervade in the realm of active, data-intensive research. Once the big discovery has been made and the papers have been published, archiving the data set is often an afterthought, if not a burden. The fate of the Explorer tapes, left to languish in a damp basement for decades, is a case in point. Time will not be so kind to digital data sets, so we have to keep up the hard work of advocating, educating and partnering with our research colleagues, and building up the infrastructure and services they need to lower the barriers to data archiving and sharing.

Trevor: Backing up out of this particular project, I don’t think I have spoken with many folks with the title “Digital Preservation Librarian.” Other than this, what kinds of projects are you working on and what sort of background did you have to be able to do this sort of work? Could you tell us a bit about what that role means in your case? Is it something you are seeing crop up in many research libraries?

Emily: My professional focus is on the preservation of collections, whether they are manifest in physical or digital form, or both. I’ve always been particularly interested in the overlaps, intersections, and interdependencies of physical/analog and digital information, and motivated to play an active role in the sociotechnical systems that support its creation, use and preservation. In graduate school at the University of Illinois, I worked both as a research assistant with an NSF-funded interdisciplinary research group focused on information technology infrastructure, and in the Library’s Conservation Lab, making enclosures, repairing broken books, and learning the ins and outs of a robust research library preservation program. After completing my MLIS, I pursued a Certificate of Advanced Study in Digital Libraries while working full-time in Preservation & Conservation, managing multi-stream workflows in support of UIUC’s scanning partnership with Google Books.

I came to Iowa at the beginning of 2012 into the newly-created position of Digital Preservation Librarian. My role here has shifted with the needs and readiness of the organization, and has included the creation and management of preservation-minded workflows for digitizing collections of all sorts, the day-to-day administration of digital content in our redundant storage servers, researching and implementing tools and processes for improved curation of digital content, piloting workflows for born-digital archiving, and advocating for ever-more resources to store and manage all of this digital digital stuff. Also, outreach and inreach have both been essential components of my work. As a profession, we’ve made good progress toward raising awareness of digital stewardship, and many of us have begun making progress toward actually doing something about it, but we still have a long way to go.

And actually, I will be leaving my current position at Iowa at the end of this month to take on a new role as the Head of Preservation and Reformatting for The Ohio State University Libraries. My experience as a hybrid preservationist with understanding and appreciation of both the physical and digital collections will give me a broad lens through which to view the challenges and opportunities for long-term preservation and access to research collections. So, there may be a vacancy for a digital preservationist at Iowa in the near future :)

by Trevor Owens at September 02, 2014 12:26 PM

Ed Summers

On Archiving Tweets

After my last post about collecting 13 million Ferguson tweets Laura Wrubel from George Washington University’s Social Feed Manager project recommended looking at how Mark Phillips made his Yes All Women collection of tweets available in the University of North Texas Digital Library. By the way, both are awesome projects to check out if you are interested in how access informs digital preservation.

If you take a look you’ll see that only the Twitter ids are listed in the data that you can download. The full metadata that Mark collected (with twarc incidentally) doesn’t appear to be there. Laura knows from her work on the Social Feed Manager that it is fairly common practice in the research community to only openly distribute lists of Tweet ids instead of the raw data. I believe this is done out of concern for Twitter’s terms of service (1.4.A):

If you provide downloadable datasets of Twitter Content or an API that returns Twitter Content, you may only return IDs (including tweet IDs and user IDs).

You may provide spreadsheet or PDF files or other export functionality via non­-programmatic means, such as using a “save as” button, for up to 100,000 public Tweets and/or User Objects per user per day. Exporting Twitter Content to a datastore as a service or other cloud based service, however, is not permitted.

There are privacy concerns here (redistributing data that users have chosen to remove). But I suspect Twitter has business reasons to discourage widespread redistribution of bulk Twitter data, especially now that they have bought the social media data provider Gnip.

I haven’t really seen a discussion of this practice of distributing Tweet ids, and its implications for research and digital preservation. I see that the International Conference on Weblogs and Social Media now have a dataset service where you need to agree to their “Sharing Agreement”, which basically prevents re-sharing of the data.

Please note that this agreement gives you access to all ICWSM-published datasets. In it, you agree not to redistribute the datasets. Furthermore, ensure that, when using a dataset in your own work, you abide by the citation requests of the authors of the dataset used.

I can certainly understand wanting to control how some of this data is made available, especially after the debate after Facebook’s Emotional Contagion Study went public. But this does not bode well for digital preservation where lots of copies keeps stuff safe. What if there were a standard license that we could use that encouraged data sharing among research data repositories? A viral license like the GPL that allowed data to be shared and reshared within particular contexts? Maybe the CC-BY-NC, or is it too weak? If each tweet is copyrighted by the person who sent it, can we even license them in bulk? What if Twitter’s terms of service included a research clause that applied to more than just Twitter employees, but to downstream archives?

Back of the Envelope

So if I were to make the ferguson tweet ids available, to work with the dataset you would need to refetch the data using the Twitter API, one tweet at a time. I did a little bit of reading and poking at the Twitter API and it appears an access token is limited to 180 requests every 15 minutes. So how long would it take to reconstitute 13 million Twitter ids?

13,000,000 tweets / 180 tweets per interval = 72,222 intervals
72,222 intervals * 15 minutes per interval =  1,083,330 minutes

1,083,330 minutes is two years of constant accesses to the Twitter API. Please let me know if I’ve done something conceptually/mathematically wrong.

Update: it turns out the statuses/lookup API call can return full tweet data for up to 100 tweets per request. So a single access token could fetch about 72,000 tweets per hour (100 per request, 180 requests per 15 minutes) … which only amounts to 180 hours, which is just over a week. James Jacobs rightly points out that a single application could use multiple access tokens, assuming users allowed the application to use them. So if 7 Twitter users donated their Twitter account API quota, the 13 million tweets could be reconstituted from their ids in roughly a day. So the situation is definitely not as bad as I initially thought. Perhaps there needs to be an app that allows people to donate some of the API quota for this sort of task? I wonder if that’s allowed by Twitter’s ToS.

The big assumption here is that the Twitter API continues to operate as it currently does. If Twitter changes its API, or ceases to exist as a company, there would be no way to reconstitute the data. But what if there were a functioning Twitter archive that could reconstitute the original data using the list of Twitter ids…

Digital Preservation as a Service

I’ve hesitated to write about LC’s Twitter archive while I was an employee. But now that I’m no longer working there I’ll just say I think this would be a perfect experimental service for them to consider providing. If a researcher could upload a list of Twitter ids to a service at the Library of Congress and get them back a few hours, days or even weeks later, this would be much preferable to managing a two year crawl of Twitter’s API. It also would allow an ecosystem of Twitter ID sharing to evolve.

The downside here is that all the tweets are in one basket, as it were. What if LC’s Twitter archiving program is discontinued? Does anyone else have a copy? I wonder if Mark kept the original tweet data that he collected, and it is private, available only inside the UNT archive? If someone could come and demonstrate to UNT that they have a research need to see the data, perhaps they could sign some sort of agreement, and get access to the original data?

I have to be honest, I kind of loathe idea of libraries and archives being gatekeepers to this data. Having to decide what is valid research and what is not seems fraught with peril. But on the flip side Maciej has a point:

These big collections of personal data are like radioactive waste. It’s easy to generate, easy to store in the short term, incredibly toxic, and almost impossible to dispose of. Just when you think you’ve buried it forever, it comes leaching out somewhere unexpected.

Managing this waste requires planning on timescales much longer than we’re typically used to. A typical Internet company goes belly-up after a couple of years. The personal data it has collected will remain sensitive for decades.

It feels like we (the research community) need to manage access to this data so that it’s not just out there for anyone to use. Maciej’s essential point is that businesses (and downstream archives) shouldn’t be collecting this behavioral data in the first place. But what about a tweet (its metadata) is behavioural? Could we strip it out? If I squint right, or put on my NSA colored glasses, even the simplest metadata such as who is tweeting to who seems behavioral.

It’s a bit of a platitude to say that social media is still new enough that we are still figuring out how to use it. Does a legitimate right to be forgotten mean that we forget everything? Can businesses blink out of existence leaving giant steaming pools of informational toxic waste, while research institutions aren’t able to collect and preserve small portions as datasets? I hope not.

To bring things back down to earth, how should I make this Ferguson Twitter data available? Are a list of tweet ids the best the archiving community can do, given the constraints of Twitter’s Terms of Service? Is there another way forward that addresses very real preservation and privacy concerns around the data? Some archivists may cringe at the cavalier use of the word “archiving” in the title of this post. However, I think the issues of access and preservation bound up in this simple use case warrant the attention of the archival community. What archival practices can we draw and adapt to help us do this work?

by ed at September 02, 2014 01:40 AM

DuraSpace News

PROGRESS REPORT: DuraSpace 2014 Membership Campaign Update

2014 Membership Update Chart

by carol at September 02, 2014 12:00 AM

Proposed Changes to Ranking Web of World Repositories

Winchester, MA  The "The Ranking Web of World Repositories" is an initiative of the Cybermetrics Lab, a research group belonging to the Consejo Superior de Investigaciones Científicas (CSIC). A goal of the Ranking is to promote Open Access initiatives and global access to academic knowledge.

by carol at September 02, 2014 12:00 AM

September 01, 2014

FOSS4Lib Recent Releases

Invenio - 1.1.4

Release Date: 
Sunday, August 31, 2014

Last updated September 1, 2014. Created by David Nind on September 1, 2014.
Log in to edit this page.

Invenio v1.1.4 was released on August 31, 2014.

This stable release contains a number of bugfixes and improvements.

It is recommended to all Invenio sites using v1.1.3 or previous stable release series (v0.99, v1.0).

by David Nind at September 01, 2014 08:18 PM


PeerLibrary 0.3 is here featuring improved importing, privacy...

PeerLibrary 0.3 is here featuring improved importing, privacy settings everywhere, feedback on publications processing, lists of all entities to navigate, improved invitations, optimized loading time, and many many more fixes and improvements around.

by retronator at September 01, 2014 06:39 PM


Linked Data Survey results 3–Why and what institutions are consuming


LOD_Cloud_Diagram_as_of_September_2011 wikimedia.orgOCLC Research conducted an international linked data survey for implementers between 7 July and 15 August 2014. This is the third post in the series reporting the results. 

The two main reasons why the 70 linked data projects/services described in the survey that consume linked data are: to enhance their own data by consuming linked data from other sources (36) and provide a richer experience for users (34). Other reasons, in descending order: more effective internal metadata management; to experiment with combining different types of data into a single triple store; heard about linked data and wanted to try it out using linked data sources; a wish for greater accuracy and scope in search results; to improve Search Engine Optimization (SEO); and to meet a grant requirement.

The ways projects/services are using linked data sources (in order of the most frequently cited):

The linked data sources that are used the most:

Here’s the alphabetical list of the sources used; those that include uses by FAST, VIAF, and Works are asterisked.

 Source # of Projects FAST VIAF Works
British National Bibliography 3       *
Canadian Subject Headings 2
DBpedia 25    *
Dewey Decimal Classification 4       *
Europeana 5
FAST 11       *       *
GeoNames 25    *
Getty’s AAT 9 29    *    *       *       *
ISNI 4    *
RDF Book Mashup 1
The European Library 5
VIAF 23    *       *       *
Wikidata 7 12    *       *       * Works 6    *       *       *
Other 20

The other linked data sources consumed include:

Asked whether there were other data sources the respondent wished were available as linked data but isn’t yet, respondents noted:

Barriers or challenges encountered in using linked data resources included:

Coming next: Linked Data Survey results-Why and what institutions are publishing


About Karen Smith-Yoshimura

Karen Smith-Yoshimura, program officer, works on topics related to renovating descriptive and organizing practices with a focus on large research libraries and area studies requirements.

by Karen at September 01, 2014 08:00 AM

August 31, 2014

John Miedema

Physika, the next phase: Text analysis of the novel. Selected book reviews are back.

NovelTM is an international collaboration of academic and non-academic partners to produce the first large-scale quantitative history of the novel. It is a natural fit with my interests in cognitive technologies, text analytics, and literature. I am getting to know the players, and hope to contribute. Given that, I have reorganized things a bit here at my blog. The next “Wilson” iteration of my basement build of a cognitive system will focus on text analysis of the novel. Note too, I have brought back a number of book reviews related to text analysis of the novel. In particular, note my review of Orality and Literacy by Ong. In that review, back in 2012, I noted, “It blows my information technology mind to think how these properties might be applied to the task of structuring data in unstructured environments, e.g., crawling the open web. I have not stopped thinking about it. It may take years to unpack.” Two years later, I am slowly unpacking that insight at this blog. 

by johnmiedema at August 31, 2014 04:22 PM

Ed Summers

A Ferguson Twitter Archive

Much has been written about the significance of Twitter as the recent events in Ferguson echoed round the Web, the country, and the world. I happened to be at the Society of American Archivists meeting 5 days after Michael Brown was killed. During our panel discussion someone asked about the role that archivists should play in documenting the event.

There was wide agreement that Ferguson was a painful reminder of the type of event that archivists working to “interrogate the role of power, ethics, and regulation in information systems” should be documenting. But what to do? Unfortunately we didn’t have time to really discuss exactly how this agreement translated into action.

Fortunately the very next day the Archive-It service run by the Internet Archive announced that they were collecting seed URLs for a Web archive related to Ferguson. It was only then, after also having finally read Zeynep Tufekci‘s terrific Medium post, that I slapped myself on the forehead … of course, we should try to archive the tweets. Ideally there would be a “we” but the reality was it was just “me”. Still, it seemed worth seeing how much I could get done.


I had some previous experience archiving tweets related to Aaron Swartz using Twitter’s search API. (Full disclosure: I also worked on the Twitter archiving project at the Library of Congress, but did not use any of that code or data then, or now.) I wrote a small Python command line program named twarc (a portmanteau for Twitter Archive), to help manage the archiving.

You give twarc a search query term, and it will plod through the search results, in reverse chronological order (the order that they are returned in), while handling quota limits, and writing out line-oriented-json, where each line is a complete tweet. It worked quite well to collect 630,000 tweets mentioning “aaronsw”, but I was starting late out of the gate, 6 days after the events in Ferguson began. One downside to twarc is it is completely dependent on Twitter’s search API, which only returns results for the past week or so. You can search back further in Twitter’s Web app, but that seems to be a privileged client. I can’t seem to convince the API to keep going back in time past a week or so.

So time was of the essence. I started up twarc searching for all tweets that mention ferguson, but quickly realized that the volume of tweets, and the order of the search results meant that I wouldn’t be able to retrieve the earliest tweets. So I tried to guesstimate a Twitter ID far enough back in time to use with twarc’s --max_id parameter to limit the initial query to tweets before that point in time. Doing this I was able to get back to 2014-08-10 22:44:43 — most of August 9th and 10th had slipped out of the window. I used a similar technique of guessing a ID further in the future in combination with the --since_id parameter to start collecting from where that snapshot left off. This resulted in a bit of a fragmented record, which you can see visualized (sort of below):

In the end I collected 13,480,000 tweets (63G of JSON) between August 10th and August 27th. There were some gaps because of mismanagement of twarc, and the data just moving too fast for me to recover from them: most of August 13th is missing, as well as part of August 22nd. I’ll know better next time how to manage this higher volume collection.

Apart from the data, a nice side effect of this work is that I fixed a socket timeout error in twarc that I hadn’t noticed before. I also refactored it a bit so I could use it programmatically like a library instead of only as a command line tool. This allowed me to write a program to archive the tweets, incrementing the max_id and since_id values automatically. The longer continuous crawls near the end are the result of using twarc more as a library from another program.

Bag of Tweets

To try to arrange/package the data a bit I decided to order all the tweets by tweet id, and split them up into gzipped files of 1 million tweets each. Sorting 13 million tweets was pretty easy using leveldb. I first loaded all 16 million tweets into the db, using the tweet id as the key, and the JSON string as the value.

import json
import leveldb
import fileinput
db = leveldb.LevelDB('./tweets.db')
for line in fileinput.input():
    tweet = json.loads(line)
    db.Put(tweet['id_str'], line)

This took almost 2 hours on a medium ec2 instance. Then I walked the leveldb index, writing out the JSON as I went, which took 35 minutes:

import leveldb
db = leveldb.LevelDB('./tweets.db')
for k, v in db.RangeIter(None, include_value=True):
    print v,

After splitting them up into 1 million line files with cut and gzipping them I put them in a Bag and uploaded it to s3 (8.5G).

I am planning on trying to extract URLs from the tweets to try to come up with a list of seed URLs for the Archive-It crawl. If you have ideas of how to use it definitely get in touch. I haven’t decided yet if/where to host the data publicly. If you have ideas please get in touch about that too!

by ed at August 31, 2014 04:16 PM

Journal of Web Librarianship

Tutorials on Google Analytics: How to Craft a Web Analytics Report for a Library Web Site

Journal of Web Librarianship, Ahead of Print.

by Le Yang at August 31, 2014 05:11 AM

August 30, 2014

Patrick Hochstenbach

Cat doodles

Created during my commute this week from Brugge to Ghent. Keeping myself busy by creating cat doodles Filed under: Doodles Tagged: art, cartoon, cartoons, cat, Cats, comics, doodle, doodles

by hochstenbach at August 30, 2014 01:42 PM


Join the PeerLibrary Team

PeerLibrary is a rapidly expanding open source project that seeks to provide a collaborative layer of knowledge over academic publications by allowing users to share real-time highlights and annotations. We need your help to keep it growing. 

We are currently looking for programmers, designers, and community managers. These are volunteer positions*, and we are seeking a commitment of at least 3 to 5 hours per week. We are based in Berkeley, CA but welcome collaborators from everywhere. Interested? Get in touch with us at

*UC Berkeley students may receive academic credit through the Undergraduate Research Apprentice Program.


Volunteer will program components of the platform and gain experience working on an open source project. Optionally, there are also opportunities to learn more about reputation systems, ranking, and search.

Desired experience: Volunteers should know how to program in CoffeeScript and preferably have experience with the Meteor framework.

Interaction Designer

Volunteer will focus on designing the user interfaces of the project and their implementation. In the process, the designer will learn about designing user interfaces, evaluating them, and incorporating feedback from users.

Desired experience: Volunteers should be familiar with HTML and CSS, preferably the Stylus CSS dialect as well.

Community Manager

PeerLibrary is an open source project, which means that we rely heavily on the participation and contributions of the community around our project. There are two main segments of a community: developers who contribute to source code, and users who contribute, collaborate, and engage with content. Community managers would work with both segments of the community. Community managers will engage with new contributors and users, identify issues for contributors, and streamline the process for joining the project. Community managers will also be responsible for external communications, such as: reaching out to new users, keeping the Open Access / Open Knowledge community abreast of new developments in the project and presenting the project to the public. Community managers will learn about developer relations, how open source projects are structured and run, and gain experience in communications and community management.

Desired experience: Community managers should have strong written English and interest in issues related to Open Access and Open Knowledge, more broadly. Bring enthusiasm; we will provide the rest.

by quoththepythoness at August 30, 2014 04:46 AM

August 29, 2014

Open Library

Time travel through millions of historic Open Library images

The BBC has an article about Kalev Leetaru’s project to extract images from millions of Open Library pages.

You can read about how it works…

The Internet Archive had used an optical character recognition (OCR) program to analyse each of its 600 million scanned pages in order to convert the image of each word into searchable text. As part of the process, the software recognised which parts of a page were pictures in order to discard them.

Mr Leetaru’s code used this information to go back to the original scans, extract the regions the OCR program had ignored, and then save each one as a separate file in the Jpeg picture format. The software also copied the caption for each image and the text from the paragraphs immediately preceding and following it in the book. Each Jpeg and its associated text was then posted to a new Flickr page, allowing the public to hunt through the vast catalogue using the site’s search tool.

“I think one of the greatest things people will do is time travel through the images,” Mr Leetaru said.

… or just check out some of the results. Images plus citations plus metadata! We couldn’t be happier. Free to use with no restrictions.

Image from page 301 of "The New England magazine" (1887)

Image from page 788 of "St. Nicholas [serial]" (1873)

Image from page 210 of "Farmington, Connecticut, the village of beautiful homes" (1906)

Image from page 1121 of "The Saturday evening post" (1839)

Image from page 368 of "New England; a human interest geographical reader" (1917)

Image from page 902 of "Canadian grocer July-December 1896" (1889)

Image from page 249 of "Gleanings in bee culture" (1874)

Image from page 411 of "The Canadian druggist" (1889)

I even found a photo of my house!

Image from page 75 of "A text book of the geography, history, constitution and civil government of Vermont; also Constitution and civil government of the U. S., a publication expressly prepared to comply with Vermont's state school laws" (1915)

Read more details at the Internet Archive’s blog or on Flickr’s “Welcome to the Commons” post.

by Jessamyn West at August 29, 2014 10:52 PM


Linked Data Survey results 2: Examples in production



.Linked Data Survey Images of Production Projects or Services. jpg

OCLC Research conducted an international linked data survey for implementers between 7  July and 15 August 2014. This is the second post in the series reporting the results.

The survey received responses describing 38 linked data projects/services that are in production and accessible to others.

American Antiquarian Society’s General Catalog, which includes the North American Imprints Program.

American Numismatic Society’s thesaurus of numismatic concepts, used by archeological projects and museum databases, Archives, biographies, and online corpus of Roman imperial coins.

Archaeology Data Service’s STELLAR (Semantic Technologies Enhancing Links and Linked data for Archaeological Resources) project to enhance the discoverability, accessibility, impact and sustainability of ADS datasets.

British Library’s British National Bibliography, released to expose linked data in bulk, to explore feasibility of service delivery with semantic web technology, to develop our knowledge of semantic web technology and make our data available to new audiences.

The British Museum’s Semantic Web Collection, providing Sparql EndPoint point using the CIDOC CRM ontology for the Museum’s entire collection.

Charles University in Prague’s OpenData, an aggregation of public institutions’ datasets converted to linked data, and Drug Encyclopedia for physicians.

Colorado College Tutt Library’s TIGER catalog that connects to a Linked-Data back-end semantic server.

Colorado State University’s archived datasets from the NSF-funded Shortgrass Steppe-Long-Term Ecological Research station in northern Colorado, linking both the data and everything associated with the data- images, technical reports, theses and dissertations, articles, etc.-in its digital repository.

DANS’ (Data Archiving and Networked Services, Royal Netherlands Academy of Arts and Sciences)’  Cedar project which takes Dutch census data as its starting point to build a semantic data-web of historical information.

Europeana, which aggregates metadata for digital objects from museums, libraries, archives and audiovisual archives across Europe. It has developed the Europeana Data Model (EDM) which is based on the principles of the Semantic Web.

Fundacción Ignacio Larramendi’s Polymath Virtual Library, which brings together information, data, digital texts and websites about Spanish, Hispano-American, Brazilian and Portuguese polymaths from all times. It aggregates information about the thinking, philosophy, politics, science, etc. from Spain, Hispano-American, Portugal and Brazil written in any language (Latin, Arabic, Hebrew, Spanish, Portuguese …) and at any time (since Seneca in the first century BC to the present).

Goldsmith College’s metadata for a collection of 16th-century printed music sources from the British Library and Transforming Musicology project.

Library of,  a Linked Data Service providing access to LC authority and vocabulary data for consumption by anyone using the data, and BIBFRAME, the technical website for the Bibliographic Framework Initiative hosting the vocabulary and  tools to assist with evaluating the nascent vocabulary and model by viewing actual data encoded using the vocabulary.

Missoula  Public Library’s Newspaper Index.

North Carolina State University’s Organization Name Linked Data, a tool to manage the variant forms of names for journal and e-resource publishers, providers, and vendors in its locally-developed electronic resource management system.

Norwegian University of Science and Technology’s special collections data, to provide a functional workflow and website for digitized manuscript production and delivery based on RDF and linked data, an index to various linked data projects, and personal name authorities from the BIBSYS consortium.

OCLC’s Dewey Decimal Classification in multiple languages and multiple editions as linked data; FAST (Faceted Application of Subject Terminology) Linked Data, for sharing the FAST authorities so they can be used elsewhere; ISNI (International Standard Name Identifier) providing identifiers, name, and links to sources; VIAF (Virtual International Authority File), which merges name authority files from around the world, primarily from national libraries;, representing the world’s largest network of library content and services-making it available as linked data was a proof of concept to learn about data mining, semantic content publishing, data modeling and gap in technologies; Works, high-level descriptions of the resources in WorldCat, containing information such as author, title, descriptions, subjects etc., common to all editions of the work, plus links to the record-level descriptions already shared in the experimental WorldCat Linked Data.

Oslo Public Library’s library catalog data converted from MARC to RDF linked data, enriched with information harvested from external sources and constructed with SPARQL update queries, to fuel their own services depending on more complex queries and data not included in MARC records, such as FRBR relations, book reviews, cover images etc.; collection of book reviews written by Norwegian libraries and linked to bibliographic data; and a web directory for pupils with links to selected webpages, described with metadata and  connected to a topic structure linked to DBpedia resources.

Public Record Office, Victoria’s PROV wiki, a very early attempt to structure archival metadata using semantic mediawiki technology and to provide an alternate gateway into a small selection of the holdings that was also machine readable.

Smithsonian LibrariesBooks Online, scanned books from their collection that are presented with RDFa bibliographic data to make their content more reusable and available to others.

University College Dublin’s Digital Library, where all resources captured and managed within the UCD digital repository environment are exposed as linked data, their approach to managing metadata for digital collections,  to provide maximum flexibility in resource discovery and dissemination.

University of Alberta, partner in the Pan-Canadian Documentary Heritage Network’s Out of the Trenches, a “proof-of-concept” to showcase a sampling of the network’s wealth of digital resources using linked open data and principles of  the semantic web to maximize discovery by a broad user community.

University of North TexasUNT Name App, providing authority services for the UNT Digital Libraries’ ETD Collection and UNT Scholarly Works; supports linking between local authority and other systems including VIAF,, wikipedia, ORCID and local faculty profiles which can be used in metadata creation.

Yale Center for British Art’s art collection data set, a Linked Open Data resource to build an environment for the development of British Art scholarship.


About Karen Smith-Yoshimura

Karen Smith-Yoshimura, program officer, works on topics related to renovating descriptive and organizing practices with a focus on large research libraries and area studies requirements.

by Karen at August 29, 2014 06:49 PM

Library of Congress: The Signal

Upgrading Image Thumbnails… Or How to Fill a Large Display Without Your Content Team Quitting

The following is a guest post by Chris Adams from the Repository Development Center at the Library of Congress, the technical lead for the World Digital Library.

Preservation is usually about maintaining as much information as possible for the future but access requires us to balance factors like image quality against file size and design requirements. These decisions often require revisiting as technology improves and what previously seemed like a reasonable compromise now feels constricting.

I recently ran into an example of this while working on the next version of the World Digital Library website, which still has substantially the same look and feel as it did when the site launched in April of 2009. The web has changed considerably since then with a huge increase in users on mobile phones or tablets and so the new site uses responsive design techniques to adjust the display for a wide range of screen sizes. Because high-resolution displays are becoming common, this has also involved serving images at larger sizes than in the past — perfectly in keeping with our goal of keeping the focus on the wonderful content provided by WDL partners.

When viewing the actual scanned items, this is a simple technical change to serve larger versions of each but one area posed a significant challenge: the thumbnail or reference image used on the main item page. These images are cropped from a hand-selected master image to provide consistently sized, interesting images which represent the nature of the item – a goal which could not easily be met by an automatic process. Unfortunately the content guidelines used in the past specified a thumbnail size of only 308 by 255 pixels, which increasingly feels cramped as popular web sites feature much larger images and modern operating systems display icons as large as 256×256 or even 512×512 pixels. A “Retina” icon is significantly larger than the thumbnail below:

Icon SizesGoing back to the source

All new items being processed for WDL now include a reference image at the maximum possible resolution, which the web servers can resize as necessary. This left around 10,000 images which had been processed before the policy changed and nobody wanted to take time away from expanding the collection to reprocess old items. The new site design allows flexible image sizes but we wanted to find an automated solution to avoid a second-class presentation for the older items.

Our original master images are much higher resolution and we had a record of the source image for each thumbnail but not the crop or rotation settings which had been used to create the original thumbnail. Researching the options for reconstructing those settings lead me to OpenCV, a popular open-source computer vision toolkit.

At first glance, the OpenCV template matching tutorial appears to be perfect for the job: give it a source image and a template image and it will attempt to locate the latter in the former. Unfortunately, the way it works is by sliding the template image around the source image one pixel at a time until it finds a close match, a common approach but one which fails when the images differ in size or have been rotated or enhanced.

Fortunately, there are far more advanced techniques available for what is known as scale and rotation invariant feature detection and OpenCV has an extensive feature detection suite. Encouragingly, the first example in the documentation shows a much harder variant of our problem: locating a significantly distorted image within a photograph – fortunately we don’t have to worry about matching the 3D distortion of a printed image!

Finding the image

The locate-thumbnail program works in three steps:

  1. Locate distinctive features in each image, where features are simply mathematically interesting points which will hopefully be relatively consistent across different versions of the image – resizing, rotation, lighting changes, etc.
  2. Compare the features found in each image and attempt to identify the points in common
  3. If a significant number of matches were found, replicate any rotation which was applied to the original image
  4. Generate a new thumbnail at full resolution and save the matched coordinates and rotation as a separate data file in case future reprocessing is required

You can see this process in the sample visualizations below which have lines connecting each matched point in the thumbnail and full-sized master image:

An Actor in the Role of Sato Norikiyo who Becomes Saigyo: An Actor in the Role of Yoshinaka.

An Actor in the Role of Sato Norikiyo who Becomes Saigyo: An Actor in the Role of Yoshinaka.

Maps of Ezo, Sakhalin, and Kuril Islands - note the rotation.

Maps of Ezo, Sakhalin, and Kuril Islands – note the rotation.

The technique even works surprisingly well with relatively low-contrast images such as this 1862 photograph from the Thereza Christina Maria Collection courtesy of the National Library of Brazil where the original thumbnail crop included a great deal of relatively uniform sky or water with few unique points:

<a href="">"Gloria Neighborhood"</a>

“Gloria Neighborhood”

Scaling up

After successful test runs on a small number of images, locate-thumbnail was ready to try against the entire collection. We added a thumbnail reconstruction job to our existing task queue system and over the next week each item was processed using idle time on our cloud servers. Based on the results, some items were reprocessed with different parameters to better handle some of the more unusual images in our collection, such as this example where the algorithm matched only a few points in the drawing, producing an interesting but rather different result:

Vietnam Veterans Memorial, Competition Drawing.

Vietnam Veterans Memorial, Competition Drawing.

Reviewing the results

Automated comparison

For the first pass of review, we wanted a fast way to compare images which should be very close to identical. For this work, we turned to libphash which attempts to calculate the perceptual difference between two images so we could find gross failures rather than cases where the original thumbnail had been slightly adjusted or was shifted by an insignificant amount. This approach is commonly used to detect copyright violations but it also works well as a way to quickly and automatically compare images or even cluster a large number of images based similarity.

A simple Python program was created and run across all of the reconstructed images, reporting the similarity of each pair for human review. The gross failures were used to correct bugs in the reconstruction routine and a few interesting cases where the thumbnail had been significantly altered, such as this cover page where a stamp added by a previous owner had been digitally removed:

7778 original7778 reconstructed





 now shows that this was corrected to follow the policy of fidelity to the physical item.

Human review

The entire process until this point has been automated but human review was essential before we could use the results. A simple webpage was created which offered fast keyboard navigation and the ability to view sets of images at either the original or larger sizes:

Screen Shot 2014-08-03 at 18.42.23This was used to review items which had been flagged by phash as less than matching below a particular threshold and to randomly sample items to confirm that the phash algorithm wasn’t masking differences which a human would notice.

In some cases where the source image had interacted poorly with the older down-sampling, the results are dramatic – the reviewers reported numerous eye-catching improvements such as this example of an illustration in an Argentinian newspaper:

Illustration from “El Mosquito, March 2, 1879″ (original).

Illustration from “El Mosquito, March 2, 1879″ (reconstructed).



This project completed towards the end of this spring and I hope you will enjoy the results when the new version of launches soon. On a wider scale, I also look forward to finding other ways to use computer-vision technology to process large image collections – many groups are used to sophisticated bulk text processing but many of the same approaches are now feasible for image-based collections and there are a number of interesting possibilities such as suggesting items which are visually similar to the one currently being viewed or using clustering or face detection to review incoming archival batches.

Most of the tools referenced above have been released as open-source and are freely available:

by Butch Lazorchak at August 29, 2014 05:56 PM

August 28, 2014


Linked Data Survey results 1 – Who’s doing it

LOD_Cloud_Diagram_as_of_September_2011 wikimedia.orgOCLC Research conducted an international linked data survey for implementers between 7 July and 15 August 2014. This is the first post in the series reporting the results. 

We received 92 responses to the international linked data survey conducted between 7 July and 15 August 2014. (So who is using linked data? And for what?) OCLC colleagues also responded to the survey and reported on 6 projects/services for comparison. So now we have some answers to share!

Although the survey was designed specifically for implementers of linked data projects/services, 26 of the 92 responses said they had not implemented nor were implementing a linked data project. Seven of them plan on implementing a linked data project within the next two years and 10 are planning to apply for funding to implement one. Some of them also pointed to interesting linked data projects they’re tracking, which included respondents to the survey (Oslo Public Library, BIBFRAME, Europeana, Yale Center for British Art).

The remaining 66 responses reported implementing 160 linked data projects/services; 68 of them are described. 23 of the projects consume linked data; 3 publish linked data; 42 both consume and publish linked data.  We have a good international representation. Just over half are linked data projects/services in the US, but the rest are from 14 countries: Australia, Canada, Czech Republic, France, Germany, Ireland, Italy, The Netherlands, Norway, Singapore, South Korea, Spain, Switzerland, and the UK. I’ve appended the list of the 44 institutions that took the time to describe their linked data projects/services below (several described multiple projects).

Of the 68 projects:

27 are not yet in production;
9 have been in production for less than one year;
12 have been in production for more than one year but less than two years;
20 have been in production for more than two years.

Four of the projects are “private”, for that institution’s use only. Most projects/services that have been implemented have been receiving an average of fewer than 1,000 requests a day over the last six months. The most heavily used non-OCLC linked data datasets as measured by average number of requests a day:

For comparison, these are the responses for six OCLC linked data projects/services:

Project/Service Consume or Publish LD How long in production Av. no. of requests/day
Dewey Publish More than two years 10,000 – 50,000/day *
FAST Both consume & publish More than two years 1,000 – 10,000/day
ISNI Publish Less than one year Fewer than 1,000/day
VIAF Both consume & publish More than two years More than 100,000/day Both consume & publish More than two years 16 million/day ** Works Both consume & publish Less than one year More than 100,000/day

*  Reflects only HTML pages served, not requests to the RDF data.
**  All pages include linked data (RDFa)

Since so many projects have not been implemented or implemented relatively recently, only 41 could assess whether the linked data project/service was successful in achieving its desired outcome. More than half (28) said it was successful or mostly successful.  Success measures included “increasing international use”, improved workflow, moving from a pilot to production, ability to link data by relationships, just making the data available as linked data, and professional development of staff. Several noted the need for better metrics to assess the service value, the challenges of harmonizing the data, and the lack of identifiers to link headings to.

Parts of institution involved: Most of the respondents reported that multiple units of the institution were involved in their linked data project/service. When only one part of an institution was involved, it was more likely the research and development group (4 projects, but cited in 15 projects total.)  Library and/or archives were involved the most, cited as being involved in 43 projects. Metadata services was the next most-involved, in 34 projects. Digital library services and library systems/IT or campus IT were involved in a third or more of the projects. Seventeen involved digital humanities and/or faculty in academic departments. The University College Dublin’s Digital Library involved the most units of all the described projects, with 10: library, archives, metadata services, digital library services, library systems/information technology, research and development group, computer science department, digital humanities, campus museum, faculty in academic departments.Since so many projects have not been implemented or implemented relatively recently, only 41 could assess whether the linked data project/service was successful in achieving its desired outcome. More than half (28) said it was successful or mostly successful.  Success measures included “increasing international use”, improved workflow, moving from a pilot to production, ability to link data by relationships, just making the data available as linked data, and professional development of staff. Several noted the need for better metrics to assess the service value, the challenges of harmonizing the data, and the lack of identifiers to link headings to.

External groups involved: Seventeen of the projects did not involve any external groups or organizations.  Thirteen were part of a national and/or international collaboration. Twenty-one involved other libraries or archives and 18 other universities or research institutions. Thirteen involved a systems vendor or a corporation/company.  Eight collaborated with other consortium members; eight were part of a discipline-specific collaboration. Europeana listed the most external groups, with 6: other libraries/archives, other universities/research institutions, other members of their consortium, part of a discipline-specific collaboration, part of an international collaboration, and a large network of experts working in cultural heritage.

Staffing:  Almost all of the institutions that have implemented or are implementing linked data projects/services (55) have added linked data to the responsibilities of current staff; only 10 have not. Nine have staff dedicated to linked data projects (five of them in conjunction with adding linked data to the responsibilities of current staff). Five are adding or have added new staff with linked data expertise; ten are adding or have added temporary staff with linked data expertise; and seven are hiring or have hired external consultants with linked data expertise.

Funding: Twenty-nine of the projects received grant funding to implement linked data; most projects (46) were covered by the library/archive or the parent institution. Three linked data projects received funding support from partner institutions; four linked data projects were privately funded.

Linked data survey respondents describing linked data projects/services

American Antiquarian Society
American Numismatic Society
Archaeology Data Service (UK)
Biblioteca della Camera dei deputati (Italy)
British Library
British Museum
Carleton College
Charles University in Prague
Colorado College
Colorado State University
Cornell University
Data Archiving and Networked Services, Royal Netherlands Academy of Arts and Sciences
Digital Public Library of America
Europeana Foundation
Fundacción Ignacio Larramendi (Spain)
Goldsmiths’ College
Library of Congress
Minnesota Historical Society
Missoula Public Library
National Library Board (NLB) of Singapore
National Library of Medicine
North Carolina State University Libraries
NTNU (Norwegian University of Science and Technology) University Library
Oslo Public Library
Public Record Office, Victoria (Australia)
Queen’s University Library (Canada)
Stanford University
The University of Texas at Austin
Tresoar (Leeuwarden – The Netherlands)
University College Dublin (Ireland)
University College London (UCL)
University of Alberta Libraries
University of Bergen Library (Norway)
University of British Columbia
University of California-Irvine
University of Illinois at Urbana-Champaign
University of North Texas
University of Oxford
University of Pennsylvania Libraries
Western Michigan University
Yale Center for British Art

About Karen Smith-Yoshimura

Karen Smith-Yoshimura, program officer, works on topics related to renovating descriptive and organizing practices with a focus on large research libraries and area studies requirements.

by Karen at August 28, 2014 09:11 PM

FOSS4Lib Recent Releases

Koha - 3.16.3

Release Date: 
Wednesday, August 27, 2014

Last updated August 28, 2014. Created by David Nind on August 28, 2014.
Log in to edit this page.

This is a maintenance release. It also includes a major enhancement to the circulation pages.

by David Nind at August 28, 2014 08:18 PM


The Nicollet County Historical Society and the Digital Library of America

The Nicollet County Historical Society is located in St. Peter, Minnesota.  Like other such societies, we seek to tell the story of our county through the use of exhibits, books, newspapers, documents, photographs, artifacts, and other means.  We are based in the Treaty Site History Center at 1851 North Minnesota Avenue, and we own the 1871 E. St. Julien Cox Historic Home in St. Peter.

We were among the first of the state’s historical societies to participate in the Minnesota Digital Library’s Minnesota Reflections project. We have posted a large number of photographs that span the years from 1861 to 1998.  Other digital highlights from our collections include prints of Dakota life by Seth Eastman, maps of ghost towns in Nicollet County, the muster roll of Company E of the First Mounted Rangers, letters from the Dakota War, 5×7 inch photographic negatives for making postcards, a ledger containing commercial letterheads and standardized forms and documents from the 1800s, the St. Peter School Board’s meeting minutes from 1865 to 1899, and four atlases or plat books for the years 1885, 1899, 1913, and 1927.  It is of interest that the names of three men who served as governors of Minnesota appear on page one of the school board’s minutes. All of this content has also been shared with the DPLA in order to provide national exposure to our unique materials.

If you've ever seen a presentation by DPLA staff, you may recall this image from the Nicollet County Historical Society. Rudolph Volk and Martin Klein in an old automobile, east of St. Peter, Minnesota, ca.1907. Nicollet County Historical Society via the Minnesota Digital Library.

If you’ve ever seen a presentation by DPLA staff, you may recall this image of a Maxwell automobile from the Nicollet County Historical Society. Rudolph Volk and Martin Klein in an old automobile, East of St. Peter, Minnesota, ca.1907.

The Minnesota Digital Library has been extremely generous in their support of our work during several years of participation in the Minnesota Reflections project.  Their wonderful assistance has been very greatly appreciated!

Having a large and diverse amount of material on the Internet has greatly improved our ability to serve our patrons.  Requests for images are numerous, especially for photographs of people, buildings, and places.  Images have been sent to many locations throughout Minnesota and other states, and to other countries as well.

When the local grocery store was being remodeled they asked us if we could provide a number of images showing how the city of St. Peter has evolved over time.  Because so many images are available online through both the Minnesota Reflections and DPLA websites, it was very easy for the store representatives to find images that fit their needs.  Today, large reprints of several photographs are on display throughout the store.  We also provided the owners of a local restaurant with images of St. Peter that are now featured prominently on their walls and menus.  Such images can be seen in several other businesses in the community. Having our historic images displayed throughout our county has provided the society with additional opportunities to share the history of Nicollet County and to highlight the important work we do in preserving that story.

Being involved in the MDL and the DPLA is a golden opportunity.  The items that we submit are digitized and placed on the Internet.  We receive high-resolution copies of all of the submitted items, which we can store off-site as backup copies of our material.  This collaborative project vastly increases attention and access to our unique holdings.

Featured image credit: Detail of Valley of the St. Peters, Minnesota, 1941-1855. Eastman, Seth, 1808-1875. Nicollet County Historical Society via the Minnesota Digital Library.

cc-by-iconAll written content on this blog is made available under a Creative Commons Attribution 4.0 International License. All images found on this blog are available under the specific license(s) attributed to them, unless otherwise noted.

by Amy Rudersdorf at August 28, 2014 06:00 PM

Library of Congress: The Signal

Perpetual Access and Digital Preservation at #SAA14

A panel discussion at the SAA 2014 conference. Photo credit: Trevor Owens.

A panel discussion at the SAA 2014 conference. Photo credit: Trevor Owens.

I had the distinct pleasure of moderating the opening plenary session of the Joint Annual Meeting of COSA, NAGARA and SAA in Washington D.C. in early August. The panel was on the “state of access,” and I shared the dais with David Cuillier, an Associate Professor and Director of the University of Arizona School of Journalism, as well as the president of the Society of Professional Journalists; and Miriam Nisbet, the Director of the Office of Government Information Services at the National Archives and Records Administration.

The panel was a great opportunity to tease out the spaces between the politics of “open government” and the technologies of “open data” but our time was much too short and we had to end just when the panelists were beginning to get to the juicy stuff.

There were so many more places we could have taken the conversation:

I must admit that when I think of “access” and “open information” I’m thinking almost exclusively about digital data because that’s the sandbox I play in. At past SAA conferences I’ve had the feeling that the discussion of digital preservation and stewardship issues was something that happened in the margins. At this year’s meeting those issues definitely moved to the center of the conversation.

Just look at this list of sessions running concurrently during a single hour on Thursday August 14, merely the tip of the iceberg:

There were also a large number of web archiving-related presentations and panels including the SAA Web Archiving Roundtable meeting (with highlights of the upcoming NDSA Web Archiving Survey report), the Archive-IT meetup and very full panels Friday and Saturday.

saa-innovator-owensI was also pleased to see that the work of NDIIPP and the National Digital Stewardship Alliance was getting recognized and used by many of the presenters. There were numerous references to the 2014 National Agenda for Digital Stewardship and the Levels of Preservation work and many NDSA members presenting and in the audience. You’ll find lots more on the digital happenings at SAA on the #SAA14 twitter stream.

We even got the chance to celebrate our own Trevor Owens as the winner of the SAA Archival Innovator award!

The increased focus on digital is great news for the archival profession. Digital stewardship is an issue where our expertise can really be put to good use and where we can have a profound impact. Younger practitioners have recognized this for years and it’s great that the profession itself is finally getting around to it.

by Butch Lazorchak at August 28, 2014 05:50 PM


Open Legal Committee Call: September 3, 2:00 PM Eastern

The Legal Committee will hold an open call on Wednesday, September 3 at 2:00 PM EDT. The agenda can be found below. To register, follow the link below and complete the short form.

You can register for the call by visiting


  1. The DPLA/Europeana rights metadata meeting in October
  2. Upcoming additional House Judiciary Committee hearings on copyright reform
  3. Brief review of the Authors Guild v. HathiTrust appeal outcome, and an update on the Authors Guild v. Google appeal
  4. Likelihood of additional reports from the Copyright Office, and the release of the USPTO’s “White Paper” on copyright reform, expected in early 2015
  5. Recent articles from the Berkeley Symposium on copyright reform, including papers on fair use and “permitted but paid” uses (by Jane Ginsburg), legislating digital exhaustion (by Jason Schultz and Aaron Perzanowski) and a paper on Section 108 reform by David Hansen

cc-by-iconAll written content on this blog is made available under a Creative Commons Attribution 4.0 International License. All images found on this blog are available under the specific license(s) attributed to them, unless otherwise noted.

by Kenny Whitebloom at August 28, 2014 03:43 PM

District Dispatch

“How do you plead?” Guilty on all counts for thinking E-rate is cool

#490607823 /

Phone rings. Caller asks: “What are you working on?”

Answer: “E-rate.”

Dinner conversation: “What did you talk about today, mom?”

Answer (supplied by child 2 and 3 in unison): “E-rate.”

Latest office joke: “Why is Marijke in traffic court?”

Answer: “E-rate.”

E-rate, broadband, and the goals that have guided us

For those of you following the American Library Association’s (ALA) E-rate year, we are working through the fifth major installment in a series of actions by the Federal Communications Commission (FCC), responding to the Further Notice of Proposed Rulemaking (FNPRM) issued in July as part of the E-rate Modernization Order. And, because we have been immersed in E-rate pretty steadily for a year, the topic “E-rate Modernization” is really the only answer to questions about what I do.

As we prepare for both ALA’s comments on the FNPRM (comments due to the Commission September 15—so why am I writing a blog post instead of drafting a response to questions on multi-year contracts?) and for our panel session at the 2014 Telecommunications Policy Research Conference (which will take place amazingly on September 12th), I have been reflecting over the ways in which we have engaged with the Commission, the Hill, our coalitions, our members, other library organizations, the press, and others to make strategic decisions and identify ALA’s policy positions.

If I am boastful, I can say, that we worked diligently over the last year. If I am critical, I can see a series of tipping points where we chose one path over another, which opened opportunities while closing off others. Either way, the decisions we made were in line with the goals we set for ALA at the beginning of the E-rate proceeding, which we saw then and now as an opportunity to increase the percentage of libraries with high-capacity broadband at affordable rates. The goals we set include:

Shaping what you have into what you want

Our first choice for the Commission would have been to tackle the “fiber gap” (term that has emerged in the second phase of the Modernization) among libraries before addressing the Wi-Fi- gap. However, when it became apparent that the Commission would address the lack of Wi-Fi capacity for libraries and schools first, we focused on that priority and the commitment of the Commission that this was one phase of a multi-phase process.

At that point we had to answer the question, “What could we gain for libraries in this first step while holding out for a larger pay-off in the second phase?” The interplay between this short-term and long-term strategizing colored the last stages of our advocacy at the Commission and among stakeholders, and with our members and library organizations. Now that we are officially in the second phase, our sleeves are rolled up, teeth bared and claws extended.

These thoughts, as well as “OMG, what are we doing about data that describes the costs to get libraries up to the 1 gigabit goal” ran through my head while I waited for my turn before the bench at traffic court. And why did I spend my afternoon in traffic court? Did you think this was going to be a concise blog post? Is anything E-rate related concise?

E-rate enables anywhere, anytime learning

At the height of negotiations to come to an Order—which we fully supported happening—there was significant back-and-forth among various stakeholders (each with different agendas), numerous ex parte meetings with Commission staff and long phone calls and meetings at the Washington Office. Coincidentally, my then 10th-grade son’s history class unit was on regulatory agencies, and being a teacher at heart, how could I help myself? The E-rate Modernization proceeding makes a perfect case study for a lesson on the responsibilities of federal regulatory agencies and of Congress and how good public policy is made. Poor kid, right? On the contrary.

Explaining E-rate and talking about how a relatively small player like ALA advocates effectively became an exemplary mashup of teen culture and wonky discussions. For example, what do you say to someone who shares information that is not ready to be shared? “Not cool dude.” Getting libraries included routinely in mentions of E-rate? “That’s a mission.” If, in a public document, there is language that could be interpreted such that it clearly dismissed one perspective in favor of another but not overtly, how would you describe this action? “Sneak dissing.” And to the discussion that resulted in traffic court, how does an advocate tread the fine line between passion for an issue and rational decision-making and how does an advocate prevent a personal agenda from influencing strategy on behalf of stakeholders?

Despite my New England pragmatism and Dutch stubbornness, I have a good dose of southern French exuberance. So, in the heat of describing the latest battle, making an extremely important point to the 10th grader about the appalling vitriol that had emerged at the tail end of the proceeding before the July Commission vote that resulted in the Order and FNPRM and how that vitriol was unfortunately influencing policy… I may have not come to a complete stop. Result? An afternoon in traffic court. “Kiiillll” said with appropriate sighing and disbelief (reflecting the sentiment in teen-speak). This may be the only record of a moving violation caused by E-rate (“That’s a bet” or more simply, “bet!”).

So at the recommendation of the police officer issuing the ticket, I plead “guilty with an explanation.” My explanation? “E-rate is really cool.”

The post “How do you plead?” Guilty on all counts for thinking E-rate is cool appeared first on District Dispatch.

by Marijke Visser at August 28, 2014 02:58 PM

OCLC Dev Network

Software Development Practices: Telling Your User's Story

Last week was the first in a series of posts about product development practices. In that post, Shelley discussed how important it is to identify the problem that the user needs to solve. This post will describe how anyone can write and understand User Stories that articulate the problem.

User Stories are informal, simple, short and concise statements about what the user wants to accomplish. They focus entirely on what the user wants to do and deliberately avoid talking about how the technical solution will work.

by Karen Coombs at August 28, 2014 02:00 PM

Tara Robertson

Internet Use Policy across Canadian public libraries

I’ve been pretty critical of Vancouver Public Library’s new Internet Use Policy. After sending a letter to their Board I was wondering what other public library policies were like. VPL is a member of the Canadian Urban Libraries Council, so I thought it would be interesting to see what other member libraries policies were.

I put up a spreadsheet on Google Drive and got help from some other librarians (thanks Justin, Myron and Sarah for your help and translations). Here’s my initial thoughts.

VPL’s policy isn’t the worst.

Here are some things that I was a bit shocked to learn:

I was surprised at how many libraries policies include phrases like sexually explicit materials, pornography, overt sexual images. Richmond Hill Library and Regina Public Library‘s policies mention “illicit drug literature”.  A few libraries mention hate literature, hate speech or incitement to hate and hateful propaganda. A handful of libraries mention that copyright infringement is prohibited.

It was disappointing that some libraries (Bibliothèque Ville de Laval, and Guelph Public Library) don’t seem to have their internet use policies published on their website.

So many of these policies sound like the 90s. There’s a lot of language about the internet being unregulated and that some of the information on the library may not be accurate, complete, or current and there may be controversial information out there. I read the phrase “The Library is not responsible for the site content of links or secondary links from its home pages” more than once. I think that these days we accept these things as common knowledge.  Greater Victoria Public Library‘s policy states that their “website ( recommends sites that provide quality information resources for both adults and children.” This seems like a very dated way of viewing information literacy.

Toronto Public Library‘s policy is worth reading. I like that it’s written in plain English. I think they do a good job of  acknowledging that users are sharing public space  without singling out sexually explicit content:

Internet workstations are situated in public areas, and users are expected to use the Internet in accordance with this environment. All users of the Toronto Public Library, including users of the Library’s Internet services, are also expected to follow the Library’s Rules of Conduct which are designed to ensure a welcoming environment. Disruptive, threatening, or otherwise intrusive behaviour is not allowed and Library staff are authorized to take action.

I’m not sure how this policy is being applied, it could be good or a bit of a disaster. I don’t know.

by Tara Robertson at August 28, 2014 05:07 AM

Jonathan Rochkind

UIUC and Academic Freedom

Professor Steven Salaita was offered a job at the University of Illinois in Urbana-Champaign (UIUC), as associate professor of American Indian Studies, in October 2013. He resigned his previous position at Virginia Tech, and his partner also made arrangements to move with him. 

On August 1 2014, less than a month before classes were to begin, the UIUC Chancellor rescinded the offer, due to angry posts he had made on Twitter about Israel’s attack on Gaza. 

This situation seems to me to be a pretty clear assault on academic freedom. I don’t think the UIUC or it’s chancellor dispute these basic facts — Chancellor Wise’s letter and the Board of Trustees statement of support for the Chancellor claim that “The decision regarding Prof. Salaita was not influenced in any way by his positions on the conflict in the Middle East nor his criticism of Israel”, but is somewhat less direct in explaining on what grounds ‘the decision’ was made, but imply that Salaita’s tweets constituted “personal and disrespectful words or actions that demean and abuse either viewpoints themselves or those who express them,” and that this is good cause to rescind a job offer (that is, effectively fire a professor).  (Incidentally, Salaita has a proven history of excellence in classroom instruction, including respect for diverse student opinions). 

[I have questions about what constitutes "demeaning and abusing viewpoints themselves", and generally thought that "demeaning viewpoints themselves", although never one's academic peers personally, was a standard and accepted part of scholarly discourse. But anyway.]

I’ve looked through Salaita’s tweets, and am not actually sure which ones are supposed to be the ones justifying effective dismissal.   I’m not sure Chancellor Wise or the trustees are either.  The website Inside Higher Ed made an open records request and received emails indicating that pressure from U of I funders motivated the decision — there are emails from major donors and university development (fund-raising) administrators pressuring the Chancellor to get rid of Salaita. 

This raises academic freedom issues not only in relation to firing a professor because of his political beliefs; but also issues of faculty governance and autonomy, when an administrator rescinds a job offer enthusiastically made by an academic department because of pressure from funders. 

I’ve made no secret of my support for Palestinian human rights, and an end to the Israeli occupation and apartheid system.  However, I stop to consider whether I would have the same reaction if a hypothetical professor had made the same sorts of tweets about the Ukraine/Russia conflict (partisan to either side), or tweeting anti-Palestinian content about Gaza instead. I am confident I would be just as alarmed about an assault on academic freedom. However, the fact that it’s hard to imagine funders exerting concerted pressure because of a professor’s opinions on Ukraine — or a professor’s anti-Palestinian opinions — is telling about the political context here, and I think indicates that this really is about Salaita’s “positions on the conflict in the Middle East and his criticism of Israel.”

So lots of academics are upset about this. So many that I suspected, when this story first developed, the UIUC would clearly have to back down, but instead they dug in further. The American Association of University Professors (AAUP) has expressed serious concern about violations of Salaita’s academic freedom — and the academic freedom of the faculty members who selected him for hire. The AAUP also notes that they have “long objected to using criteria of civility and collegiality in faculty evaluation,” in part just because of how easy it is to use those criteria as a cover for suppression of political dissent. 

The Chronicle of Higher Ed, in a good article covering the controversy, reports that “Thousands of scholars in a variety of disciplines signed petitions pledging to avoid the campus unless it reversed its decision to rescind the job offer,” and some have already carried through on their pledge of boycott. Including David J. Blacker, director of the Legal Studies Program and a professor of Philosophy at the University of Deleware, who cancelled an appearance in a prestigious lecture series. The UIUC Education Justice project cancelled a conference due to the boycott. The executive council of the Modern Language Association has sent a letter to UIUC urging them to reconsider. 

This isn’t a partisan issue. Instead, it’s illustrative of the increasingly corporatized academy, where administrative decisions in deference to donor preferences or objections take precedence over academic freedom or faculty decisions about their own departmental hiring and other scholarly matters.  Also, the way the university was willing to rescind a job offer due to political speech after Salaita had resigned his previous position, reminds us of the general precarity of junior faculty careers, and the lack of respect and dignity faculty receive from university administration.  

A variety of disciplinary-specific open letters and boycott pledges have been started in support of Salaita.

I think librarians have a special professional responsibility to stand up for academic freedom.  

Dr. Sarah T. Roberts, a UIUC LIS alumnus and professor of Media Studies at Western University in Ontario, hosts a pledge in support of Salaita from LIS practitioners, students and scholars, with a boycott pledge to “not engage with the University of Illinois at Urbana-Champaign, including visiting the campus, providing workshops, attending conferences, delivering talks or lectures, offering services, or co-sponsoring events of any kind.”  

I’ve signed the letter, and I encourage you to consider doing so as well. I know I see at least one other signer I know from the Code4Lib community already.   I think it is important for librarians to take action to stand up for academic freedom. 

Filed under: Uncategorized

by jrochkind at August 28, 2014 04:27 AM

Eric Lease Morgan

Hundredth Psalm to the Tune of "Green Sleeves": Digital Approaches to Shakespeare's Language of Genre

Provides a set of sound arguments for the use of computers to analyze texts, and uses DocuScope as an example.

by Eric Lease Morgan ( at August 28, 2014 04:00 AM

Evergreen ILS

Bug Squashing Day Wrap-Up

We Came, We Saw, We Squashed Bugs.

The Evergreen community held its first Bug Squashing Day August 26. The day was an opportunity for the entire community to focus on bugs: confirming bugs, coding bug fixes, testing patches, and merging signed-off patches into the core code. By the end of the day, eleven bug fixes were merged into the Evergreen core code. There were also several other bugs they made forward progress as testers provided feedback and contributors created patches. You can see a synopsis of the day’s activities in our August 2014 Evergreen Bug Squashing Day Activity sheet.

Here are some highlights from the day:

Although Bug Squashing Day officially ended August 26, the momentum continued through the 27th as patches worked on during Bug Squashing Day continued to make their way to the Evergreen working repository.

Special thanks go to Blake Henderson (MOBIUS) and Thomas Berezansky (MVLC) for setting up the Sandboxes that made it easy for many in the community to test these bug fixes, and to Justin Hopkins (MOBIUS) and Jason Stephenson (MVLC) for volunteering them. The hardware for the sandboxes was provided by MOBIUS and MassLNC.

Also, a big thank you to the people listed below who participated in Bug Squashing Day and to the institutions that employ them for supporting their efforts to improve Evergreen for everyone.

Although Bug Squashing Day is over, the bug wrangling, fixing and testing doesn’t need to end. Sandboxes will continue to be available to the community beyond Bug Squashing Day. Anyone interested in testing a bug fix can submit a request with our Sandbox Request Form.

by Kathy Lussier at August 28, 2014 02:46 AM

DuraSpace News

German DSpace User Group Meeting Set for Oct. 28

From Pascal Becker, Technische Universität Berlin

by carol at August 28, 2014 12:00 AM

Update 4: Announcing the Second Beta Release of Fedora 4.0

From David Wilcox, Fedora Product Manager
Winchester, MA  This is the fourth in a series of updates on the status of Fedora 4.0 as we move from the recently launched Beta [1] to the Production Release. The updates are structured around the goals and activities outlined in the July-December 2014 Planning document [2], and will serve to both demonstrate progress and call for action as needed. New information since the last status update is highlighted in bold text.

by carol at August 28, 2014 12:00 AM

Cineca to Provide National Institute of Education of Singapore with DSpace Services

From Michele Mennielli, Cineca

by carol at August 28, 2014 12:00 AM

August 27, 2014

OCLC Dev Network

VIAF Update Rescheduled for Friday

The VIAF update originally planned for today that includes both a Modification to VIAF application/xml Response Title Location and some additional fixes has been rescheduled for this Friday, August 29th. 

by Shelley Hostetler at August 27, 2014 09:00 PM


The Code4Lib Journal – Renewing UPEI’s Institutional Repository: New Features for an Islandora-based Environment

by mjhoy at August 27, 2014 07:15 PM


Community Rep Works with Design Student to Develop Awesome DPLA “Swag Caddy”

DPLA Community Rep Sarah Huber and student designer Jenna Mae Weiler.

DPLA Community Rep Sarah Huber and student designer Jenna Mae Weiler.

Every February I receive an all staff work email titled, “Packaging Clients Needed!” I work at Dunwoody College of Technology in Minneapolis, Minnesota. The email is sent out from the Design & Packaging Technology’s Introduction to Packaging Design class, which focuses on cardboard and paperboard packaging of products. The teacher, Pete Rivard, asks Dunwoody staff to be “clients” and for each to bring in an item for which a student can design and manufacture a package that would market the item. I presented to the class that I wanted a box of some sort to carry the DPLA swag (stickers, pins and pens) to the different places I intended to be presenting about the DPLA. None of the students had heard of the DPLA, so I spent time talking about the DPLA mission and what the site offers. I said that I wanted to be able to walk into a training, DPLA caddy in hand, ready for action. I wanted something that would catch people’s eye enough for them to ask about it, dig into it and walk away with a DPLA sticker or any of the other swag. After I presented, I had so many students ask to be my client that I had to set up interviews. It was tough to choose one person, but the student I chose, Jenna Mae Weiler, came to me with several ideas that I thought were promising.

Jenna and I set up our first appointment to discuss options. She came with three different detailed drawings. One was of a small, portable card catalog. The card catalog had drawers to store the different swag. Then there was a book that opened with compartments to hold different items. The last was a simple, modern looking box that opened to have a banner with the DPLA website logo and drawers beneath it that people could open to get the pens, stickers and pins. Once folded down, handles could be secured to easily carry the box (soon to be called DPLA caddy) around.

DPLA "Swag Caddy" designed and developed by Jenna Mae Weiler.

DPLA “Swag Caddy” designed and developed by Jenna Mae Weiler.

Our following meetings revolved around print and color. Kenny Whitebloom, my staff contact at DPLA, was happy to provide the DPLA logo and to hear about our project. I didn’t want Jenna or I to misrepresent DPLA, because I really did want to carry it with me to DPLA talks and trainings. Jenna was able to match the colors of the DPLA website and the font. We both agreed how much we liked the aesthetic of the site, remarking on the simplicity and clean lines of the design. We wanted to remain in that mindset. Another goal we set was that if any other DPLA Community Reps liked it, could Jenna make a design that could be shipped flat to community reps, and with instructions, they could put it together themselves.

Well, Jenna set to work and she probably thought I was just a dotting grandmother, because I didn’t have a single criticism at any stage of the process. I truly thought her design work was fantastic. I just kept saying, “I love it!”

DPLA "Swag Caddy" designed and developed by Jenna Mae Weiler.

DPLA “Swag Caddy” designed and developed by Jenna Mae Weiler.

Jenna designed the structure of the caddy in Esko’s ArtiosCAD and did the design work in Adobe’s Illustrator software, printed the box on an inkjet printer and cut the box out on our in house CAD table cutter. We presented the final product to her classmates and the other clients about our process. Jenna gave the details through a PowerPoint presentation and I just kept saying, “I love it.”

Kenny contacted me recently asking how my community rep work has been going, and I told him about the caddy and sent photos. He too thought it was great. So I asked if we could send one caddy assembled and one flat with assembly directions. It is rounding out the project in a great way, because now we are working with a second student who has graduated from the program, and is working in the packaging industry, and Jenna’s instructors to work on the full package which includes sending out the caddy to community reps with instructions to assemble it. The whole process has been a fun experience that has gotten the word out about DPLA in a very different way, but has also felt like a connection and relationship to DPLA through building something that is a small extension of it.

cc-by-iconAll written content on this blog is made available under a Creative Commons Attribution 4.0 International License. All images found on this blog are available under the specific license(s) attributed to them, unless otherwise noted.

by DPLA at August 27, 2014 04:00 PM

Library of Congress: The Signal

Untangling the Knot of CAD Preservation

<a href="">T-FLEX-CAD-12-Rus</a> from Wikimedia Commons

T-FLEX-CAD-12-Rus from Wikimedia Commons.

At the 2014 Society of American Archivists meeting, the CAD/BIM Taskforce held a session titled “Frameworks for the Discussion of Architectural Digital Data” to consider the daunting matter of archiving computer-aided design and Building Information Modelling files. This was the latest evidence that — despite some progress in standards and file exchange — archivists and the international digital preservation community at large are trying to get a firm grasp on the slippery topic of preserving CAD files.

CAD is a suite of design tools, software for 3-D modelling, simulation and testing. It is used in architecture, geographic information systems, archaeology, survey data, geophysics, 3-D printing, engineering, gaming, animation and just about any situation that requires a 3-D virtual model. It comprises geometry, intricate calculations, vector graphics and text.

The data in CAD files resides in structurally complex inter-related layers that are capable of much more than displaying models.  For example, engineers can calculate stress and load, volume and weight for specific materials, the center of gravity and visualize cause-and-effect.  Individual CAD files often relate and link to other CAD files to form a greater whole, such as parts of a machine or components in a building. Revisions are quick in CAD’s virtual environment, compared to paper-based designs, so CAD has eclipsed paper as the tool of choice for 3-D modelling.

CAD files — particularly as used by scientists, engineers and architects — can contain vital information. Still, CAD files are subject to the same risk that threatens all digital files, major and minor: failure of accessibility — being stuck on obsolete storage media or dependent on a specific program, in a specific version, on a specific operating system. In particular, the complexity and range of specifications and formats for CAD files make them even more challenging than many other kinds of born-digital materials.


Skylab from NASA.

As for CAD software, commerce thrives on rapid technological change, new versions of software and newer and more innovative software companies. This is the natural evolution of commercial technology. But each new version and type of CAD software increases the risk of software incompatibility and inaccessibility for CAD files created in older versions of software. Vendors, of course, do not have to care about that; the business of business is business — though, in fairness, businesses may continually surpass customer needs and expectations by creating newer and better features. That said, many CAD customers have long realized that it is important — and may someday be crucial — to be able to archive and access older CAD files.

Design for a Flying Machine by Leonardo da Vinci

Design for a Flying Machine by Leonardo da Vinci

Building Information Modelling files and Project Lifecycle Management files also require a digital-preservation solution. BIM and PLM integrate all the information related to a major project, not only the CAD files but also the financial, legal, email and other ancillary files.

Part of a digital preservation workflow is compatibility and portability between systems. So one of the most significant standards for the exchange of product manufacturing information of CAD files is ISO 10303, known as the “Standard for the Exchange of Product model data” or STEP. Michael J. Pratt, of the National Institute of Standards and Technology, wrote in 2001 (pdf), “the development of STEP has been one of the largest efforts ever undertaken by ISO.”

The types of systems that use STEP are CAD, computer-aided engineering and computer-aided manufacturing.

<a href="">CAD rendering of Sialk ziggurat based on archeological evidence</a> from Wikimedia Commons.

CAD rendering of Sialk ziggurat based on archeological evidence from Wikimedia Commons.

Some simple preservation information that comes up repeatedly is to save the original CAD file in its original format. Save the hardware, software and system that runs it too, if you can. Save any metadata or documentation and document a one-to-one relationship with each CAD file’s plotted sheet.

The usual digital-preservation practice applies, which is to organize the files, backup the files to a few different storage devices and put one in a geographically remote location in case of disaster, and every seven years or so migrate to a current storage medium to keep the files accessible. But what should you preserve? And why? Given the complexity of these files, and recognizing that at its heart digital preservation is an attempt to hedge our bets about mitigating a range of potential risks, it is advisable to try to generate a range of derivative files which are likely to be more viable in the future. That is, keep the originals, and try to also export to other formats that may lose some functionality and properties but which are far more likely to be able to be opened in the future.  The final report from the FACADE project makes this recommendation: ”For 3-D CAD models we identified the need for four versions with distinct formats to insure long-term preservation. These are:

1. Original (the originally submitted version of the CAD model)
2. Display (an easily viewable format to present to users, normally 3D PDF)
3. Standard (full representation in preservable standard format, normally IFC or STEP)
4. Dessicated (simple geometry in a preservable standard format, normally IGES)”

CAD files now join paper files — such as drawings, plans, elevations, blueprints, images, correspondence and project records — in institutional archives and firms’ libraries. In addition to the ongoing international work on standards and preservation, there needs to be a dialog with the design-software industry to work toward creating archival CAD files in an open-preservation format. Finally, trained professionals need to make sense of the CAD files to better archive them and possibly get them up and running again for production, academic, legal or other professional purposes. That requires knowledge of CAD software, file construction and digital preservation methods.

Either CAD users need better digital curatorial skills to manage their CAD archives or digital archivists need better CAD skills to curate the archives of CAD users. Or both.

by Mike Ashenfelder at August 27, 2014 03:42 PM

ACRL TechConnect

Bootstrap Responsibly

Bootstrap is the most popular front-end framework used for websites. An estimate by meanpath several months ago sat it firmly behind 1% of the web - for good reason: Bootstrap makes it relatively painless to puzzle together a pretty awesome plug-and-play, component-rich site. Its modularity is its key feature, developed so Twitter could rapidly spin-up internal microsites and dashboards.

Oh, and it’s responsive. This is kind of a thing. There’s not a library conference today that doesn’t showcase at least one talk about responsive web design. There’s a book, countless webinars, courses, whole blogs dedicated to it (ahem), and more. The pressure for libraries to have responsive, usable websites can seem to come more from the likes of us than from the patronbase itself, but don’t let that discredit it. The trend is clear and it is only a matter of time before our libraries have their mobile moment.

Library websites that aren’t responsive feel dated, and more importantly they are missing an opportunity to reach a bevy of mobile-only users that in 2012 already made up more than a quarter of all web traffic. Library redesigns are often quickly pulled together in a rush to meet the growing demand from stakeholders, pressure from the library community, and users. The sprint makes the allure of frameworks like Bootstrap that much more appealing, but Bootstrapped library websites often suffer the cruelest of responsive ironies:

They’re not mobile-friendly at all.

Assumptions that Frameworks Make

Let’s take a step back and consider whether using a framework is the right choice at all. A front-end framework like Bootstrap is a Lego set with all the pieces conveniently packed. It comes with a series of templates, a blown-out stylesheet, scripts tuned to the environment that let users essentially copy-and-paste fairly complex web-machinery into being. Carousels, tabs, responsive dropdown menus, all sorts of buttons, alerts for every occasion, gorgeous galleries, and very smart decisions made by a robust team of far-more capable developers than we.

Except for the specific layout and the content, every Bootstrapped site is essentially a complete organism years in the making. This is also the reason that designers sometimes scoff, joking that these sites look the same. Decked-out frameworks are ideal for rapid prototyping with a limited timescale and budget because the design decisions have by and large already been made. They assume you plan to use the framework as-is, and they don’t make customization easy.

In fact, Bootstrap’s guide points out that any customization is better suited to be cosmetic than a complete overhaul. The trade-off is that Bootstrap is otherwise complete. It is tried, true, usable, accessible out of the box, and only waiting for your content.

Not all Responsive Design is Created Equal

It is still common to hear the selling point for a swanky new site is that it is “responsive down to mobile.” The phrase probably rings a bell. It describes a website that collapses its grid as the width of the browser shrinks until its layout is appropriate for whatever screen users are carrying around. This is kind of the point – and cool, as any of us with a browser-resizing obsession could tell you.

Today, “responsive down to mobile” has a lot of baggage. Let me explain: it represents a telling and harrowing ideology that for these projects mobile is the afterthought when mobile optimization should be the most important part. Library design committees don’t actually say aloud or conceive of this stuff when researching options, but it should be implicit. When mobile is an afterthought, the committee presumes users are more likely to visit from a laptop or desktop than a phone (or refrigerator). This is not true.

See, a website, responsive or not, originally laid out for a 1366×768 desktop monitor in the designer’s office, wistfully depends on visitors with that same browsing context. If it looks good in-office and loads fast, then looking good and loading fast must be the default. “Responsive down to mobile” is divorced from the reality that a similarly wide screen is not the common denominator. As such, responsive down to mobile sites have a superficial layout optimized for the developers, not the user.

In a recent talk at An Event Apart–a conference–in Atlanta, Georgia, Mat Marquis stated that 72% of responsive websites send the same assets to mobile sites as they do desktop sites, and this is largely contributing to the web feeling slower. While setting img { width: 100%; } will scale media to fit snugly to the container, it is still sending the same high-resolution image to a 320px-wide phone as a 720px-wide tablet. A 1.6mb page loads differently on a phone than the machine it was designed on. The digital divide with which librarians are so familiar is certainly nowhere near closed, but while internet access is increasingly available its ubiquity doesn’t translate to speed:

  1. 50% of users ages 12-29 are “mostly mobile” users, and you know what wireless connections are like,
  2. even so, the weight of the average website ( currently 1.6mb) is increasing.

Last December, analysis of data from pagespeed quantiles during an HTTP Archive crawl tried to determine how fast the web was getting slower. The fastest sites are slowing at a greater rate than the big bloated sites, likely because the assets we send–like increasingly high resolution images to compensate for increasing pixel density in our devices–are getting bigger.

The havoc this wreaks on the load times of “mobile friendly” responsive websites is detrimental. Why? Well, we know that

eep O_o.

A Better Responsive Design

So there was a big change to Bootstrap in August 2013 when it was restructured from a “responsive down to mobile” framework to “mobile-first.” It has also been given a simpler, flat design, which has 100% faster paint time – but I digress. “Mobile-first” is key. Emblazon this over the door of the library web committee. Strike “responsive down to mobile.” Suppress the record.

Technically, “mobile-first” describes the structure of the stylesheet using CSS3 Media Queries, which determine when certain styles are rendered by the browser.

.example {
  styles: these load first;

@media screen and (min-width: 48em) {

  .example {

    styles: these load once the screen is 48 ems wide;



The most basic styles are loaded first. As more space becomes available, designers can assume (sort of) that the user’s device has a little extra juice, that their connection may be better, so they start adding pizzazz. One might make the decision that, hey, most of the devices less than 48em (720px approximately with a base font size of 16px) are probably touch only, so let’s not load any hover effects until the screen is wider.


In a literal sense, mobile-first is asset management. More than that, mobile-first is this philosophical undercurrent, an implicit zen of user-centric thinking that aligns with libraries’ missions to be accessible to all patrons. Designing mobile-first means designing to the lowest common denominator: functional and fast on a cracked Blackberry at peak time; functional and fast on a ten year old machine in the bayou, a browser with fourteen malware toolbars trudging through the mire of a dial-up connection; functional and fast [and beautiful?] on a 23″ iMac. Thinking about the mobile layout first makes design committees more selective of the content squeezed on to the front page, which makes committees more concerned with the quality of that content.

The Point

This is the important statement that Bootstrap now makes. It expects the design committee to think mobile-first. It comes with all the components you could want, but they want you to trim the fat.

Future Friendly Bootstrapping

This is what you get in the stock Bootstrap:

That’s almost 250kb of website. This is like a browser eating a brick of Mackinac Island Fudge – and this high calorie bloat doesn’t include images. Consider that if the median load time for a 700kb page is 10-12 seconds on a phone, half that time with out-of-the-box Bootstrap is spent loading just the assets.

While it’s not totally deal-breaking, 100kb is 5x as much CSS as an average site should have, as well as 15%-20% of what all the assets on an average page should weigh. Josh Broton

To put this in context, I like to fall back on Ilya Girgorik’s example comparing load time to user reaction in his talk “Breaking the 1000ms Time to Glass Mobile Barrier.” If the site loads in just 0-100 milliseconds, this feels instant to the user. By 100-300ms, the site already begins to feel sluggish. At 300-1000ms, uh – is the machine working? After 1 second there is a mental context switch, which means that the user is impatient, distracted, or consciously aware of the load-time. After 10 seconds, the user gives up.

By choosing not to pair down, your Bootstrapped Library starts off on the wrong foot.

The Temptation to Widgetize

Even though Bootstrap provides modals, tabs, carousels, autocomplete, and other modules, this doesn’t mean a website needs to use them. Bootstrap lets you tailor which jQuery plugins are included in the final script. The hardest part of any redesign is to let quality content determine the tools, not the ability to tabularize or scrollspy be an excuse to implement them. Oh, don’t Google those. I’ll touch on tabs and scrollspy in a few minutes.

I am going to be super presumptuous now and walk through the total Bootstrap package, then make recommendations for lightening the load.


Transitions.js is a fairly lightweight CSS transition polyfill. What this means is that the script checks to see if your user’s browser supports CSS Transitions, and if it doesn’t then it simulates those transitions with javascript. For instance, CSS transitions often handle the smooth, uh, transition between colors when you hover over a button. They are also a little more than just pizzazz. In a recent article, Rachel Nabors shows how transition and animation increase the usability of the site by guiding the eye.

With that said, CSS Transitions have pretty good browser support and they probably aren’t crucial to the functionality of the library website on IE9.

Recommendation: Don’t Include.


“Modals” are popup windows. There are plenty of neat things you can do with them. Additionally, modals are a pain to design consistently for every browser. Let Bootstrap do that heavy lifting for you.

Recommendation: Include


It’s hard to conclude a library website design committee without a lot of links in your menu bar. Dropdown menus are kind of tricky to code, and Bootstrap does a really nice job keeping it a consistent and responsive experience.

Recommendation: Include


If you have a fixed sidebar or menu that follows the user as they read, scrollspy.js can highlight the section of that menu you are currently viewing. This is useful if your site has a lot of long-form articles, or if it is a one-page app that scrolls forever. I’m not sure this describes many library websites, but even if it does, you probably want more functionality than Scrollspy offers. I recommend jQuery-Waypoints - but only if you are going to do something really cool with it.

Recommendation: Don’t Include


Tabs are a good way to break-up a lot of content without actually putting it on another page. A lot of libraries use some kind of tab widget to handle the different search options. If you are writing guides or tutorials, tabs could be a nice way to display the text.

Recommendation: Include


Tooltips are often descriptive popup bubbles of a section, option, or icon requiring more explanation. Tooltips.js helps handle the predictable positioning of the tooltip across browsers. With that said, I don’t think tooltips are that engaging; they’re sometimes appropriate, but you definitely use to see more of them in the past. Your library’s time is better spent de-jargoning any content that would warrant a tooltip. Need a tooltip? Why not just make whatever needs the tooltip more obvious O_o?

Recommendation: Don’t Include


Even fancier tooltips.

Recommendation: Don’t Include


Alerts.js lets your users dismiss alerts that you might put in the header of your website. It’s always a good idea to give users some kind of control over these things. Better they read and dismiss than get frustrated from the clutter.

Recommendation: Include


The collapse plugin allows for accordion-style sections for content similarly distributed as you might use with tabs. The ease-in-ease-out animation triggers motion-sickness and other aaarrghs among users with vestibular disorders. You could just use tabs.

Recommendation: Don’t Include


Button.js gives a little extra jolt to Bootstrap’s buttons, allowing them to communicate an action or state. By that, imagine you fill out a reference form and you click “submit.” Button.js will put a little loader icon in the button itself and change the text to “sending ….” This way, users are told that the process is running, and maybe they won’t feel compelled to click and click and click until the page refreshes. This is a good thing.

Recommendation: Include


Carousels are the most popular design element on the web. It lets a website slideshow content like upcoming events or new material. Carousels exist because design committees must be appeased. There are all sorts of reasons why you probably shouldn’t put a carousel on your website: they are largely inaccessible, have low engagement, are slooooow, and kind of imply that libraries hate their patrons.

Recommendation: Don’t Include.


I’m not exactly sure what this does. I think it’s a fixed-menu thing. You probably don’t need this. You can use CSS.

Recommendation: Don’t Include

Now, Don’t You Feel Better?

Just comparing the bootstrap.js and bootstrap.min.js files between out-of-the-box Bootstrap and one tailored to the specs above, which of course doesn’t consider the differences in the CSS, the weight of the images not included in a carousel (not to mention the unquantifiable amount of pain you would have inflicted), the numbers are telling:

File Before After
bootstrap.js 54kb 19kb
bootstrap.min.js 29kb 10kb

So, Bootstrap Responsibly

There is more to say. When bouncing this topic around twitter awhile ago, Jeremy Prevost pointed out that Bootstrap’s minified assets can be GZipped down to about 20kb total. This is the right way to serve assets from any framework. It requires an Apache config or .htaccess rule. Here is the .htaccess file used in HTML5 Boilerplate. You’ll find it well commented and modular: go ahead and just copy and paste the parts you need. You can eke out even more performance by “lazy loading” scripts at a given time, but these are a little out of the scope of this post.

Here’s the thing: when we talk about having good library websites we’re mostly talking about the look. This is the wrong discussion. Web designs driven by anything but the content they already have make grasping assumptions about how slick it would look to have this killer carousel, these accordions, nifty tooltips, and of course a squishy responsive design. Subsequently, these responsive sites miss the point: if anything, they’re mobile unfriendly.

Much of the time, a responsive library website is used as a marker that such-and-such site is credible and not irrelevant, but as such the website reflects a lack of purpose (e.g., “this website needs to increase library-card registration). A superficial understanding of responsive webdesign and easy-to-grab frameworks entail that the patron is the least priority.


About Our Guest Author :

Michael Schofield is a front-end librarian in south Florida, where it is hot and rainy – always. He tries to do neat things there. You can hear him talk design and user experience for libraries on LibUX.

by Michael Schofield at August 27, 2014 01:00 PM

In the Library, With the Lead Pipe

Call for Social Media Editor

In the Library with the Lead Pipe is seeking applications for a Social Media Editor. This volunteer position will serve on the Lead Pipe Editorial Board for a two-year term of service.

Lead Pipe is an open access, open peer reviewed journal founded and run by an international team of librarians working in various types of libraries. In addition to publishing articles and editorials by Editorial Board members, Lead Pipe publishes articles by authors representing diverse perspectives including educators, administrators, library support staff, technologists, and community members. Lead Pipe intends to help improve communities, libraries, and professional organizations. Our goal is to explore new ideas and start conversations, to document our concerns and argue for solutions.

The Lead Pipe Editorial Board is committed to collegiality and consensus decision-making. Applicants should be prepared to participate in discussions that may be forthright and frank, but always respectful and solution-focused. For many of us, the work we do for Lead Pipe is among our most professionally rewarding, and even though we interact primarily via email and a monthly hangout, we have grown to treasure the relationships we’ve formed with each other.

Lead Pipe currently has a social media presence on Facebook, Twitter, and Google+ and seeks to improve its efficacy within these venues.

The Social Media Editor will be considered an equal member of the Lead Pipe Editorial Board and will be given the opportunity to engage in other Lead Pipe Editorial Board responsibilities, such as editing articles and recruiting authors.

The expected time commitment is approximately 10-20 hours per month.


To be considered for this position, please send a statement of interest, along with your name and email address to Your statement should be succinct and should describe your relevant experience as well as at least one idea for improving Lead Pipe‘s current social media presence. We want you to demonstrate that you’ve looked at our channels and thought critically about them, and that you have a coherent approach or philosophy regarding social media for organizations. In addition, if you have one, be sure to link to your online portfolio or any social media presence you manage.

This position will remain open until filled with priority given to applications received prior to Wednesday, September 24th, 2014.

Any questions may be directed to

Many thanks to Nicole Helregel from Hack Library School for reviewing!

by Ellie Collier at August 27, 2014 10:00 AM

August 26, 2014

Casey Bisson

A/B Split Testing Calculators

Mixpanel’s A/B testing calculator is a competent performer and valuable tool:

mixpanel ab test calculator

Thumbtack’s split testing calculator, however, is a surprise standout:

thumbtack ab test calculator

That their code is in Github is especially delightful.

by Casey Bisson at August 26, 2014 10:45 PM

Tara Robertson

letter to the Vancouver Public Library Board

I am writing to urge you to reconsider the changes in the Public Internet Use policy that the Board recently passed. These are bad policy changes that erode intellectual freedom, are problematic for library workers and are harmful to libraries. I have many concerns both as a library user and as a librarian.

I served as the chair of the BC Library Association’s Intellectual Freedom Committee from 2006-2008, have blogged about intellectual freedom issues in libraries for 8 years and sit on an editorial committee for an encyclopedia on intellectual freedom for libraries.

According to the VPL’s 2013 Annual Report there were 1.3 million internet sessions and 1.1 wireless sessions. The management report cites 31 complaints out of a total 2.6 million internet sessions. This is not enough of a problem to justify a drastic policy change.

I appreciate that the management report dated July 17, 2014 references the Canadian Library Association’s Statement on Intellectual Freedom and talks about VPL’s commitment to this core library value. This policy does not “guarantee and facilitate access to all expressions of knowledge and intellectual activity”, in fact it erodes these freedoms. The phrase “explicit sexual images” is highly problematic and extremely vague. Who decides what is sexually explicit? A colleague at a public library told me about a complaint from a patron about another patron who was apparently looking at pornography. This person turned out to be watching a online video of childbirth.

It seems like there is confusion about what intellectual freedom looks like online versus the library’s traditional print collections. If someone was to read an ebook version of the graphic novel Lost Girls on a tablet device, or search for online information about sexual health or human sexuality, or watch a video of well known contemporary performance artist Annie Sprinkle–would VPL staff or security come and kick them out of the library? While some people might find these topics offensive, they are all legitimate information needs.

Reading the current practice of what happens when someone reports seeing something offensive really troubles me. The management report states that either staff or a security guard asks the user to stop viewing the inappropriate material, if the library user does not comply they are asked to leave the library. I’m concerned that there isn’t an evaluation of whether the material is acceptable or not. Also, having a security guard come up to you and possibly kicking you out of the library is a scary and intimidating experience, especially for many socially excluded individuals.

The management report describes this as being a problem primarily at the Central library and Mount Pleasant branch. This sounds like a design challenge: “how do you design public spaces so that library users’ freedom to access does not impact staff member’s freedom to work without seeing things that offend them?” As the Central branch has moved to a roving reference model, perhaps it is time to rethink how the seating areas and computers are set up.

Again, I ask you to reconsider this policy decision.

by Tara Robertson at August 26, 2014 10:09 PM

Dave Pattern

Hello world!

Welcome to The Hitchcock Zone Sites. This is your first post. Edit or delete it, then start blogging!

by Dave Pattern at August 26, 2014 07:26 PM

OCLC Dev Network

Additional Fixes in Tomorrow's VIAF Update

In addition to the Modifcation to VIAF application/xml Response Title Location we told you about a couple of weeks ago, tomorrow's VIAF update will also include a couple of bonus fixes:

by Shelley Hostetler at August 26, 2014 06:00 PM