August 28, 2014


Linked Data Survey results – Who’s doing it

LOD_Cloud_Diagram_as_of_September_2011 wikimedia.orgOCLC Research conducted an international linked data survey for implementers between 7 July and 15 August 2014. This is the first post in the series reporting the results. 

We received 92 responses to the international linked data survey conducted between 7 July and 15 August 2014. (So who is using linked data? And for what?) OCLC colleagues also responded to the survey and reported on 6 projects/services for comparison. So now we have some answers to share!

Although the survey was designed specifically for implementers of linked data projects/services, 26 of the 92 responses said they had not implemented nor were implementing a linked data project. Seven of them plan on implementing a linked data project within the next two years and 10 are planning to apply for funding to implement one. Some of them also pointed to interesting linked data projects they’re tracking, which included respondents to the survey (Oslo Public Library, BIBFRAME, Europeana, Yale Center for British Art).

The remaining 66 responses reported implementing 160 linked data projects/services; 68 of them are described. 23 of the projects consume linked data; 3 publish linked data; 42 both consume and publish linked data.  We have a good international representation. Just over half are linked data projects/services in the US, but the rest are from 14 countries: Australia, Canada, Czech Republic, France, Germany, Ireland, Italy, The Netherlands, Norway, Singapore, South Korea, Spain, Switzerland, and the UK. I’ve appended the list of the 44 institutions that took the time to describe their linked data projects/services below (several described multiple projects).

Of the 68 projects:

27 are not yet in production;
9 have been in production for less than one year;
12 have been in production for more than one year but less than two years;
20 have been in production for more than two years.

Four of the projects are “private”, for that institution’s use only. Most projects/services that have been implemented have been receiving an average of fewer than 1,000 requests a day over the last six months. The most heavily used non-OCLC linked data datasets as measured by average number of requests a day:

For comparison, these are the responses for six OCLC linked data projects/services:

Project/Service Consume or Publish LD How long in production Av. no. of requests/day
Dewey Publish More than two years 10,000 – 50,000/day *
FAST Both consume & publish More than two years 1,000 – 10,000/day
ISNI Publish Less than one year Fewer than 1,000/day
VIAF Both consume & publish More than two years More than 100,000/day Both consume & publish More than two years 16 million/day ** Works Both consume & publish Less than one year More than 100,000/day

*  Reflects only HTML pages served, not requests to the RDF data.
**  All pages include linked data (RDFa)

Since so many projects have not been implemented or implemented relatively recently, only 41 could assess whether the linked data project/service was successful in achieving its desired outcome. More than half (28) said it was successful or mostly successful.  Success measures included “increasing international use”, improved workflow, moving from a pilot to production, ability to link data by relationships, just making the data available as linked data, and professional development of staff. Several noted the need for better metrics to assess the service value, the challenges of harmonizing the data, and the lack of identifiers to link headings to.

Parts of institution involved: Most of the respondents reported that multiple units of the institution were involved in their linked data project/service. When only one part of an institution was involved, it was more likely the research and development group (4 projects, but cited in 15 projects total.)  Library and/or archives were involved the most, cited as being involved in 43 projects. Metadata services was the next most-involved, in 34 projects. Digital library services and library systems/IT or campus IT were involved in a third or more of the projects. Seventeen involved digital humanities and/or faculty in academic departments. The University College Dublin’s Digital Library involved the most units of all the described projects, with 10: library, archives, metadata services, digital library services, library systems/information technology, research and development group, computer science department, digital humanities, campus museum, faculty in academic departments.Since so many projects have not been implemented or implemented relatively recently, only 41 could assess whether the linked data project/service was successful in achieving its desired outcome. More than half (28) said it was successful or mostly successful.  Success measures included “increasing international use”, improved workflow, moving from a pilot to production, ability to link data by relationships, just making the data available as linked data, and professional development of staff. Several noted the need for better metrics to assess the service value, the challenges of harmonizing the data, and the lack of identifiers to link headings to.

External groups involved: Seventeen of the projects did not involve any external groups or organizations.  Thirteen were part of a national and/or international collaboration. Twenty-one involved other libraries or archives and 18 other universities or research institutions. Thirteen involved a systems vendor or a corporation/company.  Eight collaborated with other consortium members; eight were part of a discipline-specific collaboration. Europeana listed the most external groups, with 6: other libraries/archives, other universities/research institutions, other members of their consortium, part of a discipline-specific collaboration, part of an international collaboration, and a large network of experts working in cultural heritage.

Staffing:  Almost all of the institutions that have implemented or are implementing linked data projects/services (55) have added linked data to the responsibilities of current staff; only 10 have not. Nine have staff dedicated to linked data projects (five of them in conjunction with adding linked data to the responsibilities of current staff). Five are adding or have added new staff with linked data expertise; ten are adding or have added temporary staff with linked data expertise; and seven are hiring or have hired external consultants with linked data expertise.

Funding: Twenty-nine of the projects received grant funding to implement linked data; most projects (46) were covered by the library/archive or the parent institution. Three linked data projects received funding support from partner institutions; four linked data projects were privately funded.

Linked data survey respondents describing linked data projects/services

American Antiquarian Society
American Numismatic Society
Archaeology Data Service (UK)
Biblioteca della Camera dei deputati (Italy)
British Library
British Museum
Carleton College
Charles University in Prague
Colorado College
Colorado State University
Cornell University
Data Archiving and Networked Services, Royal Netherlands Academy of Arts and Sciences
Digital Public Library of America
Europeana Foundation
Fundacción Ignacio Larramendi (Spain)
Goldsmiths’ College
Library of Congress
Minnesota Historical Society
Missoula Public Library
National Library Board (NLB) of Singapore
National Library of Medicine
North Carolina State University Libraries
NTNU (Norwegian University of Science and Technology) University Library
Oslo Public Library
Public Record Office, Victoria (Australia)
Queen’s University Library (Canada)
Stanford University
The University of Texas at Austin
Tresoar (Leeuwarden – The Netherlands)
University College Dublin (Ireland)
University College London (UCL)
University of Alberta Libraries
University of Bergen Library (Norway)
University of British Columbia
University of California-Irvine
University of Illinois at Urbana-Champaign
University of North Texas
University of Oxford
University of Pennsylvania Libraries
Western Michigan University
Yale Center for British Art

About Karen Smith-Yoshimura

Karen Smith-Yoshimura, program officer, works on topics related to renovating descriptive and organizing practices with a focus on large research libraries and area studies requirements.

by Karen at August 28, 2014 09:11 PM

FOSS4Lib Recent Releases

Koha - 3.16.3

Release Date: 
Wednesday, August 27, 2014

Last updated August 28, 2014. Created by David Nind on August 28, 2014.
Log in to edit this page.

This is a maintenance release. It also includes a major enhancement to the circulation pages.

by David Nind at August 28, 2014 08:18 PM


The Nicollet County Historical Society and the Digital Library of America

The Nicollet County Historical Society is located in St. Peter, Minnesota.  Like other such societies, we seek to tell the story of our county through the use of exhibits, books, newspapers, documents, photographs, artifacts, and other means.  We are based in the Treaty Site History Center at 1851 North Minnesota Avenue, and we own the 1871 E. St. Julien Cox Historic Home in St. Peter.

We were among the first of the state’s historical societies to participate in the Minnesota Digital Library’s Minnesota Reflections project. We have posted a large number of photographs that span the years from 1861 to 1998.  Other digital highlights from our collections include prints of Dakota life by Seth Eastman, maps of ghost towns in Nicollet County, the muster roll of Company E of the First Mounted Rangers, letters from the Dakota War, 5×7 inch photographic negatives for making postcards, a ledger containing commercial letterheads and standardized forms and documents from the 1800s, the St. Peter School Board’s meeting minutes from 1865 to 1899, and four atlases or plat books for the years 1885, 1899, 1913, and 1927.  It is of interest that the names of three men who served as governors of Minnesota appear on page one of the school board’s minutes. All of this content has also been shared with the DPLA in order to provide national exposure to our unique materials.

If you've ever seen a presentation by DPLA staff, you may recall this image from the Nicollet County Historical Society. Rudolph Volk and Martin Klein in an old automobile, east of St. Peter, Minnesota, ca.1907. Nicollet County Historical Society via the Minnesota Digital Library.

If you’ve ever seen a presentation by DPLA staff, you may recall this image of a Maxwell automobile from the Nicollet County Historical Society. Rudolph Volk and Martin Klein in an old automobile, East of St. Peter, Minnesota, ca.1907.

The Minnesota Digital Library has been extremely generous in their support of our work during several years of participation in the Minnesota Reflections project.  Their wonderful assistance has been very greatly appreciated!

Having a large and diverse amount of material on the Internet has greatly improved our ability to serve our patrons.  Requests for images are numerous, especially for photographs of people, buildings, and places.  Images have been sent to many locations throughout Minnesota and other states, and to other countries as well.

When the local grocery store was being remodeled they asked us if we could provide a number of images showing how the city of St. Peter has evolved over time.  Because so many images are available online through both the Minnesota Reflections and DPLA websites, it was very easy for the store representatives to find images that fit their needs.  Today, large reprints of several photographs are on display throughout the store.  We also provided the owners of a local restaurant with images of St. Peter that are now featured prominently on their walls and menus.  Such images can be seen in several other businesses in the community. Having our historic images displayed throughout our county has provided the society with additional opportunities to share the history of Nicollet County and to highlight the important work we do in preserving that story.

Being involved in the MDL and the DPLA is a golden opportunity.  The items that we submit are digitized and placed on the Internet.  We receive high-resolution copies of all of the submitted items, which we can store off-site as backup copies of our material.  This collaborative project vastly increases attention and access to our unique holdings.

Featured image credit: Detail of Valley of the St. Peters, Minnesota, 1941-1855. Eastman, Seth, 1808-1875. Nicollet County Historical Society via the Minnesota Digital Library.

cc-by-iconAll written content on this blog is made available under a Creative Commons Attribution 4.0 International License. All images found on this blog are available under the specific license(s) attributed to them, unless otherwise noted.

by Amy Rudersdorf at August 28, 2014 06:00 PM

Library of Congress: The Signal

Perpetual Access and Digital Preservation at #SAA14

A panel discussion at the SAA 2014 conference. Photo credit: Trevor Owens.

A panel discussion at the SAA 2014 conference. Photo credit: Trevor Owens.

I had the distinct pleasure of moderating the opening plenary session of the Joint Annual Meeting of COSA, NAGARA and SAA in Washington D.C. in early August. The panel was on the “state of access,” and I shared the dais with David Cuillier, an Associate Professor and Director of the University of Arizona School of Journalism, as well as the president of the Society of Professional Journalists; and Miriam Nisbet, the Director of the Office of Government Information Services at the National Archives and Records Administration.

The panel was a great opportunity to tease out the spaces between the politics of “open government” and the technologies of “open data” but our time was much too short and we had to end just when the panelists were beginning to get to the juicy stuff.

There were so many more places we could have taken the conversation:

I must admit that when I think of “access” and “open information” I’m thinking almost exclusively about digital data because that’s the sandbox I play in. At past SAA conferences I’ve had the feeling that the discussion of digital preservation and stewardship issues was something that happened in the margins. At this year’s meeting those issues definitely moved to the center of the conversation.

Just look at this list of sessions running concurrently during a single hour on Thursday August 14, merely the tip of the iceberg:

There were also a large number of web archiving-related presentations and panels including the SAA Web Archiving Roundtable meeting (with highlights of the upcoming NDSA Web Archiving Survey report), the Archive-IT meetup and very full panels Friday and Saturday.

saa-innovator-owensI was also pleased to see that the work of NDIIPP and the National Digital Stewardship Alliance was getting recognized and used by many of the presenters. There were numerous references to the 2014 National Agenda for Digital Stewardship and the Levels of Preservation work and many NDSA members presenting and in the audience. You’ll find lots more on the digital happenings at SAA on the #SAA14 twitter stream.

We even got the chance to celebrate our own Trevor Owens as the winner of the SAA Archival Innovator award!

The increased focus on digital is great news for the archival profession. Digital stewardship is an issue where our expertise can really be put to good use and where we can have a profound impact. Younger practitioners have recognized this for years and it’s great that the profession itself is finally getting around to it.

by Butch Lazorchak at August 28, 2014 05:50 PM


Open Legal Committee Call: September 3, 2:00 PM Eastern

The Legal Committee will hold an open call on Wednesday, September 3 at 2:00 PM EDT. The agenda can be found below. To register, follow the link below and complete the short form.

You can register for the call by visiting


  1. The DPLA/Europeana rights metadata meeting in October
  2. Upcoming additional House Judiciary Committee hearings on copyright reform
  3. Brief review of the Authors Guild v. HathiTrust appeal outcome, and an update on the Authors Guild v. Google appeal
  4. Likelihood of additional reports from the Copyright Office, and the release of the USPTO’s “White Paper” on copyright reform, expected in early 2015
  5. Recent articles from the Berkeley Symposium on copyright reform, including papers on fair use and “permitted but paid” uses (by Jane Ginsburg), legislating digital exhaustion (by Jason Schultz and Aaron Perzanowski) and a paper on Section 108 reform by David Hansen

cc-by-iconAll written content on this blog is made available under a Creative Commons Attribution 4.0 International License. All images found on this blog are available under the specific license(s) attributed to them, unless otherwise noted.

by Kenny Whitebloom at August 28, 2014 03:43 PM

District Dispatch

“How do you plead?” Guilty on all counts for thinking E-rate is cool

#490607823 /

Phone rings. Caller asks: “What are you working on?”

Answer: “E-rate.”

Dinner conversation: “What did you talk about today, mom?”

Answer (supplied by child 2 and 3 in unison): “E-rate.”

Latest office joke: “Why is Marijke in traffic court?”

Answer: “E-rate.”

E-rate, broadband, and the goals that have guided us

For those of you following the American Library Association’s (ALA) E-rate year, we are working through the fifth major installment in a series of actions by the Federal Communications Commission (FCC), responding to the Further Notice of Proposed Rulemaking (FNPRM) issued in July as part of the E-rate Modernization Order. And, because we have been immersed in E-rate pretty steadily for a year, the topic “E-rate Modernization” is really the only answer to questions about what I do.

As we prepare for both ALA’s comments on the FNPRM (comments due to the Commission September 15—so why am I writing a blog post instead of drafting a response to questions on multi-year contracts?) and for our panel session at the 2014 Telecommunications Policy Research Conference (which will take place amazingly on September 12th), I have been reflecting over the ways in which we have engaged with the Commission, the Hill, our coalitions, our members, other library organizations, the press, and others to make strategic decisions and identify ALA’s policy positions.

If I am boastful, I can say, that we worked diligently over the last year. If I am critical, I can see a series of tipping points where we chose one path over another, which opened opportunities while closing off others. Either way, the decisions we made were in line with the goals we set for ALA at the beginning of the E-rate proceeding, which we saw then and now as an opportunity to increase the percentage of libraries with high-capacity broadband at affordable rates. The goals we set include:

Shaping what you have into what you want

Our first choice for the Commission would have been to tackle the “fiber gap” (term that has emerged in the second phase of the Modernization) among libraries before addressing the Wi-Fi- gap. However, when it became apparent that the Commission would address the lack of Wi-Fi capacity for libraries and schools first, we focused on that priority and the commitment of the Commission that this was one phase of a multi-phase process.

At that point we had to answer the question, “What could we gain for libraries in this first step while holding out for a larger pay-off in the second phase?” The interplay between this short-term and long-term strategizing colored the last stages of our advocacy at the Commission and among stakeholders, and with our members and library organizations. Now that we are officially in the second phase, our sleeves are rolled up, teeth bared and claws extended.

These thoughts, as well as “OMG, what are we doing about data that describes the costs to get libraries up to the 1 gigabit goal” ran through my head while I waited for my turn before the bench at traffic court. And why did I spend my afternoon in traffic court? Did you think this was going to be a concise blog post? Is anything E-rate related concise?

E-rate enables anywhere, anytime learning

At the height of negotiations to come to an Order—which we fully supported happening—there was significant back-and-forth among various stakeholders (each with different agendas), numerous ex parte meetings with Commission staff and long phone calls and meetings at the Washington Office. Coincidentally, my then 10th-grade son’s history class unit was on regulatory agencies, and being a teacher at heart, how could I help myself? The E-rate Modernization proceeding makes a perfect case study for a lesson on the responsibilities of federal regulatory agencies and of Congress and how good public policy is made. Poor kid, right? On the contrary.

Explaining E-rate and talking about how a relatively small player like ALA advocates effectively became an exemplary mashup of teen culture and wonky discussions. For example, what do you say to someone who shares information that is not ready to be shared? “Not cool dude.” Getting libraries included routinely in mentions of E-rate? “That’s a mission.” If, in a public document, there is language that could be interpreted such that it clearly dismissed one perspective in favor of another but not overtly, how would you describe this action? “Sneak dissing.” And to the discussion that resulted in traffic court, how does an advocate tread the fine line between passion for an issue and rational decision-making and how does an advocate prevent a personal agenda from influencing strategy on behalf of stakeholders?

Despite my New England pragmatism and Dutch stubbornness, I have a good dose of southern French exuberance, so in the heat of describing the latest battle, making an extremely important point to the 10th grader about the appalling vitriol that had emerged at the tail end of the proceeding before the July Commission vote that resulted in the Order and FNPRM and how that vitriol was unfortunately influencing policy… I may have not come to a complete stop. Result? An afternoon in traffic court. “Kiiillll” said with appropriate sighing and disbelief (reflecting the sentiment in teen-speak). This may be the only record of a moving violation caused by E-rate (“That’s a bet” or more simply, “bet!”).

So at the recommendation of the police officer issuing the ticket, I plead “guilty with an explanation.” My explanation? “E-rate is really cool.”

The post “How do you plead?” Guilty on all counts for thinking E-rate is cool appeared first on District Dispatch.

by Marijke Visser at August 28, 2014 02:58 PM

OCLC Dev Network

Software Development Practices: Telling Your User's Story

Last week was the first in a series of posts about product development practices. In that post, Shelley discussed how important it is to identify the problem that the user needs to solve. This post will describe how anyone can write and understand User Stories that articulate the problem.

User Stories are informal, simple, short and concise statements about what the user wants to accomplish. They focus entirely on what the user wants to do and deliberately avoid talking about how the technical solution will work.

by Karen Coombs at August 28, 2014 02:00 PM

Tara Robertson

Internet Use Policy across Canadian public libraries

I’ve been pretty critical of Vancouver Public Library’s new Internet Use Policy. After sending a letter to their Board I was wondering what other public library policies were like. VPL is a member of the Canadian Urban Libraries Council, so I thought it would be interesting to see what other member libraries policies were.

I put up a spreadsheet on Google Drive and got help from some other librarians (thanks Justin, Myron and Sarah for your help and translations). Here’s my initial thoughts.

VPL’s policy isn’t the worst.

Here are some things that I was a bit shocked to learn:

I was surprised at how many libraries policies include phrases like sexually explicit materials, pornography, overt sexual images. Richmond Hill Library and Regina Public Library‘s policies mention “illicit drug literature”.  A few libraries mention hate literature, hate speech or incitement to hate and hateful propaganda. A handful of libraries mention that copyright infringement is prohibited.

It was disappointing that some libraries (Bibliothèque Ville de Laval, and Guelph Public Library) don’t seem to have their internet use policies published on their website.

So many of these policies sound like the 90s. There’s a lot of language about the internet being unregulated and that some of the information on the library may not be accurate, complete, or current and there may be controversial information out there. I read the phrase “The Library is not responsible for the site content of links or secondary links from its home pages” more than once. I think that these days we accept these things as common knowledge.  Greater Victoria Public Library‘s policy states that their “website ( recommends sites that provide quality information resources for both adults and children.” This seems like a very dated way of viewing information literacy.

Toronto Public Library‘s policy is worth reading. I like that it’s written in plain English. I think they do a good job of  acknowledging that users are sharing public space  without singling out sexually explicit content:

Internet workstations are situated in public areas, and users are expected to use the Internet in accordance with this environment. All users of the Toronto Public Library, including users of the Library’s Internet services, are also expected to follow the Library’s Rules of Conduct which are designed to ensure a welcoming environment. Disruptive, threatening, or otherwise intrusive behaviour is not allowed and Library staff are authorized to take action.

I’m not sure how this policy is being applied, it could be good or a bit of a disaster. I don’t know.

by Tara Robertson at August 28, 2014 05:07 AM

Jonathan Rochkind

UIUC and Academic Freedom

Professor Steven Salaita was offered a job at the University of Illinois in Urbana-Champaign (UIUC), as associate professor of American Indian Studies, in October 2013. He resigned his previous position at Virginia Tech, and his partner also made arrangements to move with him. 

On August 1 2014, less than a month before classes were to begin, the UIUC Chancellor rescinded the offer, due to angry posts he had made on Twitter about Israel’s attack on Gaza. 

This situation seems to me to be a pretty clear assault on academic freedom. I don’t think the UIUC or it’s chancellor dispute these basic facts — Chancellor Wise’s letter and the Board of Trustees statement of support for the Chancellor claim that “The decision regarding Prof. Salaita was not influenced in any way by his positions on the conflict in the Middle East nor his criticism of Israel”, but is somewhat less direct in explaining on what grounds ‘the decision’ was made, but imply that Salaita’s tweets constituted “personal and disrespectful words or actions that demean and abuse either viewpoints themselves or those who express them,” and that this is good cause to rescind a job offer (that is, effectively fire a professor).  (Incidentally, Salaita has a proven history of excellence in classroom instruction, including respect for diverse student opinions). 

[I have questions about what constitutes "demeaning and abusing viewpoints themselves", and generally thought that "demeaning viewpoints themselves", although never one's academic peers personally, was a standard and accepted part of scholarly discourse. But anyway.]

I’ve looked through Salaita’s tweets, and am not actually sure which ones are supposed to be the ones justifying effective dismissal.   I’m not sure Chancellor Wise or the trustees are either.  The website Inside Higher Ed made an open records request and received emails indicating that pressure from U of I funders motivated the decision — there are emails from major donors and university development (fund-raising) administrators pressuring the Chancellor to get rid of Salaita. 

This raises academic freedom issues not only in relation to firing a professor because of his political beliefs; but also issues of faculty governance and autonomy, when an administrator rescinds a job offer enthusiastically made by an academic department because of pressure from funders. 

I’ve made no secret of my support for Palestinian human rights, and an end to the Israeli occupation and apartheid system.  However, I stop to consider whether I would have the same reaction if a hypothetical professor had made the same sorts of tweets about the Ukraine/Russia conflict (partisan to either side), or tweeting anti-Palestinian content about Gaza instead. I am confident I would be just as alarmed about an assault on academic freedom. However, the fact that it’s hard to imagine funders exerting concerted pressure because of a professor’s opinions on Ukraine — or a professor’s anti-Palestinian opinions — is telling about the political context here, and I think indicates that this really is about Salaita’s “positions on the conflict in the Middle East and his criticism of Israel.”

So lots of academics are upset about this. So many that I suspected, when this story first developed, the UIUC would clearly have to back down, but instead they dug in further. The American Association of University Professors (AAUP) has expressed serious concern about violations of Salaita’s academic freedom — and the academic freedom of the faculty members who selected him for hire. The AAUP also notes that they have “long objected to using criteria of civility and collegiality in faculty evaluation,” in part just because of how easy it is to use those criteria as a cover for suppression of political dissent. 

The Chronicle of Higher Ed, in a good article covering the controversy, reports that “Thousands of scholars in a variety of disciplines signed petitions pledging to avoid the campus unless it reversed its decision to rescind the job offer,” and some have already carried through on their pledge of boycott. Including David J. Blacker, director of the Legal Studies Program and a professor of Philosophy at the University of Deleware, who cancelled an appearance in a prestigious lecture series. The UIUC Education Justice project cancelled a conference due to the boycott. The executive council of the Modern Language Association has sent a letter to UIUC urging them to reconsider. 

This isn’t a partisan issue. Instead, it’s illustrative of the increasingly corporatized academy, where administrative decisions in deference to donor preferences or objections take precedence over academic freedom or faculty decisions about their own departmental hiring and other scholarly matters.  Also, the way the university was willing to rescind a job offer due to political speech after Salaita had resigned his previous position, reminds us of the general precarity of junior faculty careers, and the lack of respect and dignity faculty receive from university administration.  

A variety of disciplinary-specific open letters and boycott pledges have been started in support of Salaita.

I think librarians have a special professional responsibility to stand up for academic freedom.  

Dr. Sarah T. Roberts, a UIUC LIS alumnus and professor of Media Studies at Western University in Ontario, hosts a pledge in support of Salaita from LIS practitioners, students and scholars, with a boycott pledge to “not engage with the University of Illinois at Urbana-Champaign, including visiting the campus, providing workshops, attending conferences, delivering talks or lectures, offering services, or co-sponsoring events of any kind.”  

I’ve signed the letter, and I encourage you to consider doing so as well. I know I see at least one other signer I know from the Code4Lib community already.   I think it is important for librarians to take action to stand up for academic freedom. 

Filed under: Uncategorized

by jrochkind at August 28, 2014 04:27 AM

Eric Lease Morgan

Hundredth Psalm to the Tune of "Green Sleeves": Digital Approaches to Shakespeare's Language of Genre

Provides a set of sound arguments for the use of computers to analyze texts, and uses DocuScope as an example.

by Eric Lease Morgan ( at August 28, 2014 04:00 AM

Evergreen ILS

Bug Squashing Day Wrap-Up

We Came, We Saw, We Squashed Bugs.

The Evergreen community held its first Bug Squashing Day August 26. The day was an opportunity for the entire community to focus on bugs: confirming bugs, coding bug fixes, testing patches, and merging signed-off patches into the core code. By the end of the day, eleven bug fixes were merged into the Evergreen core code. There were also several other bugs they made forward progress as testers provided feedback and contributors created patches. You can see a synopsis of the day’s activities in our August 2014 Evergreen Bug Squashing Day Activity sheet.

Here are some highlights from the day:

Although Bug Squashing Day officially ended August 26, the momentum continued through the 27th as patches worked on during Bug Squashing Day continued to make their way to the Evergreen working repository.

Special thanks go to Blake Henderson (MOBIUS) and Thomas Berezansky (MVLC) for setting up the Sandboxes that made it easy for many in the community to test these bug fixes, and to Justin Hopkins (MOBIUS) and Jason Stephenson (MVLC) for volunteering them. The hardware for the sandboxes was provided by MOBIUS and MassLNC.

Also, a big thank you to the people listed below who participated in Bug Squashing Day and to the institutions that employ them for supporting their efforts to improve Evergreen for everyone.

Although Bug Squashing Day is over, the bug wrangling, fixing and testing doesn’t need to end. Sandboxes will continue to be available to the community beyond Bug Squashing Day. Anyone interested in testing a bug fix can submit a request with our Sandbox Request Form.

by Kathy Lussier at August 28, 2014 02:46 AM

DuraSpace News

German DSpace User Group Meeting Set for Oct. 28

From Pascal Becker, Technische Universität Berlin

by carol at August 28, 2014 12:00 AM

Update 4: Announcing the Second Beta Release of Fedora 4.0

From David Wilcox, Fedora Product Manager
Winchester, MA  This is the fourth in a series of updates on the status of Fedora 4.0 as we move from the recently launched Beta [1] to the Production Release. The updates are structured around the goals and activities outlined in the July-December 2014 Planning document [2], and will serve to both demonstrate progress and call for action as needed. New information since the last status update is highlighted in bold text.

by carol at August 28, 2014 12:00 AM

Cineca to Provide National Institute of Education of Singapore with DSpace Services

From Michele Mennielli, Cineca

by carol at August 28, 2014 12:00 AM

August 27, 2014

OCLC Dev Network

VIAF Update Rescheduled for Friday

The VIAF update originally planned for today that includes both a Modification to VIAF application/xml Response Title Location and some additional fixes has been rescheduled for this Friday, August 29th. 

by Shelley Hostetler at August 27, 2014 09:00 PM


The Code4Lib Journal – Renewing UPEI’s Institutional Repository: New Features for an Islandora-based Environment

by mjhoy at August 27, 2014 07:15 PM


Community Rep Works with Design Student to Develop Awesome DPLA “Swag Caddy”

DPLA Community Rep Sarah Huber and student designer Jenna Mae Weiler.

DPLA Community Rep Sarah Huber and student designer Jenna Mae Weiler.

Every February I receive an all staff work email titled, “Packaging Clients Needed!” I work at Dunwoody College of Technology in Minneapolis, Minnesota. The email is sent out from the Design & Packaging Technology’s Introduction to Packaging Design class, which focuses on cardboard and paperboard packaging of products. The teacher, Pete Rivard, asks Dunwoody staff to be “clients” and for each to bring in an item for which a student can design and manufacture a package that would market the item. I presented to the class that I wanted a box of some sort to carry the DPLA swag (stickers, pins and pens) to the different places I intended to be presenting about the DPLA. None of the students had heard of the DPLA, so I spent time talking about the DPLA mission and what the site offers. I said that I wanted to be able to walk into a training, DPLA caddy in hand, ready for action. I wanted something that would catch people’s eye enough for them to ask about it, dig into it and walk away with a DPLA sticker or any of the other swag. After I presented, I had so many students ask to be my client that I had to set up interviews. It was tough to choose one person, but the student I chose, Jenna Mae Weiler, came to me with several ideas that I thought were promising.

Jenna and I set up our first appointment to discuss options. She came with three different detailed drawings. One was of a small, portable card catalog. The card catalog had drawers to store the different swag. Then there was a book that opened with compartments to hold different items. The last was a simple, modern looking box that opened to have a banner with the DPLA website logo and drawers beneath it that people could open to get the pens, stickers and pins. Once folded down, handles could be secured to easily carry the box (soon to be called DPLA caddy) around.

DPLA "Swag Caddy" designed and developed by Jenna Mae Weiler.

DPLA “Swag Caddy” designed and developed by Jenna Mae Weiler.

Our following meetings revolved around print and color. Kenny Whitebloom, my staff contact at DPLA, was happy to provide the DPLA logo and to hear about our project. I didn’t want Jenna or I to misrepresent DPLA, because I really did want to carry it with me to DPLA talks and trainings. Jenna was able to match the colors of the DPLA website and the font. We both agreed how much we liked the aesthetic of the site, remarking on the simplicity and clean lines of the design. We wanted to remain in that mindset. Another goal we set was that if any other DPLA Community Reps liked it, could Jenna make a design that could be shipped flat to community reps, and with instructions, they could put it together themselves.

Well, Jenna set to work and she probably thought I was just a dotting grandmother, because I didn’t have a single criticism at any stage of the process. I truly thought her design work was fantastic. I just kept saying, “I love it!”

DPLA "Swag Caddy" designed and developed by Jenna Mae Weiler.

DPLA “Swag Caddy” designed and developed by Jenna Mae Weiler.

Jenna designed the structure of the caddy in Esko’s ArtiosCAD and did the design work in Adobe’s Illustrator software, printed the box on an inkjet printer and cut the box out on our in house CAD table cutter. We presented the final product to her classmates and the other clients about our process. Jenna gave the details through a PowerPoint presentation and I just kept saying, “I love it.”

Kenny contacted me recently asking how my community rep work has been going, and I told him about the caddy and sent photos. He too thought it was great. So I asked if we could send one caddy assembled and one flat with assembly directions. It is rounding out the project in a great way, because now we are working with a second student who has graduated from the program, and is working in the packaging industry, and Jenna’s instructors to work on the full package which includes sending out the caddy to community reps with instructions to assemble it. The whole process has been a fun experience that has gotten the word out about DPLA in a very different way, but has also felt like a connection and relationship to DPLA through building something that is a small extension of it.

cc-by-iconAll written content on this blog is made available under a Creative Commons Attribution 4.0 International License. All images found on this blog are available under the specific license(s) attributed to them, unless otherwise noted.

by DPLA at August 27, 2014 04:00 PM

Library of Congress: The Signal

Untangling the Knot of CAD Preservation

<a href="">T-FLEX-CAD-12-Rus</a> from Wikimedia Commons

T-FLEX-CAD-12-Rus from Wikimedia Commons.

At the 2014 Society of American Archivists meeting, the CAD/BIM Taskforce held a session titled “Frameworks for the Discussion of Architectural Digital Data” to consider the daunting matter of archiving computer-aided design and Building Information Modelling files. This was the latest evidence that — despite some progress in standards and file exchange — archivists and the international digital preservation community at large are trying to get a firm grasp on the slippery topic of preserving CAD files.

CAD is a suite of design tools, software for 3-D modelling, simulation and testing. It is used in architecture, geographic information systems, archaeology, survey data, geophysics, 3-D printing, engineering, gaming, animation and just about any situation that requires a 3-D virtual model. It comprises geometry, intricate calculations, vector graphics and text.

The data in CAD files resides in structurally complex inter-related layers that are capable of much more than displaying models.  For example, engineers can calculate stress and load, volume and weight for specific materials, the center of gravity and visualize cause-and-effect.  Individual CAD files often relate and link to other CAD files to form a greater whole, such as parts of a machine or components in a building. Revisions are quick in CAD’s virtual environment, compared to paper-based designs, so CAD has eclipsed paper as the tool of choice for 3-D modelling.

CAD files — particularly as used by scientists, engineers and architects — can contain vital information. Still, CAD files are subject to the same risk that threatens all digital files, major and minor: failure of accessibility — being stuck on obsolete storage media or dependent on a specific program, in a specific version, on a specific operating system. In particular, the complexity and range of specifications and formats for CAD files make them even more challenging than many other kinds of born-digital materials.


Skylab from NASA.

As for CAD software, commerce thrives on rapid technological change, new versions of software and newer and more innovative software companies. This is the natural evolution of commercial technology. But each new version and type of CAD software increases the risk of software incompatibility and inaccessibility for CAD files created in older versions of software. Vendors, of course, do not have to care about that; the business of business is business — though, in fairness, businesses may continually surpass customer needs and expectations by creating newer and better features. That said, many CAD customers have long realized that it is important — and may someday be crucial — to be able to archive and access older CAD files.

Design for a Flying Machine by Leonardo da Vinci

Design for a Flying Machine by Leonardo da Vinci

Building Information Modelling files and Project Lifecycle Management files also require a digital-preservation solution. BIM and PLM integrate all the information related to a major project, not only the CAD files but also the financial, legal, email and other ancillary files.

Part of a digital preservation workflow is compatibility and portability between systems. So one of the most significant standards for the exchange of product manufacturing information of CAD files is ISO 10303, known as the “Standard for the Exchange of Product model data” or STEP. Michael J. Pratt, of the National Institute of Standards and Technology, wrote in 2001 (pdf), “the development of STEP has been one of the largest efforts ever undertaken by ISO.”

The types of systems that use STEP are CAD, computer-aided engineering and computer-aided manufacturing.

<a href="">CAD rendering of Sialk ziggurat based on archeological evidence</a> from Wikimedia Commons.

CAD rendering of Sialk ziggurat based on archeological evidence from Wikimedia Commons.

Some simple preservation information that comes up repeatedly is to save the original CAD file in its original format. Save the hardware, software and system that runs it too, if you can. Save any metadata or documentation and document a one-to-one relationship with each CAD file’s plotted sheet.

The usual digital-preservation practice applies, which is to organize the files, backup the files to a few different storage devices and put one in a geographically remote location in case of disaster, and every seven years or so migrate to a current storage medium to keep the files accessible. But what should you preserve? And why? Given the complexity of these files, and recognizing that at its heart digital preservation is an attempt to hedge our bets about mitigating a range of potential risks, it is advisable to try to generate a range of derivative files which are likely to be more viable in the future. That is, keep the originals, and try to also export to other formats that may lose some functionality and properties but which are far more likely to be able to be opened in the future.  The final report from the FACADE project makes this recommendation: ”For 3-D CAD models we identified the need for four versions with distinct formats to insure long-term preservation. These are:

1. Original (the originally submitted version of the CAD model)
2. Display (an easily viewable format to present to users, normally 3D PDF)
3. Standard (full representation in preservable standard format, normally IFC or STEP)
4. Dessicated (simple geometry in a preservable standard format, normally IGES)”

CAD files now join paper files — such as drawings, plans, elevations, blueprints, images, correspondence and project records — in institutional archives and firms’ libraries. In addition to the ongoing international work on standards and preservation, there needs to be a dialog with the design-software industry to work toward creating archival CAD files in an open-preservation format. Finally, trained professionals need to make sense of the CAD files to better archive them and possibly get them up and running again for production, academic, legal or other professional purposes. That requires knowledge of CAD software, file construction and digital preservation methods.

Either CAD users need better digital curatorial skills to manage their CAD archives or digital archivists need better CAD skills to curate the archives of CAD users. Or both.

by Mike Ashenfelder at August 27, 2014 03:42 PM

ACRL TechConnect

Bootstrap Responsibly

Bootstrap is the most popular front-end framework used for websites. An estimate by meanpath several months ago sat it firmly behind 1% of the web - for good reason: Bootstrap makes it relatively painless to puzzle together a pretty awesome plug-and-play, component-rich site. Its modularity is its key feature, developed so Twitter could rapidly spin-up internal microsites and dashboards.

Oh, and it’s responsive. This is kind of a thing. There’s not a library conference today that doesn’t showcase at least one talk about responsive web design. There’s a book, countless webinars, courses, whole blogs dedicated to it (ahem), and more. The pressure for libraries to have responsive, usable websites can seem to come more from the likes of us than from the patronbase itself, but don’t let that discredit it. The trend is clear and it is only a matter of time before our libraries have their mobile moment.

Library websites that aren’t responsive feel dated, and more importantly they are missing an opportunity to reach a bevy of mobile-only users that in 2012 already made up more than a quarter of all web traffic. Library redesigns are often quickly pulled together in a rush to meet the growing demand from stakeholders, pressure from the library community, and users. The sprint makes the allure of frameworks like Bootstrap that much more appealing, but Bootstrapped library websites often suffer the cruelest of responsive ironies:

They’re not mobile-friendly at all.

Assumptions that Frameworks Make

Let’s take a step back and consider whether using a framework is the right choice at all. A front-end framework like Bootstrap is a Lego set with all the pieces conveniently packed. It comes with a series of templates, a blown-out stylesheet, scripts tuned to the environment that let users essentially copy-and-paste fairly complex web-machinery into being. Carousels, tabs, responsive dropdown menus, all sorts of buttons, alerts for every occasion, gorgeous galleries, and very smart decisions made by a robust team of far-more capable developers than we.

Except for the specific layout and the content, every Bootstrapped site is essentially a complete organism years in the making. This is also the reason that designers sometimes scoff, joking that these sites look the same. Decked-out frameworks are ideal for rapid prototyping with a limited timescale and budget because the design decisions have by and large already been made. They assume you plan to use the framework as-is, and they don’t make customization easy.

In fact, Bootstrap’s guide points out that any customization is better suited to be cosmetic than a complete overhaul. The trade-off is that Bootstrap is otherwise complete. It is tried, true, usable, accessible out of the box, and only waiting for your content.

Not all Responsive Design is Created Equal

It is still common to hear the selling point for a swanky new site is that it is “responsive down to mobile.” The phrase probably rings a bell. It describes a website that collapses its grid as the width of the browser shrinks until its layout is appropriate for whatever screen users are carrying around. This is kind of the point – and cool, as any of us with a browser-resizing obsession could tell you.

Today, “responsive down to mobile” has a lot of baggage. Let me explain: it represents a telling and harrowing ideology that for these projects mobile is the afterthought when mobile optimization should be the most important part. Library design committees don’t actually say aloud or conceive of this stuff when researching options, but it should be implicit. When mobile is an afterthought, the committee presumes users are more likely to visit from a laptop or desktop than a phone (or refrigerator). This is not true.

See, a website, responsive or not, originally laid out for a 1366×768 desktop monitor in the designer’s office, wistfully depends on visitors with that same browsing context. If it looks good in-office and loads fast, then looking good and loading fast must be the default. “Responsive down to mobile” is divorced from the reality that a similarly wide screen is not the common denominator. As such, responsive down to mobile sites have a superficial layout optimized for the developers, not the user.

In a recent talk at An Event Apart–a conference–in Atlanta, Georgia, Mat Marquis stated that 72% of responsive websites send the same assets to mobile sites as they do desktop sites, and this is largely contributing to the web feeling slower. While setting img { width: 100%; } will scale media to fit snugly to the container, it is still sending the same high-resolution image to a 320px-wide phone as a 720px-wide tablet. A 1.6mb page loads differently on a phone than the machine it was designed on. The digital divide with which librarians are so familiar is certainly nowhere near closed, but while internet access is increasingly available its ubiquity doesn’t translate to speed:

  1. 50% of users ages 12-29 are “mostly mobile” users, and you know what wireless connections are like,
  2. even so, the weight of the average website ( currently 1.6mb) is increasing.

Last December, analysis of data from pagespeed quantiles during an HTTP Archive crawl tried to determine how fast the web was getting slower. The fastest sites are slowing at a greater rate than the big bloated sites, likely because the assets we send–like increasingly high resolution images to compensate for increasing pixel density in our devices–are getting bigger.

The havoc this wreaks on the load times of “mobile friendly” responsive websites is detrimental. Why? Well, we know that

eep O_o.

A Better Responsive Design

So there was a big change to Bootstrap in August 2013 when it was restructured from a “responsive down to mobile” framework to “mobile-first.” It has also been given a simpler, flat design, which has 100% faster paint time – but I digress. “Mobile-first” is key. Emblazon this over the door of the library web committee. Strike “responsive down to mobile.” Suppress the record.

Technically, “mobile-first” describes the structure of the stylesheet using CSS3 Media Queries, which determine when certain styles are rendered by the browser.

.example {
  styles: these load first;

@media screen and (min-width: 48em) {

  .example {

    styles: these load once the screen is 48 ems wide;



The most basic styles are loaded first. As more space becomes available, designers can assume (sort of) that the user’s device has a little extra juice, that their connection may be better, so they start adding pizzazz. One might make the decision that, hey, most of the devices less than 48em (720px approximately with a base font size of 16px) are probably touch only, so let’s not load any hover effects until the screen is wider.


In a literal sense, mobile-first is asset management. More than that, mobile-first is this philosophical undercurrent, an implicit zen of user-centric thinking that aligns with libraries’ missions to be accessible to all patrons. Designing mobile-first means designing to the lowest common denominator: functional and fast on a cracked Blackberry at peak time; functional and fast on a ten year old machine in the bayou, a browser with fourteen malware toolbars trudging through the mire of a dial-up connection; functional and fast [and beautiful?] on a 23″ iMac. Thinking about the mobile layout first makes design committees more selective of the content squeezed on to the front page, which makes committees more concerned with the quality of that content.

The Point

This is the important statement that Bootstrap now makes. It expects the design committee to think mobile-first. It comes with all the components you could want, but they want you to trim the fat.

Future Friendly Bootstrapping

This is what you get in the stock Bootstrap:

That’s almost 250kb of website. This is like a browser eating a brick of Mackinac Island Fudge – and this high calorie bloat doesn’t include images. Consider that if the median load time for a 700kb page is 10-12 seconds on a phone, half that time with out-of-the-box Bootstrap is spent loading just the assets.

While it’s not totally deal-breaking, 100kb is 5x as much CSS as an average site should have, as well as 15%-20% of what all the assets on an average page should weigh. Josh Broton

To put this in context, I like to fall back on Ilya Girgorik’s example comparing load time to user reaction in his talk “Breaking the 1000ms Time to Glass Mobile Barrier.” If the site loads in just 0-100 milliseconds, this feels instant to the user. By 100-300ms, the site already begins to feel sluggish. At 300-1000ms, uh – is the machine working? After 1 second there is a mental context switch, which means that the user is impatient, distracted, or consciously aware of the load-time. After 10 seconds, the user gives up.

By choosing not to pair down, your Bootstrapped Library starts off on the wrong foot.

The Temptation to Widgetize

Even though Bootstrap provides modals, tabs, carousels, autocomplete, and other modules, this doesn’t mean a website needs to use them. Bootstrap lets you tailor which jQuery plugins are included in the final script. The hardest part of any redesign is to let quality content determine the tools, not the ability to tabularize or scrollspy be an excuse to implement them. Oh, don’t Google those. I’ll touch on tabs and scrollspy in a few minutes.

I am going to be super presumptuous now and walk through the total Bootstrap package, then make recommendations for lightening the load.


Transitions.js is a fairly lightweight CSS transition polyfill. What this means is that the script checks to see if your user’s browser supports CSS Transitions, and if it doesn’t then it simulates those transitions with javascript. For instance, CSS transitions often handle the smooth, uh, transition between colors when you hover over a button. They are also a little more than just pizzazz. In a recent article, Rachel Nabors shows how transition and animation increase the usability of the site by guiding the eye.

With that said, CSS Transitions have pretty good browser support and they probably aren’t crucial to the functionality of the library website on IE9.

Recommendation: Don’t Include.


“Modals” are popup windows. There are plenty of neat things you can do with them. Additionally, modals are a pain to design consistently for every browser. Let Bootstrap do that heavy lifting for you.

Recommendation: Include


It’s hard to conclude a library website design committee without a lot of links in your menu bar. Dropdown menus are kind of tricky to code, and Bootstrap does a really nice job keeping it a consistent and responsive experience.

Recommendation: Include


If you have a fixed sidebar or menu that follows the user as they read, scrollspy.js can highlight the section of that menu you are currently viewing. This is useful if your site has a lot of long-form articles, or if it is a one-page app that scrolls forever. I’m not sure this describes many library websites, but even if it does, you probably want more functionality than Scrollspy offers. I recommend jQuery-Waypoints - but only if you are going to do something really cool with it.

Recommendation: Don’t Include


Tabs are a good way to break-up a lot of content without actually putting it on another page. A lot of libraries use some kind of tab widget to handle the different search options. If you are writing guides or tutorials, tabs could be a nice way to display the text.

Recommendation: Include


Tooltips are often descriptive popup bubbles of a section, option, or icon requiring more explanation. Tooltips.js helps handle the predictable positioning of the tooltip across browsers. With that said, I don’t think tooltips are that engaging; they’re sometimes appropriate, but you definitely use to see more of them in the past. Your library’s time is better spent de-jargoning any content that would warrant a tooltip. Need a tooltip? Why not just make whatever needs the tooltip more obvious O_o?

Recommendation: Don’t Include


Even fancier tooltips.

Recommendation: Don’t Include


Alerts.js lets your users dismiss alerts that you might put in the header of your website. It’s always a good idea to give users some kind of control over these things. Better they read and dismiss than get frustrated from the clutter.

Recommendation: Include


The collapse plugin allows for accordion-style sections for content similarly distributed as you might use with tabs. The ease-in-ease-out animation triggers motion-sickness and other aaarrghs among users with vestibular disorders. You could just use tabs.

Recommendation: Don’t Include


Button.js gives a little extra jolt to Bootstrap’s buttons, allowing them to communicate an action or state. By that, imagine you fill out a reference form and you click “submit.” Button.js will put a little loader icon in the button itself and change the text to “sending ….” This way, users are told that the process is running, and maybe they won’t feel compelled to click and click and click until the page refreshes. This is a good thing.

Recommendation: Include


Carousels are the most popular design element on the web. It lets a website slideshow content like upcoming events or new material. Carousels exist because design committees must be appeased. There are all sorts of reasons why you probably shouldn’t put a carousel on your website: they are largely inaccessible, have low engagement, are slooooow, and kind of imply that libraries hate their patrons.

Recommendation: Don’t Include.


I’m not exactly sure what this does. I think it’s a fixed-menu thing. You probably don’t need this. You can use CSS.

Recommendation: Don’t Include

Now, Don’t You Feel Better?

Just comparing the bootstrap.js and bootstrap.min.js files between out-of-the-box Bootstrap and one tailored to the specs above, which of course doesn’t consider the differences in the CSS, the weight of the images not included in a carousel (not to mention the unquantifiable amount of pain you would have inflicted), the numbers are telling:

File Before After
bootstrap.js 54kb 19kb
bootstrap.min.js 29kb 10kb

So, Bootstrap Responsibly

There is more to say. When bouncing this topic around twitter awhile ago, Jeremy Prevost pointed out that Bootstrap’s minified assets can be GZipped down to about 20kb total. This is the right way to serve assets from any framework. It requires an Apache config or .htaccess rule. Here is the .htaccess file used in HTML5 Boilerplate. You’ll find it well commented and modular: go ahead and just copy and paste the parts you need. You can eke out even more performance by “lazy loading” scripts at a given time, but these are a little out of the scope of this post.

Here’s the thing: when we talk about having good library websites we’re mostly talking about the look. This is the wrong discussion. Web designs driven by anything but the content they already have make grasping assumptions about how slick it would look to have this killer carousel, these accordions, nifty tooltips, and of course a squishy responsive design. Subsequently, these responsive sites miss the point: if anything, they’re mobile unfriendly.

Much of the time, a responsive library website is used as a marker that such-and-such site is credible and not irrelevant, but as such the website reflects a lack of purpose (e.g., “this website needs to increase library-card registration). A superficial understanding of responsive webdesign and easy-to-grab frameworks entail that the patron is the least priority.


About Our Guest Author :

Michael Schofield is a front-end librarian in south Florida, where it is hot and rainy – always. He tries to do neat things there. You can hear him talk design and user experience for libraries on LibUX.

by Michael Schofield at August 27, 2014 01:00 PM

In the Library, With the Lead Pipe

Call for Social Media Editor

In the Library with the Lead Pipe is seeking applications for a Social Media Editor. This volunteer position will serve on the Lead Pipe Editorial Board for a two-year term of service.

Lead Pipe is an open access, open peer reviewed journal founded and run by an international team of librarians working in various types of libraries. In addition to publishing articles and editorials by Editorial Board members, Lead Pipe publishes articles by authors representing diverse perspectives including educators, administrators, library support staff, technologists, and community members. Lead Pipe intends to help improve communities, libraries, and professional organizations. Our goal is to explore new ideas and start conversations, to document our concerns and argue for solutions.

The Lead Pipe Editorial Board is committed to collegiality and consensus decision-making. Applicants should be prepared to participate in discussions that may be forthright and frank, but always respectful and solution-focused. For many of us, the work we do for Lead Pipe is among our most professionally rewarding, and even though we interact primarily via email and a monthly hangout, we have grown to treasure the relationships we’ve formed with each other.

Lead Pipe currently has a social media presence on Facebook, Twitter, and Google+ and seeks to improve its efficacy within these venues.

The Social Media Editor will be considered an equal member of the Lead Pipe Editorial Board and will be given the opportunity to engage in other Lead Pipe Editorial Board responsibilities, such as editing articles and recruiting authors.

The expected time commitment is approximately 10-20 hours per month.


To be considered for this position, please send a statement of interest, along with your name and email address to Your statement should be succinct and should describe your relevant experience as well as at least one idea for improving Lead Pipe‘s current social media presence. We want you to demonstrate that you’ve looked at our channels and thought critically about them, and that you have a coherent approach or philosophy regarding social media for organizations. In addition, if you have one, be sure to link to your online portfolio or any social media presence you manage.

This position will remain open until filled with priority given to applications received prior to Wednesday, September 24th, 2014.

Any questions may be directed to

Many thanks to Nicole Helregel from Hack Library School for reviewing!

by Ellie Collier at August 27, 2014 10:00 AM

August 26, 2014

Casey Bisson

A/B Split Testing Calculators

Mixpanel’s A/B testing calculator is a competent performer and valuable tool:

mixpanel ab test calculator

Thumbtack’s split testing calculator, however, is a surprise standout:

thumbtack ab test calculator

That their code is in Github is especially delightful.

by Casey Bisson at August 26, 2014 10:45 PM

Tara Robertson

letter to the Vancouver Public Library Board

I am writing to urge you to reconsider the changes in the Public Internet Use policy that the Board recently passed. These are bad policy changes that erode intellectual freedom, are problematic for library workers and are harmful to libraries. I have many concerns both as a library user and as a librarian.

I served as the chair of the BC Library Association’s Intellectual Freedom Committee from 2006-2008, have blogged about intellectual freedom issues in libraries for 8 years and sit on an editorial committee for an encyclopedia on intellectual freedom for libraries.

According to the VPL’s 2013 Annual Report there were 1.3 million internet sessions and 1.1 wireless sessions. The management report cites 31 complaints out of a total 2.6 million internet sessions. This is not enough of a problem to justify a drastic policy change.

I appreciate that the management report dated July 17, 2014 references the Canadian Library Association’s Statement on Intellectual Freedom and talks about VPL’s commitment to this core library value. This policy does not “guarantee and facilitate access to all expressions of knowledge and intellectual activity”, in fact it erodes these freedoms. The phrase “explicit sexual images” is highly problematic and extremely vague. Who decides what is sexually explicit? A colleague at a public library told me about a complaint from a patron about another patron who was apparently looking at pornography. This person turned out to be watching a online video of childbirth.

It seems like there is confusion about what intellectual freedom looks like online versus the library’s traditional print collections. If someone was to read an ebook version of the graphic novel Lost Girls on a tablet device, or search for online information about sexual health or human sexuality, or watch a video of well known contemporary performance artist Annie Sprinkle–would VPL staff or security come and kick them out of the library? While some people might find these topics offensive, they are all legitimate information needs.

Reading the current practice of what happens when someone reports seeing something offensive really troubles me. The management report states that either staff or a security guard asks the user to stop viewing the inappropriate material, if the library user does not comply they are asked to leave the library. I’m concerned that there isn’t an evaluation of whether the material is acceptable or not. Also, having a security guard come up to you and possibly kicking you out of the library is a scary and intimidating experience, especially for many socially excluded individuals.

The management report describes this as being a problem primarily at the Central library and Mount Pleasant branch. This sounds like a design challenge: “how do you design public spaces so that library users’ freedom to access does not impact staff member’s freedom to work without seeing things that offend them?” As the Central branch has moved to a roving reference model, perhaps it is time to rethink how the seating areas and computers are set up.

Again, I ask you to reconsider this policy decision.

by Tara Robertson at August 26, 2014 10:09 PM

Dave Pattern

Hello world!

Welcome to The Hitchcock Zone Sites. This is your first post. Edit or delete it, then start blogging!

by Dave Pattern at August 26, 2014 07:26 PM

OCLC Dev Network

Additional Fixes in Tomorrow's VIAF Update

In addition to the Modifcation to VIAF application/xml Response Title Location we told you about a couple of weeks ago, tomorrow's VIAF update will also include a couple of bonus fixes:

by Shelley Hostetler at August 26, 2014 06:00 PM


Summer of Archives: Gorgeous historical book covers

Who doesn’t love a beautiful book cover? Our latest installment in the Summer of Archives series contains a smattering of stunning 19th and 20th-century book covers designed by the likes of Margaret Armstrong (1867-1944) and Will Bradley (1868–1962), among others. All images courtesy University of North Carolina at Greensboro, via the North Carolina Digital Heritage Center and DPLA.

View album on

cc-by-iconAll written content on this blog is made available under a Creative Commons Attribution 4.0 International License. All images found on this blog are available under the specific license(s) attributed to them, unless otherwise noted.

by Kenny Whitebloom at August 26, 2014 05:23 PM

Roy Tennant

The Ignorance of What it Will Take

ccI noted in July’s issue of Current Cites, that we had ended our 24th year of continuous monthly publication and were entering our 25th. Of course the real celebrations will happen a year from now, but I thought that it was worth noting.

As I thought more about it, I remembered (again) that I had started the publication at UC Berkeley a little less than three years before my twins were born. Now they are in college. That got me to thinking that had I known I would still be doing this, month in and month out, over 20 years later, I’m not sure what I would have done.

Would I have been proud? Ready to jump in and put my shoulder to the wheel for the next 20-something years, every month? I don’t know. I really don’t. In some ways, it’s like parenthood. Although having children is the best thing I’ve ever done in my life, I can’t help thinking if I knew all of the impacts that were about to occur for the rest of my life it would at least give me pause.

Some of my favorite statements of parenthood include these chestnuts (but nonetheless true): “having children is like deciding to let your heart live outside of your body” and “parenthood is the hardest job you will ever love”. Because, in the end, we don’t really know what we are getting in for until it’s too late. And that is a good thing. 

Because if we truly understood all of the many impacts on our lives without also truly understanding the benefits, we probably would never do anything. And that’s not good.

So although it might sound strange, color me happy for ignorance. It definitely has its place.

by Roy Tennant at August 26, 2014 04:49 PM

District Dispatch

Add your voice to FCC public comment on network neutrality

The Federal Communications Commission (FCC) has heard from more than 1 million commenters on proposed rulemaking to Protect and Promote the Open Internet, including from the American Library Association (ALA), Association for College & Research Libraries (ACRL), the Chief Officers of State Library Agencies (COSLA) and the Association of Research Libraries (ARL). But it’s not too late to add your voice in support of network neutrality.

September is a perfect time to add more voices from the library and education community. Working with EDUCAUSE, ALA has developed a template letter of support for our comments that you can use to amplify our voice. Click here (doc) to open the document, customize with your information and follow guidelines for submission to FCC.

ALA is meeting with FCC officials, and there is definite interest in our perspective as advocates for intellectual freedom and equity of access to information for all. Please consider strengthening our presence as a community in the public record.

The formal “reply” comment period of the FCC proceeding will close September 15, but “ex parte” comments will be accepted until further notice. The FCC hoped to deliver a new Order on network neutrality by the end of the year, but this could be delayed as the commission considers the broad public input and a range of proposals and perspectives.

As always, more background and related news can be found online. Stay tuned!

The post Add your voice to FCC public comment on network neutrality appeared first on District Dispatch.

by Larra Clark at August 26, 2014 03:21 PM

Lukas Koster

Library Linked Data Happening

LOD happening

On August 14 the IFLA 2014 Satellite Meeting ‘Linked Data in Libraries: Let’s make it happen! took place at the National Library of France in Paris. Rurik Greenall (who also wrote a very readable conference report) and I had the opportunity to present our paper ‘An unbroken chain: approaches to implementing Linked Open Data in libraries; comparing local, open-source, collaborative and commercial systems’. In this paper we do not go into reasons for libraries to implement linked open data, nor into detailed technical implementation options. Instead we focus on the strategies that libraries can adopt for the three objectives of linked open data, original cataloguing/creating of linked data, exposing legacy data as linked open data and consuming external linked open data. Possible approaches are: local development, using Free and open Source Software, participating in consortia or service centres, and relying on commercial vendors, or any combination of these. Our main conclusions and recommendations are: identify your business case, if you’re not big enough be part of some community, and take lifecycle planning seriously.

The other morning presentations provided some interesting examples of a number of approaches we described in our talk. Valentine Charles presented the work in the area of aggregating library and heritage data from a large number of heterogeneous sources in different languages by two European institutions that de facto function as large consortia or service centres for exposing and enriching data, Europeana and The European Library. Both platforms not only expose their aggregated content in web pages for human consumption but also as linked open data, besides other so called machine readable formats. Moreover they enrich their aggregated content by consuming data from their own network of providers and from external sources, for instance multilingual “value vocabularies” like thesauri, authority lists, classifications. The ideas is to use concepts/URIs together with display labels in multiple languages. For Europeana these sources currently are GeoNames, DBPedia and GEMET. Work is being done on including the Getty Art and Architecture Thesaurus (AAT) which was recently published as Linked Open Data. Besides using VIAF for person authorities, The European Library has started adding multilingual subject headings by integrating the Common European Research Classification Scheme, part of the CERIF format. The use of MACS (Multilingual Access to Subjects) as Linked Open Data is being investigated. This topic was also discussed during the informal networking breaks. Questions that were asked: is MACS valuable for libraries, who should be responsible for MACS and how can administering MACS in a Linked Open Data environment best be organized? Personally I believe that a multilingual concept based subject authority file for libraries, archives, museums and related institutions is long overdue and will be extremely valuable, not only in Linked Open Data environments.

The importance of multilingual issues and the advantages that Linked Open Data can offer in this area were also demonstrated in the presentation about the Linked Open Authority Data project at the National Diet Library of Japan. The Web NDL Authorities are strongly connected to VIAF and LCSH among others.

The presentation of the Linked Open Data environment of the National Library of France BnF ( highlighted a very interesting collaboration between a large library with considerable resources in expertise, people and funding on the one hand, and the non-library commercial IT company Logilab. The result of this project is a very sophisticated local environment consisting of the aggregated data sources of the National Library and a dedicated application based on the free software tool Cubicweb. An interesting situation arose when the company Logilab itself asked if the developed applications could be released as Open Source by the National Library. The BnF representative Gildas Illien (also one of the organizers of the meeting together with Emmanuelle Bermes) replied with considerations about planning, support and scalability, which is completely understandable from the perspective of lifecycle planning.

With all these success stories about exposing and publishing Linked Open Data, the question always remains if the data is actually used by others. It is impossible to incorporate this in project planning and results evaluation. Regarding the BnF data this question was answered in the presentation about Linked Open Data in the book industry. The Electre and Antidot project uses linked open data form among others

The afternoon presentations were focused on creating, maintaining and using various data models, controlled vocabularies and knowledge organisation sysems (KOS) as Linked Open Data: The EDM Europeana data Model, UNIMARC, MODS. An interesting perspective was presented by Gordon Dunsire on versioning vocabularies in a linked data world. Vocabularies change over time, so an assignment of a URI of a certain vocabulary concept should always contain version information (like timestamps and/or version numbers) in order to be able to identify the intended meaning at the time of assigning.

The meeting was concluded with a panel with representatives of three commercial companies involved in library systems and Linked Open Data developments: Ex Libris, OCLC and the afore-mentioned Logilab. The fact that this panel with commercial companies on library linked data took place was significant and important in itself, regardless of the statements that were made about the value and importance of Linked Open Data in library systems. After years of dedicated temporarily funded proof of concept projects this may be an indication that Linked Open Data in libraries is slowly becoming mainstream.


flattr this!

by Lukas Koster at August 26, 2014 01:52 PM

State Library of Denmark

Ten times faster

One week ago I complained about Solr’s two-phase distributed faceting being slow in the second phase – ten times slower than the first phase. The culprit was the fine-counting of top-X terms, with each term-count being done as an intersection between regular hits and a special search for the term in question.

Let’s have a quick look at the numbers from last week (note that the scales are logarithmic):

256GB RAM, 1 thread, 12 shards, 10TB, random words, sparse faceting on URL, phase 1 and 2 separately, numbers from the individual shard requests

256GB RAM, 1 thread, 12 shards, 10TB, random words, sparse faceting on URL, phase 1 and 2 separately, numbers from the individual shard requests

Imprecise facet counting aka cheating

The simple way to get fast distributed faceting for high cardinality fields is to skip the second phase and accept that the term counts for faceting might be a bit off, where “a bit” is highly dependent on corpus & query. An extremely quick ad-hoc test with our corpus suggested around 10% deviation from the correct counts. The skipping requires just 3 lines of code, strategically placed.

256GB RAM, 1 thread, 12 shards, 10TB, random words, faceting on URL, numbers from full distributed faceting

256GB RAM, 1 thread, 12 shards, 10TB, random words, faceting on URL, numbers from full distributed faceting

Apologies for the colors not matching between the charts. The median for plain Solr faceting is 1608 ms. For imprecise faceting counting, using sparse faceting for first phase, it is 168 ms. Quite a speed up! Read Whale hunting with Solr for an explanation of the weird response times below 100 hits.

Since we are already cheating and getting imprecise counts, we might as well limit the maximum count for each term. In our 12 shards, the maximum count for a single URL in any shard is a little below 5000, with the long tail very quickly dropping. Most counts are below 10 in a single shard. With a count of 5000, we need 13 bits to hold the counter, meaning 3.6 billion terms / 13 bits/term ~= 5.44 GB for all counter structures or about 0.45 GB / shard / thread. If we lower this max count to 255 / shard, so that a single counter fits in a byte, we get faster faceting and reduce the memory overhead to 3.6 GB total or 300 MB / shard / thread.

Alas, some of us think that all this cheating is getting out of hand…

Once more, with feeling!

It was possible to speed the first phase of Solr faceting by doing sparse counting, so let’s try that again: For the second phase, we do a near complete repetition of the first phase, so the counts for all terms in the facet field are calculated. However, instead of calculating and extracting the top-X terms, only the counts for the requested terms are extracted from the counting structure. Extraction of a count from the structure requires resolving of the ordinal for the term in question. This does take some time, but the hope was that this would not give too much overhead. So did it help?

256GB RAM, 1 thread, 12 shards, 10TB, random words, faceting on URL, numbers from full distributed faceting

256GB RAM, 1 thread, 12 shards, 10TB, random words, faceting on URL, numbers from full distributed faceting


This is getting a bit chaotic and it is hard to see all the cyan dots hiding between the green ones. Trusty old percentile plot to the rescue:

256GB RAM, 1 thread, 12 shards, 10TB, random words, faceting on URL, numbers from full distributed faceting with sparse term lookup

256GB RAM, 1 thread, 12 shards, 10TB, random words, faceting on URL, numbers from full distributed faceting with sparse term lookup

Now we can see! With the second phase of faceting being nearly as fast as first phase, total faceting time for small result sets is looking quite good. If we lump all the measurements for each method together, we get

256GB RAM, 1 thread, 12 shards, 10TB, random words, faceting on URL, numbers from full distributed faceting

256GB RAM, 1 thread, 12 shards, 10TB, random words, faceting on URL, numbers from full distributed faceting

Note how the median for skip_secondary is a lot lower than for the previous test run – it seems that someone else was using the machine at that time. The outputs has been verified by random inspection: It really only takes a few hundred milliseconds to facet on small result sets out of more than 3 billion documents on a single machine. Just as it should, one might add. It remains to be tested if smaller setups benefits just as much.

We’re not done yet

The implementation is not ideal as the exact same work – counting of all term occurrences – is very often done twice. It would be a lot better to cache it. When releasing a counter structure to the pool of structures, it could be released with a tag stating if it would probably be used again (first phase of distributed faceting) or if it would probably not be needed any more (after the second phase). Guesstimating, this should shave 30-40% of the total time for faceting with sparse term lookup.

Should anyone want to try sparse faceting for themselves, then visit and check out branch lucene_solr_4_8_sparse or lucene_solr_4_9_sparse. You will need an existing Solr index for proper testing. Refer to the file for options. The defaults works fine if the parameters facet.sparse=true and facet.sparse.termlookup=true are given and the requested facet field has over 10,000 unique values and facet.method=fc. To disable the second phase completely, add the parameter facet.sparse.skiprefinements=true. Proper documentation pending.

If you want to see this in Solr, visit SOLR-5894 and vote for it.

Update 20140826 22:26

To verify things, the experiment was repeated with 10 minute running time for each faceting method (as opposed to 3 minutes). This did not affect the conclusion, but might add a little bit of information.

256GB RAM, 1 thread, 12 shards, 10TB, random words, faceting on URL, numbers from full distributed faceting

256GB RAM, 1 thread, 12 shards, 10TB, random words, faceting on URL, numbers from full distributed faceting

The first thing to note is how response time rises near linear to result set size, when the result set is 5M or more. It would be interesting to investigate what happens around the 300M (8% of the corpus size) mark, which is the limit chosen for sparse faceting for this setup.

The second observation is that all methods except stock Solr (the blue dots) seems to have two clusters, one below 100ms and one above. As the no_facet method is single phase (note to self: Double check if this is true), this cannot be explained by the second phase being skipped. Maybe there is some caching effect? The queries should be as good as unique, so it is not just because of simple request caching.

For an alternative illustration, here’s the same data as above but without the logarithmic y-scale:

256GB RAM, 1 thread, 12 shards, 10TB, random words, faceting on URL, numbers from full distributed faceting

256GB RAM, 1 thread, 12 shards, 10TB, random words, faceting on URL, numbers from full distributed faceting

by Toke Eskildsen at August 26, 2014 12:53 PM

FOSS4Lib Upcoming Events

13th International PASIG Meeting

Tuesday, September 16, 2014 - 08:00 to Thursday, September 18, 2014 - 17:00

Last updated August 26, 2014. Created by Peter Murray on August 26, 2014.
Log in to edit this page.

Wednesday, September 17

9:15 – 10:45 Research Data: Small Sciences
Fedora 4 for Research Data – David Wilcox, Duraspace

11:15 – 12:45 Research Data: Big Data
Islandora: Mining the Open Source Ecosystem for Data Management - Erin Tripp, DiscoveryGarden

Full Agenda – PASIG

by Peter Murray at August 26, 2014 12:22 PM

Terry Reese

MarcEdit’s MARCNext: JSON Object Viewer

As I noted in my last post (, I’ll be adding a new area to the MarcEdit application called MARCNext.  This will be used to expose a number of research tools for users interested in working with BibFrame data.  In addition to the BibFrame Testbed, I’ll also be releasing a JSON Object Viewer.  The JSON Object Viewer is a specialized viewer designed to parse JSON text and provide an object visualization of the data.  The idea is that this tool could be utilized to render MARC data translated into Bibframe as JSON for easy reading.  However, I’m sure that there will be other uses as well.  I’ve tried to keep the interface simple.  Essentially, you point the tool at a JSON file and the tool will render the file as objects.  From there, you can search and query the data, view the JSON file in Object or Plain text mode, and ultimately, copy data for use elsewhere. 


Some additional testing needs to be done to make sure the program works well when coming across poorly formed data – but this tool will be a part of the next update.


by reeset at August 26, 2014 04:31 AM

Tara Robertson


by Tara Robertson at August 26, 2014 01:51 AM

August 25, 2014

Casey Bisson

Algolia Search

The multi-category autocomplete and autocomplete on filtering operators demos are interesting:

multi-category autocomplete

autocomplete on filtering operators

by Casey Bisson at August 25, 2014 10:43 PM


Sarah Lewis on mastery:

Mastery is in the reaching, not the arriving. It’s in constantly wanting to close that gap between where you are and where you want to be

by Casey Bisson at August 25, 2014 10:34 PM

FOSS4Lib Recent Releases

Sufia - 4.0

Release Date: 
Thursday, August 21, 2014

Last updated August 25, 2014. Created by Peter Murray on August 25, 2014.
Log in to edit this page.

This new version includes a complete redesign of Sufia's user interface, based on user feedback and design studies conducted with the staff and faculty at Penn State University. Sufia’s features now include:

by Peter Murray at August 25, 2014 05:50 PM

Journal of Web Librarianship


Journal of Web Librarianship, Volume 8, Issue 3, pages 324-325, July-September 2014.

by David Gibbs at August 25, 2014 03:50 PM


Journal of Web Librarianship, Volume 8, Issue 3, pages 328-329, July-September 2014.

by Paula Barnett-Ellis at August 25, 2014 03:50 PM


Journal of Web Librarianship, Volume 8, Issue 3, pages 325-326, July-September 2014.

by Dena L. Luce at August 25, 2014 03:49 PM


Journal of Web Librarianship, Volume 8, Issue 3, pages 326-327, July-September 2014.

by John Rodzvilla at August 25, 2014 03:49 PM


Journal of Web Librarianship, Volume 8, Issue 3, pages 327-328, July-September 2014.

by Bradford Lee Eden at August 25, 2014 03:49 PM

Discovering Usability: Comparing Two Discovery Systems at One Academic Library

Journal of Web Librarianship, Volume 8, Issue 3, pages 263-285, July-September 2014.

by Mireille Djenno at August 25, 2014 03:49 PM

Office Location Map of Individuals in the Library and Other College Campus Buildings: A Proof-of-Concept Wayfinding System

Journal of Web Librarianship, Volume 8, Issue 3, pages 305-323, July-September 2014.

by Naresh Kumar Agarwal at August 25, 2014 03:49 PM

Indonesian LIS Professionals’ Understanding of Library 2.0: A Pilot Study

Journal of Web Librarianship, Volume 8, Issue 3, pages 286-304, July-September 2014.

by Bekti Mulatiningsih at August 25, 2014 03:49 PM

The Library and the Web: Graduate Students’ Selection of Open Access Journals for Empirical Literature Searches

Journal of Web Librarianship, Volume 8, Issue 3, pages 243-262, July-September 2014.

by Ethan J. Allen at August 25, 2014 03:49 PM

Library of Congress: The Signal

What Do You Do With 100 Million Photos? David A. Shamma and the Flickr Photos Dataset


David Ayman Shamma, a scientist and senior research manager with Yahoo Labs and Flickr. Photo from xeeliz on Flickr.

Every day, people from around the world upload photos to share on a range of social media sites and web applications. The results are astounding; collections of billions of digital photographs are now stored and managed by several companies and organizations. In this context, Yahoo Labs recently announced that they were making a data set of 100 million Creative Commons photos from Flickr available to researchers. As part of our ongoing series of Insights Interviews, I am excited to discuss potential uses and implications for collecting and providing access to digital materials with David Ayman Shamma, a scientist and senior research manager with Yahoo Labs and Flickr.

Trevor: Could you give us a sense of the scope and range of this corpus of photos? What date ranges do they span? The kinds of devices they were taken on? Where they were taken? What kinds of information and metadata they come with? Really, anything you can offer for us to better get our heads around what exactly the dataset entails.

Ayman: There’s a lot to answer in that question. Starting at the beginning, Flickr was an early supporter of the Creative Commons and since 2004 devices have come and gone, photographic volume has increased, and interests have changed.  When creating the large-scale dataset, we wanted to cast as wide a representative net as possible.  So the dataset is a fair random sample across the entire corpus of public CC images.  The photos were uploaded from 2004 to early 2014 and were taken by over 27,000 devices, including everything from camera phones to DSLRs. The dataset is a list of photo IDs with a URL to download a JPEG or video plus some corresponding metadata like tags and camera type and location coordinates.  All of this data is public and can generally be accessed from an unauthenticated API call; what we’re providing is a consistent list of photos in a large, rolled-up format.  We’ve rolled up some but not all of the data that is there.  For example, about 48% of the dataset has longitude and latitude data which is included in the rollup, but comments on the photos have not been included, though they can be queried through the API if someone wants to supplement their research with it.

Data, data, data... A glimpse of a small piece of the dataset shared by aymanshamma on Flickr.

Data, data, data… A glimpse of a small piece of the dataset. Image shared by aymanshamma on Flickr.

Trevor: In the announcement about the dataset you mention that there is a 12 GB data set, which seems to have some basic metadata about the images and a 50 TB data set containing the entirety of the collection of images. Could you tell us a bit about the value of each of these separately, the kinds of research both enable and a bit about the kinds of infrastructure required to provide access to and process these data sets?

Ayman: Broadly speaking, research on Flickr can be categorized into two non-exclusive topic areas: social computing and computer vision. In the latter, one has to compute what are called ‘features’ or pixel details about luminosity, texture, cluster and relations to other pixels.  The same is true for audio in the videos.  In effect, it’s a mathematical fingerprint of the media.  Computing these fingerprints can take quite a bit of computational power and time, especially at the scale of 100 million items.  While the core dataset of metadata is only 12 GB, a large collection of features reach into the terabytes. Since these are all CC media files, we thought to also share these computed features.  Our friends at the International Computer Science Institute and Lawrence Livermore National Labs were more than happy to compute and host a standard set of open features for the world to use.  What’s nice is this expands the dataset’s utility.  If you’re from an institution (academic or otherwise), computing the features could be a costly set of compute time.

A 1 million photo sample of the 48 million geotagged photos from the dataset plotted around the globe shared by aymanshamma on Flickr.

A 1 million photo sample of the 48 million geotagged photos from the dataset plotted around the globe. Image shared by aymanshamma on Flickr.

Trevor: The dataset page notes that the dataset has been reviewed to meet “data protection standards, including strict controls on privacy.” Could you tell us a bit about what that means for a dataset like this?

Ayman: The images are all under one of six Creative Commons licenses implemented by Flickr. However, there were additional protections that we put into place. For example, you could upload an image with the license CC Attribution-NoDerivatives and mark it as private. Technically, the image is in the public CC; however, Flickr’s agreement with its users supersedes the CC distribution rights. With that, we only sampled from Flickr’s public collection. There are also some edge cases.  Some photos are public and in the CC but the owner set the geo-metadata to private. Again, while the geo-data might be embedded in the original JPEG and is technically under CC license, we didn’t include it in the rollup.

Trevor: Looking at the Creative Commons page for Flickr, it would seem that this isn’t the full set of Creative Commons images. By my count, there are more than 300 million creative commons licensed photos there. How were the 100 million selected, and what factors went into deciding to release a subset rather than the full corpus?

Ayman: We wanted to create a solid dataset given the potential public dataset size; 100 million seemed like a fair sample size that could bring in close to 50% geo-tagged data and about 800 thousand videos. We envision researchers from all over the world accessing this data, so we did want to account for the overall footprint and feature sizes.  We’ve chatted about the possibility of ‘expansion packs’ down the road, both to increase the size of the dataset and to include things like comments or group memberships on the photos.

Trevor: These images are all already licensed for these kinds of uses, but I imagine that it would have simply been impractical for someone to collect this kind of data via the API. How does this data set extend what researchers could already do with these images based on their licenses? Researchers have already been using Flickr photos as data, what does bundling these up as a dataset do for enabling further or better research?

Ayman: Well, what’s been happening in the past is people have been harvesting the API or crawling the site.  However, there are a few problems with these one-off research collections; the foremost is replication.  By having a large and flexible corpus, we aim to set a baseline reference dataset for others to see if they can replicate or improve upon new methods and techniques.  A few academic and industry players have created targeted datasets for research, such as ImageNet from Stanford or Yelp’s release of its Phoenix-area reviews. Yahoo Labs itself has released a few small targeted Flickr datasets in the past as well.  But in today’s research world, the new paradigm and new research methods require large and diverse datasets, and this is a new dataset to meet the research demands.

Trevor: What kinds of research are you and your colleagues imagining folks will do with these photographs? I imagine a lot of computer science and social network research could make use of them. Are there other areas you imagine these being used in? It would be great if you could mention some examples of existing work that folks have done with Flickr photos to illustrate their potential use.

Ayman: Well, part of the exciting bit is finding new research questions.  In one recent example, we began to examine the shape and structure of events through photos.  Here, we needed to temporally align geo-referenced photos to see when and where a photo was taken. As it turns out, the time the photo was taken and the time reported by the GPS are off by as much as 10 minutes in 40% of the photos.  So, in work that will be published later this year, we designed a method for correcting timestamps that are in disagreement with the GPS time.  It’s not something we would have thought we’d encounter, but it’s an example of what makes a good research question.  With a large corpus available to the research world at-large, we look forward to others also finding new challenges, both immediate and far-reaching.

Trevor: Based on this, and similar webscope data sets, I would be curious for any thoughts and reflections you might offer for libraries, archives and museums looking at making large scale data sets like this available to researchers. Are there any lessons learned you can share with our community?

Ayman: There’s a fair bit of care and precaution that goes into making collections like this -  rarely is it ever just a scrape of public data; ownership and copyright does play a role. These datasets are large collections that reflect people’s practices, behavior and engagement with media like photos, tweets or reviews. So, coming to understand what these datasets mean with regard to culture is something to set our sights on. This applies to the libraries and archives that set to preserve collections and to researchers and scientists, social and computational alike, who aim to understand them.

by Trevor Owens at August 25, 2014 03:20 PM

Peter Murray

Kuali Reboots Itself into a Commercial Entity

Did you feel a great disturbance in the open source force last week? At noon on Friday in a conference call with members of the Kuali community, the Kuali Foundation Board of Directors announced a change of direction:

We are pleased to share with you that the Kuali Foundation is creating a Professional Open Source commercial entity to help achieve these goals. We expect that this company will engage with the community to prioritize investments in Kuali products, will hire full-time personnel, will mesh a “software startup” with our current culture, and will, over time, become self-sustaining. It enables an additional path for investment to accelerate existing and create new Kuali products.

As outlined in the Kuali 2.0 FAQ:

The Kuali Foundation (.org) will still exist and will be a co-founder of the company. It will provide assurance of an ongoing open source code base and still enable members to pool funds to get special projects done that are outside the company’s roadmap. The fees for Foundation membership will be reduced.

There have been some great observations on Twitter this morning. First, a series of tweets from Roger Schonfeld:

Community source models have proved inadequate to HighWire & Kuali: both have reorganized as profit-seeking initiatives. 1/3— Roger C. Schonfeld (@rschon) August 25, 2014

As collaborative software/hosting specialized to higher ed, did they have trouble recapitalizing in the community following start up? 2/3

— Roger C. Schonfeld (@rschon) August 25, 2014

And if so what should planners for other community collaborative initiatives such as HathiTrust bear in mind? 3/3

— Roger C. Schonfeld (@rschon) August 25, 2014

Lisa Hinchliffe points out a similar struggle by the Sakai Foundation last year.

.@rschon Different path but perhaps also lessons from Sakai?

— Lisa Hinchliffe (@lisalibrarian) August 25, 2014

Dan Cohen adds:

@DataG @griffey Hmm, looks more radical than an “adding a vendor” move or even a Mozilla Foundation->Mozilla Corporation move.

— Dan Cohen (@dancohen) August 25, 2014

And lastly (for the moment) Bryan Alexander adds a brief quote from Brad Wheeler’s conference call:

@rschon Cf Brad Wheeler: "college leaders perceive companies as more stable than communal projects" @dancohen @DataG @griffey

— Bryan Alexander (@BryanAlexander) August 25, 2014

My first interpretation of this is that there is a fundamental shift afoot in the perception of open source by senior leadership at higher education institutions. Maybe it is a lack of faith in the “community source” model of software development. Having a company out there that is formally responsible for the software rather than your own staff’s sweat equity makes it easier to pin the blame for problems on someone else. Or maybe it is that highly distributed open source projects for large enterprise-wide applications aren’t feasible — are communication barriers and the accountability overhead too large to move fast?

I do wonder what this means for the Kuali Open Library Environment (OLE) project. Kuali OLE just saw its first two installations go live this week. Will Kuali’s pivot towards a for-profit company make OLE more attractive to academic libraries or less? Does it even matter?

Lots of questions, and lots to think about.

by Peter Murray at August 25, 2014 02:10 PM

August 24, 2014

Patrick Hochstenbach

META Cover image

Woo-hoo my super librarian girl was used as the cover image of the META magazine for library professionals!  Filed under: Comics Tagged: cartoon, cover, libraries, library, superhero

by hochstenbach at August 24, 2014 12:28 PM

Terry Reese

MarcEdit’s Research Toolkit – MARCNext

While developing MarcEdit 6, one of the areas that I spent a significant amount of time working on was the MarcEdit Research Toolkit.  The Research Toolkit is an easter egg of sorts – it’s a set of tools and utilities that I’ve developed to support my own personal research interests around library metadata – specifically, around the future of library metadata including topics the current BibFrame testing and linked data.  I’ve kept these tools private because they tend to not be fully realized concepts or ideas and have very little in the way of a user interface.  Just as important, many of these tools represent work being created to engage in the conversation that the library community is having around library metadata formats and standards, so things can and do change or drop out of the conversation and are then removed from my toolkit.

While developing MarcEdit 6, one of the goals of the project was to find a way to make some or parts of these tools available to the general MarcEdit community.  To that end, I’ll be making a new area available within MarcEdit called MARCNext.  MARCNext will provide a space to make proof of concept tools available for anyone to use, and offer a simple to use interface that anyone can use to test new bibliographic concepts like BibFrame. 

Presently, I’m evaluating my current workbench to see which of the available tools can be made public.  I have a handful that I think may be applicable – but will need some time to move them from concept to a utility for public consumption.  With that said, I will be making one tool immediately available as part of the next MarcEdit update, and that will be the BibFrame Testbed.  This is code that utilizes the LC XQuery files being developed and distributed at: with a handful of changes made to provide better support within MarcEdit.  These are my base files that will enable librarians to easily model their MARC metadata in a variety of serializations.  And using this initial work, I’ll likely add some additional serializations to the list. 

I have two goals for making this particular tool available.  First and foremost, I would like to enable anyone that is interested the ability to take their existing library metadata and model it using Bibframe concepts.  Currently, Library of Congress makes available a handful of commandline tools that users can utilize to process their metadata – but these tools tend to not be designed for the average user.  By making this information available in MarcEdit – I’m hoping to lower the barrier so that anyone can model their data and then engage in the larger discussion around this work. 

Secondly, I’m currently engaging in some work with Zepheira and other early implementers to take Bibframe testing mainstream.  Given the number of users working with MarcEdit, it made a lot of sense to provide tools to support this level of integration.  Likewise, by taking the time to move this work from the concept stage, I’ve been able to develop the start of a framework around these concepts. 

So how is this going to work?  On the next update, you will see a new link within the Main MarcEdit Window called MARCNext. 

MarcEdit Main Window

Click on the MARCNext link, and you will be taken to the public version of the Research Toolkit.  At this point, the only tool being made publically available is the BibFrame Testbed, though this will change.

MarcEdit’s MARCNext Window

Selecting the BibFrame Testbed initializes a simple dialog box to allow a user to select from a variety of library metadata types and convert them using BibFrame principles into a user-defined serialization. 

BibFrame Testbed window

As noted above, this test bed will be the first of a handful of tools that I will eventually be making available.  Will they be useful to anyone – who knows.  Honestly, the questions that these tools are working to answer are not ones that come up on the list serv, and at present, aren’t going to help much in one’s daily cataloging work.  But hopefully they will enable every cataloger that wants to, the ability to engage with some of these new metadata concepts and at least take their existing data and see how it may change utilizing different serializations and concepts.

Questions – feel free to ask.


by reeset at August 24, 2014 04:36 AM

August 23, 2014

Nicole Engard

Bookmarks for August 23, 2014

Today I found the following resources and bookmarked them on <a href=

Digest powered by RSS Digest

The post Bookmarks for August 23, 2014 appeared first on What I Learned Today....

by Nicole C. Engard at August 23, 2014 08:30 PM

August 22, 2014

District Dispatch

Now available: Archived copyright session

Video from the interactive copyright webinar “International Copyright and Library Practices” is now available. The online seminar covered the basics of international copyright and how it applies to use of foreign works by libraries and in educational settings in the United States. The American Library Association’s (ALA) Office for Information Technology Policy Copyright Education Subcommittee hosted the educational webinar.

Janice T. Pilch discussed international copyright practices during the webinar. Pilch is a copyright and licensing librarian and a member of the faculty of Rutgers University Libraries and a former chair of the ALA OITP Copyright Education Subcommittee. From 2007-2011, Pilch served as an international copyright advocate for the Library Copyright Alliance (LCA) at the World Intellectual Property Organization (WIPO) and other international organizations to promote fair and equitable access to information. She served as Visiting Program Officer on International Copyright for the Association of Research Libraries (ARL) from 2009-2010. She is currently the U.S. representative to the International Federation of Library Associations (IFLA) Committee on Copyright and Other Legal Matters, and chairs a permanent committee on copyright issues within the Association of Slavic, East European and Eurasian Studies.

The post Now available: Archived copyright session appeared first on District Dispatch.

by Carrie Russell at August 22, 2014 09:44 PM