Planet Code4Lib

After privacy glitch, the ball is now in our court / District Dispatch

Photo by John Leben Art Prints

Photo by John Leben Art Prints via Deviant Art

Last week, Adobe announced that with its software update (Digital Editions 4.0.1), the collection and transmission of user data has been secured. Adobe was true to its word that a fix would be made by the week of October 20 correcting this apparent oversight.

For those who might not know, a recap: Adobe Digital Editions is widely used software in the e-book trade for both library and commercial ebook transactions to authenticate legitimate library users, apply DRM to encrypt e-book files, and in general facilitate the e-book circulation process, such as deleting an e-book from a device after the loan period has expired. Earlier in October, librarians and others discovered that the new Adobe Digital Editions software (4.0) had a tremendous security and privacy glitch. A large amount of unencrypted data reflecting e-book loan and purchase transactions was being collected and transmitted to Adobe servers.

The collection of data “in the clear” is a hacker’s dream because it can be so easily obtained. Information about books, including publisher, title and other metadata was also unencrypted raising alarms about reader privacy and the collection of personal information. Some incorrectly reported that Adobe was scanning hard drives and spying on readers. After various librarians conducted a few tests, they confirmed that Adobe was not scanning or spying, but nonetheless this was a clearly a security nightmare and alleged assault on reader privacy.

ALA contacted Adobe about the breach and asked to talk to Adobe about what was going on. Conversations did take place and Adobe responded to several questions raised by librarians.

Now that the immediate problem of unencrypted data is fixed, let’s step back and consider what we have learned and ponder what to do next.

We learned that few librarians have the knowledge base to explain how these software technologies work. To a great extent, users (librarians and otherwise) do not know what is going on behind the curtain (without successfully hacking various layers of encryption).

We can no longer ensure user privacy by simply destroying circulation records, or refusing to reveal information without a court order. This just isn’t enough in the digital environment. Data collection is a permanent part of the digital landscape. It is lucrative and highly valued by some, and is often necessary to make things work.

We learned that most librarians continue to view privacy as a fundamental value of the profession, and something we should continue to support through awareness and action.

We should hold venders and other suppliers to account—any data collected to enable services should be encrypted, retained for only as long as necessary with no personal information collected, shared or sold.

What’s next? We have excellent policy statements regarding privacy, but we do not have a handy dandy guide to help us and our library communities understand how digital technologies work and how they can interfere with reader privacy. We need a handy dandy guide with diagrams and narrative that is not too technicalese (new word, modeled after “legalese”).

We have to inform our users that whenever they key in their name for a service or product, all privacy bets are off. We need to understand how data brokers amass boat loads of data and what they do with it. We need to know how to opt out of data collection when possible, or never opt in in the first place. We need to better inform our library communities.

A good suggestion is to collaborate with vendors and other suppliers and not just talk to one another at the license negotiating table. By working together we can renew our commitment to privacy. The vendors have extended an invitation by asking to work with us on best practices for privacy. Let’s RSVP “yes.”

The post After privacy glitch, the ball is now in our court appeared first on District Dispatch.

Webinar archive available: “$2.2 Billion reasons libraries should care about WIOA” / District Dispatch

Photo by the Knight Foundation

Photo by the Knight Foundation

On Monday, more than one thousand people participated in the American Library Association’s (ALA) webinar “$2.2 Billion Reasons to Pay Attention to WIOA,” an interactive webinar that focused on ways that public libraries can receive funding for employment skills training and job search assistance from the recently-passed Workforce Innovation and Opportunity Act (WIOA).

During the webinar, leaders from the Department of Education and the Department of Labor explored the new federal law. Watch the webinar.

An archive of the webinar is available now:

The Workforce Innovation and Opportunity Act allows public libraries to be considered additional One-Stop partners, prohibits federal supervision or control over selection of library resources and authorizes adult education and literacy activities provided by public libraries as an allowable statewide employment and training activity. Additionally, the law defines digital literacy skills as a workforce preparation activity.

View slides from the webinar presentation:

Webinar speakers included:

  • Susan Hildreth, director, Institute of Museum and Library Services
  • Kimberly Vitelli, chief of Division of National Programs, Employment and Training Administration, U.S. Department of Labor
  • Heidi Silver-Pacuilla, team leader, Applied Innovation and Improvement, Office of Career, Technical, and Adult Education, U.S. Department of Education

We are in the process of developing a WIOA Frequently Asked Questions guide for library leaders—we’ll publish the report on the District Dispatch shortly. Subscribe to the District Dispatch, ALA’s policy blog, to be alerted to when additional WIOA information becomes available.

The post Webinar archive available: “$2.2 Billion reasons libraries should care about WIOA” appeared first on District Dispatch.

Gossiping About Digital Preservation / Library of Congress: The Signal

ANTI-ENTROPY by user 51pct on <a href="">Flickr</a>.

ANTI-ENTROPY by user 51pct on Flickr.

In September the Library held its annual Designing Storage Architectures for Digital Collections meeting. The meeting brings together technical experts from the computer storage industry with decision-makers from a wide range of organizations with digital preservation requirements to explore the issues and opportunities around the storage of digital information for the long-term. I always learn quite a bit during the meeting and more often than not encounter terms and phrases that I’m not familiar with.

One I found particularly interesting this time around was the term “anti-entropy.”  I’ve been familiar with the term “entropy” for a while, but I’d never heard “anti-entropy.” One definition of “entropy” is a “gradual decline into disorder.” So is “anti-entropy” a “gradual coming-together into order?” Turns out that the term has a long history in information science and is important to get an understanding of some very important digital preservation processes regarding file storage, file repair and fixity checking.

The “entropy” we’re talking about when we talk about “anti-entropy” might also be called “Shannon Entropy” after the legendary information scientist Claude Shannon. His ideas on entropy were elucidated in a 1948 paper called “A Mathematical Theory of Communication” (PDF), developed while he worked at Bell Labs. For Shannon, entropy was the measure of the unpredictability of information content. He wasn’t necessarily thinking about information in the same way that digital archivists think about information as bits, but the idea of the unpredictability of information content has great applicability to digital preservation work.

“Anti-entropy” represents the idea of the “noise” that begins to slip into information processes over time. It made sense that computer science would co-opt the term, and in that context “anti-entropy” has come to mean “comparing all the replicas of each piece of data that exist (or are supposed to) and updating each replica to the newest version.” In other words, what information scientists call “bit flips” or “bit rot” are examples of entropy in digital information files, and anti-entropy protocols (a subtype of “gossip” protocols) use methods to ensure that files are maintained in their desired state. This is an important concept to grasp when designing digital preservation systems that take advantage of multiple copies to ensure long-term preservability, LOCKSS being the most obvious example of this.

gossip_bench by user ricoslounge on Flickr.

gossip_bench by user ricoslounge on Flickr.

Anti-entropy and gossip protocols are the means to ensure the automated management of digital content that can take some of the human overhead out of the picture. Digital preservation systems invoke some form of content monitoring in order to do their job. Humans could do this monitoring, but as digital repositories scale up massively, the idea that humans can effectively monitor the digital information under their control with something approaching comprehensiveness is a fantasy. Thus, we’ve got to be able to invoke anti-entropy and gossip protocols to manage the data.

An excellent introduction to how gossip protocols work can be found in the paper “GEMS: Gossip-Enabled Monitoring Service for Scalable Heterogeneous Distributed Systems.”  The authors note three key parameters to gossip protocols: monitoring, failure detection and consensus.  Not coincidentally, LOCKSS “consists of a large number of independent, low-cost, persistent Web caches that cooperate to detect and repair damage to their content by voting in “opinion polls” (PDF). In other words, gossip and anti-entropy.

I’ve only just encountered these terms, but they’ve been around for a long while.  David Rosenthal, the chief scientist of LOCKSS, has been thinking about digital preservation storage and sustainability for a long time and he has given a number of presentations at the LC storage meetings and the summer digital preservation meetings.

LOCKSS are the most prominent example in the digital preservation community on the exploitation of gossip protocols, but these protocols are widely used in distributed computing. If you really want to dive deep into the technology that underpins some of these systems, start reading about distributed hash tables, consistent hashing, versioning, vector clocks and quorum in addition to anti-entropy-based recovery. Good luck!

One of the more hilarious anti-entropy analogies was recently supplied by the Register, which suggested that a new tool that supports gossip protocols “acts like [a] depressed teenager to assure data reliability” and “constantly interrogates itself to make sure data is ok.”

You learn something new every day.

Web for Libraries: The UX Bandwagon / LibUX

This issue of The Web for Libraries was mailed Wednesday, October 29th, 2014. Want to get the latest from the cutting-edge web made practical for libraries and higher ed every Wednesday? You can subscribe here!

The UX Bandwagon

Is it a bad thing? Throw a stone and you’ll hit a user experience talk at a library conference (or even a whole library conference). There are books, courses, papers, more books, librarians who understand the phrase “critical rendering path,” this newsletter, this podcast, interest groups, and so on.

It is the best fad that could happen for library perception. The core concept behind capital-u Usability is continuous data-driven decision making that invests in the library’s ability to iterate upon itself. Usability testing that stops is usability testing done wrong. What’s more, libraries concerned with UX are thus concerned about measurable outward perception – marketing–which libraries used to suck at–that can neither be haphazard nor half-assed. This bandwagon values experimentation, permits change, and increases the opportunities to create delight.

The Sheridan Libraries   Johns Hopkins University

Latest Podcast: A High-Functioning Research Site with Sean Hannan

Sean Hannan talks about designing a high functioning research site for the John Hopkins Sheridan Libraries and University Museums. It’s a crazy fast API-driven research dashboard mashing up research databases, LibGuides, and a magic, otherworldly carousel actually increasing engagement. Research tools are so incredibly difficult to build well, especially when libraries rely so heavily on third parties, that I’m glad to have taken the opportunity to pick Sean’s brain. You can catch this and every episode on Stitcher, iTunes, or on the Web.

Top 5 Problems with Library Websites – a Review of Recent Usability Studies

Emily Singley looked at 16 library website usability studies over the past two years and broke down the biggest complaints. Can you guess what they are?

“Is the semantic web still a thing?”

Jonathan Rochkind sez: “The entire comment, and, really the entire thread, are worth a read. There seems to be a lot of energy in libraryland behind trying to produce “linked data”, and I think it’s important to pay attention to what’s going on in the larger world here.
Especially because much of the stated motivation for library “linked data” seems to have been: “Because that’s where non-library information management technology is headed, and for once let’s do what everyone else is doing and not create our own library-specific standards.” It turns out that may or may not be the case ….”

How to Run a Content-Planning Workshop

Let’s draw a line. There are libraries that blah-blah “take content seriously” enough in that they pair down the content patrons don’t care about, ensure that hours and suchlike are findable, that their #libweb is ultimately usable. Then there are libraries that dive head-first into content creation. They podcast, make lists, write blogs, etc. For the latter, the library without a content strategy is going to be a mess, and I think these suggestions by James Deer on Smashing Magazine are really helpful.

New findings: For top ecommerce sites, mobile web performance is wildly inconsistent

I’m working on a new talk and maybe even a #bigproject about treating library web services and apps as e-commerce – because, think about it, what a library website does and what a web-store wants you to do isn’t too dissimilar. That said, I think we need to pay a lot of attention to stats that come out of e-commerce. Every year, Radware studies the mobile performance of the top 100 ecommerce sites to see how they measure up to user expectations. Here’s the latest report.

These are a few gems I think particularly important to us:

  • 1 out of 4 people worldwide own a smartphone
  • On mobile, 40% will abandon a page that takes longer than 3 seconds to load
  • Slow pages are the number one issue that mobile users complain about. 38% of smartphone users have screamed at, cursed at, or thrown their phones when pages take too long to load.
  • The median page is 19% larger than it was one year ago

There is also a lot of ink dedicated to sites that serve m-dot versions to mobile users, mostly making the point that this is ultimately dissatisfying and, moreover, tablet users definitely don’t want that m-dot site.

The post Web for Libraries: The UX Bandwagon appeared first on LibUX.

Reaching LITA members: a datapoint / Galen Charlton

I recently circulated a petition to start a new interest group within LITA, to be called the Patron Privacy Technologies IG.  I’ve submitted the formation petition to the LITA Council, and a vote on the petition is scheduled for early November.  I also held an organizational meeting with the co-chairs; I’m really looking forward to what we all can do to help improve how our tools protect patron privacy.

But enough about the IG, let’s talk about the petition! To be specific, let’s talk about when the signatures came in.

I’ve been on Twitter since March of 2009, but a few months ago I made the decision to become much more active there (you see, there was a dearth of cat pictures on Twitter, and I felt it my duty to help do something about it).  My first thought was to tweet the link to a Google Form I created for the petition. I did so at 7:20 a.m. Pacific Time on 15 October:

Since I wanted to gauge whether there was interest beyond just LITA members, I also posted about the petition on the ALA Think Tank Facebook group at 7:50 a.m. on the 15th.

By the following morning, I had 13 responses: 7 from LITA members, and 6 from non-LITA members. An interest group petition requires 10 signatures from LITA members, so at 8:15 on the 16th, I sent another tweet, which got retweeted by LITA:

By early afternoon, that had gotten me one more signature. I was feeling a bit impatient, so at 2:28 p.m. on the 16th, I sent a message to the LITA-L mailing list.

That opened the floodgates: 10 more signatures from LITA members arrived by the end of the day, and 10 more came in on the 17th. All told, a total of 42 responses to the form were submitted between the 15th and the 23rd.

The petition didn’t ask how the responder found it, but if I make the assumption that most respondents filled out the form shortly after they first heard about it, I arrive at my bit of anecdata: over half of the petition responses were inspired by my post to LITA-L, suggesting that the mailing list remains an effective way of getting the attention of many LITA members.

By the way, the petition form is still up for folks to use if they want to be automatically subscribed to the IG’s mailing list when it gets created.

The Society of Motion Picture and Television Engineers (SMPTE) Archival Technology Medal Awarded to Neil Beagrie / DuraSpace News

From William Kilbride, Digital Preservation Coalition

Heslington, York  At a ceremony in Hollywood on October 23, 2014, the Society of Motion Picture and Television Engineers® (SMPTE®) awarded the 2014 SMPTE Archival Technology Medal to Neil Beagrie in recognition of his long-term contributions to the research and implementation of strategies and solutions for digital preservation.

ALA opposes e-book accessibility waiver petition / District Dispatch

Water fountain.ALA and the Association of Research Libraries (ARL) renewed their opposition to a petition filed by the Coalition of E-book Manufacturers seeking a waiver from complying with disability legislation and regulation (specifically Sections 716 and 717 of the Communications Act as Enacted by the Twenty-First Century Communications and Video Accessibility Act of 2010). Amazon, Kobo, and Sony are the members of the coalition, and they argue that they do not have to make their e-readers’ Advanced Communications Services (ACS) accessible to people with print disabilities.

Why? The coalition argues that because basic e-readers (Kindle, Sony Reader, Kobo E-Reader) are primarily used for reading and have only rudimentary ACS, they should be exempt from CVAA accessibility rules. People with disabilities can buy other more expensive e-readers and download apps in order to access content. To ask the Coalition to modify their basic e-readers is a regulatory burden, will raise consumer prices, will ruin the streamlined look of basic e-readers, and inhibit innovation (I suppose for other companies and start-ups that want to make even more advanced inaccessible readers).

The library associations have argued that these basic e-readers do have ACS capability as a co-primary use. In fact, the very companies asking for this waiver market their e-readers as being able to browse the web, for example. The Amazon Webkit that comes with the basic Kindle can “render HyperText Markup Language (HTML) pages, interpret JavaScript code, and apply webpage layout and styles from Cascading Style Sheets (CSS).” The combination of HTML, JavaScript, and CSS demonstrates that this basic e-reader’s browser leaves open a wide array of ACS capability, including mobile versions of Facebook, Gmail, and Twitter, to name a few widely popular services.”

We believe denying the Coalition’s petition will not only increase access to ACS, but also increase access to more e-content for more people. As we note in our FCC comments: “Under the current e-reader ACS regime proposed by the Coalition and tentatively adopted by the Commission, disabled persons must pay a ‘device access tax.’ By availing oneself of one of the ‘accessible options’ as suggested by the Coalition, a disabled person would pay at minimum $20 more a device for a Kindle tablet that is heavier and has less battery life than a basic Kindle e-reader.” Surely it is right that everyone ought to be able to buy and use basic e-readers just like everybody has the right to drink from the same water fountain.

This decision will rest on the narrowly question of whether or not ACS is offered, marketed and used as a co-primary purpose in these basic e-readers. We believe the answer to that question is “yes,” and we will continue our advocacy to support more accessible devices for all readers.

The post ALA opposes e-book accessibility waiver petition appeared first on District Dispatch.

GITenberg: Modern Maintenance Infrastructure for Our Literary Heritage / Eric Hellman

One day back in March, the Project Gutenberg website thought I was a robot and stopped letting me download ebooks. Frustrated, I resolved to put some Project Gutenberg ebooks into GitHub, where I could let other people fix problems in the files. I decided to call this effort "Project Gitenhub". On my second or third book, I found that Seth Woodworth had had the same idea a year earlier, and had already moved about a thousand ebooks into GitHub. That project was named "GITenberg". So I joined his email list and started submitting pull requests for PG ebooks that I was improving.

Recently, we've joined forces to submit a proposal to the Knight Foundation's News Challenge, whose theme is "How might we leverage libraries as a platform to build more knowledgeable communities? ". Here are some excerpts:
Project Gutenberg (PG) offers 45,000 public domain ebooks, yet few libraries use this collection to serve their communities. Text quality varies greatly, metadata is all over the map, and it's difficult for users to contribute improvements. 
We propose to use workflow and software tools developed and proven for open source software development- GitHub- to open up the PG corpus to maintenance and use by libraries and librarians. 
The result- GITenberg- will include MARC records, covers, OPDS feeds and ebook files to facilitate library use. Version-controlled fork and merge workflow, combined with a change triggered back-end build environment will allow scaleable, distributed maintenance of the greatest works of our literary heritage.  
Libraries need metadata records in MARC format, but in addition they need to be able to select from the corpus those works which are most relevant to their communities. They need covers to integrate the records with their catalogs, and they need a level of quality assurance so as not to disappoint patrons. Because this sort of metadata is not readily available, most libraries do not include PG records in their catalogs, resulting in unnecessary disappointment when, for example, a patron want to read Moby Dick from the library on their Kindle. 
43,000 books and their metadata have been moved to the git version control software, this will enable librarians to collaboratively edit and control the metadata. The GITenberg website, mailing list and software repository has been launched at . Software for generating MARC records and OPDS feeds have already been written.
Modern software development teams use version control, continuous integration, and workflow management systems to coordinate their work. When applied to open-source software, these tools allow diverse teams from around the world to collaboratively maintain even the most sprawling projects. Anyone wanting to fix a bug or make a change first forks the software repository, makes the change, and then makes a "pull request". A best practice is to submit the pull request with a test case verifying the bug fix. A developer charged with maintaining the repository can then review the pull request and accept or reject the change. Often, there is discussion asking for clarification. Occasionally versions remain forked and diverge from each other. GitHub has become the most popular sites for this type software repository because of its well developed workflow tools and integration hooks. 
The leaders of this team recognized the possibility to use GitHub for the maintenance of ebooks, and we began the process of migrating the most important corpus of public domain ebooks, Project Gutenberg, onto GitHub, thus the name GITenberg. Project Gutenberg has grown over the years to 50,000 ebooks, audiobooks, and related media, including all the most important public domain works of English language literature. Despite the great value of this collection, few libraries have made good use of this resource to serve their communities. There are a number of reasons why. The quality of the ebooks and the metadata around the ebooks is quite varied. MARC records, which libraries use to feed their catalog systems, are available for only a subset of the PG collection. Cover images and other catalog enrichment assets are not part of PG. 
To make the entire PG corpus available via local libraries, massive collaboration amoung librarians and ebook develeopers is essential. We propose to build integration tools around github that will enable this sort of collaboration to occur. 
  1. Although the PG corpus has been loaded into GITenberg, we need to build a backend that automatically converts the version-controlled source text into well-structured ebooks. We expect to define a flavor of MarkDown or Asciidoc which will enable this automatic, change-triggered building of ebook files (EPUB, MOBI, PDF). (MarkDown is a human-readable plain text format used on GitHub for documentation; MarkDown for ebooks is being developed independently by several team of developers. Asciidoc is a similar format that works nicely for ebooks.) 
  2. Similarly, we will need to build a parallel backend server that will produce MARC and XML formatted records from version-controlled plain-text metadata files.
  3. We will generate covers for the ebooks using a tool recently developed by NYPL and include them in the repository.
  4. We will build a selection tool to help libraries select the records best suited to their libraries.
  5. Using a set of "cleaned up" MARC records from NYPL, and adding custom cataloguing, we will seed the metadata collection with ~1000 high quality metadata records.
  6. We will provide a browsable OPDS feed for use in tablet and smartphone ebook readers.
  7. We expect that the toolchain we develop will be reusable for creation and maintenance of a new generation of freely licensed ebooks.

The rest of the proposal is on the Knight News Challenge website. If you like the idea of GITenberg, you can "applaud" it there. The "applause' is not used in the judging of the proposals, but it makes us feel good. There are lots of other interesting and inspiring proposals to check out and applaud, so go take a look!

Building the newest DPLA student exhibition, “From Colonialism to Tourism: Maps in American Culture” / DPLA

Oregon Territory, 1835. Courtesy of David Rumsey.

Oregon Territory, 1835. Courtesy of David Rumsey.

Two groups of MLIS students from the University of Washington’s Information School took part in a DPLA pilot called the Digital Curation Program during the 2013-2014 academic year. The DPLA’s Amy Rudersdorf worked with iSchool faculty member Helene Williams as we created exhibits for the DPLA for the culminating project, or Capstone, in our degree program. The result is the newest addition to DPLA’s exhibitions, called “From Colonialism to Tourism: Maps in American Culture.”

My group included Kili Bergau, Jessica Blanchard, and Emily Felt; we began by choosing a common interest from the list of available topics, and became “Team Cartography.” This project taught us about online exhibit creation and curation of digital objects, copyright and licensing, and took place over two quarters. The first quarter was devoted to creating a project plan and learning about the subject matter. We asked questions including: What is Cartography? What is the history of American maps? How are they represented within the DPLA collections?

Girl & road maps, Southern California, 1932. Courtesy of the University of Southern California Libraries.

Girl & road maps, Southern California, 1932. Courtesy of the University of Southern California Libraries.

As we explored the topic, the project became less about librarianship and more about our life as historians. Cartography, or the creation of maps, slowly transformed into the cultural “maps in history” as we worked through the DPLA’s immense body of aggregated images. While segmenting history and reading articles to learn about the pioneers, the Oregon Trail, the Civil War, and the 20th Century, we also learned about the innards of the DPLA’s curation process. We learned how to use Omeka, the platform for creating the exhibitions, and completed forms for acquiring usage rights the images we would use in our exhibit.

One of the greatest benefits of working with the team was the opportunity to investigate niche areas among the broad topics, as well as leverage each other’s interests to create one big fascinating project. With limited time, we soon had to focus on selecting images and writing the exhibit narrative. We wrote, and revised, and wrote again. We waded through hundreds of images to determine which were the most appropriate, and then gathered appropriate metadata to meet the project requirements.

Our deadline for the exhibit submission was the end of the quarter, and our group was ecstatic to hear the night of the Capstone showcase at the UW iSchool event that the DPLA had chosen our exhibit for publication. Overjoyed, we celebrated remotely, together. Two of us had been in Seattle, one in Maine, and I had been off in a Dengue Fever haze in rural Cambodia (I’m better now).

The Negro Travelers' Green Book [Cover], 1956. Courtesy of the University of South Carolina, South Caroliniana Library via the South Carolina Digital Library.

The Negro Travelers’ Green Book [Cover], 1956. Courtesy of the University of South Carolina, South Caroliniana Library via the South Carolina Digital Library.

Shortly after graduation in early June, Helene asked if I was interested in contributing further to this project: over the summer, I worked with DPLA staff to refine the exhibit and prepare it for public release. Through rigorous editing, some spinning of various themes in new directions, and a wild series of conversations over Google Hangouts about maps, maps, barbecue, maps, libraries, maps, television, movies, and more maps, the three of us had taken the exhibition to its final state.

Most experiences in higher education, be they on the undergrad or graduate levels (sans PhD), fail to capture a sense of endurance and longevity. The exhibition was powerful and successful throughout the process from many different angles. For me, watching its transformation from concept to public release has been marvelous, and has prepared me for what I hope are ambitious library projects in my future.

View this exhibition

A huge thanks to Amy Rudersdorf for coordinating the program, Franky Abbott for her work editing and refining the exhibition, Kenny Whitebloom for Omeka wrangling, and the many Hubs and their partners for sharing their resources. 

cc-by-iconAll written content on this blog is made available under a Creative Commons Attribution 4.0 International License. All images found on this blog are available under the specific license(s) attributed to them, unless otherwise noted.

Open Access in Ireland: A case-study / Open Knowledge Foundation

Following last week’s Open Access Week blog series, we continue our celebration of community efforts in this field. Today we give the microphone to Dr. Salua Nassabay from Open Knowledge Ireland in a great account from Ireland, originally posted on the Open Knowledge Ireland blog.

In Ireland, awareness of OA has increased within the research community nationally, particularly since institutional repositories have been built in each Irish university. Advocacy programmes and funder mandates (IRCSET, SFI, HEA) have had a positive effect; but there is still some way to go before the majority of Irish researchers will automatically deposit their papers in their local OA repository.

Brief Story

In summer 2004, the Irish Research eLibrary (IReL) was launched, giving online access to a wide range of key research journals. The National Principles on Open Access Policy Statement were launched on Oct 23rd 2012 at the Digital Repository of Ireland Conference by Sean Sherlock, Minister of State, Department of Enterprise, Jobs & Innovation and Department of Education & Skills with responsibility for Research & Innovation. The policy consists of a ‘Green way’ mandate and encouragement to publish in ’Gold’ OA journals. It aligns with the European policy for Horizon 2020. OA on national level is managed by the National Steering Committee on OA Policy, see table 3.

A Committee of Irish research organisations is working in partnership to coordinate activities and to combine expertise at a national level to promote unrestricted, online access to outputs which result from research that is wholly or partially funded by the State:

National Principles on Open Access Policy Statement

Definition of OA

Reaffirm: freedom of researchers; increase visibility and access; support international interoperability, link to teaching and learning, and open innovation.

Defining Research Outputs:

include peer-reviewed publications, research data and other research artefacts which
feed the research process”.

General Principle (1): all researchers to have deposit rights for an AO repository.

Deposit: post-print/publisher version and metadata; peer-reviewed journal articles and
conference publication. Others where possible; at time of acceptance for publication; in
compliance with national metadata standards.

General Principle (2):Release: immediate for meta-data; respect publisher copyright, licensing and embargo (not
normally exceeding 6months/12months).

Green route policy – not exclusive

Suitable repositories

Research data linked to publications.

High-level principles:

Infrastructure and sustainability: depositing once, harvesting, interoperability and long-term preservation.

Advocacy and coordination: mechanisms for and monitoring of implementation, awareness raising and engagement for ALL.

Exploiting OA and implementation: preparing metadata and national value-added metrics.

Table 1. National Principles on Open Access Policy Statement. and

There are seven universities in Ireland These Irish universities received government funding to build institutional repositories in each Irish university and to develop a federated harvesting and discovery service via a national portal. It is intended that this collaboration will be expanded to embrace all Irish research institutions in the future. OA repositories are currently available in all Irish universities and in a number of other higher education institutions and government agencies:

Higher Education

Government Agency

Institutional repositories

Subject repository

Dublin Business School; Dublin City University; Dublin Institute of Technology; Dundalk Institte of Technology; Mary Immaculate College; National University of Ireland Galway; National University of Ireland, Maynooth; Royal College of Surgeons in Ireland; Trinity College Dublin; University College Cork; University College Dublin, University of Limerick; Waterford Intitute of Technology

Irish Virtual Research Library & Archive, UCD

Health Service Executive Lenus; All-Ireland electronic Health Library (AieHL); Marine Institute; Teagasc

Table 2. Currently available repositories in Ireland

AO Ireland’s statistics show more than 58,859 OA publications in 13 repositories, distributed as can be seen in the figures 1 and 2.

oa_figure1Figure 1. Publications in repositories.From (date: 16/9/2014).

Some samples of Irish OA journals are:

- Crossings: Electronic Journal of Art and Technology:;

-Economic and Social Review:;

-Journal of the Society for Musicology in Ireland:;

-Journal of the Statistical and Social Inquiry Society of Ireland:;

-Minerva: an Internet Journal of Philosophy:;

-The Surgeon: Journal of the Royal Colleges of Surgeons of Edinburgh and Ireland:;

-Irish Journal of Psychological Medicine:

oa_figure2Figure 2. Publications by document type. From (date: 16/9/2014).

Institutional OA policies:



OA mandatory

OA Infrastructure

Health Research Board (HRB) - Funders





Science Foundation Ireland (SFI) – Funders





Higher Education Authority (HEA) – Funders





Department of Agriculture, Food and Marine (DAFM) – Funders



Yes effective 2013


Environmental Protection Agency (EPA) – Funders






Marine Institute (MI) – Funders






Irish Research Council (IRC) – Funders





Teagasc – Funders






Institute of Public Health in Ireland (IPH) – Funders





Irish Universities Association (IUA) – Researchers

Representative body for Ireland’s seven universities:

Yes effective 2010


Health Service Executive (HSE) – Researchers




Yes effective 2013


Institutes of Technology Ireland (IOTI) – Researchers




Dublin Institute of Technology (DIT) – Researchers






Royal College of Surgeons in Ireland (RCSI) – Researchers






Consortium of National and University Libraries (CONUL) – Library and Repository





IUA Librarians’ Group (IUALG) - Library and Repository





Digital Repository of Ireland (DRI) - Library and Repository

Webside and Repository:

DRI Position Statement on Open Access for Data:


effective 2014


EdepositIreland - Library and Repository






*IRC: Some exceptions like books. See policy.

*Teagasc: Material in the repository is licensed under the Creative Commons Attribution-NonCommercial Share-Alike License

*DIT: Material that is to be commercialised, or which can be regarded as confidential, or the publication of which would infringe a legal commitment of the Institute and/or the author, is exempt from inclusion in the repository.

*RCSI: Material in the repository is licensed under the Creative Commons Attribution-NonCommercial Share-Alike License

Table 3. Institutional OA Policies in Ireland

Funder OA policies:

Major research funders in Ireland

Department of Agriculture, Fisheries and Food:

IRCHSS (Irish Research Council for Humanities and Social Sciences): No Open Access policies as yet.

Enterprise Ireland: No Open Access policies as yet.

IRCSET (Irish Research Council for Science, Engineering and Technology): OA Mandate from May 1st 2008:

HEA (Higher Education Authority): OA Mandate from June 30th 2009:

Marine Institute: No Open Access policies as yet

HRB (Health Research Board): OA Recommendations, Policy:

SFI (Science Foundation Ireland): OA Mandate from February 1st 2009:

Table 4. Open Access funders in Ireland.

oa_figure3Figure 3. Public sources of funds for Open Access. From (date: 16/9/2014),

Infrastructural support for OA:

Open Access organisations and groups

Open Access projects and initiatives. The Open Access to Irish Research Project. Associated National Initiatives

RIAN Steering Group. IUA (Irish Universities Association) Librarian’s Group (Coordinating body). RIAN is the outcome of a project to build online open access to institutional repositories in all seven Irish universities and to harvest their content to the national portal.

NDLR (National Digital Learning Repository):

National Steering Group on Open Access Policy. See Table 3

RISE Group (Research Information Systems Exchange)

Irish Open Access Repositories Support Project Working Group. ReSupIE:

Repository Network Ireland is a newly formed group of Repository managers, librarians and information:

Digital Repository Ireland DRI is a trusted national repository for Ireland’s humanities and social sciences data @dri_ireland

Table 5. Open Access infrastructural support.

Challenges and ongoing developments

Ireland already has considerable expertise in developing Open Access to publicly funded research, aligned with international policies and initiatives, and is now seeking to strengthen its approach to support international developments on Open Access led by the European Commission, Science Europe and other international agencies.

The greatest challenge is the increasing pressure faced by publishers in a fast-changing environment.


The launch of Ireland’s national Open Access policy has put Ireland ahead of many European partners. Irish research organisations are particularly successful in the following areas of research: Information and Communication Technologies, Health and Food, Agriculture, and Biotechnology.


- Repository Network Ireland /

-Open Access Scholarly Publishers /

- OpenDoar – Directory of Repositories /

- OpenAire – Open Access Infrastructure for research in Europe /

- Repositories Support Ireland /

-UCD Library News /

- Trinity’s Open Access News /

- RIAN /

Contact person: Dr. Salua Nassabay; twitter: @OKFirl


2014 DPOE Training Needs Assessment Survey / Library of Congress: The Signal

The following is a guest post by Barrie Howard, IT Project Manager at the Library of Congress.

Last month the Digital Preservation Outreach and Education (DPOE) Program wrapped up the “2014 DPOE Training Needs Assessment Survey” in an effort to get a sense of the state of digital preservation practice and understand more about what capacity exists for organizations and professionals to effectively preserve digital content. This survey is a follow up to a similar survey that was conducted in 2010, and mentioned in a previous blog post.

The 17-question survey was open for seven weeks to relevant organizations and received 436 responses, which is excellent considering summer vacation schedules and survey fatigue. The questions addressed issues like primary function (library, archive, museum, etc.), staff size and responsibilities, collection items, preferred training content and delivery options and financial support for professional development and training.

Response rates from libraries, archives, museums, and historical societies were similar in 2010 and 2014, with a notable increase this year in participation from state governments. There was good geographic coverage, including responses from organizations in 48 states, DC and Puerto Rico (see below), and none of the survey questions were skipped by any of the respondents.

Figure 1. Geographic coverage of survey respondents.

Figure 1. Geographic coverage of survey respondents.

The most significant takeaways are: 1) an overwhelming expression of concern that respondents ensure their digital content is accessible for 10 or more years (84%), and; 2) evidence of a strong commitment to support employee training opportunities (83%). Other important discoveries reveal changes in staff size and configuration over the last four years. There was a marked 6% decrease in staff size at smaller organizations ranging from 1-50 employees, and a slight 2% drop in staff size at large organizations with over 500 employees. In comparison, medium-size organizations reported a 4% uptick in the staff range of 51-200 and 3% for the 201-500 tier. There was a substantial 13% increase across all organizations in paid full-time or part-time professional staff with practitioner experience, and a 5% drop in organizations reporting no staff at all. These findings suggest positive trends across the digital preservation community, which bode well for the long-term preservation of our collective cultural heritage.

One survey question tackled the issue of what type of digital content is held by each institution. While reformatted material digitized from collections already held has the highest frequency across all respondents (83%), born-digital content created by and for your organization trails close behind (76.4%). Forty-five percent of all respondents reported that their institution had deposited digital materials managed for other individuals or institutions. These results reflect prevailing trends, and it will be interesting to see how things change between now and the next survey.

Figure 2. Types of digital content held by each responding organizations (percentages are a portion of the 436 respondents, and each respondent was allowed to choose multiple types)

Figure 2. Types of digital content held by each responding organizations (percentages are a portion of the 436 respondents, and each respondent was allowed to choose multiple types).

The main purpose of the survey was to collect data about the training needs of these organizations, and half a dozen questions were devoted to this task. Interestingly, while online training is trending across many sectors to meet the constraints of reduced travel budgets, the 2014 survey results find that respondents still value intimate, in-person workshops. In-person training often comes at a higher price than online, and the survey attempted to find out how much money an employee would receive annually for training. Not surprisingly, the majority (25%) of respondents didn’t know, and equally as important, another 24% reported a modest budget range of $0-$250.

When given the opportunity to be away from their place of employment, respondents preferred half or full-day training sessions over 2-3 days or week-long intensives. They showed a willingness to travel off-site up to a 100-mile radius of their places of work. There was a bias towards training on applicable skills, rather than introductory material on basic concepts, and respondents identified training investments that result in an increased capacity to work with digital objects and metadata management as the most beneficial outcome for their organization.

DPOE currently offers an in-person, train-the-trainer workshop, and is exploring options for extending the workshop curriculum to include online delivery options for the training modules. These advancements will address some of the issues raised in the survey, and may include regularly scheduled webinars, on-demand videos and pre- and post-workshop videos. The 2014 survey results will be released in a forthcoming report, which will be made available in November, so keep a watchful eye on the DPOE website and The Signal for the report and subsequent DPOE training materials as they become available.

Psoriatic arthritis awareness / Coral Sheldon-Hess

psoriasis isn't contagiousOctober 29 is World Psoriasis Day. I’ve already missed World Arthritis Day (Facebook link), which was October 12th. (I was too busy to write, then, anyway.) I’m going to bullet point out the conclusions I want you to draw from this post, before I get to the post itself. Consider this a TL;DR:

  • Not all disabilities are visible. Many people are fighting battles that you can’t perceive.
  • Some people literally have fewer hours in their day than you do, either because they need more hours of sleep per night, or because their body requires extra daily maintenance to work; some people have the same thing, figuratively, because their energy is sapped by pain, by their immune system, or by dealing with constant microaggressions. Read this fantastic post, so that you have a good mental model for understanding what this is like.
  • Don’t judge people for their clothing choices or their shoe choices or whether they take an elevator or don’t want to go on a long walk or… just don’t judge people.
  • Please go ahead and assume that someone living with a chronic illness knows how to treat it, or is seeing a professional who knows how to treat it. Advice is welcome only if it’s requested.

Facts and figures about psoriasis, arthritis, and psoriatic arthritis

Psoriatic arthritis is an autoimmune condition, which means it is one in a class of diseases in which one’s own immune system attacks their body tissues; in the case of psoriasis, the immune system causes visible problems with one’s skin. (Different kinds of psoriasis cause different effects. I won’t go into detail, except to remind you that none of them are contagious. There’s a lot of stigma around skin conditions, and people with visible psoriasis are far too often mistreated for something that isn’t their fault.) Psoriasis is the most common autoimmune condition in the US, affecting 2.2 percent of the population. Out of that 2.2 percent, the National Psoriasis Foundation estimates that between 10 and 30 percent of people develop psoriatic arthritis(1). This means that, in addition to skin issues (and other symptoms, which I’ll get to in a moment), the immune system also attacks joints and causes inflammation (e.g. pain, swelling).

A lot of what I’m going to say about psoriatic arthritis is true of other kinds of arthritis, so I’m going to share this statistic, too: according to the CDC, arthritis affects at least 22.7% of adults in the US and limits the activities of at least 9.8% of adults(1). It’s not true that all of these people are of advanced age, either: many kinds of arthritis can strike as early as one’s twenties. I developed psoriatic arthritis a little on the early side, but within the normal range: in my early 30s.

Osteoarthritis and rheumatoid arthritis are much more common than psoriatic arthritis, and probably as a result, treatments for PsA tend to come out later than treatments for other types of arthritis.

Because psoriatic arthritis is an autoimmune disease, as you’d imagine, the medications people take for it are designed to decrease (or, ideally, just redirect) immune response, preventing the immune system from attacking healthy cells. I’m not going to lie: all of these drugs are pretty scary. Methotrexate pills are probably the most common treatment, or at least the one doctors seem to try first; methotrexate is also used in chemotherapy, albeit in much larger doses. Even for low doses, it is sufficiently toxic that blood tests are required quarterly, to ensure no liver damage has occurred. Women taking methotrexate need to stop six months before trying to conceive a child, and must stay off of it throughout birth and breastfeeding; for people with sufficiently bad arthritis (like me), obviously, that is not workable. Drinking is contraindicated, although my doctor said I could have up to four drinks per week, provided I wait at least 36 hours after I take my dose. Its side effects vary by individual; I experience mild nausea—usually, so mild that a single ginger ale takes care of it—and low energy/decreased concentration on the day after I take it. And I have reduced immunity to disease, though I am not formally considered immunodeficient.

There are a number of other treatments, including injected methotrexate, alone or combined with another drug, and TNF inhibitors. They’re all scary.

Some people ask why people with psoriatic arthritis would take such awful drugs—which, by the way, isn’t a nice question; I guarantee they’ve done their research and thought hard about it. But this post is here to educate, so: if you wonder why one might consider it worthwhile, 1) probably you don’t experience constant pain, which is awesome for you! 2) With a warning that it isn’t pretty, I invite you to click on this Google image search link, to see what psoriatic arthritis looks like when it goes untreated. 3) Speaking only for myself, I was promised that, with the right treatment, I could go back to full functioning of my affected joints and be pain-free. While my treatment has helped, it is not sufficient to achieve that goal.

What it’s like to live with this disease: one person’s story

In the interest of personalizing this disease, so you can understand how one person experiences it and (I hope) act with empathy toward other people who might be experiencing it, or something similar, I’m going to share as much about this as I am comfortable sharing. … More, actually, because there’s a risk to saying any of this. Our biases against people with disabilities run strong and deep, and I’ve met good people who would judge me not worth investing in, on the basis of the next few paragraphs; at one time in my life, before all this, I might have. (I’d like to think not, but I don’t know.) Saying all of this may have a direct impact on my career, both short and long term. That said, I have a proven track record as a driven and successful individual in two different fields (going on three), and I am proud of what my CV/resume says about me. So… I’m prepared to take my chances.

My case of psoriasis is very small and went unnoticed, or at least untested, until after the arthritis started. It’s on my scalp. When I take my medicine, manage my stress, and get enough rest, it fades almost entirely away. It’s not painful and is almost never itchy. Its only effect on my life is that I occasionally appear to have dandruff, and I’m afraid to get my hair cut; I’ve had my spouse cut my hair at home, since it started. I don’t want to have the conversation with a hairdresser about it.

The arthritis, in contrast, affects my life daily. It is most apparent in my right hand and my left foot, and I’m not sure which is worse.

My thumb and wrist both have decreased range of motion and pain, and my wrist mostly can’t bear weight. (I can’t ever pick up a gallon of milk with my right hand alone; on a bad day, I can’t even pick up a full 4-cup coffee pot or a heavy plate. Yes, that is my dominant hand.) My middle finger and pinky take turns acting up, too. But it could be worse: I can type just fine, and I can mouse right-handed, though I am more comfortable using my left. I can’t use a trackpad with my right hand for long, so I avoid working on a laptop without a separate mouse and keyboard. Hand writing is a little unpleasant, but doable. I seem to be able to play guitar OK, and I can crochet and do beading for a little while at a time. I suspect cross-stitch would be too painful, but I haven’t tried it. Reading paper books hurts my hand (I love my Kindle), and I’m not supposed to use my iPhone with my right thumb.

As for the foot, the ball of of the foot hurts. Rounded-bottom shoes and custom orthotics help, but standing still for very long, or walking very far, still hurts. (This is a sad thing. Walking used to be how I dealt with stress, and it’s the number one suggestion of pretty much every arthritis-related organization.) Also, my fourth toe (what would be the ring toe, if toes were fingers) has changed shape; if you’ve heard of “hammer toe,” that’s kind of what’s going on. Until I get surgery to fix it—which is pointless until I’m on medication that will definitively prevent it from getting worse—any shoe with a restricted toe box is absolute torture to wear. Only last week did I find another pair of shoes that is almost as low-pain as Sketchers Shape-Ups (which, yes, are ugly; I hate the look people give me for wearing them) or the Teva sandals I found in Hawaii. Zappos probably hates me.

On really bad days, the arthritis affects my upper thighs and hips. Even on good days, my flexibility is decreased. But if you’re hanging out with me, and you see that I am having trouble getting up from a chair without using the arm rests or a table, it’s a bad day. (Then again, if I don’t try to disguise it, it’s a VERY bad day.) Before methotrexate, my hips hurt every day; I had to literally pull myself up the banister, to walk up stairs, and I was almost incapable of standing up from a chair without support (as in, I could do it, but it was excruciating — and worse the longer I sat before trying to stand). Now, I can walk up stairs without support on a good day and with only minimal support most days, but I still avoid them: I feel uncomfortable twinges in my hips, when I push it. (I avoid walking down stairs because of the foot, too. People glare at me for using elevators, probably in part because I’m also not a thin woman. I hate that.)

The hips and the wrist combined mean that I also can’t push myself to standing position from the floor without a chair or table for support. You’ll never see me use a bean bag or sit on the ground in public, because, dignity.

Finally, it’s not usually a big deal, but some of my joints are extremely painful if squeezed, though they don’t bother me any other time. (Poor rheumatologists. I grow to dislike them over time, because that’s an important piece of diagnostic information.)

None of that sounds all that bad, from an employer’s perspective—1-2 doctor’s appointments per quarter (more if I see a hand therapist and podiatrist, which so far I have not done in VA), and I’m less fun at happy hours. But here’s the kicker: like many others with autoimmune diseases, I need more sleep than most people (I’ve found that 9 hours is a hard minimum; 10 is better), and even when I get enough sleep, I have some days of fairly severe exhaustion. I do better than the person who wrote the spoons article, but not as well as someone without an autoimmune disease. If I go too long with insufficient sleep (more than a day or two), my arthritis gets noticeably more painful, my hips and knees begin to hurt, and my mind gets a little cloudy, so that I can’t focus and have trouble remembering things.

I’m lucky, because it doesn’t happen to me often, at least not when I’m taking care of myself outside of work. But that’s a whole other can of worms, right? I’m supposed to treat my hand and foot with heat or ultrasound; I’m supposed to get exercise that ideally doesn’t hurt my foot; I’m supposed to eat healthy food with a low inflammatory index (which, for me, appears to preclude gluten and at least some nightshades); I’m supposed to massage sore spots with a therapy ball; and I’m supposed to do things that I find calming, each day, since stress also increases inflammatory response and makes chronic conditions like mine worse. Somehow I’m supposed to reserve enough spoons—and enough time—to accomplish all of that, while working and getting enough sleep and meeting whatever other commitments I’ve made for myself. You can guess how well that goes.

Just a last couple of thoughts

bracelet - actually for juvenile arthritisI’ve tried to put myself in the shoes of people who might read this, so I can answer your questions, fill in any gaps in my behavior you might have noticed; if I missed anything, today is your one-time free pass to ask, because I’ve set aside some of those metaphorical spoons for explaining. (You can ask on another day, but I reserve the right to ignore the question, if I’m not up to it right then.) (Also, this is the third time I’m linking the spoons article, for anyone who didn’t click on it the first two times. Please read it. It’s very helpful.)

I’m afraid that those closest to (or employing) me might be angry that I kept this from you, or only told you part of the story; I’m sympathetic to that. My one defense is that it’s kind of a lot to get across (as I cross the 2200 word mark), and people’s responses, in the past, have varied from thinly veiled disbelief to making me feel super awkward by constantly reminding me of how different I am. Some people have pushed alternative therapies on me, um, fairly persistently. (I don’t mean the copper bracelets, Mom; science says they’re useless, but they’re pretty. :)) Telling people about how much my daily life doesn’t look like theirs is super awkward. And I guess I have one other defense, too: there’s a part of myself says that I shouldn’t have to disclose my disability in order to work somewhere that strives to be inclusive, as my employer does, or to spend time with friends. I work in tech; what field is there that could possibly have more space for someone with physical limitations? (I know that this is theory, not reality, and I hate that.) And most of my friends are pretty nerdy—with hobbies that don’t require great dexterity or strength; quite a few are pretty introverted—so it shouldn’t matter if I have to say no to going out (every single Friday, lately). But in little, subtle ways, it does end up mattering, both at work and in life.

So general awareness-raising wasn’t my only goal with this post; I am also deciding to stop trying to hide my disability. Not disclosing has made my life a little harder, because we don’t live in a society that makes room for disabled bodies. And I guess I’m tired of trying to hide something that is a big part of my life—and tired of fighting the tiny battles I have to fight, to keep it hidden.

So now I have this giant blog post that I can point people to, and maybe that’ll make the disclosure process easier for me. Or maybe it won’t, but I am hoping it at least makes some people stop judging one another for stupid stuff like shoes and elevators.

Analyzing EZProxy Logs / ACRL TechConnect

Analyzing EZProxy logs may not be the most glamorous task in the world, but it can be illuminating. Depending on your EZProxy configuration, log analysis can allow you to see the top databases your users are visiting, the busiest days of the week, the number of connections to your resources occurring on or off-campus, what kinds of users (e.g., staff or faculty) are accessing proxied resources, and more.

What’s an EZProxy Log?

EZProxy logs are not significantly different from regular server logs.  Server logs are generally just plain text files that record activity that happens on the server.  Logs that are frequently analyzed to provide insight into how the server is doing include error logs (which can be used to help diagnose problems the server is having) and access logs (which can be used to identify usage activity).

EZProxy logs are a kind of modified access log, which record activities (page loads, http requests, etc.) your users undertake while connected in an EZProxy session. This article will briefly outline five potential methods for analyzing EZProxy logs:  AWStats, Piwik, EZPaarse, a custom Python script for parsing starting-point URLS (SPU) logs, and a paid option called Splunk.

The ability of  any log analyzer will of course depend upon how your EZProxy log directives are configured.  You will need to know your LogFormat and/or LogSPU directives in order to configure most log file analyzing solutions.  In EZProxy, you can see how your logs are formatted in config.txt/ezproxy.cfg by looking for the LogFormat directive, 1  e.g.,

LogFormat %h %l %u %t “%r” %s %b “%{user-agent}i”

and / or, to log Starting Point URLs (SPUs):

LogSPU -strftime log/spu/spu%Y%m.log %h %l %u %t “%r” %s %b “%{ezproxy-groups}i”

Logging Starting Point URLs can be useful because those tend to be users either clicking into a database or the full-text of an article, but no activity after that initial contact is logged.  This type of logging does not log extraneous resource loading, such as loading scripts and images – which often clutter up your traditional LogFormat logs and lead to misleadingly high hits.  LogSPU directives can be defined in addition to traditional LogFormat to provide two different possible views of your users’ data.  SPULogs can be easier to analyze and give more interesting data, because they can give a clearer picture of which links and databases are most popular  among your EZProxy users.  If you haven’t already set it up, SPULogs can be a very useful way to observe general usage trends by database.

You can find some very brief anonymized EZProxy log sample files on Gist:

On a typical EZProxy installation, historical monthly logs can be found inside the ezproxy/log directory.  By default they will rotate out every 12 months, so you may only find the past year of data stored on your server.


Get It:

Best Used With:  Full Logs or SPU Logs

Code / Framework:  Perl

    An example AWStats monthly history report. Can you tell when our summer break begins?

An example AWStats monthly history report. Can you tell when our summer break begins?

AWStats Pros:

  • Easy installation, including on localhost
  • You can define your unique LogFormat easily in AWStats’ .conf file.
  • Friendly, albeit a little bit dated looking, charts show overall usage trends.
  • Extensive (but sometimes tricky) customization options can be used to more accurately represent sometimes unusual EZProxy log data.
Hourly traffic distribution in AWStats.  While our traffic peaks during normal working hours, we have steady usage going on until about 1 AM, after which point it crashes pretty hard.  We could use this data to determine  how much virtual reference staffing we should have available during these hours.

Hourly traffic distribution in AWStats. While our traffic peaks during normal working hours, we have steady usage going on until about Midnight, after which point it crashes pretty hard. We could use this data to determine how much virtual reference staffing we should have available during these hours.


AWStats Cons:

  • If you make a change to .conf files after you’ve ingested logs, the changes do not take effect on already ingested data.  You’ll have to re-ingest your logs.
  • Charts and graphs are not particularly (at least easily) customizable, and are not very modern-looking.
  • Charts are static and not interactive; you cannot easily cross-section the data to make custom charts.


Get It:

Best Used With:  SPULogs, or embedded on web pages web traffic analytic tool

Code / Framework:  Python

piwik visitor dashboard

The Piwik visitor dashboard showing visits over time. Each point on the graph is interactive. The report shown actually is only displaying stats for a single day. The graphs are friendly and modern-looking, but can be slow to load.

Piwik Pros:

  • The charts and graphs generated by Piwik are much more attractive and interactive than those produced by AWStats, with report customizations very similar to what’s available in Google Analytics.
  • If you are comfortable with Python, you can do additional customizations to get more details out of your logs.
Piwik file ingestion in PowerShell

To ingest a single monthly log took several hours. On the plus side, with this running on one of Lauren’s monitors, anytime someone walked into her office they thought she was doing something *really* technical.

Piwik Cons:

  • By default, parsing of large log files seems to be pretty slow, but performance may depend on your environment, the size of your log files and how often you rotate your logs.
  • In order to fully take advantage of the library-specific information your logs might contain and your LogFormat setup, you might have to do some pretty significant modification of Piwik’s script.
When looking at popular pages in Piwik you’re somewhat at the mercy that the subdirectories of databases have meaningful labels; luckily EBSCO does, as shown here.  We have a lot of users looking at EBSCO Ebooks, apparently.

When looking at popular pages in Piwik you’re somewhat at the mercy that the subdirectories of database URLs have meaningful labels; luckily EBSCO does, as shown here. We have a lot of users looking at EBSCO Ebooks, apparently.


Get It

Best Used With:  Full Logs or SPULogs

Code / Framework:  Node.js

ezPaarse’s friendly drag and drop interface.  You can also copy/paste lines for your logs to try out the functionality by creating an account at

ezPaarse’s friendly drag and drop interface. You can also copy/paste lines for your logs to try out the functionality by creating an account at

EZPaarse Pros:

  • Has a lot of potential to be used to analyze existing log data to better understand e-resource usage.
  • Drag-and-drop interface, as well as copy/paste log analysis
  • No command-line needed
  • Its goal is to be able to associate meaningful metadata (domains, ISSNs) to provide better electronic resource usage statistics.
ezPaarse Excel output generated from a sample log file, showing type of resource (article, book, etc.) ISSN, publisher, domain, filesize, and more.

ezPaarse Excel output generated from a sample log file, showing type of resource (article, book, etc.) ISSN, publisher, domain, filesize, and more.

EZPaarse Cons:

  • This isn’t really a con per se, but it is under development.  In Lauren’s testing, we couldn’t get of the logs to ingest correctly (perhaps due to a somewhat non-standard EZProxy logformat) but the samples files provided worked well.  As development continues  it can be expected to become more flexible with different kinds of log formats supported.
  • It’s tricky to customize the log formatting correctly, and in Lauren’s testing, if bibliographic information cannot be found for your electronic resources, the data is returned a little strangely.
  • Output is in Excel Sheets rather than a dashboard-style format.

Write Your Own with Python

Get Started With:

Best used with: SPU logs

Code / Framework:  Python


Screenshot of a Python script, available at Robin Davis’ Github


Custom Script Pros:

  • You will have total control over what data you care about. DIY analyzers are usually written up because you’re looking to answer a specific question, such as “How many connections come from within the Library?”
  • You will become very familiar with the data! As librarians in an age of user tracking, we need to have a very good grasp of the kinds of data that our various services collect from our patrons, like IP addresses.
  • If your script is fairly simple, it should run quickly. Robin’s script took 5 minutes to analyze almost 6 years of SPU logs.
  • Your output will probably be a CSV, a flexible and useful data format, but could be any format your heart desires. You could even integrate Python libraries like Plotly to generate beautiful charts in addition to tabular data.
  • If you use Python for other things in your day-to-day, analyzing structured data is a fun challenge. And you can impress your colleagues with your Pythonic abilities!


Action shot: running the script from the command line. (Source)

Action shot: running the script from the command line.

Custom Script Cons:

  • If you have not used Python to input/output files or analyze tables before, this could be challenging.
  • The easiest way to run the script is within an IDE or from the command line; if this is the case, it will likely only be used by you.
  • You will need to spend time ascertaining what’s what in the logs.
  • If you choose to output data in a CSV file, you’ll need more elbow grease to turn the data into a beautiful collection of charts and graphs.

Output of the sample script is a labeled CSV that divides connections by locations and user type (student or faculty). (Source)

Splunk (Paid Option)

Best Used with:  Full Logs and SPU Logs

Get It (as a free trial):

Code / Framework:  Various, including Python

A Splunk distribution showing traffic by days of the week.  You can choose to visualize this data in several formats, such as a bar chart or scatter plot.  Notice that this chart was generated by a syntactical query in the upper left corner:  host=lmagnuson| top limit=20 date_wday

A Splunk distribution showing traffic by days of the week. You can choose to visualize this data in several formats, such as a bar chart or scatter plot. Notice that this chart was generated by a syntactical query in the upper left corner: host=lmagnuson| top limit=20 date_wday

Splunk Pros:  

  • Easy to use interface, no scripting/command line required (although command line interfacing (CLI) is available)
  • Incredibly fast processing.  As soon as you import a file, splunk begins ingesting the file and indexing it for searching
  • It’s really strong in interactive searching.  Rather than relying on canned reports, you can dynamically and quickly search by keywords or structured queries to generate data and visualizations on the fly.
Here's a search for log entries containing a URL (, which Splunk uses to create a chart showing the hours of the day that this URL is being accessed.  This particular database is most popular around 4 PM.

Here’s a search for log entries containing a URL (, which Splunk uses to display a chart showing the hours of the day that this URL is being accessed. This particular database is most popular around 4 PM.

Splunk Cons:

    • It has a little bit of a learning curve, but it’s worth it for the kind of features and intelligence you can get from Splunk.
    • It’s the only paid option on this list.  You can try it out for 60 days with up to 500MB/day a day, and certain non-profits can apply to continue using Splunk under the 500MB/day limit.  Splunk can be used with any server access or error log, so a library might consider partnering with other departments on campus to purchase a license.2

What should you choose?

It depends on your needs, but AWStats is always a tried and true easy to install and maintain solution.  If you have the knowledge, a custom Python script is definitely better, but obviously takes time to test and develop.  If you have money and could partner with others on your campus (or just need a one-time report generated through a free trial), Splunk is very powerful, generates some slick-looking charts, and is definitely work looking into.  If there are other options not covered here, please let us know in the comments!

About our guest author: Robin Camille Davis is the Emerging Technologies & Distance Services Librarian at John Jay College of Criminal Justice (CUNY) in New York City. She received her MLIS from the University of Illinois Urbana-Champaign in 2012 with a focus in data curation. She is currently pursuing an MA in Computational Linguistics from the CUNY Graduate Center.

  1. Details about LogFormat and what each %/lettter value means can be found at; LogSPU details can be found
  2. Another paid option that offers a free trial, and comes with extensions made for parsing EZProxy logs, is Sawmill:

PARTICIPATE: DuraSpace Projects Launch Leadership Group Elections / DuraSpace News

Winchester, MA  DuraSpace’s open source projects—DSpace, Fedora, and VIVO—are officially launching the nominations phase of the Leadership Group elections to expand the community's role in setting strategic direction and priorities for each project.

New Project Governance

“Is the semantic web still a thing?” / Jonathan Rochkind

A post on Hacker News asks:

A few years ago, it seemed as if everyone was talking about the semantic web as the next big thing. What happened? Are there still startups working in that space? Are people still interested?

Note that “linked data” is basically talking about the same technologies as “semantic web”, it’s sort of the new branding for “semantic web”, with some minor changes in focus.

The top-rated comment in the discussion says, in part:

A bit of background, I’ve been working in environments next to, and sometimes with, large scale Semantic Graph projects for much of my career — I usually try to avoid working near a semantic graph program due to my long histories of poor outcomes with them.

I’ve seen uncountably large chunks of money put into KM projects that go absolutely nowhere and I’ve come to understand and appreciate many of the foundational problems the field continues to suffer from. Despite a long period of time, progress in solving these fundamental problems seem hopelessly delayed.

The semantic web as originally proposed (Berners-Lee, Hendler, Lassila) is as dead as last year’s roadkill, though there are plenty out there that pretend that’s not the case. There’s still plenty of groups trying to revive the original idea, or like most things in the KM field, they’ve simply changed the definition to encompass something else that looks like it might work instead.

The reasons are complex but it basically boils down to: going through all the effort of putting semantic markup with no guarantee of a payoff for yourself was a stupid idea.

The entire comment, and, really the entire thread, are worth a read. There seems to be a lot of energy in libraryland behind trying to produce “linked data”, and I think it’s important to pay attention to what’s going on in the larger world here.

Especially because much of the stated motivation for library “linked data” seems to have been: “Because that’s where non-library information management technology is headed, and for once let’s do what everyone else is doing and not create our own library-specific standards.”  It turns out that may or may not be the case, if your motivation for library linked data was “so we can be like everyone else,” that simply may not be an accurate motivation, everyone else doesn’t seem to be heading there in the way people hoped a few years ago.

On the other hand, some of the reasons that semantic web/linked data have not caught on are commercial and have to do with business models.

One of the reasons that whole thing died was that existing business models simply couldn’t be reworked to make it make sense. If I’m running an ad driven site about Cat Breeds, simply giving you all my information in an easy to parse machine readable form so your site on General Pet Breeds can exist and make money is not something I’m particularly inclined to do. You’ll notice now that even some of the most permissive sites are rate limited through their API and almost all require some kind of API key authentication scheme to even get access to the data.

It may be that libraries and other civic organizations, without business models predicated on competition, may be a better fit for implementation of semantic web technologies.  And the sorts of data that libraries deal with (bibliographic and scholarly) may be better suited for semantic data as well compared to general commercial business data.  It may be that at the moment libraries, cultural heritage, and civic organizations are the majority of entities exploring linked data.

Still, the coarsely stated conclusion of that top-rated HN comment is worth repeating:

going through all the effort of putting semantic markup with no guarantee of a payoff for yourself was a stupid idea.

Putting data into linked data form simply because we’ve been told that “everyone is doing it” without carefully understanding the use cases such reformatting is supposed to benefit and making sure that it does — risks undergoing great expense for no payoff. Especially when everyone is not in fact doing it.


Taking the same data you already have and reformatting as “linked data” does not neccesarily add much value. If it was poorly controlled, poorly modelled, or incomplete data before — it still is even in RDF.   You can potentially add a lot more value and more additional uses of your data by improving the data quality than by working to reformat it as linked data/RDF.  The idea that simply reformatting it as RDF would add significant value was predicated on the idea of an ecology of software and services built to use linked data, software and services exciting enough that making your data available to them would result in added value.  That ecology has not really materialized, and it’s hardly clear that it will (and to the extent it does, it may only be if libraries and cultural heritage organizations create it; we are unlikely to get a free ride on more general tools from a wider community).

But please do share your data

To be clear, I still highly advocate taking the data you do have and making it freely available under open (or public domain) license terms. In whatever formats you’ve already got it in.  If your data is valuable, developers will find a way to use it, and simply making the data you’ve already got available is much less expensive than trying to reformat it as linked data.  And you can find out if anyone is interested in it. If nobody’s interested in your data as it is — I think it’s unlikely the amount of interest will be significantly greater after you model it as ‘linked data’. The ecology simply hasn’t arisen to make using linked data any easier or more valuable than using anything else (in many contexts and cases, it’s more troublesome and challenging than less abstract formats, in fact).

Following the bandwagon vs doing the work

Part of the problem is that modelling data is inherently a context-specific act. There is no universally applicable model — and I’m talking here about the ontological level of entities and relationships, what objects you represent in your data as distinct entities and how they are related. Whether you model it as RDF or just as custom XML, the way you model the world may or may not be useful or even usable by those in different contexts, domains and businesses.  See “Schemas aren’t neutral” in the short essay by Cory Doctorow linked to from that HN comment.  But some of the linked data promise is premised on the idea that your data will be both useful and integrate-able nearly universally with data from other contexts and domains.

These are not insoluble problems, they are interesting problems, and they are problems that libraries as professional information organizations rightly should be interested in working on. Semantic web/linked data technologies may very well play a role in the solutions (although it’s hardly clear that they are THE answer).

It’s great for libraries to be interested in working on these problems. But working on these problems means working on these problems, it means spending resources on investigation and R&D and staff with the right expertise and portfolio. It does not mean blindly following the linked data bandwagon because you (erroneously) believe it’s already been judged as the right way to go by people outside of (and with the implication ‘smarter than’) libraries. It has not been.

For individual linked data projects, it means being clear about what specific benefits they are supposed to bring to use cases you care about — short and long term — and what other outside dependencies may be necessary to make those benefits happen, and focusing on those too.  It means understanding all your technical options and considering them in a cost/benefit/risk analysis, rather than automatically assuming RDF/semantic web/linked data and as much of it as possible.

It means being aware of the costs and the hoped for benefits, and making wise decisions about how best to allocate resources to maximize chances of success at those hoped for benefits.   Blindly throwing resources into taking your same old data and sharing it as “linked data”, because you’ve heard it’s the thing to do,  does not in fact help.

Filed under: General

Islandora Camp Colorado - Bringing Islandora and Drupal closer / Cherry Hill Company

From October 13 - 16, 2014, I had the opportunity to go to (and the priviledge to present at) Islandora Camp Colorado ( These were four fairly intensive days, including a last day workshop looking to the future with Fedora Commons 4.x. We had a one day introduction to Islandora, a day of workshops, and a final day of community presentations on how Libraries (and companies that work with Libraries such as ours) are using Islandora. The future looks quite interesting for the relationship between Fedora Commons and Drupal.

  • The new version of Islandora allows you to regenerate derivatives on the fly. You can specify which datastreams are derivatives of (what I am calling) parent datastreams. As a result, the new feature allows you to regenerate a derivative through the UI or possibly via Drush, which something the Colorado Alliance is working to have working with the ...
Read more »

Service-Proxy - 0.39 / FOSS4Lib Recent Releases

Release Date: 
Monday, October 27, 2014

Last updated October 28, 2014. Created by Peter Murray on October 28, 2014.
Log in to edit this page.

New IdentityLayer field, indexIconUrl, meant for defining another per-user logo, supplementing iconUrl.

Archivematica - 1.3.0 / FOSS4Lib Recent Releases

Release Date: 
Friday, October 24, 2014

Last updated October 28, 2014. Created by Peter Murray on October 28, 2014.
Log in to edit this page.

Important note: this is not a required upgrade from 1.2.x. Only new users, those wanting to try out 14.04, or DuraCloud account holders need this release.

Bookmarks for October 28, 2014 / Nicole Engard

Today I found the following resources and bookmarked them on <a href=

    ZenHub provides a project management solution to GitHub with customizable task boards, peer feedback, file uploads, and more.
  • Thingful
    Thingful® is a search engine for the Internet of Things, providing a unique geographical index of connected objects around the world, including energy, radiation, weather, and air quality devices as well as seismographs, iBeacons, ships, aircraft and even animal trackers. Thingful’s powerful search capabilities enable people to find devices, datasets and realtime data sources by geolocation across many popular Internet of Things networks
  • Zanran Numerical Data Search
    Zanran helps you to find ‘semi-structured’ data on the web. This is the numerical data that people have presented as graphs and tables and charts. For example, the data could be a graph in a PDF report, or a table in an Excel spreadsheet, or a barchart shown as an image in an HTML page. This huge amount of information can be difficult to find using conventional search engines, which are focused primarily on finding text rather than graphs, tables and bar charts.
  • Gwittr
    Gwittr is a Twitter API based search website. It allows you to better search any Twitter account for older tweets, linked web pages and pictures.
  • ThingLink
    Easily create interactive images and videos for your websites, infographics, photo galleries, presentations and more!

Digest powered by RSS Digest

The post Bookmarks for October 28, 2014 appeared first on What I Learned Today....

Midwinter Workshop Highlight: Meet the Field Research Presenter! / LITA

We asked our LITA Midwinter Workshop Presenters to tell us a little more about themselves and what to expect from their workshops in January. This week, we’re hearing from Wayne Johnston, who will be presenting the workshop:

Developing mobile apps to support field research
(For registration details, please see the bottom of this blog post)

LITA: Can you tell us a little more about you?

Wayne: I am currently Head of Research Enterprise and Scholarly Communication at the University of Guelph Library. Prior to joining the Library I worked for the United Nations in both New York and Geneva. My international experience includes work I’ve done in Ghana, Nepal, Croatia and Canada’s Arctic.

LITA: Who is your target audience for this workshop?

Wayne: I think this workshop will be most relevant to academic librarians who are supporting research activity on their campuses.  It may be of particular interest to those working in research data management.  Beyond that, anyone interested in mobile technology and/or open source software will find the workshop of interest.

LITA: How much experience with programming do attendees need to succeed in the workshop?

Wayne: None whatsoever.  Some experience with examples of field research undertaken by faculty and/or graduate students would be useful.

LITA: If you were a character from the Marvel or Harry Potter universe, which would it be, and why?

Wayne: How about the Silver Surfer?  By living vicariously through the field research I support I feel that I glide effortlessly to the far corners of the world.

LITA: Name one concrete thing your attendees will be able to take back to their libraries after participating in your workshop.

WayneYou will be equipped to enable researchers on your campus to dispense with paper data collection and discover new efficiencies and data security by using mobile technology.

LITA: What kind of gadgets/software do your attendees need to bring?

WayneNothing required but any mobile devices would be advantageous.  If possible, have an app that enables you to read QR codes.

LITA: Respond to this scenario: You’re stuck on a desert island. A box washes ashore. As you pry off the lid and peer inside, you begin to dance and sing, totally euphoric. What’s in the box?

WayneA bottle of craft beer.

More information about Midwinter Workshops. 

Registration Information:
LITA members get one third off the cost of Mid-Winter workshops. Use the discount promotional code:  LITA2015 during online registration to automatically receive your member discount.  Start the process at the ALA web sites:
Conference web site:
Registration start page:
LITA Workshops registration descriptions:
When you start the registration process and BEFORE you choose the workshop, you will encounter the Personal Information page.  On that page there is a field to enter the discount promotional code:  LITA2015
As in the example below.  If you do so, then when you get to the workshops choosing page the discount prices, of $235, are automatically displayed and entered.  The discounted total will be reflected in the Balance Due line on the payment page.
Please contact the LITA Office if you have any registration questions.

Data Infrastructure, Education & Sustainability: Notes from the Symposium on the Interagency Strategic Plan for Big Data / Library of Congress: The Signal

Last week, the  National Academies Board on Research Data and Information hosted a Symposium on the Interagency Strategic Plan for Big Data. Staff from the National Institutes of Health, the National Science Foundation, the U.S. Geological Survey and the National Institute for Standards and Technology presented on ongoing work to establish an interagency strategic plan for Big Data. In this short post I recap some of the points and issues that were raised in the presentations and discussion and provide links to some of the projects and initiatives that I think will be of interest to readers of The Signal.

Vision and Priority Actions for National Big Data R&D

Slide with the vision for the interagency big data activity.

Slide with the vision for the interagency big data activity.

Part of the occasion for this event is the current “Request for Input (RFI)-National Big Data R&D Initiative.” Individuals and organizations have until November 14th to provide comments on “The National Big Data R&D Initiative: Vision and Actions to be Taken” (pdf). This short document is intended to inform policy for research and development across various federal agencies. Relevant to those working in digital stewardship and digital preservation, the draft includes a focus on issues related to trustworthiness of data and resulting knowledge, investing in both domain-specific and shared cyberinfrastructure to support research and improving data analysis education and training and a focus on “ensuring the long term sustainability” of data sets and data resources.

Sustainability as the Elephant in the Room

In the overview presentation about the interagency big data initiative, Allen Dearry from the National Institute of Environmental Health Sciences noted that sustainability and preservation infrastructure for data remains the “elephant in the room.” This comment resonated with several of the subsequent presenters and was referenced several times in their remarks. I was glad to see sustainability and long-term access getting this kind of attention. It is also good to see that “sustainability” is specifically mentioned in the draft document referenced above. With that noted, throughout discussion and presentations it was clear that the challenges of long-term data management are only becoming more and more complex as more and more data is collected to support a range of research.

From “Data to Knowledge” as a Framework

The phrase “Data to Knowledge” was a repeated in several of the presentations. The interagency team working in this space has often made use of it, for example, in relation to last years “Data to Knowledge to Action” event (pdf). From a stewardship/preservation perspective, it is invaluable to recognize that the focus on the resulting knowledge and action that comes from data puts additional levels of required assurance on the range of activities involved in the stewardship of data. This is not simply an issue of maintaining data assets, but a more complex activity of keeping data accessible and interpretable in ways that support generating sound  knowledge.

Some of the particular examples discussed under the heading of “data to knowledge” illustrate the significance of the concept to the work of data preservation and stewardship. One of the presenters mentioned the importance of publishing negative results and the analytic process of research. A presenter noted that open source platforms like iPython notebook are making it easier for scientists to work on and share their data, code and research. This discussion connected rather directly with many of the issues that were raised in the 2012 NDIIPP content summit Science@Risk: Toward a National Strategy for Preserving Online Science and in its final report (pdf). There is a whole range of seemingly ancillary material that makes data interpretable and meaningful. I was pleased to see one of those areas, software, receive recognition at the event.

Recognition of Software Preservation as Supporting Data to Knowledge

Sky Bristol from USGS presenting on sustainability issues related to big data to an audience at the National Academies of Science in Washington DC.

Sky Bristol from USGS presenting on sustainability issues related to big data to an audience at the National Academies of Science in Washington DC.

The event closed with presentations from two projects that won National Academies Board on Research Data and Information’s Data and Information Challenge Awards. Adam Asare of the Immune Tolerance Network presented on “ITN Trial Share: Enabling True Clinical Trial Transparency” and Mahadev Satyanarayanan from the Olive Executable Archive presented on “Olive: Sustaining Executable Content Over Decades.” Both of these projects represent significant progress supporting the sustainability of access to scientific data.

I was particularly thrilled to see the issues around software preservation receiving this kind of national attention. As explained in much greater depth in the Preserving.exe report, arts, culture and scientific advancement are increasingly dependent on software. In this respect, I found it promising to see a project like Olive, which has considerable implication for the reproducibility of analysis and for providing long-term access to data and interpretations of data in their native formats and environments, receiving recognition at an event focused on data infrastructure. For those interested in the further implications of this kind of work for science, this 2011 interview with the Olive project explores many of the potential implications of this kind of work for science.

Education and Training in Data Curation

Slide from presentation on approaches to analytical training for working wtih data for all learners.

Slide from presentation on approaches to analytical training for working with data for all learners.

Another subject I imagine readers of The Signal are tracking is education and training in support of data analysis and curation. Michelle Dunn from the National Institutes for Health presented on an approach NIH is taking to develop the kind of workforce that is necessary in this space. She mentioned a range of vectors for thinking about data science training, including traditional academic programs as well as the potential for the development of open educational resources. For those interested in this topic, it’s worth reviewing the vision and goals outlined in the NIH Data Science “Education, Training, and Workforce Development” draft report (pdf). As libraries increasingly become involved in the curation and management of research data, and as library and information science programs increasingly focus on preparing students to work in support of data-intensive research, it will be critical to follow developments in this area.

Familiarity Breeds Contempt / David Rosenthal

In my recent Internet of Things post I linked to Jim Gettys' post Bufferbloat and Other Challenges. In it Jim points to a really important 2010 paper by Sandy Clarke, Matt Blaze, Stefan Frei and Jonathan Smith entitled Familiarity Breeds Contempt: The Honeymoon Effect and the Role of Legacy Code in Zero-Day Vulnerabilities.

Clarke et al analyze databases of vulnerabilities to show that the factors influencing the rate of discovery of vulnerabilities are quite different from those influencing the rate of discovery of bugs. They summarize their findings thus:
We show that the length of the period after the release of a software product (or version) and before the discovery of the first vulnerability (the ’Honeymoon’ period) is primarily a function of familiarity with the system. In addition, we demonstrate that legacy code resulting from code re-use is a major contributor to both the rate of vulnerability discovery and the numbers of vulnerabilities found; this has significant implications for software engineering principles and practice.
Jim says:
our engineering processes need fundamental reform in the face of very long lived devices.
Don't hold your breath. The paper's findings also have significant implications for digital preservation, because external attack is an important component of the threat model for digital preservation systems:
  • Digital preservation systems are, like devices in the Internet of Things (IoT), long-lived.
  • Although they are designed to be easier to update than most IoT devices, they need to be extremely cheap to run. Resources to make major changes to the code base within the "honeymoon" period will be inadequate.
  • Scarce resources and adherence to current good software engineering resources already mean that much of the code in these systems is shared.
Thus it is likely that digital preservation systems will be more vulnerable than the systems whose content they are intended to preserve. This is a strong argument for diversity of implementation, which has unfortunately turned out to increase costs significantly. Mitigating the threat from external attack increases the threat of economic failure.

Lots more on researcher identifiers / HangingTogether

I blogged earlier this year inviting feedback on the OCLC Research Registering Researchers in Authority Files Task Group‘s draft report-and we did receive some, much appreciated. Now the report is published!

Along with it, we’ve published supplementary datasets detailing our research:

  • our use case scenarios
  • characteristics profiles of 20 research networking or identifier systems
  • an Excel workbook with
    • links to 100 systems the task group considered
    • the functional requirements derived from the use case scenarios and their associated stakeholders
    • compilation of the 20 characteristics profiles for easy comparison
    • the 20 profiled systems mapped to their functional requirements.

Registering Researchers in Authority FilesThe report, supplementary datasets, and a slide with the Researcher Identifier Information Flow diagram used in the report (and which can be repurposed, with attribution) are all available on the Registering Researchers in Authority Files report landing page.

If I had to choose the key message from all of this, it would be that research institutions and libraries need to recognize that “authors are not strings” and that persistent identifiers are needed to accurately link their researchers with their scholarly output and to funders.

The report could be considered the “executive summary” of the task group’s two years’ worth of research. No one identifier or system will ever include all researchers, or meet all functional requirements of every stakeholder. If you’re weighing pros and cons of different identifier systems, I’d suggest you look at the profiles and our mappings to the functional requirements.

Collaborating with such talented experts on the task group has been a great pleasure. Now that we’ve delivered our final output, I’m looking forward to your reactions and feedback!

About Karen Smith-Yoshimura

Karen Smith-Yoshimura, program officer, works on topics related to renovating descriptive and organizing practices with a focus on large research libraries and area studies requirements.

iCampCO's Secret Sauce - a Guest Blog / Islandora

The following guest post was written by Islandora Camp Colorado attendee Bryan Brown, who joined us from Florida State University:

Islandora Camp CO was over a week ago now, but I’m still digesting the experience. Having been to several conferences before, I was expecting something similar where a Sage on the Stage lectures about some abstract topic while the audience passively listens (or doesn’t). I was pleasantly surprised at the smaller and more personal atmosphere of iCamp, where we were free to ask questions in the middle of presentations and instructors revised their talks based on what the audience was most interested in. Instead of canned slideshows, Islandora Camp is an interactive experience that could vary wildly depending on who attends. This is because the core theme of Islandora Camp, and maybe even Islandora in general, is community.

From the first day where we all introduced ourselves and how we are using Islandora, I quickly felt like I knew everyone at camp and felt no hesitation to strike up a conversation with others about their work. The conversations I had with other campers about how they are using Islandora stuck with me just as much as the presentations and workshops. I met a lot of interesting developers and administrators who are working on projects similar to my own and came back to Florida with a greatly extended network of fellow Islandorians I could work with to solve shared problems. Instead of treating our Islandora instances like unique snowflakes and solving our problems in a vacuum, we need to come together and discuss these problems as a community so we can create better solutions that help more people.

The future of Islandora is not up to the Islandora Foundation or Discovery Garden, but with Islandora users. If you want Islandora to be better, it’s not enough to sit around and wait for new modules to come out or complain about problems they might have. File bug reports when you find an issue. Volunteer to test modules for new releases. Contribute your patches as a pull request. Join an interest group. There are lots of ways to get involved in the Islandora community, even if you aren’t a developer. Since we are all using the same system, we are all in the same boat. This sense of connectedness might just be the secret sauce that makes iCamp such a great experience.

Unit testing WordPress plugins / Casey Bisson

We’ve been unit testing some of our plugins using the old WordPress-tests framework and tips from this 2012 blog post. The good news is that the framework has since been incorporated into core WP, the bad news is that it was changed along the way, and it wasn’t exactly easy to get the test environment setup correctly for the old WordPress-tests.

I’ve had a feeling there must be a better way, and today I discovered there is. WP-CLI has plugin unit test scaffolding that’s easy to install. Pippin’s Plugins’ guide to the scaffold is helpful as well. My experience was pretty smooth, with the following caveats:

  • cd $(wp plugin path --dir my-plugin) is just another way of saying “cd into the plugin’s directory.” It’s good to see the example of how wpcli can be used that way, but way easier for me to type the path.
  • bin/ came out with some unexpected permissions. I did a chmod 550 bin/ and was a lot happier. It’s possible (perhaps likely) that I’m missing a sexy unix permissions trick there, and the permissions are intentionally non-executable for non-root users, but there’s no obvious documentation for that.
  • The bin/ needs to run with a user that can create databases (probably root for many people). I’m usually pretty particular about this permission, but the convenience factor here depends on it.
  • The old framework expected -test.php to be the file suffix, the new approach expects the files to be prefixed with test-

All those are pretty minor, however. I think this approach will make it far easier to make tests distributable. The support for Travis (and from there to Github) is super sexy. All together, this should make tests easier to write and use.


I’ve added the scaffold to some of my most popular plugins:

Only bCMS has a meaningful test, written by Will Luo, but we’ll see where it goes from here. I’m still working out issues getting the test environment setup both locally and in Travis. Plugin dependencies, configuration, and git submodules are among the problems.

Policy Revolution! and COSLA in Wyoming: Bountiful in bibliophiles but barren of bears / District Dispatch

Jenny Lake, Wyoming

Jenny Lake, Wyoming

I just returned from the Annual Meeting of the Chief Officers of State Library Agencies (COSLA), held in Teton Village, Wyo., just down the road from Grand Teton National Park and Jackson. From the moment I left the airport, I knew I was not in D.C. any longer, as there were constant reminders about avoiding animals. There were road signs informing drivers about “moose on the loose;” strong suggestions about hiking in groups and to carry bear spray; and warnings about elk hunting so “please wear bright colors.” In D.C., we only worry about donkeys and elephants engaging in political shenanigans.

Work on our Policy Revolution! Initiative attracted me to the COSLA meeting, to leverage the presence of the state librarians, and also librarians from the mountain states. Our session focused on four aspects of work related to developing a national public policy agenda:

  • From a library leader’s perspective, what are the most important national goals that would advance libraries in the next 5-10 years?
  • From the U.S. President’s perspective, how could libraries and libraries best contribute to the most important national goals, and what national initiatives are needed to realize these contributions?
  • From the many good ideas that we can generate, how can we prioritize among them?
  • What does a national public policy agenda look like? What are its characteristics?


The wide open spaces and rugged individualistic culture of Wyoming, symbolized by Steamboat, reminded me of the vastness of the United States, and great resources and resourcefulness of our people. In this time of library revolution, we need to move beyond our conventional views of the world to figure out how libraries may best serve the nation for decades to come. With the next presidential election just around the corner, and with it the certainty of a new occupant in the White House, it is timely and urgent to develop and coalesce around a common library vision.

One thought on the way home was stimulated by the Wyoming session. What should be the priority for national action? Three possibilities occur to me:

  • Increase direct funding (i.e., show me the money)
  • Effect public policy changes that may or may not directly implicate funding, such as copyright, privacy, licensing regimes, accommodations for people with disabilities, but are changes that can only be achieved at the national level, or at least best addressed at the national level
  • Promote a new vision and positioning for libraries in national conversation (i.e., bully pulpit)

Should a national public policy agenda systematically favor one of these directions?

Teton County Library

Teton County Library

Many thanks to COSLA for hosting us, with particular thanks to Ann Joslin and Tim Cherubini (and his staff). I also appreciated the opportunity to sit in a number of sessions that included generous doses of our long-time friends E-rate, ebooks and digital services. We had a special treat as Wyoming’s senior U.S. Senator, Michael Enzi (R-WY), addressed the group, regaling the audience with his love of reading and libraries.

I had the opportunity for a quick tour around the area. I was impressed with the large, modern Teton County Library (in Jackson), which has good wireless access—yay! After seeing the Grand Tetons and tooling about Jenny Lake, it is gonna be hard to settle back down to the political chaos that is Washington, D.C.

The post Policy Revolution! and COSLA in Wyoming: Bountiful in bibliophiles but barren of bears appeared first on District Dispatch.

What’s this Jane-athon thing? / Manage Metadata (Diane Hillmann and Jon Phipps)

Everyone is getting tired of the sage-on-the-stage style of preconferences, so when Deborah Fritz suggested a hackathon (thank you Deborah!) to the RDA Dev Team, we all climbed aboard and started thinking about what that kind of event might look like, particularly in the ALA Midwinter context. We all agreed: there had to be a significant hands-on aspect to really engage those folks who were eager to learn more about how the RDA data model could work in a linked data environment, and, of course, in their own home environment.

We’re calling it a Jane-athon, which should give you a clue about the model for the event: a hackathon, of course! The Jane Austen corpus is perfect to demonstrate the value of FRBR, and there’s no lack of interesting material to look at– media materials, series, spin-offs of every description–in addition to the well known novels. So the Jane-athon will be partially about creating data, and partially about how that data fits into a larger environment. And did you know there is a Jane Austen bobblehead?

We think there will be a significant number of people who might be interested in attending, and we figured that getting the world out early would help prospective participants make their travel arrangements with attendance in mind. Sponsored by ALA Publishing, the Jane-athon will be on the Friday before the midwinter conference (the traditional pre-conference day), and though we don’t yet have registration set up, we’ll make sure everyone knows when that’s available. If you think, as we do that this event will be the hit of Midwinter, be sure to watch for that announcement, and register early! If the event is successful, you’ll be seeing others in subsequent ALA conferences.

So, what’s the plan and what will participants get out of it?

The first thing to know is that there will be tables and laptops to enable small groups to work together for the ‘making data’ portion of the event. We’ll be asking folks who have laptops they can bring to Chicago to plan on bringing theirs. We’ll be using the latest version of a new bibliographic metadata editor called RIMMF (“RDA In Many Metadata Formats”–not yet publicly available–but soon. Watch for it on the TMQ website). We encourage interested folks to download the current beta version and play with it–it’s a cool tool and really is a good one to learn about.

In the morning, we’ll form small cataloging groups and use RIMMF to do some FRBRish cataloging, starting from MARC21 and ending up with RDA records exported as RDF Linked Data. In the afternoon we’ll all take a look at what we’ve produced, share our successes and discoveries, and discuss the challenges we faced. In true hackathon tradition we’ll share our conclusions and recommendations with the rest of the library community on a special Jane-athon website set up to support this and subsequent Jane-athons.

Who should attend?

We believe that there will be a variety of people who could contribute important skills and ideas to this event. Catalogers, of course, but also every flavor of metadata people, vendors, and IT folks in libraries would be warmly welcomed. But wouldn’t tech services managers find it useful? Oh yes, they’d be welcomed enthusiastically, and I’m sure their participation in the discussion portion of the event in the afternoon will bring out issues of interest to all.

Keep in mind, this is not cataloging training, nor Toolkit training, by any stretch of the imagination. Neither will it be RIMMF training or have a focus on the RDA Registry, although all those tools are relevant to the discussion. For RIMMF, particularly, we will be looking at ways to ensure that there will be a cadre of folks who’ve had enough experience with it to make the hands-on portion of the day run smoothly. For that reason, we encourage as many as possible to play with it beforehand!

Our belief is that the small group work and the discussion will be best with a variety of experience informing the effort. We know that we can’t provide the answers to all the questions that will come up, but the issues that we know about (and that come up during the small group work) will be aired and discussed.

Photos from iCampCO / Islandora

I mentioned the location of our latest Islandora Camp was beautiful, right? Well, don't take my word for it. One of our campers shared these lovely photos from around town:

(also, check out Ashok Modi's blog about his experiences at camp)

IL2014: Driving Our Own Destinies / Nicole Engard

Brendan Howley opened up the Internet Librarian conference this year. Brian designs stories that insight people to “do something”. He’s here to talk to us about the world of media and desired outcomes – specifically the desired outcomes for our libraries. Brendan collected stories from local library constituents to find out what libraries needed to do to get to the next step. He found (among other things) that libraries should be hubs for culture and should connect community media.

Three things internet librarians need to know:

  1. why stories world and what really matters
  2. why networks form (power of the weak not the strong)
  3. why culture eats strategy for lunch (Peter Drucker)

“The internet means that libraries are busting out of their bricks and mortars”

Brendan shared with us how Stories are not about dumping data, they’re about sharing data and teachable moments.

How storytelling effects the brain

Data is a type of story and where data and stories meet is where change found. If you want to speak to your community you need to keep in mind that we’re in a society of “post-everything” – there is only one appetite left in terms of storytelling – “meaning”. People need to find it relevant and find meaning in the story. The most remarkable thing about librarians is that we give “meaning” away every day.

People want to know what we stand for and why – values are the key piece to stories. People want to understand why libraries still exist. People under the age of 35 want to know how to find the truth out there – the reliable sources – they don’t care about digital literacy. It’s those who are scared of being left behind – those over 35 (in general) who care about digital literacy.

The recipe for a successful story is: share the why of the how of what you do.

The sharing of stories creates networks. Networks lead to the opportunity to create value – and when that happens you’ve proved your worth as a civic institution. Networks are the means by which those values spread. They are key to the future of libraries.

A Pattern Language by Christopher Alexander is a must read by anyone designing systems/networks.

You need to understand that it’s the weak ties that matter. Strong ties are really quite rare – this sounds a lot like the long tail to me.

Libraries are in the business of giving away context – that means that where stories live, breathe, gather and cause people to do things is in the context. We’re in a position where we can give this context away. Libraries need to understand that we’re cultural entrepreneurs. Influencers fuel culture – and that’s the job description for librarians.

The post IL2014: Driving Our Own Destinies appeared first on What I Learned Today....

Wagging the Long Tail Again / Islandora

It has been a while since our last foray into the Long Tail of Islandora. Some of those modules have moved all the way from the tail to the head and become part of our regular release. We have been quietly gathering them in our Resources section, but it's more than time for another high level review of the awesome modules that are out there in the community, just waiting to make your repo better.

Islandora XQuery

The ability to batch edit has long been the impossible dream in Islandora. Well, with this little module from discoverygarden, Inc., the dream has arrived. With a basic knowledge of XQuery, you can attack the metadata in your Fedora repository en masse. 

Putting Islandora XQuery into production should be approached with caution for the same reason that batch editing has been so long elusive: if you mass-edit your data, you can break things. That said, the module does come with a helpful install script, so getting it working in your Islandora Installation may be the easiest part!

Islandora Entity Bridge

Much like Islandora Sync, Ashok Modi's Islandora Entity Bridge endeavours to build relationships between Fedora objects and Drupal so you can apply a wider variety of Drupal modules to the contents of your repository without recreating your objects as nodes.

Ashok presented on this module at the recent Islandora Camp in Denver, so you can learn more from his slides here.

Islandora Plupload

This simple but very effective module has been around a while. It makes use of the Plupload library to allow you to exceed PHP file limits when uploading large files.

Islandora Feeds

Mark Jordan has created this tool so you can use the Feeds contrib module to create Islandora objects. This module is still in development, so you can help it to move forward by telling Mark your use cases.

Islandora Meme Solution Pack

The latest in islandora demo/teaching modules, developed at Islandora Camp Colorado by dev instructors Daniel Lamb and Nick Ruest to help demonstrate the joys of querying Solr. This module is not meant to be used in your repo, but rather to act as a learning tool, especially when used in combination with our Islandora VM.

Are you an iPad or a laptop? / LITA

I’ve never been a big tablet user. This may come as a surprise to some, given that I assist patrons with their tablets every day at the public library. Don’t get me wrong, I love my Nexus 7 tablet. It’s perfect for reading ebooks, using Twitter, and watching Netflix; but the moment I want to respond to an email, edit a photo, or work my way through a Treehouse lesson, I feel helpless. Several library patrons have asked me if our public computers will be replaced by iPads and tablets. It’s hard to say where technology will take us in the coming years, but I strongly believe that a library without computers would leave us severely handicapped.

ipad_laptop-01One of our regular library patrons, let’s call her Jane, is a diehard iPad fan. She is constantly on the hunt for the next great app and enjoys sharing her finds with me and my colleagues. Jane frequently teases me about preferring computers and whenever I’m leading a computer class she’ll ask “Can I do it on my iPad?” She’s not the only person I know who thinks that computers are antiquated and on their way to obsoletion, but I have plenty of hope for computers regardless of the iPad revolution.

In observing how patrons use technology, and reflecting on how I use technology in my personal and professional life, I find that tablets are excellent tools for absorbing and consuming information. However, they are not designed for creation. 9 times out of 10, if you want to make something, you’re better off using a computer. In a recent Wired article about digital literacy, Ari Geshner poses the question “Are you an iPad or are you a laptop? An iPad is designed for consumption.” He explains that literacy “means moving beyond a passive relationship with technology.”

So Jane is an iPad and I am a laptop. We’ve managed to coexist and I think that’s the best approach. Tablets and computers may both fall under the digital literacy umbrella, but they are entirely different tools. I sincerely hope that public libraries will continue to consider computers and tablets separately, encouraging a thirst for knowledge as well as a desire to create.

Research information management systems - a new service category? / Lorcan Dempsey

It has been interesting watching Research Information Management or RIM emerge as a new service category in the last couple of years. RIM is supported by a particular system category, the Research Information Management System (RIMs), sometimes referred to by an earlier name, the CRIS (Current Research Information System).

For reasons discussed below, this area has been more prominent outside the US, but interest is also now growing in the US. See for example, the mention of RIMs in the Library FY15 Strategic Goals at Dartmouth College.

Research information management

The name is unfortunately confusing - a reserved sense living alongside more general senses. What is the reserved sense? Broadly, RIM is used to refer to the integrated management of information about the research life-cycle, and about the entities which are party to it (e.g. researchers, research outputs, organizations, grants, facilities, ..). The aim is to synchronize data across parts of the university, reducing the burden to all involved of collecting and managing data about the research process. An outcome is to provide greater visibility onto institutional research activity. Motivations include better internal reporting and analytics, support for compliance and assessment, and improved reputation management through more organized disclosure of research expertise and outputs.

A major driver has been the need to streamline the provision of data to various national university research assessment exercises (for example, in the UK, Denmark and Australia). Without integrated support, responding to these is costly, with activities fragmented across the Office of Research, individual schools or departments, and other support units, including, sometimes, the library. (See this report on national assessment regimes and the roles of libraries.)

Some of the functional areas covered by a RIM system may be:

  • Award management and identification of award opportunities. Matching of interests to potential funding sources. Supporting management of and communication around grant and contracts activity.
  • Publications management. Collecting data about researcher publications. Often this will be done by searching in external sources (Scopus and Web of Science, for example) to help populate profiles, and to provide alerts to keep them up to date.
  • Coordination and publishing of expertise profiles. Centralized upkeep of expertise profiles. Pulling of data from various systems. This may be for internal reporting or assessment purposes, to support individual researchers in providing personal data in a variety of required forms (e.g. for different granting agencies), and for publishing to the web through an institutional research portal or other venue.
  • Research analytics/reporting. Providing management information about research activity and interests, across departments, groups and individuals.
  • Compliance with internal/external mandates.
  • Support of open access. Synchronization with institutional repository. Managing deposit requirements. Integration with sources of information about Open Access policies.

To meet these goals, a RIM system will integrate data from a variety of internal and external systems.Typically, a university will currently manage information about these processes across a variety of administrative and academic departments. Required data also has to be pulled from external systems, notably data about funding opportunities and publications.


Several products have emerged specifically to support RIM in recent years. This is an important reason for suggesting that it is emerging as a recognized service category.

  • Pure (Elsevier). "Pure aggregates your organization's research information from numerous internal and external sources, and ensures the data that drives your strategic decisions is trusted, comprehensive and accessible in real time. A highly versatile system, Pure enables your organization to build reports, carry out performance assessments, manage researcher profiles, enable expertise identification and more, all while reducing administrative burden for researchers, faculty and staff." [Pure]
  • Converis (Thomson Reuters). "Converis is the only fully configurable research information management system that can manage the complete research lifecycle, from the earliest due diligence in the grant process through the final publication and application of research results. With Converis, understand the full scope of your organization's contributions by building scholarly profiles based on our publishing and citations data--then layer in your institutional data to more specifically track success within your organization." [Converis]
  • Symplectic Elements. "A driving force of our approach is to minimise the administrative burden placed on academic staff during their research. We work with our clients to provide industry leading software services and integrations that automate the capture, reduce the manual input, improve the quality and expedite the transfer of rich data at their institution."[Symplectic]

Pure and Converis are parts of broader sets of research management and analytics services from, respectively, Elsevier (Elsevier research intelligence) and Thomson Reuters (Research management and evaluation). Each is a recent acquisition, providing an institutional approach alongside the aggregate, network level approach of each company's broader research analytics and management services.

Symplectic is a member of the very interesting Digital Science portfolio. Digital Science is a company set up by Macmillan Publishers to incubate start-ups focused on scientific workflow and research productivity. These include, for example, Figshare and Altmetric.

Other products are also relevant here. As RIM is an emerging area, it is natural to expect some overlap with other functions. For example, there is definitely overlap with backoffice research administration systems - Ideate from Consilience or solutions from infoEd Global, for example. And also with more publicly oriented profiling and expertise systems on the front office side.

With respect to the latter, Pure and Symplectic both note that they can interface to VIVO. Furthermore, Symplectic can provide "VIVO services that cover installation, support, hosting and integration for institutions looking to join the VIVO network". It also provides implementation support for the Profiles Research Networking Software.

As I discuss further below, one interesting question for libraries is the relationship between the RIMs or CRIS and the institutional repository. Extensions have been written for both Dspace and Eprints to provide some RIMs-like support. For example, Dspace-Cris extends the Dspace model to cater for the Cerif entities. This is based on work done for the Scholar's Hub at Hong Kong University.

It is also interesting to note that none of the three open source educational community organizations - Kuali, The Duraspace Foundation, or The Apereo Foundation - has a directly comparable offering, although there are some adjacent activities. In particular, Kuali Coeus for Research Administration is "a comprehensive system to manage the complexities of research administration needs from the faculty researcher through grants administration to federal funding agencies", based on work at MIT. Duraspace is now the organizational home for VIVO.

Finally, there are some national approaches to providing RIMs or CRIS functionality, associated with a national view of research outputs. This is the case in South Africa, Norway and The Netherlands, for example.


Another signal that this is an emerging service category is the existence of active standards activities. Two are especially relevant here:CERIF (Common European Research Information Format) from EuroCRIS, which provides a format for exchange of data between RIM systems, and the Casrai dictionary. CASRAI is the Consortia Advancing Standards in Research Administration Information.


So, what about research information management (in this reserved sense) and libraries? One of the interesting things to happen in recent years is that a variety of other campus players are developing service agendas around digital information management that may overlap with library interests. This has happened with IT, learning and teaching support, and with the University press, for example. This coincides with another trend, the growing interest in tracking, managing and disclosing the research and learning outputs of the institution: research data, learning materials, expertise profiles, research reports and papers, and so on. The convergence of these two trends means that the library now has shared interests with the Office of Research, as well as with other campus partners. As both the local institutional and public science policy interest in university outputs grows, this will become a more important area, and the library will increasingly be a partner. Research Information Management is a part of a slowly emerging view of how institutional digital materials will be managed more holistically, with a clear connection to researcher identity.

As noted above, this interest has been more pronounced outside the US to date, but will I think become a more general interest in coming years. It will also become of more general interest to libraries. Here are some contact points.

  • The institutional repository boundary. It is acknowledged that Institutional Repositories (IRs) have been a mixed success. One reason for this is that they are to one side of researcher workflows, and not necessarily aligned with researcher incentives. Although also an additional administrative overhead, Research Information Management is better aligned with organizational and external incentives. See for example this presentation (from Royal Holloway, U of London) which notes that faculty are more interested in the CRIS than they had been in the IR, 'because it does more for them'. It also notes that the library no longer talks about the 'repository' but about updating profiles and loading full-text. There is a clear intersection between RIMs and the institutional repository and the boundary may be managed in different ways. Hong Kong University, for example, has evolved its institutional repository to include RIMs or CRIS features. Look at the publications or presentations of David Palmer, who has led this development, for more detail. There is a strong focus here on improved reputation management on the web through effective disclosure of researcher profiles and outputs. Movement in the other direction has also occurred, where a RIMs or CRIS is used to support IR-like services. Quite often, however, the RIMs and IR are working as part of an integrated workflow, as described here.
  • Management and disclosure of research outputs and expertise. There is a growing interest in researcher and research profiles, and the RIMs may support the creation and management of a 'research portal' on campus. An important part of this is assisting researchers to more easily manager their profiles, including prompting with new publications from searches of external sources. See the research portal at Queen's University Belfast for an example of a site supported by Pure. Related to this is general awareness about promotion, effective publishing, bibliometrics, and management of online research identity. Some libraries are supporting the assignment of ORCIDs. The presentations of Wouter Gerritsma, of Wageningen University in The Netherlands, provide useful pointers and experiences.
  • Compliance with mandates/reporting. The role of RIMs in supporting research assessment regimes in various countries was mentioned earlier: without such workflow support, participation was expensive and inefficient. Similar issues are arising as compliance to institutional or national mandates needs to be managed. Earlier this year, the California Digital Library announced that it had contracted with Symplectic "to implement a publication harvesting system in support of the UC Open Access Policy". US Universities are now considering the impact of the OSTP memo "Increasing Access to the Results of Federally Funded Scientific Research," [PDF] which directs funding agencies with an annual R&D budget over $100 million to develop a public access plan for disseminating the results of their research. ICPSR summarises the memo and its implications here. It is not yet clear how this will be implemented, but it is an example of the growing science and research policy interest in the organized disclosure of information about, and access to, the outputs of publicly funded research. This drives a University wide interest in research information management. In this context, SHARE may provide some focus for greater RIM awareness.
  • Management of institutional digital materials. I suggest above that RIM is one strand of the growing campus interest in managing institutional materials - research data, video, expertise profiles, and so on. Clearly, the relationship between research information management, whatever becomes of the institutional repository, and the management of research data is close. This is especially the case in the US, given the inclusion of research data within the scope of the OSTP memo. The library provides a natural institutional partner and potential home for some of this activity, and also expertise in what Arlitsch and colleagues call 'new knowledge work', thinking about the identifiers and markup that the web expects.

Whether or not Research Information Management become a new service category in the US in quite the way I have discussed it here, it is clear the issues raised will provide important opportunities for libraries to become further involved in supporting the research life of the university.

Mozilla Festival Day 2: Webmaking in Higher Education / Cynthia Ng

We had a short session on looking at how we might use webmaker in a higher education context. Facilitator Helen Lee Open (Free) Tools We use github firefox webmaker drupal wordpress/buddypress Linux Arduino Pinterest & other social media platforms FLAC Why They Are Awesome content is shareable, reusable/remixable easy to use, quick to do creates online […]

Federated search engine of European Poetical databases / Péter Király

- A gentle proposal, v2.0 -

by Levente Seláf1 and Péter Király2

1levente.selaf (.), ELTE, Budapest
2peter.kiraly (.), The Göttingen Society for Scientific Data Processing


This is a technical suggestion for an implementation of a federated search engine provides the researchers a tool for querying multiple poetical databases simultaneously. This suggestion is based on the experiences of a pilot project, MegaRep (, which queries two such databases Le Noveau Naetebus – Repertoire des poémes strophiques non-lyriques en langue francaise d'avant 1400 ( and Repertorire de la poésie hongroise ancienne (abbreviated as RPHA,, both created at Eötvös Loránd University, Budapest.

The main usage scenario of the tool is the following. The end user (the researcher) creates a query in a user interface. The user interface hides the technical, formal details and provides human readable dropdown lists, radio buttons and similar standard web user interface elements. When the user enter the form the tool creates a more-or-less language independent formal query, and sends it to the individual databases. The databases receive the query, transform it to their own query language, run the search, transform the hit list to an XML-based common format, and send it back to the caller, the federated search engine. The tool collects the results, transfroms XML to HTML, and display the merged list to the end user.

The technological background of the communication is based on the OpenSearch protocol. It is a widely accepted and used industrial standard, among others the internet browsers use it to communicate with custom search engines. You can read more at The standard is pretty straightforward, we should send a specific URL format to the server, which sends back a hit list in Atom RSS format.


The simplicity of the OpenSearch is that it does not specify the format of the query itself, and because of its limitation we can not use custom URL parameters (such as &meter=hexameter), but we have to use one parameter (called searchTerms) to send our complex query. The solution is using a popular and well documented formal query language, the Lucene's query syntax.

The request URL which should be implemented by all participants:

[base URL]
?searchTerms=[query string]
&startIndex=[the index of first hit (default is 1)]

You can find the details of Lucene's query syntax here:

This proposal suggest to implement only a limited set of the whole grammar, namely:

  • simple field-value pair
    meaning: the record have field field with value as its value
    SQL equivalent: field = "value"

  • boolean AND, OR, NOT between field-value pairs
    [field1]:[value1] AND [field2]:[value2]
    meaning: the record have field1 field with value1 as its value, and another field2 field with value2 as its value
    SQL equivalent: field1 = "value1" AND field2 = "value2"

  • boolean AND, OR, NOT within one field
    [field]:([value1] AND [value2])
    meaning: the record have field field with both value1 and value2 as its value
    SQL equivalent: field IN ("value1", "value2")

All these is about the formal structure of the query, but we have to define a semantical structure; an initial set of fields, and possible values as a kind of common vocabulary for the concepts described in poetic databases.


We defined an initial structure. This can be extended in a later phase of the project. We tried to find those concepts which are common in the databases used in our pilot. In the design of the vocabulary we had two rules: 1) it should be language agnostic where it is possible, so where we applied categories, we denoted them by numerical values; 2) it can be extendable later. We have a two level hierarchy: some elements has qualifiers, for example: we can make distinctions between subcategories of Graeco-Roman metrical versifications.

In the tables the header contain the field names. In the body of the table the first or first two columns contain the possible values, the last column contains the meaning of the field value.


meter meter_qualifier
01 Graeco-Roman Metrical Versification
01-01-01 hexameter – one verse
01-01-02 hexameter – several verses
01-02-01 distich – one
01-02-02 distichs – several
01-03 Graeco-Roman metrical poetry (classical meter, different from hexameter or pentameter)
01-04 Graeco-Roman metrical versification – new meters without classical antecedents
02 syllabic
03 tendency to be syllabic
04 tonic
05 each word is a foot
06 free verse
07 syllabo-tonic
07-01 German or English syllabo-tonic versification
07-02 Graeco-Roman Metrical Versification combined with stricte syllabism
08 Mixed Compositions (different parts of the text in different metrical systems)


?searchTerms=meter:01 AND meter_qualifier:01-01 &startIndex=1


segmentation segmentation_qualifier
01 strophic – more than one stanza




02 strophic – one strophe
03 rhyming couplets
04 laisses
05 rimes couées, serventese
06 terza rima


?searchTerms=segmentation:01 AND segmentation_qualifier:02 &startIndex=1


rhyme rhyme_qualifier
01 No end-rhymes
01 alliterating, non-rhyming
02 non-alliterating, non-rhyming
02 rhyming
03 assonanced
04 word-refrain rhyming


?searchTerms=rhyme:01 AND rhyme_qualifier:02 &startIndex=1

Rhyming Structure of the Stanza

The field name is rhyme_scheme. It contains a free text of rhyming structure in a scholarly accepted notation (such as AABA).



Metrical Structure (verse length)

The field name is metrical_scheme. It contains a free text of the metrical structure in a scholarly accepted notation (such as 12 16).


?searchTerms=metrical_scheme:12 16&startIndex=1

Declination of line

01 rythme de vers descendant
02 rythme de vers ascendant
03 critere non applicable



Gonic Structure – level of the poem

01 homogonical
02 heterogonical



Gonic Structure

The field name is declination_scheme. It contains a free text of gonic structure in a scholarly accepted notation, i.e. is one of more 'M', 'm', 'F', or 'f' character where 'M' and 'm' mean masculine rhyme, 'F' and 'f' mean feminine rhyme, and uppercase characters denote the beginning of a strophe.



Number of lines

The field name is number_of_lines. It contains a number denotes the number of lines.



Number of strophes

The field name is number_of_strophes. It contains a number denotes the number of strophes.




The field name is author. It contains a free text field denotes the author of the poem.




date date_qualifier
[ISO date format] ['before'|'after'|'circa'|'between']


?searchTerms=date:1321-00-00&startIndex=1 (the year 1321)
?searchTerms=date:1321-01-00&startIndex=1 (January, 1321)
?searchTerms=date:1321-01-01&startIndex=1 (1st of January, 1321)


melody melody_qualifier
01 poem was sung
01 has musical notation
02 has no musical notation
02 poem was not sung
03 undecideable




The field name is genre. It contains the genre of the poem. It should reference to a genre classification to be elaborated.


The field name is caesuras. It contains the free text description of caesuras in the poem.


language language_qualifier Language
[text: ISO 639-1, 639-2, and 639-3 language codes] (repeatable) one language
01 sporadic bilinguism
02 change language by verses
03 change language by strophes
04 the refrain and body of the strophe are in different languages

Interstrophical relations – level of rhymes

01 coblas singulars
02 coblas unissonans
03 coblas doblas
04 coblas ternas
05 coblas alternas

Interstrophical relations - primary level note

The field name is interstrophical_relations_level1_note. It contains the free text note related to the previous field.

Interstrophical relations - secondary level

01 coblas capcaudadas
02 coblas capfinidas
03 coblas capdenals (niveau des strophes)
04 rimes constantes
05 acrostichon
06 telestichon
07 prayer with glosses
08 alphabetical poem
09 coblas retrogradadas
10 dialogue (the participants recite the strophe in alternance)
11 cantio cum auctoritate

Interstrophical relations - secondary level note

The field name is interstrophical_relations_level2_note. It contains the free text note related to the previous field.


refrain refrain_qualifier
01 without refrain
02 with refrain
02-01-01 identical refrain
02-01-02 variation at the beginning
03 with (a joint) refrain
03-01-01 initial refrain
03-01-02 not initial refrain
04 multiple refrains


Each OpenSearch implementor should publish its implementation via a descriptor file in order to the search engine understands the implementation details they support. The descriptor file is described with details in the OpenSearch standard. Here we show you an example, the RPHA's description file (you can access it at

   RPHA Web Search
   RPHA Web Search
   RPHA OpenSearch interface
   Seláf Levente, Király Péter
   rpha poems web
   Creative Commons



The base structure of the response fit to Atom RSS. In the element there are some header fields, which contains information relevant to the whole response, and a number of elements, for the individual results. In the header part of the response there are some important elements:

  • : the total number of results
  • : count number of the first element of the returned part of hit list (important: the first element's count number is 1, and not 0)
  • : the number of records in one response

In the elements the implementors should provide three elements in project specific way:

  1. the element should contain the identifier of the repository, and the identifier of the record separated by a space character. For example: RPHA 0373
  2. the element should contain the URL of the record
  3. the element should make use of fields defined inside the project's own namepsace (which is in the sample implementation). The field are the same what we use in the query term.

An example:

  RPHA results
  RPHA results
    RPHA 0373
       Emlékezzünk, én uraim, régen lett dologról
       Rusztán császár históriája
       Drávamelléki Névtelen
    RPHA 0381
       Én lelkecském, búdosócskám, hízelkedőcském
       Magyari István
    RPHA 2052
       Az elefánt nagy, mégis megöletik
       Bornemisza Péter?
    RPHA 2053
       Én császár nem lennék
       Bornemisza Péter?
       6 6 6 7
    RPHA 1340
       Szólok szerelem dolgáról nektek
       Paris és Görög Ilona históriája
       Lévai Névtelen
    RPHA 2054
       Bújdosó édes lelkecském
       8 8 910 9
    RPHA 3216
       Ó, Istennek teste édesség, e világnak oltalma
       Könyörgés a kenyér színe alatt jelenlévő Krisztushoz
    RPHA 3211
       Krisztus feltámada menten nagy kínjából
       Húsvéti népének
       6 7 7 7 4
    RPHA 3209
       Jephtes históriája
       Balassi Bálint
    RPHA 3202
       Az újesztendő kezdessék tőled, Úristen

In these items you can find, that the field names are the same what we described in the vocabulary section of this paper. There are some minor differences however: the federated genre classification has not been created, so RPHA uses its own classification, and date is not full conform of the ISO date standard.


The MegaRep source code is available as Open Source software at The working implementation is available at The RPHA source code is also available at, the OpenSearch endpoint is Both implementation was writen in Java using Apache Struts 1.0 framework. The MegaRep contains translation files to English, French and Hungarian languages, so both search and record retrieval are available in all three languages.


The background of this proposal, i.e. the services, and the data dictionary were created 5 years ago, but it was never documented other than a bunch of spreadsheet and readme files. Recently I had to find the documentations regarding to my work on RPHA, and unexpectedly I also found the Megarep's files, so I thought it's high time to create this proposal, even if I think some parts are outdated in the light of the advances of TEI and Linked Data. There is a Hungarian proverb: it is better to do it later than never. So while we don't come with an up-to-date proposal, here you can read and use this one. If you have any suggestion, please write us.

Megosztom Facebookon! Megosztom Twitteren! Megosztom Tumblren!

Mozilla Festival Day 2: Notes from Having Fun and Sharing Gratitude in Distributed Online Communities / Cynthia Ng

Interesting session on Having Fun and Sharing Gratitude in Distributed Online Communities. Here are some notes. Facilitators J.Nathan Matias (MIT Media Lab / Awesome Knowledge Foundation) @natematias – research on gratitude Vanessa Gennarelli (P2PU) – build communities online Fewer options to celebrate things together in distributed communities. Examples: Yammer Praise KudoNow (performance review) Wikipedia Thanks (for […]

Let’s imagine a creative format for Open Access / Open Knowledge Foundation

This post is part of our Open Access Week blog series to highlight great work in Open Access communities around the world. It is written by Celya Gruson-Daniel from Open Knowledge France and reports from “Open Access Xsprint”, a creative workshop held on October 20 in the biohackerspace La Paillasse in Paris – as announced here.

More and more information is available online about Open Access. However it’s difficult to process all this content when one is a busy PhD Student or researcher. Moreover, people already informed and convinced are often the main spectators. The question thus becomes : How to spread the world about Open Access to a large audience ? (researchers, students but also people who are not directly concerned). With the HackYourPhD community, we have been developing initiatives to invent new creative formats and to raise curiosity and/or interest about Open Access. Open Access Week was a perfect occasion to propose workshops to experiment with those kinds of formats.

An Open Access XSprint at La Paillasse

During the Open Access Week, HackYourPhD with Sharelex design a creative workshop called the Open Access Xsprint (X standing for media). The evening was held on October 20 in the biohackerspace La Paillasse in Paris with the financial support of a Generation Open Grant (Right to Research Coalition)

The main objective was to produce appealing guidelines about the legal aspects and issues of Open Access through innovative formats such as livesketching, or comics. HackYourPhD has been working with Sharelex on this topic for several months. Sharelex aims at providing access to the law to everyone with the use of a collaborative workshop and forum. A first content has been produced in French and was used during the Open Access XSprint.

One evening to invent creative formats about Open Access

These sessions brings together illustrators, graphic designers, students, researchers. After a short introduction to get to know each other, the group discussed about the meaning of Open Access and its definition. First Livesketching and illustration emerged.


In a second time, two groups were composed. One group worked on the different meaning of Open Access with a focus on the Creative Commons licences.


The other group discussed about the development of the different Open Access models and their evolution (Green Open Access, 100% Gold Open Access, hybrid Journal, Diamond, Platinum). The importance of Evaluation was raised. It appears to be one of the brakes in the Open Access transition.

After an open buffet, each group presented their work. A future project was proposed. It will consist of personalizing a scientific article and inventing its different “”life””. An ingenious way to present the different Open Access Models.


Explore also our storify “Open Access XSprint”

Next Step: Improvisation Theatre and Open Access

To conclude the Open Access Week, another event will be organized on October 24 in a science center (Espace Pierre Gilles de Gennes) with HackYourPhD and Sharelex, and the financial support of Couperin/FOSTER.

This event aims at exploring new format to communicate about Open Access. An improvisation theatral company will participate to this event. The presentations of different speakers about Open Access will be interspersed with short improvisation. The main topic of this evening will be the stereotypes or false ideas about Open Access. Bring an entertaining and original view is a way to discuss about Open Access for a large public, and maybe a starter to help them to become curious and to continue exploring this crucial topic for researchers and all citizen.

Licence Creative Commons Ce(tte) œuvre est mise à disposition selon les termes de la Licence Creative Commons Attribution – Partage dans les Mêmes Conditions 4.0 International.

Nature-branded journal goes Open Access-only: Can we celebrate already? / Open Knowledge Foundation

This post is part of our Open Access Week blog series to highlight great work in Open Access communities around the world. It is written by Miguel Said from Open Knowledge Brazil and is a translated version of the original that can be found the Brazilian Open Science Working Group's blog.

Open access 2(1)Nature Publishing Group reported recently that in October, its Nature Communications journal will become open access only: all articles published after this date will be available for reading and re-using, free of charge (by default they will be published under a Creative Commons Attribution license, allowing virtually every type of use). Nature Communications was a hybrid journal, publishing articles with the conventional, proprietary model, or as open access if the author paid a fee; but now it will be exclusively open access. The publishing group that owns Science recently also revealed an open access only journal, Science Advances – but with a default CC-NC license, which prevents commercial usages.

So we made it: the greatest bastions of traditional scientific publishing are clearly signaling support for open access. Can we pop the champagne already?

This announcement obviously has positive aspects: for example, lives can be saved in poor countries where doctors may have access to the most up-to-date scientific information – information that was previously behind a paywall, unaffordable for most of the Global South. Papers published under open access also tend to achieve more visibility, and that can benefit the research in countries like Brazil, where I live.

The overall picture, however, is more complex than it seems at first sight. In both cases, Nature and Science adopt a specific model of open access: the so-called "gold model", where publication in journals is usually subject to a fee paid by authors of approved manuscripts (the article processing charge, or APC). In this model, access to articles is thus open to readers and users, but access to the publication space is closed, in a sense, being only available to the authors who can afford the fee. In the case of Nature Communications, the APC is $5000, certainly among the highest in any journal (in 2010, the largest recorded APC was US $ 3900 – according to the abstract of this article… which I cannot read, as it is behind a paywall).

This amounts to two months of the net salary of a professor in state universities in Brazil (those in private universities would have to work even longer, as their pay is generally lower). Who is up for spending 15%+ of their annual income to publish a single article? Nature reported that it will waive the fee for researchers from a list of countries (which does not include Brazil, China, India, Pakistan and Libya, among others), and for researchers from elsewhere on a "case by case" basis – but they did not provide any further objective information about this policy. (I suspect it is better not to count on the generosity of a publisher that charges us $32 to read a single article, or $18 for a single piece of correspondence [!] from its journals.)

On the other hand, the global trend seems to be that the institutions with which researchers are affiliated (the universities where they work, or the scientific foundations that fund their research) bear part of these charges, partly because of the value these institutions attach to publishing in high-impact journals. In Brazil, for example, FAPESP (one of the largest research foundations in Latin America) provides a specific line of funding to cover these fees, and also considers them as eligible expenses for project grants and scholarships. As it happens, however, the funds available for this kind of support are limited, and in general they are not awarded automatically; in the example of FAPESP, researchers compete heavily for funding, and one of the main evaluation criteria is – as in so many situations in academic bureaucracy today – the researcher's past publication record:

Analysis criteria [...] a) Applicant's Academic Record a.1) Quality and regularity of scientific and / or technological production. Important elements for this analysis are: list of publications in journals with selective editorial policy; books or book chapters [...]

Because of this reason, the payment of APCs by institutions has a good chance of feeding the so called "cumulative advantage" feedback loop in which researchers that are already publishing in major journals get more money and more chances to publish, while the underfunded remain that way.

The advancement of open access via the gold model also involves another risk: the proliferation of predatory publishers. They are the ones that make open access publishing (with payment by authors or institutions) a business where profit is maximized through the drastic reduction of quality standards in peer review – or even the virtual elimination of any review: if you pay, you are published. The risk is that on the one hand, predatory publishing can thrive because it satisfies the productivist demands imposed on researchers (whose careers are continually judged under the light of the publish or perish motto); and on the other hand, that with the gold model the act of publishing is turned into a commodity (to be sold to researchers), marketable under high profit rates - even without the intellectual property-based monopoly that was key to the economic power mustered by traditional scientific publishing houses. In this case, the use of a logic that treats scientific articles strictly as commodities results in pollution and degradation of humankind's body of scientific knowledge, as predatory publishers are fundamentally interested in maximizing profits: the quality of articles is irrelevant, or only a secondary factor.

Naturally, I do not mean to imply that Nature has become a predatory publisher; but one should not ignore that there is a risk of a slow corruption of the review process (in order to make publishing more profitable), particularly among those publishing houses that are "serious" but do not have as much market power as Nature. And, as we mentioned, on top of that is the risk of proliferation of bogus journals, in which peer review is a mere facade. In the latter case, unfortunately this is not a hypothetical risk: the shady "business model" of predatory publishing has already been put in place in hundreds of journals.

Are there no alternatives to this commodified, market-oriented logic currently in play in scientific publishing? Will this logic (and its serious disadvantages) be always dominant, regardless if the journal is "proprietary" or open access? Well, not necessarily: even within the gold model, there are promising initiatives that do not adhere strictly to this logic – that is the case of the Public Library of Science (PLOS), an open access publishing house that charges for publication, but works as a nonprofit organization; because of that, it has no reason to eliminate quality criteria in the selection of articles in order to obtain more profits from APCs. Perhaps this helps explain the fact that PLOS has a broader and more transparent fee waiver policy for poor researchers (or poor countries) than the one offered by Nature. And finally, it is worth noting that the gold model is not the only open access model: the main alternative is the "green model", based on institutional repositories. This model involves a number of challenges regarding coordination and funding, but it also tends not to follow a strictly market-oriented logic, and to be more responsive to the interests of the academic community. The green model is hardly a substitute for the gold one (even because it is not designed to cover the costs of peer review), but it is important that we join efforts to strengthen it and avoid a situation where the gold model becomes the only way for scientists and scholars in general to release their work under open access.

(My comments here are directly related to my PhD thesis on commons and commodification, where these issues are explored in a bit more detail – especially in the Introduction and in Chapter 4, pp. 17-20 and 272-88; unfortunately, it's only available in Portuguese as of now. This post was born out of discussions in the Brazilian Open Science Working Group's mailing list; thanks to Ewout ter Haar for his help with the text.)

Mozilla Festival Day 1: Closing Keynotes / Cynthia Ng

We ended the first day with closing plenary featuring numerous people. Marc Surman was back on stage to help set the context of the evening talks. 10 5 minute talks, relay race. Mobile and the Future Emerging Markets and Adoption Chris Locke emerging markets in explosion of adoption of mobile social good example: mobile to […]

Citations get HOT / Karen Coyle

The Public Library of Science research section, PLOSLabs ( has announced some very interesting news about the work that they are doing on citations, which they are calling "Rich Citations".

Citations are the ultimate "linked data" of academia, linking new work with related works. The problem is that the link is human-readable only and has to be interpreted by a person to understand what the link means. PLOS Labs have been working to make those citations machine-expressive, even though they don't natively provide the information needed for a full computational analysis.

Given what one does have in a normal machine-readable document with citations, they are able to pull out an impressive amount of information:
  • What section the citation is found in. There is some difference in meaning whether a citation is found in the "Background" section of an article, or in the "Methodology" section. This gives only a hint to the meaning of the citation, but it's more than no information at all.
  • How often a resource is cited in the article. This could give some weight to its importance to the topic of the article.
  • What resources are cited together. Whenever a sentence ends with "[3][7][9]", you at least know that those three resources equally support what is being affirmed. That creates a bond between those resources.
  • ... and more
As an open access publisher, they also want to be able to take users as directly as possible to the cited resources. For PLOS publications, they can create a direct link. For other resources, they make use of the DOI to provide links. Where possible, they reveal the license of cited resources, so that readers can know which resources are open access and which are pay-walled.

This is just a beginning, and their demo site, appropriately named "alpha," uses their rich citations on a segment of the PLOS papers. They also have an API that developers can experiment with.

I was fortunate to be able to spend a day recently at their Citation Hackathon where groups hacked on ongoing aspects of this work. Lots of ideas floated around, including adding abstracts to the citations so a reader could learn more about a resource before retrieving it. Abstracts also would add search terms for those resources not held in the PLOS database. I participated in a discussion about coordinating Wikidata citations and bibliographies with the PLOS data.

Being able to datamine the relationships inherent in the act of citation is a way to help make visible and actionable what has long been the rule in academic research, which is to clearly indicate upon whose shoulders you are standing. This research is very exciting, and although the PLOS resources will primarily be journal articles, there are also books in their collection of citations. The idea of connecting those to libraries, and eventually connecting books to each other through citations and bibliographies, opens up some interesting research possibilities.

Open Access Week in Nepal / Open Knowledge Foundation

This post is part of our Open Access Week blog series to highlight great work in Open Access communities around the world.

Open Access Week was celebrated for the first time in Nepal for the opening 2 days: October 20, 21. The event, which was led by newly founded Open Access Nepal, and supported by EIFL and R2RC, featured a series of workshops, presentation, and peer to peer discussions and training by country leaders in Open Access, Open Knowledge, and Open Data including a 3 hour workshop on Open Science and Collaborative Research by Open Knowledge Nepal on the second day.

Open Access Nepal is a student led initiative that mostly includes students of MBBS. Most of the audience of Open Access Week celebrations here, hence, included med students, but engineering students, management students, librarians, professionals, and academics were also well represented. Participants discussed open access developments in Nepal and their roles in promoting and advancing open access.

EIFL and Right to Research Coalition provided financial support for the Open Access Week in Nepal. EIFL Open Access Program Manager Iryna Kuchma attended the conference as speaker and facilitator of workshops.

Skærmbillede 2014-10-23 kl. 16.37.26

Open Knowledge Nepal hosted an interactive session on Open Science and Collaborative Research on the second day of two. The session we led by Kshitiz Khanal, Team Leader of Open Access / Open Science for Open Knowledge Nepal with support from Iryna Kuchma and Nikesh Balami, Team Leader of Open Government Data. About 8-10 Open Access experts of the country were present inside the hall to assist participants. The session began a half an hour before lunch where participants were first asked to brainstorm till lunch was over about what they think Open Science and Collaborative Research is, and the challenges relevant to Open Access that they have faced / might face in their Research endeavors. The participants were seated in round tables in groups of 7-8 persons, making a total of 5 groups.

After lunch, one team member from each group took turns in the front to present the summary of their brain-storming in colored chart papers. Participants came up with near exact definitions and reflected the troubles researchers in the country have been facing regarding Open Access. As we can expect of industrious students, some groups impressed the session hosts and experts with interesting graphical illustrations.

Skærmbillede 2014-10-23 kl. 16.39.09

Skærmbillede 2014-10-23 kl. 16.39.39

Iryna followed the presentations by her presentation where she introduced the concept, principles, and examples related to Open Science. Kshitiz followed Iryna with his presentation on Collaborative Research.

Skærmbillede 2014-10-23 kl. 16.40.14

Session on Collaborative Research featured industry – academia collaborations facilitated by government. Collaborative Research needs more attention in Nepal as World Bank’s data of Nepal shows that total R&D investment is only equivalent to 0.3% of total GDP. Lambert Toolkit, created by the Intellectual Property Office of the UK, was also discussed. The toolkit provides agreement samples for industry – university collaborations, multi–party consortiums and few decision guides for such collaborations. The session also introduced version control and discussed simple web based tools for Collaborative Research like Google Docs, Etherpads, Dropbox, Evernote, Skype etc.

On the same day, Open Nepal also hosted a workshop about open data, and a session on Open Access Button was hosted by the organizers. Sessions in the previous day included sessions that enlightened the audience about Introduction to Open Access, Open Access Repositories, and growing Open Access initiatives all over the world.

This event dedicated to Open Access in Nepal was well received in the Open Communities of Nepal which has mostly concerned themselves with Open Data, Open Knowledge, and Open Source Software. A new set of audience became aware of the philosophy of Open. This author believes the event was a success story.

Skærmbillede 2014-10-23 kl. 16.41.08

IL2014: More Library Mashups Signing/Talk / Nicole Engard

I’m headed to Monterey for Internet Librarian this weekend. Don’t miss my talk on Monday afternoon followed by the book signing for More Library Mashups.

From Information Today Inc:

This October, Information Today, Inc.’s most popular authors will be at Internet Librarian 2014. For attendees, it’s the place to meet the industry’s top authors and purchase signed copies of their books at a special 40% discount.

The following authors will be signing at the Information Today, Inc., on Monday, October 27 from 5:00 to 6:00 P.M. during the Grand Opening Reception

Book Signing

The post IL2014: More Library Mashups Signing/Talk appeared first on What I Learned Today....

Mozilla Festival Day 1: CC Tools for Makers / Cynthia Ng

Creative Commons folks hosted a discussion on barriers and possible solutions to publishing and using CC licensed content. Facilitators Ryan Merkley (CEO) Matt Lee (Tech Lead) Ali Al Dallal (Mozilla Foundation) Our Challenge our tech is old, user needs are unmet (can be confusing, don’t know how to do attribution) focus on publishing vs. sharing […]