Planet Code4Lib

LibGuides, you’re not “Web 2.0″ without an open API / Meredith Farkas

Springshare  LibGuides   Web 2.0 for Library 2.0

Update: I’ve been in touch with a Springshare representative who tells me that things like the contextually aware D2L widget from Portland State University will work in LibGuides 2.0 and apparently, the responses we’d received from support were based on hypotheticals (though we’d explicitly sent the link to PSU’s code in our emails to support). This is very good news, but I am dismayed that it takes a blog post to receive a straight answer, because what we’d heard from support originally was that there was a change in access to the API. What I do know for sure is that if you want to use JSON data and full access to the API, you will need to upgrade to the CMS product and that what they used to call an API wasn’t a true API. So there isn’t access to the full API with LibGuides 2.0, but apparently there never was, FWIW.

When you think of Web 2.0 (a term I know you know I dislike), what do you think of? Rounded corners? The read/write web? Social media? Collaboration? The wisdom of crowds? How about open APIs? Maybe that last one doesn’t come to mind for most people who aren’t web developers, but open APIs are critical to so many of the “2.0″ web services we rely on. For those who don’t know what an API is, it is short for Application Programming Interface, and it is what allows developers to pull content or data from one web service into another. One application will make a call to the other through the API to pull in updated data/content regularly. It is the technology behind many incredible mashups out there (I’m particularly in love with those that layer data on top of Google Maps) and is how so many of our web services connect to one another. Use the wonderfully clever IFTTT (if this then that)? Won’t work without APIs. You don’t have to be a programmer to recognize the value of all that.

I thought APIs were so integral to social software back in 2005 that I wrote about them in my book Social Software in Libraries. My writing on the topic was originally going to be an entire chapter (Chapter 2 to be precise), but the powers-that-be wanted it edited down, so it ended up in the chapter on “the future.” And, as predicted, it has become an increasingly important part of web 2.0 services in the ensuing years.

LibGuides was originally marketed as the Web 2.0 version of subject and course guides. It offered a Web 2.0 look-and-feel as well as tools to gather user feedback. And it certainly made it easier for web design novices to develop decent-looking guides (and ugly ones too as I’m sure we’ve all seen). One great 2.0 feature of LibGuides was the open API, which allowed you to pull content from LibGuides into other websites, like the library website or a Learning Management System (like D2L, Blackboard, Canvas, etc.). This was what my colleague Mike at Portland State relied on to create our contextually aware D2L widget that connected students from their D2L course homepage directly to the appropriate course guide (where one existed) or subject guide to support their research. I led our adoption of LibGuides at PSU and this widget was one of the best things to come out of it.

I’m now in serious deja vu mode as I work with colleagues here at Portland Community College to implement LibGuides (they’d been using the open source Library a la Carte for many years). It’s actually great to have a second chance to implement LibGuides knowing what I know now and what I wish I’d done early on. Many of our students transfer to Portland State eventually, so the college tries to create a consistent experience for students wherever possible. Moving to LibGuides is another positive step in that direction and we planned to use the PSU widget to make our D2L instance even more in-step with PSU (and easier for students to use). However, all our plans went on hold when we were told by Springshare that the open API was not a part of their LibGuides offerings in LibGuides 2.0 (they told us we have access to Tools –> Widgets, “many of which replicate what you might recall as API in v1″). However, they happily informed us that it was a part of their CMS product, so if we were willing to give them more money, we could make the widget work.

LibGuides promoted itself early on as being different from other library vendors, yet this move is exactly what I’d expect from a vendor that knows they have a critical mass of customers, next-to-no competition, and knows users by and large won’t leave. So, in order to pull more customers into their more expensive product, they make a certain feature that was part of their cheaper product now only available in the more expensive one. This is similar to the move EBSCO made when they pulled certain critical history journals our of Academic Search Premier and made them only available in their America: History and Life and Historical Abstracts full text products (which mostly otherwise contained junk at that point). Do you really want to be on par with EBSCO, Springshare?

But the funny thing in this case is that I can’t find this information about the API no longer being open to LibGuides customers anywhere on their website. In fact, I find evidence of the opposite being true. So, if this is the case, it is not only a crappy thing to do, but, unless I’ve lost my ability to search a website, on nebulous legal ground, because people are making purchasing decisions based on the evidence on their website that they will have the same API access in LibGuides as in the CMS. As the person who promoted and helped get LibGuides adopted at two institutions, I am seriously pissed off.

Comparing LibGuides to CMS

My next American Libraries column is all about how we can’t be complacent with library vendors, some of whom continually change (and in many cases decrease) their offerings without decreasing their price tag. This is just another example of the sort of stuff that goes on all the time and that we should not accept without a fight. These companies cannot survive without us, yet they know that ditching them will be painful for the library and for our patrons. We have to find ways to flex our collective muscle even when it hurts (especially when we are part of large and powerful consortia… cough cough…Orbis Cascade Alliance… cough cough) to advocate for what will best serve our patrons. Otherwise, we are not acting as good advocates for our patrons nor good stewards of the funds we receive.

Image credit: From 2007 LibGuides website, which focused on how it brought “the benefits of Library 2.0 to your institution.” Courtesy of the Wayback Machine.

Steve Reich phase pieces with Sonic Pi / William Denton

The first two of the phase pieces Steve Reich made in the sixties, working with recorded sounds and tape loops, were It’s Gonna Rain (1965) and Come Out (1966), both of which are made of two loops of the same fragment of speech slowly going out of phase with each other and then coming back together as the two tape players run at slightly different speeds. I was curious to see if I could make phase pieces with Sonic Pi, and it turns out it takes little code to do it.

Here is the beginning of Reich’s notes on “It’s Gonna Rain (1965)” in Writings on Music, 1965–2000 (Oxford University Press, 2002):

Late in 1964, I recorded a tape in Union Square in San Francisco of a black preacher, Brother Walter, preaching about the Flood. I was extremely impressed with the melodic quality of his speech, which seemed to be on the verge of singing. Early in 1965, I began making tape loops of his voice, which made the musical quality of his speech emerge even more strongly. This is not to say that the meaning of his words on the loop, “it’s gonna rain,” were forgotten or obliterated. The incessant repetition intensified their meaning and their melody at one and the same time.


I discovered the phasing process by accident. I had two identical tape loops of Brother Walter saying “It’s gonna rain,” and I was playing with two inexpensive tape recorders—one jack of my stereo headphones plugged into machine A, the other into machine B. I had intended to make a specific relationship: “It’s gonna” on one loop against “rain” on the other. Instead, the two machines happened to be lined up in unison and one of them gradually started to get ahead of the other. The sensation I had in my head was that the sound moved over to my left ear, down to my left shoulder, down my left arm, down my leg, out across the floor to the left, and finally began to reverberate and shake and become the sound I was looking for—“It’s gonna/It’s gonna rain/rain”—and then it started going the other way and came back together in the center of my head. When I heard that, I realized it was more interesting than any one particular relationship, because it was the process (of gradually passing through all the canonic relationships) making an entire piece, and not just a moment in time.

The audio sample

First I needed a clip of speech to use. Something with an interesting rhythm, and something I had a connection with. I looked through recordings I had on my computer and found an interview my mother, Kady MacDonald Denton, had done in 2007 on CBC Radio One after winning the Elizabeth Mrazik-Cleaver Canadian Picture Book Award for Snow.

She said something I’ve never forgotten that made me look at illustrated books in a new way:

A picture book is a unique art form. It is the two languages, the visual and the spoken, put together. It’s sort of like a—almost like a frozen theatre in a way. You open the cover of the books, the curtain goes up, the drama ensues.

I noticed something in “theatre in a way,” a bit of rhythmic displacement: there’s a fast ONE-two-three one-two-THREE rhythm.

That’s the clip to use, I decided: the length is right (1.12 seconds), the four words are a little mysterious when isolated like that, and the rhythm ought to turn into something interesting.

Photograph of interesting bricks Interesting bricks.

Phase by fractions

The first way I made a phase piece was not with the method Reich used but with a process simpler to code: here one clip repeats over and over while the other starts progressively later in the clip each iteration, with the missing bit from the start added on at the end.

The start and finish parameters specify where to clip a sample: 0 is the beginning, 1 is the end, and ratio grows from 0 to 1. I loop n+1 times to make the loops play one last time in sync with each other.

full_quote = "~/music/sonicpi/theatre-in-a-way/frozen-theatre-full-quote.wav"

theatre_in_a_way = "~/music/sonicpi/theatre-in-a-way/frozen-theatre-theatre-in-a-way.wav"

length = sample_duration theatre_in_a_way

puts "Length: #{length}"

sample full_quote
sleep sample_duration full_quote
sleep 1

4.times do
  sample theatre_in_a_way
  sleep length + 0.3

# Moving ahead by fractions of a second

n = 100

(n+1).times do |t|
  ratio = t.to_f/n # t is a Fixnum, but we need ratio to be a Float
  # This one never changes
  sample theatre_in_a_way, pan: -0.5
  # This one progresses through
  sample theatre_in_a_way, start: ratio, finish: 1, pan: 0.5
  sleep length - length * ratio
  sample theatre_in_a_way, start: 0, finish: ratio, pan: 0.5
  sleep length*ratio

This is the result:

“Music as a gradual process”

A few quotes from Steve Reich’s “Music as a Gradual Process” (1968), also in Writings on Music, 1965–2000:

I do not mean the process of composition but rather pieces of music that are, literally, processes.

The distinctive thing about musical processes is that they determine all the note-to-note (sound-to-sound) details and the overall form simultaneously. (Think of a round or infinite canon.)

Although I may have the pleasure of discovering musical processes and composing the material to run through them, once the process is set up and loaded it runs by itself.

What I’m interested in is a compositional process and a sounding music that are one and the same thing.

When performing and listening to gradual musical processes, one can participate in a particular liberating and impersonal kind of ritual. Focusing in on the musical process makes possible that shift of attention away from he and she and you and me outwards toward it.

In Sonic Pi we do all this with code.

Mesopotamian wall cone mosaic Mesopotamian wall cone mosaic at the Metropolitan Museum of Art in New York City.

Phase by speed

The second method is to run one loop a tiny bit faster than the other and wait for it to eventually come back around and line up with the fixed loop. This is what Reich did, but here we achieve the effect with code, not analog tape players.

The rate parameter controls how fast a sample is played (< 1 is slower, > 1 is faster), and if n is how many times we want the fixed sample to loop then the faster sample will have length length - (length / n) and play at rate (1 - 1/n.to_f) (the number needs to be converted to a Float for this to work). It needs to loop n * length / phased_length) times to end up in sync with the steady loop. (Again I add 1 to play both clips in sync at the end as they did in the beginning.)

For example, if the sample is 1 second long and n = 100, then the phased sample would play at rate 0.99, be 0.99 seconds long, and play 101 times to end up, after 100 seconds (actually 99.99, but close enough) back in sync with the steady loop, which took 100 seconds to play 1 second of sound 100 times.

It took me a bit of figuring to realize I had to convert numbers to Float or Integer here and there to make it all work, which is why to_f and to_i are scattered around.

full_quote = "~/music/sonicpi/theatre-in-a-way/frozen-theatre-full-quote.wav"

theatre_in_a_way = "~/music/sonicpi/theatre-in-a-way/frozen-theatre-theatre-in-a-way.wav"

length = sample_duration theatre_in_a_way

puts "Length: #{length}"

sample full_quote
sleep sample_duration full_quote
sleep 1

4.times do
  sample theatre_in_a_way
  sleep length + 0.3

# Speed phasing

n = 100

phased_length = length - (length / n)

# Steady loop
in_thread do
  (n+1).times do
    sample theatre_in_a_way
    sleep length

# Phasing loop
((n * length / phased_length) + 1).to_i.times do
  sample theatre_in_a_way, rate: (1 - 1/n.to_f)
  sleep phased_length

This is the result:

Set n to 800 and it takes over fifteen minutes to evolve. The voice gets lost and just sounds remain.

“Time for something new”

In notes for “Clapping Music (1972)” (which I also did on Sonic Pi), Reich said:

The gradual phase shifting process extremely useful from 1965 through 1971, but I do not have any thoughts of using it again. By late 1972, it was time for something new.

ALA responds to proposed changes to the Code of Federal Regulations / District Dispatch

Photo by Wknight94

Photo by Wknight94

On Monday, ALA submitted comments to the National Archives and Records Administration’s Administrative Committee of the Federal Register regarding the proposed changes to the Code of Federal Regulations (CFR).

As states, “The Federal Register is an official daily legal publication that informs citizens of: rights and obligations, opportunities for funding and Federal benefits, and actions of Federal agencies for accountability to the public” and the CFR contains “Federal rules that have: general applicability to the public, current and future effect as of the date specified”.

Both documents are important in aiding researchers and promoting a more transparent government. Given a library’s role of providing public access to government information of all types, ALA was pleased at the opportunity to submit comments.

The post ALA responds to proposed changes to the Code of Federal Regulations appeared first on District Dispatch.

National Library Legislative Day 2015 / District Dispatch

Lisa Rice visits with Rep. Brett Guthrie (R-KY), NLLD 2014

Lisa Rice visits with Rep. Brett Guthrie (R-KY), NLLD 2014

Good news! Registration for the 41st annual National Library Legislative Day is now open!

This two-day advocacy event brings hundreds of librarians, trustees, library supporters, and patrons to Washington, D.C. to meet with their Members of Congress to rally support for libraries issues and policies.

Registration information and hotel booking information are available on the ALA Washington Office website.

This year, National Library Legislative Day will be held May 4-5, 2015. Participants will receive advocacy tips and training, along with important issues briefings prior to their meetings.

First-time participants are eligible for a unique scholarship opportunity. The White House Conference on Library and Information Services Taskforce (WHCLIST) and the ALA Washington Office are calling for nominations for the 2015 WHCLIST Award. Recipients of this award receive a stipend ($300 and two free nights at a D.C. hotel) to a non-librarian participant in National Library Legislative Day.

For more information about the WHCLIST award or National Library Legislative Day, visit Questions or comments can be directed to grassroots coordinator Lisa Lindle.

The post National Library Legislative Day 2015 appeared first on District Dispatch.

LC and OCLC Collaborate on Linked Data / HangingTogether

coverAs we have said before, the Library of Congress and OCLC have been sharing information and approaches regarding library linked data. In a nutshell, we have two different use cases and strategies that we believe are compatible and complementary.

Now, in a just co-published white paper, we are beginning to share more details and evidence that this is the case.

This is actually just a high-level view of a more technical review of our approaches, and more details will be forthcoming in the months ahead. The Library of Congress’ main use case is to transition from MARC into a linked data world that will enable a much richer and more full-featured interface to library data. OCLC’s use case is to syndicate library data at scale into the wider web, as well as enabling richer online interactions for end-users.

OCLC is of course committed to enabling our member libraries to obtain the vital metadata they need for their work in appropriate formats, including BIBFRAME. This is one of the things we make clear in this paper.

As always, we want to know what you think. So download the paper, read it, and let us know in the comments below, or by email to the authors (their addresses on the title page verso) what you think.

About Roy Tennant

Roy Tennant works on projects related to improving the technological infrastructure of libraries, museums, and archives.

A Life Worth Noting / Ed Summers

There are no obituaries for the war casualties that the United States inflicts, and there cannot be. If there were to be an obituary there would had to have been a life, a life worth noting, a life worth valuing and preserving, a life that qualifies for recognition. Although we might argue that it would be impractical to write obituaries for all those people, or for all people, I think we have to ask, again and again, how the obituary functions as the instrument by which grievability is publicly distributed. It is the means by which a life becomes, or fails to become, a publicly grievable life, and icon for national self-recognition, the means by which a life becomes noteworthy. As a result, we have to consider the obituary as an act of nation-building. The matter is not a simple one, for, if a life is not grievable, it is not quite a life; it does not qualify as a life and is not worth a note. It is already the unburied, if not the unburiable.

Precarious Life by Judith Butler, (p. 34)

Why We Need to Encrypt The Whole Web… Library Websites, Too / LITA

The Patron Privacy Technologies Interest Group was formed in the fall of 2014 to help library technologists improve how well our tools protect patron privacy.  As the first in a series of posts on technical matters concerning patron privacy, please enjoy this guest post by Alison Macrina.

When using the web for activities like banking or shopping, you’ve likely seen a small lock symbol appear at the beginning of the URL and noticed the “HTTP” in the site’s address switch to “HTTPS”. You might even know that the “s” in HTTPS stands for “secure”, and that all of this means that the website you’ve accessed is using the TLS/SSL protocol. But what you might not know is that TLS/SSL is one of the most important yet most underutilized internet protocols, and that all websites, not just those transmitting “sensitive” information, should be using HTTPS by default.

To understand why TLS/SSL is so important for secure web browsing, a little background is necessary. TLS/SSL is the colloquial way of referring to this protocol, but the term is slightly misleading – TLS and SSL are essentially different versions of a similar protocol. Secure Sockets Layer (SSL) was the first protocol used to secure applications over the web, and Transport Layer Security (TLS) was built from SSL as a standardized version of the earlier protocol. The convention of TLS/SSL is used pretty often, though you might see TLS or SSL alone. However written, it all refers to the layer of security that sits on top of HTTP. HTTP, or HyperText Transfer Protocol, is the protocol that governs how websites send and receive data, and how that data is formatted. TLS/SSL adds three things to HTTP: authentication, encryption, and data integrity. Let’s break down those three components:

Authentication: When you visit a website, your computer asks the server on the other end for the information you want to access, and the server responds with the requested information. With TLS/SSL enabled, your computer also reviews a security certificate that guarantees the authenticity of that server. Without TLS/SSL, you have no way of knowing if the website you’re visiting is the real website you want, and that puts you at risk of something called a man-in-the-middle attack, which means data going to and from your computer can be intercepted by an entity masquerading as the site you intended to visit.

Fig. 1: Clicking the lock icon next to a site with TLS/SSL enabled will bring up a window that looks like one above. You can see here that Twitter is running on HTTPS, signed by the certificate authority Symantec.

Fig. 1: Clicking the lock icon next to a site with TLS/SSL enabled will bring up a window that looks like one above. You can see here that Twitter is running on HTTPS, signed by the certificate authority Symantec. [Image courtesy Alison Macrina]

Fig. 2: Clicking “more information” in the first window will bring up this window. In the security tab, you can see the owner of the site, the certificate authority that verified the site, and the encryption details.

Fig. 2: Clicking “more information” in the first window will bring up this window. In the security tab, you can see the owner of the site, the certificate authority that verified the site, and the encryption details. [Image courtesy Alison Macrina]

Fig. 3: Lastly, clicking the “view certificate” option in the previous window will bring up even more technical details, including the site's fingerprints and the certificate expiration date.

Fig. 3: Lastly, clicking the “view certificate” option in the previous window will bring up even more technical details, including the site’s fingerprints and the certificate expiration date. [Image courtesy Alison Macrina]

Data encryption: Encryption is the process of scrambling messages into a secret code so they can only be read by the intended recipient. When a website uses TLS/SSL, the traffic between you and the server hosting that website is encrypted, providing you with a measure of privacy and protection against eavesdropping by third parties.

Data integrity: Finally, TLS/SSL uses an algorithm that includes a value to check on the integrity of the data in transit, meaning the data sent between you and a TLS/SSL secured website cannot be tampered with or altered to add malicious code.

Authentication, encryption, and integrity work in concert to protect the data you send out over TLS/SSL enabled websites. In this age of widespread criminal computer hacking and overbroad surveillance from government entities like the NSA, encrypting the web against interception and tampering is a social necessity. Unfortunately, most of the web is still unencrypted, because enabling TLS/SSL can be confusing, and often some critical steps are left out. But the digital privacy rights advocates at the Electronic Frontier Foundation are aiming to change that with Let’s Encrypt, a free and automated way to deploy TLS/SSL on all websites, launching in Summer 2015. EFF has also built a plugin called HTTPS Everywhere which forces TLS/SSL encryption on websites where this protocol is supported, but not fully set up (a frequent occurrence).

As stewards of information and providers of public internet access, librarians have a special duty to protect the privacy of our patrons and honor the public trust we’ve worked hard to earn. Just as we continue to protect patron checkout histories from unlawful snooping, we should be actively protecting the privacy of patrons using our website, catalog, and public internet terminals:

  • Start by enabling TLS/SSL on our library websites and catalog (some instructions are here and here, and if those are too confusing, Let’s Encrypt goes live this summer. If your website is hosted on a server that is managed externally, ask your administrator to set up TLS/SSL for you).
  • Install the HTTPS Everywhere add-on on all library computers. Tell your patrons what it is and why it’s important for their digital privacy.
  • Urge vendors, database providers, and other libraries to take a stand for privacy and start using TLS/SSL.

Privacy is essential to democratic institutions like libraries; let’s show our patrons that we take that seriously.

Alison Macrina is an IT librarian in Massachusetts and the founder of the Library Freedom Project, an initiative aimed at bringing privacy education and tools into libraries across the country. Her website doesn’t have any content on it right now, but hey, at least it’s using HTTPS! 

The inaugural in-person meeting of the LITA Patron Privacy Interest Technologies Group is at Midwinter 2015 on Saturday, January 31st, at 8:30 a.m. Everybody interested in learning about patron privacy and data security in libraries is welcome to attend! You can also subscribe to the interest group’s mailing list.

Why GitHub is Important for Book Publishing / Eric Hellman

How do you organize large numbers of people for a common purpose? For millenia, the answer has been some sort of hierarchical organization. An army, or a feudal system topped with a king. To reach global scale, these hierarchies propagated customs and codes for behavior: laws, religions, ideology. Most of what you read in history books is really the history of these hierarchies. It wasn't possible to orchestrate big efforts or harness significant resources any other way.

In the 20th century, mass media redistributed much of this organizational power. In politics, charismatic individuals could motivate millions of people independently of the hierarchies that maintain command and control. But for the most part, one hierarchy got swapped for another. In business, production innovations such as Henry Ford's assembly line needed the hierarchy to support the capital investments.

I think the history of the 21st century will be the story of non-hierarchical systems of human organization enabled by the Internet. From this point of view, Wikipedia is particularly important not only for its organization of knowledge, but because it demonstrated that thousands of people can be organized with extremely small amounts of hierarchy. Anyone can contribute, anyone can edit, and many do. Bitcoin, or whatever cryptocurrency wins out, won't be successful because of a hierarchy but rather because of a framework of incentives for a self-interested network of entities to work together. Crowdfunding will enable resources to coalesce around needs without large hierarchical foundations or financial institutions.

So let's think a bit about book publishing. Through the 20th century, publishing required a signification amount of investment in capital- printing presses, warehouses, delivery trucks, bookstores, libraries, and people with specialized skills and abilities. A few large publishing companies emerged along with big-box retailers that together comprised an efficient machine for producing, distributing and monetizing books of all kinds. The transition from print to digital has eliminated need for the physical aspects of the book publishing machine, but the human components of that machine remain essential. It's no longer clear that the hierarchical organization of publishing is necessary for the organization of publishing's human effort.

I've already mentioned Wikipedia's conquest of encyclopedia publishing, by dint of its large scale and wide reach. But equally important to its success has been a set of codes and customs bound together in a suite of collaboration and workflow tools. Version tracking allows for easy reversion of edits. "Talk pages" and notifications facilitate communication and collaboration. (And edit-wars and page locking, but that's another bucket of fish.)

Most publishing projects have audiences that are too small or requirements too specific to support Wikipedia's anyone-can-edit-or-revert model of collaboration. A more appropriate model for collaboration in publishing  is one widely used for software development.

Modern software development requires people with different skills to work together. Book publishing is the same. Designers, engineers, testers, product managers, writers, and subject domain experts may each have an important role in creating a software application; authors, editors, proofreaders, illustrators, designers, subject experts, agents, and publicists may all work together on a book. Book publishing and software can be either open or proprietary. The team producing a book or a piece of software might number from one to a hundred. Books and programs can go into maintenance mode or be revised in new editions or versions. Translation into new languages happens for both. Assets from one project can be reused in other projects.

Open source software has been hugely successful over the past few decades. Along the way, an ecosystem of collaboration tools and practices has evolved to support both open source development and software development in general. Many aspects of this ecosystem have been captured in GitHub.

The "Git" in GitHub comes from git, an open source distributed version control system initially written by Linus Torvalds, the Linus behind Linux. It's fast, and it lets you work on a local code repository and then merge your changes with a repository stored somewhere else.

In just two sentences, I've touched on several concepts that may be foreign to many book publishing professionals. Microsoft Word's "track changes" is probably the closest that most authors get to a version control system. The big difference is that "track changes" is designed to facilitate collaboration between a maximum of two people. Git works easily with many contributors. A code "repository" holds more than just code, it can contain all the assets, documentation, and licenses associated with a project. And unlike "track changes", Git remembers the entire history of your project. Many book publishers still don't keep together all the assets that go into a book. And I'm guessing that publishers are still working on centralizing their asset stores instead of distributing them!

Git is just one of the useful aspects of GitHub. I think the workflow tools are perhaps more important. Developers talk about the workflow variants such as "git-flow" and "GitHub-flow", but the differences are immaterial to this discussion. Here's what it boils down to: Someone working on a project will first create a "feature branch", a copy of the repository that adds a feature or fixes a bug. When the new feature has been tested and is working, the changes will be "committed". Each set of changes is given an identifier and a message explaining what has been changed. The branch's developer then sends a "pull request" to the maintainers of the repository. A well crafted pull request will provide tests and documentation for the new feature. If the maintainers like the changes, they "pull" the changes into the main branch of the repository. Each of these steps is a push of a button on GitHub, and GitHub provides annotation, visualization and commenting tools that support discussions around each pull request, as well as issue lists and wiki pages.

The reason the workflow tools and the customs surrounding their use are so important is that anyone who has used them already knows how to participate in another project. For an excellent non-programming example, take a look at the free-programming-books repository, which is a basic list of programming books available online for free.  As of today, 512 different different people have contributed a total of 2,854 sets of changes the the repository, have expanded it to books in 23 languages, and have added free courses, screencasts and interactive tutorials. The maintainers enforce some basic standards and make sure that the list is free of pirated books and the like.

It's also interesting that there are 7,229 "forks" of free-programming-books. Each of these could be different. If the main free-programming-books repo disappears, or if the maintainers go AWOL, one of these forks could become the main fork. Or if one group of contributors want to move the project in a different direction from the maintainers, it's easy to do.

Forking a book is a lot more common than you might think. Consider the book Robinson Crusoe by Daniel Defoe. OCLC's WorldCat lists 7,459 editions of this book, each one representing significantly more effort than a button push in a workflow system. It's common to have many editions of out-of-copyright books of course, but it's also becoming common for books developed with open processes. As an example, look at the repository for Amy Brown and Greg Wilson's Architecture of Open Source Applications.  It has 5 contributors, and has been forked 58 times. For another example of using GitHub to write a book, read Scott Chacon's description of how he produced the second edition of Pro Git. (Are you surprised that a founder of GitHub is using GitHub to revise his book about Git?)

There's another aspect of modern software engineering with GitHub support that could be very useful for book publishing and distribution. "Continuous integration" is essential for development of complex software systems because changes in one component can have unintended effects on other components. For that reason, when a set of changes is committed to a project, the entire project needs to be rebuilt and retested. GitHub supports this via "hooks". For example, a "post-commit" hook can trigger a build-test apparatus; hooks can even be used to automatically deploy the new software version into production environments. In the making of a book, the insertion of a sentence might necessitate re-pagination and re-indexing. With continuous integration, you can imagine the correction of a typo immediately resulting in changes in all the copies of a textbook for sale. (or even the copies that had already been purchased!)

A number of startups have recognized the applicability of Git and GitHub to book publishing. Leanpub, GitBook, and Penflip are supporting GitHub backends for open publishing models; so far adoption has been most rapid in author communities that already "get" GitHub, for example, software developers. The company that is best able to teach a GitHub-like toolset to non-programmers will have a good and worthy business, I think.

As more people learn and exercise the collaboration culture of GitHub, new things will become possible. Last year, I became annoyed that I couldn't fix a problem I found with an ebook from Project Gutenberg. It seemed obvious to me that I should put my contributions into a GitHub repo so that others could easily make use of my work. I created a GitHub organization for "Project GitenHub". In the course of creating my third GitenHub book, I discovered that someone named Seth Woodward had done the same thing a year before me, and he had moved over a thousand Project Gutenberg texts onto GitHub, in the "GITenberg"  organization. Since I knew how to contribute to a GitHub project, I knew that I could start sending pull requests to GITenberg to add my changes to its repositories. And so Seth and I started working together on GITenberg.

Seth has now loaded over 50,000 books from Project Gutenberg onto GitHub. (The folks at Project Gutenberg are happy to see this happening, by the way.) Seth and I are planning out how to make improved quality ebooks and metadata for all of these books, which would be impossible without a way to get people to work together. We put in a funding proposal to the Knight Foundation's NewsChallenge competition. And we were excited to learn that (as of Jan 1, 2015) the Text Creation Partnership has added 25,000 texts from EEBO (Early English Books Online) on GitHub. So it's an exciting time for books on GitHub.

There's quite a bit of work to do. Having 50,000 repositories in an organization strains some GitHub tools. We need to figure out how to explain the GitHub workflow to potential contributors who aren't software developers. We need to  make bibliographic metadata more git-friendly. And we need to create a "continuous integration system" for building ebooks.

Who knows, it might work.

People Don't Read on the Web / Library Tech Talk (U of Michigan)

How much do people actually read on the web? Not much. UX Myths presents the evidence.

Make your mark on national policy agenda for libraries! / District Dispatch

As many of us bundle up and prepare to head to Chicago for the ALA Midwinter Meeting, the ALA has added another discussion item for attendees—and beyond. Today the American Library Association (ALA) Office for Information Technology Policy (OITP) released a discussion draft national policy agenda for libraries to guide a proactive policy shift.

Donald Harrison Health Sciences Library, OH

As ALA President Courtney Young states clearly: “Too often, investment in libraries and librarians lags the opportunities we present. Libraries provide countless benefits to U.S. communities and campuses, and contribute to the missions of the federal government and other national institutions. These benefits must be assertively communicated to national decision makers and influencers to advance how libraries may best contribute to society in the digital age.”

The draft agenda is the first step towards answering the questions “What are the U.S. library interests and priorities for the next five years that should be emphasized to national decision makers?” and “Where might there be windows of opportunity to advance a particular priority at this particular time?”

The draft agenda provides an umbrella of timely policy priorities and is understood to be too extensive to serve as the single policy agenda for any given entity in the community. Rather, the goal is that various library entities and their members can fashion their national policy priorities under the rubric of this national public policy agenda.

Outlining this key set of issues and context is being pursued through the Policy Revolution! Initiative, led by ALA OITP and the Chief Officers of State Library Agencies (COSLA) with guidance from a Library Advisory Committee—which includes broad representation from across the library community. The three-year initiative, funded by the Bill & Melinda Gates Foundation, has three major elements: to develop a national public policy agenda, to initiate and deepen national stakeholder interactions based on policy priorities, and build library advocacy capacity for the long-term.

“In a time of increasing competition for resources and challenges to fulfilling our core missions, libraries and library organizations must come together to advocate proactively and strategically,” said COSLA President Kendall Wiggin. “Sustainable libraries are essential to sustainable communities.”

The draft national public policy agenda will be vetted, discussed, and further elaborated upon in the first quarter of 2015, also seeking to align with existing and emerging national library efforts. Several members of the team that worked on the agenda will discuss the Policy Revolution! Initiative and invite input into the draft agenda at the 2015 ALA Midwinter Meeting on February 1 from 1-2:30 p.m. in the McCormick Convention Center, room W196A.

From this foundation, the ALA Washington Office will match priorities to windows of opportunity and confluence to begin advancing policy priorities—in partnership with other library organizations and allies with whom there is alignment—in mid-2015.

Please join us in this work. Feedback should be sent by February 27, 2015, to oitp[at]alawash[dot]org, and updates will be available online.

The post Make your mark on national policy agenda for libraries! appeared first on District Dispatch.

Public libraries top public Wi-Fi spot for African Americans, Latinos / District Dispatch

A first-of-its-kind survey (pdf) finds that public libraries are the most common public Wi-Fi access point for African Americans and Latinos—with roughly one-third of these communities using public library Wi-Fi. This is true for 23 percent of white people, who list school as their top public Wi-Fi spot.

The study of Wi-Fi usage patterns by John Horrigan and Jason Llorenz for WifiForward also finds that communities of color are more likely to use Wi-Fi networks in public places, use them more often, and report greater positive impacts of Internet use than their white counterparts. A majority of all online users have at some point used Wi-Fi networks in public places.

The new report also shows that Wi-Fi boosts how people view the Internet’s benefits. Across all racial and ethnic categories, users of public Wi-Fi networks reported higher levels of satisfaction with how the Internet impacts their lives. African Americans and Latinos are more likely to report that the Internet—in general—has a beneficial impact on education, saving time and searching for jobs. This pattern holds when examining Wi-Fi users.

Clearly, library Wi-Fi is no longer ‘nice to have.’ It is essential to support The E’s of Libraries™—Education, Employment, Entrepreneurship, Empowerment and Engagement—in cities and towns nationwide. In fact, the latest data from the Digital Inclusion Survey finds that virtually all (98%) public libraries now offer Wi-Fi, up from 18 percent a decade ago. By offering free public access to the Internet via wireless connections, libraries serve as community technology hubs that enable digital opportunity and full participation in the nation’s economy.

The survey finds there is strong support for investing in wireless networks. Two-thirds of people, for instance, think improving Wi-Fi at libraries and schools would be a good thing. The overwhelmingly highest response, though, to a question about what stakeholders could do to improve the internet was to make it easier to make sure their personal information is secure. Both findings have relevance for libraries as new funding is now available through the E-rate program to improve library and school Wi-Fi access, and digital literacy training clearly demands attention to data privacy and security concerns.

Patrons using Wi-Fi at the MLK Digital Commons in Washington D.C.

Patrons using Wi-Fi at the MLK Digital Commons in Washington D.C.

The findings highlight the importance of improving the environment for wireless internet use, including making more Wi-Fi spectrum available—and sharing what we already have—at low, medium and high spectrum bands because each band offers different opportunities for Wi-Fi. As a founding member of WiFiForward, ALA actively advocates for ensuring adequate unlicensed spectrum to support the next-generation of technologies needed for our libraries and communities. Wi-Fi contributes close to $100 billion each year to the U.S. economy, and libraries depend on unlicensed spectrum to support everything from self-checkout and circulation systems to mobile learning labs.

Library broadband and Wi-Fi access are clearly part of the solution in narrowing the Digital Divide that still exists for many people and for supporting the full range of modern library services. We’d love to hear how your library Wi-Fi is making a difference in your community or on your campus, if you’d like share in the comments section.

The post Public libraries top public Wi-Fi spot for African Americans, Latinos appeared first on District Dispatch.

2014 Blog Year Stats / Cynthia Ng

Love that WordPress will compile at stats report for you if you’re using their hosted version. As with previous years, the most visits in a day are from Code4Lib, although the most views in a day is less than previous years. Of course, the most popular post that day was a Code4Lib one. Also very … Continue reading 2014 Blog Year Stats

Tell the IRS your thoughts! / District Dispatch

Photo by AgriLifeToday via FlickrWant to comment on the Internal Revenue Service’s (IRS) tax form delivery service? Discuss your experiences obtaining tax forms for your library at “Tell the IRS: Tax Forms in the Library,” a session that takes place during the 2015 American Library Association (ALA) Midwinter Meeting in Chicago. The session will be held from 11:30 a.m.–12:30 p.m. on Sunday, February 1, 2015.

A new speaker will lead the interactive conference session: L’Tanya Brooks, director of media and publications for the Internal Revenue Service (IRS), will lead the discussion that will explore library participation in the agency’s Tax Forms Outlet Program (TFOP). The TFOP offers tax forms and products to the American public primarily through participating libraries and post offices. During the conference program, Brooks will discuss the IRS’ ongoing efforts to create a library-focused group that works with library staff members.

The session takes place in the McCormick Place Convention Center in room W187. Add the conference program to your scheduler.

The post Tell the IRS your thoughts! appeared first on District Dispatch.

Islandora Show and Tell: Marsden Online Archive / Islandora

It's time for our first Islandora Show and Tell of 2015: Marsden Online Archive, by the University of Otago. This digital archive, launched in November, 2014 on a platform of islandora 7.x-1.2, houses the letters and journals of Reverend Samuel Marsden, a key figure in New Zealand colonial history.

This collection really shows the Book Solution Pack at its finest, with searchable transcripts provided alongside scans of the original text, making them both searchable and easily readable. Presented with supplementary material such as a biography and timeline of Samuel Marsden, a cast of characters from his journals, and a primer on the Māori language, the collection is a snapshot of 18th century New Zealand as seen through the eyes of an English missionary, magistrate, and superintendent of government affairs.

Altogether, the team made 72 customizations and configurations to the repository, from minor tweaks like removing the title link from the tool bar in the Internet Archive Book Reader, to major undertakings such as customizing the Internet Archive Book Reader to highlight searched text in the viewer that displayed transcripts. While most of these customizations are not yet ready to share, the team hopes to give community access to their METS Form, Internet Archive customizations, and the changes they have made to Book Batch. One tool which is available for public use is ACEMC (Automatic Content Extraction & Metadata Creation Tool), a tool to prepare data for ingest via Book Batch. It was designed specifically for the Mining Marsden Project and provides a range of functionality, including:

  • Extraction of data for use in TEI and MODS metadata.
  • Creation of METS metadata.
  • Creation of MODS metadata.
  • Creation of TEI text mark-up (including strikethrough and underline).
  • Creation of alternative spelling for Solr.
  • Creation of valid metadata packages and packaging data into the directory structure required for ingest.
  • Conversion of .doc files to HTML.
  • Separation of HTML into separate pages.

My standard 'cat' search backfired somewhat in this case of the Marsden Online Archive. The good Reverend had little to do with the creatures, but he did decline to eat one.

Marsden project manager Vanessa Gibbs answered some questions about how this project came together:

What is the primary purpose of your repository? Who is the intended audience?

The Marsden Online Archive is used to store and share the collection material that Hocken Collections has on the Church Missionary Society’s settlement in New Zealand. These are 200 year old manuscripts that document New Zealand’s first Christian mission. The audience for the Marsden Online Archive is anyone with interest in the material. Researchers, academics, lectures, students, members of the public and specifically Māori (Indigenous New Zealanders) are more than welcome to use the Archive.

Why did you choose Islandora?

We choose Islandora after an extensive evaluation process. We evaluated it in terms of:

  1. Our Business Requirements (the number of business requirements that are met)
  2. Adaptability (if it can be easily adapted for other material)
  3. Architecture (looking at the state and complexity of the code)
  4. Platform Compatibility (does the technology used fit/work with the Library’s enterprise systems)
  5. Independence and Support (plug-ins, flexibility, community support etc.)
  6. Resource Commitment (the amount of resource required to work on the code)
  7. Extensibility (the implementation takes future growth into consideration)

Which modules or solution packs are most important to your repository?

Islandora Book Batch was a useful solution pack that reduced the effort required in uploading the material. This required a bit of customization for our material however it works well with our Archive. Also important are the Internet Archive Book Reader, the Book Solution Pack and Islandora OpenSeadragon. All of which allows up to display out collection material.

What feature of your repository are you most proud of?

We are most proud of the customizations that we have made to the Internet Archive Book Reader. Having the transcripts instead of using OCR presented some challenges however the way the material is displayed is completely fit for purpose. The text displays alongside the manuscript image and search terms are able to be highlighted in the text.

Who built/developed/designed your repository (i.e, who was on the team?)

The development team consistent of four staff: Emmanuel Delaborde; Hailing Situ; Shahne Rodgers; and Allison Brown.

Do you have plans to expand your site in the future?

We have plans to extend both the content contained within the site and the functionality. We hope to upload the remaining material that we have on Marsden and add functionality such as user accounts and allow for scholarly commentary.

What is your favourite object in your collection to show off?

There is one letter that documents 300 ‘useful’ words in Māori. Body parts, numbers, natural features and phenomena, animals and plants, verbs of movement, and other sundry objects, as well as a few phrases. Kendall, the missionary who wrote the letter, was aware of his shortcomings, writing ‘I have no doubt but I shall find it necessary to make many alterations in the above words when I get better acquainted with the Language’


Out of Control / LITA

Image courtesy of Flickr user Eric Peacock

Image courtesy of Flickr user Eric Peacock

Last week I found myself in a grey area. I set up a one-on-one tech appointment with a patron to go over the basics of her new Android tablet. Once we met in person I learned that what she really wanted was to monitor her daughter’s every move online. It felt like a typical help session as I showed her how to check the browsing history and set up parental controls. She had all the necessary passwords for her daughter’s email and Facebook accounts, which made it even easier. It wasn’t until she left that I realized I had committed a library crime: I completely ignored the issue of privacy.

I’m still mulling this over in my head, trying to decide how I should have acted. I’m not a parent, so I can’t speak to the desire to protect children from the dangers of the Internet. Chances are her daughter can work around her mom’s snooping anyhow. But as a librarian, a champion of privacy, how could I have disregarded the issue?

A friend of mine put it best when he said that situations like this devalue what we do. We’re here to help people access information, not create barriers. Being a parent in the age of the Internet must be a scary thing, but that doesn’t mean that any regard for privacy goes out the window. At the same time, it’s not our job to judge. If the same patron came in and said she wanted to learn about parental controls for a research paper, I wouldn’t have given it a second thought. You can see how the issue gets cloudy.

Ultimately, I keep going back to a phrase I learned from Cen Campbell, founder of Little eLit at ALA last year: “We are media mentors.” We are not parents, and we’re not teachers, rather we are media mentors. It’s our job to work with parents, educators, and kids to foster a healthy relationship with technology. Regardless of right or wrong, I was too quick to jump in and give her the answers, without going through a proper reference interview. I suspect that she was afraid of all the things she doesn’t know about technology; the great unknown that her daughter is entering when she opens her web browser. That was an opportunity for me to answer questions about things like Facebook, YouTube, and Snapchat, instead of blindly leading her to the parental controls. After all this, one thing I know for certain is that the next time I find myself in this situation, I’ll be slow to act and quick to listen.

I would love to hear back from other librarians. How would you act in this situation? What’s the best way to work with parents when it comes to parental controls and privacy?

LITA Interest Group Events at ALA Midwinter / LITA

Are you headed to ALA Midwinter this weekend and curious about what the LITA interest groups will be up to? See below for a current listing of LITA IG events!

Saturday, January 31, 2015

10:30am to 11:30am

Imagineering Interest Group, Hyatt Regency McCormick Adler/CC 24C

The Imagineering Interest Group will meet to plan for future ALA Annual programs and meetings. We will also talk about future group endeavors, such as creating online resources. Please attend if you are interested in working with the group.  Additional Information: Librarianship, Adult Services, Collection Development, Popular Culture, Reader’s Advisory

Open Source Systems Interest Group, Hyatt Regency McCormick Burnham/CC 23C

Meeting to discuss future projects for the Open Source Systems Interest Group.

Search Engine Optimization, Hyatt Regency McCormick Jackson Park/CC 10D

Attendees will have an opportunity to share their experiences with search engine optimization. We will also discuss the SEO Best Practices Wiki entry in Library Success: A Best Practices Wiki as well as the latest SEO tools.

ALCTS/LITA ERM Interest Group, MCP W194a

The ALCTS/LITA ERM Interest Group will host a panel entitled “Data-Driven Decision Making in E-Resources Management: Beyond Cost per Use.”

1:00pm to 2:30pm

Library Code year – Saturday 1/31, 1-2:30pm, MCP W175c

Are you ever in a meeting where people throw around terms like front-end,back-end, Bootstrap, git, JavaScript, agile, XML, PHP, Python, WordPress, and Drupal, but you are not sure what they mean in the library context (even after you looked the terms up on your phone covertly under the table)? If so, please join us for an informal and lively discussion about decoding technology jargon.

Sunday, February 1, 2015

8:30am to 10:00am

LITA/ALCTS Linked Library Data Interest Group, MCP W192b

The ALCTS/LITA Linked Library Data Interest Group is hosting three presentations during its meeting at the ALA Midwinter Conference in Chicago. The meeting will be held on Sunday, February 1, from 8:30-10:00, in McCormick Place West, room W192b. To read the speaker abstracts, go here.

Nancy Lorimer, Interim Head of Metadata Department at Stanford University Libraries will speak about the Linked Data for Libraries project: The Linked Data for Libraries project: An Update

Kristi Holmes, the Director of Galter Health Sciences Library at Northwestern University and a VIVO Project Engagement Lead will speak about VIVO: Opening up science with VIVO

Victoria Mueller, Senior Information Architect and System Librarian, Zepheira: BIBFRAME: A Way Forward. Moving Libraries into a linked data world!

10:30am to 11:30am

Drupal4Lib Interest Group, MCP W186c

The Drupal4Lib Interest Group was established to promote the use and understanding of the Drupal content management system by libraries and librarians. Join members of the group for a lively discussion of current issues facing librarians working with Drupal at any skill level. Bring your questions and meet your colleagues!

Game Making Interest Group, Hyatt Regency McCormick, DuSable/CC 21AB

The Game Making Interest Group will meet to discuss how we use games in libraries and to plan for our meeting and informal presentations at ALA Annual and future plans for the group. Please join us if you are interested in using games in libraries.

Library Consortia Automated Systems Interest Group, Hyatt Regency McCormick, Jackson Park/CC 10C

Managing IT services in a consortium has its own particular challenges and opportunities. The Library Consortia Automated Systems Interest Group provides an informal forum where people working in a consortium environment can share ideas and seek advice.

Public Library Technology Interest Group, MCP W194a

Will meet to discuss trends in technology that are applicable to public libraries.

User Experience Interest Group Meeting, MCP W176b

The LITA User Experience IG seeks 2-3 short presentations (10-15 minutes) on UX and Web usability for the upcoming 2015 ALA Midwinter Conference. This will be a physical meeting, and so the physical attendance for the ALA Midwinter is required for the presentation and/or attendance for this meeting. The LITA UX IG is also seeking the suggestions for discussion topics, things you have been working on, plan to work, or want to work on in terms of UX/Usability. All suggestions and presentation topics are welcome and will be given consideration for presentation and discussion. Please submit your topic in the comments section in ALA Connect ( You may also e-mail us off-the-list. Bohyun Kim, LITA UX IG chair and Rachel Clark, LITA UX IG vice-chair

1:00pm to 2:30pm

Head of Technology Interest Group , MCP W176b

HoLT IG provides a forum and support network for those individuals with administrative responsibility for computing and technology in  library settings. It is open for anyone to give short presentations on a library technology project you might be working on to explore  issues of planning and implementation, technology management, support, leadership and other areas of interests library technology.

LITA/ALCTS Authority Control Interest Group – until 5:30pm, MCP W474b

The joint LITA/ALCTS Authority Control Interest Group provides a forum for discussion of a variety of issues related to authority control for online catalogs and for international sharing of authority of data.

From the Field: More Insight Into Digital Preservation Training Needs / Library of Congress: The Signal

The following is a guest post by Jody DeRidder, Head of Digital Services at the University of Alabama Libraries.  This post reports on efforts in the digital preservation community that align with the Library’s Digital Preservation Outreach & Education (DPOE) Program. Jody, among many other accomplishments, has completed one of the DPOE Train-the-Trainer workshops and delivered digital preservation training online to the Association of Southeastern Research Libraries (ASERL).


Jody DeRidder

As previously discussed on The Signal, DPOE has conducted two surveys to better understand the digital preservation capacities of cultural heritage institutions. The respondents provide insight into their digital preservation practice, what types of training are necessary to address their staffing needs and preferences for the best delivery options of training events. Between the 2010 and 2014 DPOE surveys, I conducted an interim survey in 2012 to identify the digital preservation topics and types of materials most important to webinar attendees and their institutions. A comparison of the information uncovered by these three surveys provides insight into changing needs and priorities, and indicates what type of training is most needed and in what venues.

In terms of topics, technical training (to assist practitioners in understanding and applying techniques) is the clear top preference in all three surveys. In the 2010 DPOE survey, the highest percentage of respondents (32%) ranked technical training as their top choice. This was echoed in the 2014 DPOE survey as well. In my 2012 survey, this question was represented by multiple options. (Each of the rankings referenced is the percentage of participants who considered training in this topic to be extremely important.) The top two selected were training in “methods of preservation metadata extraction, creation, and storage” (77%) and “determining what metadata to capture and store” (68%). Both of these could easily be considered technical training.

Other technical training options included:

  • File conversion and migration issues (59%).
  • Validating files and capturing checksums (54%).
  • Monitoring status of files and media (53%).
  • How to inventory content to be managed for preservation (42%).

These preferences are echoed in the DPOE 2014 survey, where respondents identified training investments that result in “an increased capacity to work with digital objects and metadata management” as the most beneficial outcome with a three-year horizon.

In the 2010 DPOE survey, the need for “project management,” “management and administration,” and “strategic planning” followed “technical training” in priority (in that order). By 2014, this had shifted a bit: “strategic planning” led “management and administration,” followed by “project management.” Last in importance to participants in both surveys was fundamentals (described as “basic knowledge for all levels of staff”).

Has the need for strategic planning increased? Topics in the 2012 survey that related to management included:

  • Planning for provision of access over time (the third highest ranking: 65%).
  • Developing your institution’s preservation policy and planning team (51%).
  • Legal issues surrounding access, use, migration, and storage (43%).
  • Self-assessment and external audits of your preservation implementation (34%).

Strategic planning might include the following topics from the 2012 survey:

  • Developing selection criteria, and setting the scope for what your institution commits to preserving (52%).
  • Selecting file formats for archiving (45%).
  • Selecting storage options and number of copies (44%).
  • Security and disaster planning at multiple levels of scope (33%).
  • Business continuity planning (28%).

Thus it seems that in the 2012 survey, strategic planning was still secondary to management decisions, but that may have shifted, as indicated in the DPOE 2014 survey. A potential driving force for this shift could well be the increased investment in digital preservation in recent years.

When asked in 2010 about the types of digital content in organizational holdings, 94% of the respondents to the DPOE survey selected reformatted material digitized from collections, and 39.5% indicated digital materials. In 2014 the reformatted content had dropped to 83%, deposited digital materials had increased to 44%, and a new category, “born digital,” was selected by over 76% of participants. Within these categories, digital images, PDFs and audiovisual materials were the most selected types of content, followed closely by office files. Research data and websites were secondary contenders, with architectural drawings third, followed by geospatial information and finally “other.”


From the 2012 survey, with the numbers representing percentages of the types of content in organizational holdings.

In the 2012 survey, participants were only asked to rank categories of digital content in terms of importance for preservation at their institution. Within this, 65% selected born-digital special collections materials as extremely important; 63% selected born-digital institutional records, and 61% selected digitized (reformatted) collections. “Other” was selected by 47%, and comments indicate that most of this was audiovisual materials, followed by state archives content and email records. The lowest categories selected were digital scholarly content (institutional repository or grey lit, at 37%); digital research data (34%), and web content (31%).

Clearly, preservation of born-digital content has now become a priority to survey respondents over the past few years, though concern for preservation of reformatted content continues to be strong. As the amount of born-digital content continues to pour into special collections and archives, the pressure to meet the burgeoning challenge for long-term access is likely to increase.

In both the 2010 and 2014 DPOE surveys, an overwhelming number of participants (84%) expressed the importance of ensuring digital content is accessible for 10 years or more. Training is a critical requirement to support this process. While the 2012 survey focused only on webinars as an option, both of the DPOE surveys indicated that respondents preferred small, in-person training events, on-site or close to home. However, webinars were the second choice in both 2010 and 2014, and self-paced, online courses were the third choice in 2014. As funding restrictions on travel and training continue, an increased focus on webinars and nearby workshops will be best-suited to furthering the capacity for implementing long-term access for valuable digital content.

In the interest of high impact for low cost, the results of these surveys can help to fine-tune digital preservation training efforts in terms of topics, content and venues in the coming months.

Launching Open Data Day Coalition Micro-Grant Scheme: Apply Today! / Open Knowledge Foundation

OPEN DATA DAY 2015 is coming and a coalition of partners have come together to provide a limited number of micro-grants designed to support communities organise ODD activities all over the world !

Open Data Day (ODD) is one of the most exciting events of the year. As a volunteer led event, with no organisation behind it, Open Data Day provides the perfect opportunity for communities all over the world to convene, celebrate and promote open data in ways most relevant to their fellow citizens. This year, Open Data Day will take place on Saturday, the 21st of February 2015 and a coalition of partners have gotten together to help make the event bigger (and hopefully better) than its has ever been before!

While Open Data Day has always been a volunteer led initiative, organising an event often comes with quite a hefty price tag. From hiring a venue, to securing a proper wifi connection, to feeding and caffeinating the volunteer storytellers, data wranglers and developers who donate their Saturday to ensuring that open data empowers citizens in their communities, there are costs associated with convening people! Our Open Data Day Coalition is made of open data, open knowledge and open access organisations who are interested in providing support for communities organising ODD activities. This idea emerged from an event that was organised in Kenya last year, where a small stipend helped local organisers create an amazing event, exposing a number of new people to open data. This is exactly what we are trying to achieve on Open Data Day!

As such, this year, for the first time ever, we are proud to announce the availability of a limited number of micro grants of up to $300 to help communities organise amazing events without incurring prohibitive personal costs. The coalition will also provide in-kind support in the form mentorship and guidance or simply by providing a list of suggested activities proven effective at engaging new communities!

The coalition consists of the following organisations (in alphabetical order): Caribbean Open Institute, Code for Africa, DAL, E-Democracy, ILDA, NDI, Open Access Button, Open Coalition, Open Institute, Open Knowledge, Sunlight Foundation and Wikimedia UK. Want to join? Read on.


Applying for a Microgrant!

Any group or organisation from any country can apply. Given the difference focus of our partners, grants in Latin America will be handled and awarded by ILDA. In the Caribbean, the Caribbean Open Institute will handle the process. Finally, The Partnership for Open Data will focus on other low to mid income countries. Of course, in order to ensure that we are able to award the maximum number of grants, we will coordinate this effort!

You can find the application form here. The deadline to apply is February 3rd and we aim to let you know whether your grant was approved ASAP.

Currently, we have one micro grant, provided by The Sunlight Foundation, for a group organising open data day activities in a country of any income level. We would love to provide additional support for groups organising in any country; as such, if you are interested in helping us find (or have!) additional funding (or other forms of in kind support such as an event space!), do get in touch (see below how to join the coalition). We will make sure to spread the word far and wide once we have additional confirmed support!

How to Apply for an Open Data Day Micro Grant

If you are organising an event and would like additional support, apply here. If your grant is approved, you will be asked to provide us with bank transfer details and proof of purchase. If it is not possible for you to make the purchases in advance and be reimbursed, we will be sure to find an alternative solution.

Is this your first Open Data Day event? Fear not! In addition to the grant itself, our coalition of partners is here to provide you with the support you need to ensure that your event is a success. Whether you need help publicising the event, deciding what to do, or some tips on event facilitation, we are here to help!


All groups who receive support will be asked to add their event to the map by registering your event here as well as by adding it to list of events on the Open Data Day wiki.

After the event, event organisers will be asked to share a short blog post or video discussing the event! What data did you work with, how many people attended, are you planning on organising additional events? We’d also love to hear about what you learned, what were the challenges and what you would have done differently?

You can publish this in any language but if possible, we would love an English translation that we can share in a larger blog series about Open Data Day. I you would like to have your event included in our summary blog series but are not comfortable writing in English, write to us at local [at] okfn [dot] org and we will help you translate (or connect you with someone who can!).

What To Do Now

The next step is to start organising your event so that you can apply for your micro-grant ASAP! We are aware that we are a bit late getting started and that communities will need time to organise! As such, we aim to let you know whether your grant has been approved ASAP and ideally by the February 6th, 2015. If February 3rd proves to be too tight a deadline, we will extend!

Finally, if you need inspiration for what to do on the day, we are building a menu of suggested activities on the Open Data Day wiki. Go here for inspiration or add your ideas and inspire others! For further inspiration and information, check out the Open Data Day website, which the community will be updating and improving as we move closer to the big day. If you need help, reach out to us at local [at] okfn [dot] org, or check in with one of the other organisations in the coalition.

Interested in joining the coalition?

We have a limited number of grants available and expect a large demand! If you are interested in joining the coalition and have either financial and/or in-kind support available, do get in touch and help us make Open Data Day 2015 the the largest open data hackday our community and the world has ever seen!

Wikipedia’s Waterloo? / Roy Tennant

wikipedia-logoIf you are involved in technology at all, you no doubt have heard about GamerGate. Normally at this point I would say that if you hadn’t heard about it, go read about it and come back.

But that would be foolish.

You would likely never come back. Perhaps it would be from disgust at how women have been treated by many male gamers. Perhaps it would be because you can’t believe you have just wasted hours of your life that you are never getting back. Or perhaps it is because you disappeared down the rat hole of controversy and won’t emerge until either hunger or your spouse drags you out. Whatever. You aren’t coming back. So don’t go before I explain why I am writing about this.

Wikipedia has a lot to offer. Sure, it has some gaping holes you could drive a truck through, just about any controversial subject can end up with a sketchy page as warring factions battle it out, and the lack of pages on women worthy of them is striking.

You see, it is well known that Wikipedia has a problem with female representation — both with the percentage of pages devoted to deserving women as well as the number of editors building the encyclopedia.

So perhaps it shouldn’t come as a surprise that Wikipedia has now sanctioned the editors trying to keep a GamerGate Wikipedia page focused on what it is really all about — the misogynistic actions of a number of male gamers. But the shocking part to me is that it even extends beyond that one controversy into really dangerous muzzling territory. According to The Guardian, these women editors* have been banned from editing “any other article about ‘gender or sexuality, broadly construed'”.

I find that astonishingly brutal. Especially for an endeavor that tries to pride itself on an egalitarian process.

Get your act together, Wikipedia.


* My bad. Editors were banned. They are not necessarily women. Or even feminists.

Photo hipster: playing with 110 cameras / Casey Bisson

After playing with Fuji Instax and Polaroid (with The Impossible Project film) cameras, I realized I had to do something with Kodak. My grandfather worked for Kodak for years, and I have many memories of the stories he shared of that work. He retired in the late 70s, just as the final seeds of Kodak’s coming downfall were being sown, but well before anybody could see them for what they were.

The most emblematic Kodak camera and film I could think of was the 110 cartridge film type, and that’s what I used to captured this picture of Cliff Pearson and Millicent Prancypants.

Cliff, with Millie

I bought two cameras and a small bundle of film from various eBay sellers. They look small in the following photo, but they’re significantly larger and less pocketable than even my iPhone 6 plus.

Pocket Instamatic 60 110 film camera, with film

Developing is $4 per cartridge at Adolph Gasser’s, but they can’t print or scan the film there, so that had me looking for other solutions. I couldn’t find a transparency scanner that had film holders for 110 film. That isn’t surprising, but it did leave me wondering and hesitant long enough to look for other ways to capture this film. For these shots I re-photographed them with my EOS M:

film "scanning"

Writing has changed with digital technology, but much is the same. Pirsig’s slip-based writing system was inspired by information technology. / John Miedema

Writing has changed with digital technology, but much is the same. The Lila writing technology builds on both the dynamic and static features.

Writers traditionally spend considerable time reading individual works closely and carefully. The emergence of big data and analytic technologies causes a shift toward distant reading, the ability to analyze a large volume of text in terms of statistical patterns. Lila uses these technologies to select relevant content for deeper reading.

Writing, as always, occurs in many locations, from a car seat to a coffee shop to a desk. Digital technology makes it easier to aggregate text from these different locations. Existing technologies like Evernote and Google Drive can gather these pieces for Lila to perform its cognitive functions.

Writing is performed on a variety of media. In the past it might have been napkins, stickies and binder sheets. Today it includes a greater variety, from cell phone notes to email and word processor documents. Lila can only analyze digital media. It is understood that there is still much text in the world that is not digital. Going forward, text will likely always be digital.

Writing tends to be more fragmented today, occurring in smaller units of text. Letter length is replaced with cell phone texts, tweets, and short emails. The phrase “too long; didn’t read” is used on the internet for overly long statements. Digital books are shorter than print books. Lila is expressly designed around a “slip” length unit of text, from at least a tweet length for a subject line, up to a few paragraphs. It would be okay to call a slip a note. Unlike tweets, there will be no hard limit on the number of characters.

A work is written by one or many authors. Print magazines and newspapers are compilation of multiple authors, so too are many websites. Books still tend to be written by a single author, but Lila’s function of compiling content into views will make it easier for authors to collaborate on a work with the complexity and coherence of a book.

In the past, the act of writing was more isolated. There was a clear separation between authors and readers. Today, writing is more social. Authors blog their way through books and get immediate feedback. Readers talk with authors during their readings. Fans publish their own spin on book endings. Lila extends reading and writing capabilities. I have considered additional capabilities with regard to publishing drafts to the web for feedback and iteration. A WordPress integration perhaps.

Pirsig’s book, Lila, was published in 1991, not long after the advent of the personal computer and just at the dawn of the web. His slip-based writing system used print index cards, but he deliberately chose that unit of text over pages because it allowed for “more random access.” He also categorized some slips as “program” cards, instructions for organizing other slips. As cards about cards, they were powerful, he said, in the way that John Von Neuman explained the power of computers, “the program is data and can be treated like any other data.” Pirsig’s slip-based writing system was no doubt inspired by the developments in information technology.

Exploring a personal Twitter network / Alf Eaton, Alf

network graph PDF version
  1. Fetch the IDs of users I follow on Twitter, using vege-table:

    var url = '';
    var params = {
      screen_name: ‘invisiblecomma’,
      stringify_ids: true,
      count: 5000
    var collection = new Collection(url, params);
    collection.items = function(data) {
      return data.ids;
    } = function(data) {
      if (!data.next_cursor) {
        return null;
      params.cursor = data.next_cursor_str;
      return [url, params];
    return collection.get('json');
  2. Using similar code, fetch the list of users that each of those users follows.

  3. Export the 10,000 user IDs with the highest intra-network follower counts.

  4. Fetch the details of each Twitter user:

    return Resource('', {
      user_id: user_id
    }).get('json').then(function(data) {
      return data[0];
  5. Process those two CSV files into a list of pairs of connected identifiers suitable for import into Gephi.

  6. In Gephi, drag the “Topology > In Degree Range” filter into the Queries section, and adjust the range until a small enough number of users with the most followers is visible:

    filter screenshot
  7. Set the label size to be larger for users with more incoming links:

    label size screenshot
  8. Set the label colour to be darker for users with more incoming links:

    label colour screenshot
  9. Apply the ForceAtlas 2 layout, then the Expansion layout a few times, then the Label Adjust layout:

    layout screenshot
  10. Switch to the Preview window and adjust the colour and opacity of the edges and labels appropriately. Hide the nodes, set the label font to Roboto, then export to PDF.

  11. Use imagemagick to convert the PDF to JPEG: convert —density 200 twitter-foaf.pdf twitter-foaf.jpg

It would probably be possible to automate this whole sequence - perhaps in a Jupyter Notebook. The part that takes the longest is fetching the data from Twitter, due to the low API rate limits.

What do we put in our BagIt bag-info.txt files? / Mark E. Phillips

The UNT Libraries makes heavy use of the BagIt packaging format throughout our digital repository infrastructure.  I’m of the opinion that BagIt is one of the technologies that has contributed more toward moving digital preservation forward in the last ten years than any other one technology/service/specification.  The UNT Libraries uses BagIt for our Submission Information Packages (SIP),  our Archival Information Packages (AIP), our Dissemination Information Packages, and our local Access Content Package (ACP).

For those that don’t know BagIt,  it is a set of conventions for packaging content into a directory structure in a consistent and repeatable way.  There are a number of other descriptions of BagIt that do a very good job of describing the conventions and some of the more specific bits of the specification.

There are a number of great tools for creating, modifying and validating BagIt bags,  and my favorite for a long time has been bagit-python from the Library of Congress.   (To be honest I usually am using Ed Summers fork which I grab from here)

The BagIt specification has a metadata file that is stored in the root of a bag,  this metadata file is called bag-it.txt.  The BagIt specification has a number of fields defined for this file which are stored as key value pairs in the file in the format of.

key: value

I thought it might be helpful for those new to using BagIt bags to see what kinds of information we are putting into these bag-info.txt files,  and also explain some of the unique fields that we are adding to the file for managing items in our system.  Below is a typical bag-info.txt file from one of our AIPs in the Coda Repository.

Bag-Size: 28.32M
Bagging-Date: 2015-01-23
CODA-Ingest-Batch-Identifier: f2dbfd7e-9dc5-43fd-975a-8a47e665e09f
CODA-Ingest-Timestamp: 2015-01-22T21:43:33-0600
Contact-Name: Mark Phillips
Contact-Phone: 940-369-7809
External-Description: Collection of photographs held by the University of North
 Texas Archives that were taken by Junebug Clark or other family
 members. Master files are tiff images.
External-Identifier: ark:/67531/metadc488207
Internal-Sender-Identifier: UNTA_AR0749-002-0016-0017
Organization-Address: P. O. Box 305190, Denton, TX 76203-5190
Payload-Oxum: 29666559.4
Source-Organization: University of North Texas Libraries

In the example above,  several of the fields are boiler plate, and others are machine generated.

Field How we create the Value
Bag-Size Machine
Bagging-Date Machine
CODA-Ingest-Batch-Identifier Machine
CODA-Ingest-Timestamp Machine
Contact-Email Boiler-Plate
Contact-Name Boiler-Plate
Contact-Phone Boiler-Plate
External-Description Changes per “collection”
External-Identifier Machine
Internal-Sender-Identifier Machine
Organization-Address Boiler-Plate
Payload-Oxum Machine
Source-Organization Boiler-Plate

You can tell from looking at the example bag-info.txt file above that some of the fields are very self explanatory.  I’m going to run over a few of the fields that either are non-standard, or that we’ve made explicit decisions on as we were implementing BagIt.

CODA-Ingest-Batch-Identifier is a UUID for each batch of content added to our Coda Repository,  this helps us identify other items that may have been added during a specific run of our ingest process,  helpful for troubleshooting.

CODA-Ingest-Timestamp is the timestamp when the AIP was added to the Coda Repository.

External-Identifier will change for each collection that gets processed,  it has just enough information about the collection to help jog someone’s memory about where this item came from and why it was created.

External-Identifier is the ARK identifier assigned the item on ingest into one of the Aubrey systems where we access the items or manage the descriptive metadata.

Internal-Sender-Identifier is the locally important (often not unique) identifier for the item as it is being digitized or collected.  It often takes the shape of an accession number from our University Special Collections, or the folder name of an issue of newspaper.

We currently have 1,070,180 BagIt bags in our Coda Repository and they have be instrumental in us being able to scale our digital library infrastructure and verify that each item is just the same as when we added it to our collection.

If you have any specific questions for me let me know on twitter.

Writing non-fiction is mostly reading, thinking, and sorting; the rest is just keystrokes. Lila is for writing non-fiction; poetry, not so much. / John Miedema

Writing non-fiction is mostly reading, thinking, and sorting; the rest is just keystrokes. And style. Think clearly and the rest comes easy. Lila is designed to extend human writing capabilities by performing cognitive work:

  1. The work of reading, especially during the early research phase. Writers can simply drop unread digital content onto disk, and Lila will convert it into manageable chunks — slips. These slips are shorter than the full length originals, making them quicker to evaluate. More important, these slips are embedded in the context of relevant content written by the author; context is meaning, so unread content will be easier to evaluate
  2. The work of analyzing content and sorting it into the best view, using visualization. As Pirsig said, “Instead of asking ‘Where does this metaphysics of the universe begin?’ – which was a virtually impossible question – all he had to do was just hold up two slips and ask, ‘Which comes first?'” This work builds of a table of contents, a hierarchical view of the content. Lila will show multiple views so the author can choose the best one.
  3. The ability to uncover bias and ensure completeness of thought. Author bias may filter out content when reading, but Lila will compel a writer to notice relevant content.

robotpoetLila’s cognitive abilities depend on the author’s engagement in a writing project, generating content that guides the above work. Lila is designed expressly for the writing of non-fiction; poetry, not so much. The cognitive work is performed in most kinds of writing, and so Lila will aid with other kinds of fiction. Both fiction and creative non-fiction still require substantial stylistic work after Lila has done her part.

CrossRef Indicators / CrossRef

Updated January 20, 2015

Total no. participating publishers & societies 5736
Total no. voting members 3022
% of non-profit publishers 57%
Total no. participating libraries 1926
No. journals covered 37,469
No. DOIs registered to date 71,820,143
No. DOIs deposited in previous month 648,271
No. DOIs retrieved (matched references) in previous month 46,260,320
DOI resolutions (end-user clicks) in previous month 134,057,984

New CrossRef Members / CrossRef

Updated January 20, 2015

Voting Members
All-Russia Petroleum Research Exploration Institute (VNIGRI)
Barbara Budrich Publishers
Botanical Research Institute of Texas
Faculty of Humanities and Social Sciences, University of Zagreb
Graduate Program of Management and Business, Bogor Agricultural University
IJSS Group of Journals
IndorSoft, LLC
Innovative Pedagogical Technologies LLC
International Network for Social Network Analysts
Slovenian Chemical Society
Subsea Diving Contractor di Stefano Di Cagno Publisher
The National Academies Press
Wisconsin Space Grant Consortium

Represented Members
Artvin Coruh Universitesi Orman Fakultesi Dergisi
Canadian Association of Schools of Nursing
Indian Society for Education and Environment
Journal for the Education of the Young Scientist and Giftedness
Kastamonu University Journal of Forestry Faculty
Korean Society for Metabolic and Bariatric Surgery
Korean Society of Acute Care Surgery
The Korean Ophthalmological Society
The Pharmaceutical Society of Korea
Uludag University Journal of the Faculty of Engineering
YEDI: Journal of Art, Design and Science

Last updated January 12, 2015

Voting Members
Association of Basic Medical Sciences of FBIH
Emergent Publications
Kinga - Service Agency Ltd.
Particapatory Educational Research (Per)
Robotics: Science and Systems Foundation
University of Lincoln, School of Film and Media and Changer Agency
Uniwersytet Przyrodniczy w Poznaniu (Poznan University of Life Sciences)
Voronezh State University
Wyzsza Szkola Logistyki (Poznan School of Logistics)

Represented Members
Journal of the Faculty of Engineering and Architecture of Gazi University
Korean Insurance Academic Society
Korean Neurological Association
Medical Journal of Suleyman Demirel University

Upcoming CrossRef Webinars / CrossRef

Introduction to CrossCheck
Date: Tuesday, Jan 27, 2015
Time: 8:00 am (San Francisco), 11:00 am (New York), 4:00 pm (London)
Moderator: Rachael Lammey

Introduction to CrossRef Text and Data Mining
Date: Thursday, Jan 29, 2015
Time: 8:00 am (San Francisco), 11:00 am (New York), 4:00 pm (London)
Moderator: Rachael Lammey

Introduction to CrossRef Technical Basics
Date: Wednesday, Feb 11, 2015
Time: 8:00 am (San Francisco), 11:00 am (New York), 4:00 pm (London)
Moderator: Patricia Feeney

CrossCheck: iThenticate Admin Webinar
Date: Thursday, Feb 19, 2015
Time: 7:00 am (San Francisco), 10:00 am (New York), 3:00 pm (London)
Moderator: iThenticate

Introduction to CrossRef
Date: Wednesday, Mar 4, 2015
Time: 8:00 am (San Francisco), 11:00 am (New York), 4:00 pm (London)
Moderator: Patricia Feeney

Introduction to CrossCheck
Date: Tuesday, Mar 17, 2015
Time: 8:00 am (San Francisco), 11:00 am (New York), 3:00 pm (London)
Moderator: Rachael Lammey

Introduction to CrossRef Technical Basics
Date: Wednesday, Mar 18, 2015
Time: 8:00 am (San Francisco), 11:00 am (New York), 3:00 pm (London)
Moderator: Patricia Feeney

Introduction to CrossRef Text and Data Mining
Date: Thursday, Mar 19, 2015
Time: 8:00 am (San Francisco), 11:00 am (New York), 3:00 pm (London)
Moderator: Rachael Lammey

Introduction to CrossCheck
Date: Tuesday, May 5, 2015
Time: 8:00 am (San Francisco), 11:00 am (New York), 4:00 pm (London)
Moderator: Rachael Lammey

Introduction to CrossRef Text and Data Mining
Date: Thursday, May 7, 2015
Time: 8:00 am (San Francisco), 11:00 am (New York), 4:00 pm (London)
Moderator: Rachael Lammey

Introduction to CrossCheck
Date: Tuesday, July 21, 2015
Time: 8:00 am (San Francisco), 11:00 am (New York), 4:00 pm (London)
Moderator: Rachael Lammey

Introduction to CrossRef Text and Data Mining
Date: Thursday, July 23, 2015
Time: 8:00 am (San Francisco), 11:00 am (New York), 4:00 pm (London)
Moderator: Rachael Lammey

Library of Alexandria v2.0 / Ed Summers

In case you missed Jill Lepore has written a superb article for the New Yorker about the Internet Archive and archiving the Web in general. The story of the Internet Archive is largely the story of its creator Brewster Kahle. If you’ve heard Kahle speak you’ve probably heard the Library of Alexandria v2.0 metaphor before. As a historian Lepore is particularly tuned to this dimension to the story of the Internet Archive:

When Kahle started the Internet Archive, in 1996, in his attic, he gave everyone working with him a book called “The Vanished Library,” about the burning of the Library of Alexandria. “The idea is to build the Library of Alexandria Two,” he told me. (The Hellenism goes further: there’s a partial backup of the Internet Archive in Alexandria, Egypt.)

I’m kind of embarrassed to admit that until reading Lepore’s article I never quite understood the metaphor…but now I think I do. The Web is on fire and the Internet Archive is helping save it, one HTTP request and response at a time. Previously I couldn’t get the image of this vast collection of Web content that the Internet Archive is building as yet another centralized collection of valuable material that, as with v1.0, is vulnerable to disaster but more likely, as Heather Phillips writes, creeping neglect:

Though it seems fitting that the destruction of so mythic an institution as the Great Library of Alexandria must have required some cataclysmic event like those described above – and while some of them certainly took their toll on the Library – in reality, the fortunes of the Great Library waxed and waned with those of Alexandria itself. Much of its downfall was gradual, often bureaucratic, and by comparison to our cultural imaginings, somewhat petty.

I don’t think it can be overstated: like the Library of Alexandria before it, the Internet Archive is an amazingly bold and priceless resource for human civilization. I’ve visited the Internet Archive on multiple occasions, and each time I’ve been struck by how unlikely it is that such a small and talented team have been able to build and sustain a service with such impact. It’s almost as if it’s too good to be true. I’m nagged by the thought that perhaps it is.

Herbert van de Sompel is quoted by Lepore:

A world with one archive is a really bad idea.

Van de Sompel and his collaborator Michael Nelson have repeatedly pointed out just how important it is for there to be multiple archives of Web content, and for there to be a way for them to be discoverable, and work together. Another thing I learned from Lepore’s article is that Brewster’s initial vision for the Internet Archive was much more collaborative, which gave birth to the International Internet Preservation Consortium, which is made up of 32 member organizations who do Web archiving.

A couple weeks ago one prominent IIPC member, the California Digital Library announced that it was retiring its in house archiving infrastructure and out sourcing its operation to ArchiveIt, which is the subscription web archiving service from the Internet Archive.

The CDL and the UC Libraries are partnering with Internet Archive’s Archive-It Service. In the coming year, CDL’s Web Archiving Service (WAS) collections and all core infrastructure activities, i.e., crawling, indexing, search, display, and storage, will be transferred to Archive-It. The CDL remains committed to web archiving as a fundamental component of its mission to support the acquisition, preservation and dissemination of content. This new partnership will allow the CDL to meet its mission and goals more efficiently and effectively and provide a robust solution for our stakeholders.

I happened to tweet this at the time:

Which at least inspired some mirth from Jason Scott, who is an Internet Archive employee, and also a noted Internet historian and documentarian.

Jason is also well known for his work with ArchiveTeam, which quickly mobilizes volunteers to save content on websites that are being shutdown. This content is often then transferred to the Internet Archive. He gets his hands dirty doing the work, and inspires others to do the same. So I deserved a bit of derisive laughter for my hand-wringing.

But here’s the thing. What does it mean if one of the pre-eminent digital library organizations needs to outsource their Web archiving operation? And what if, as the announcement indicates, Harvard, MIT, Stanford, UCLA, and others might not be far behind. Should we be concerned that the technical expertise and infrastructure for doing this work is becoming consolidated in a single organization? What does it say about our Web archiving tools that it is more cost-effective for CDL to outsource this work?

The situation isn’t as dire as it might sound since ArchiveIt subscribers retain the right to download their content and store it themselves. How many institutions do that with regularity isn’t well known (at least to me). But Web content isn’t like paper that you can put in a box, in a climate controlled room, and return to years hence. As Matt Kirschenbaum has pointed out:

the preservation of digital objects is logically inseparable from the act of their creation — the lag between creation and preservation collapses completely, since a digital object may only ever be said to be preserved if it is accessible, and each individual access creates the object anew

Can an organization download their WARC content, not provide any meaningful access to it, and say that it is being preserved? I don’t think so. You can’t do digital preservation without thinking about some kind of access to make sure things are working and people can use the stuff. If the content you are accessing is on a platform somewhere else that you have no control over you should probably be concerned.

I’m hopeful that this collaboration between CDL and ArchiveIt, and other organizations, will lead to a fruitful collaboration and improved tools. But I’m worried that it will mean organizations can simply outsource the expertise and infrastructure of web archiving, while helping reinforce what is already a huge single point of failure. David Rosenthal of Stanford University notes that diversity is a vital component to digital preservation:

Media, software and hardware must flow through the system over time as they fail or become obsolete, and are replaced. The system must support diversity among its components to avoid monoculture vulnerabilities, to allow for incremental replacement, and to avoid vendor lock-in.

I’d like to see more Web archiving classes in iSchools and computer science departments. I’d like to see improved and simplified tools for doing the work of Web archiving. Ideally I’d like to see more in house crawling and access of web archives, not less. I’d like to see more organizations like the Internet Archive that are not just technically able to do this work, but are also bold enough to collect what they think is important to save on the Web and make it available. If we can’t do this together I think the Library of Alexandria metaphor will be all too literal.

Islandora Conference: Registration Now Open / Islandora

The Islandora Foundation is thrilled to invite you to the first-ever Islandora Conference, taking place August 3 - 7, 2015 in the birthplace of Islandora: Charlottetown, PEI.

This full week event will consist of sessions from the Islandora Foundation, Interest groups, community presentations, two full days of hands-on Islandora training, and will end with a Hackfest where we invite you to make your mark in the Islandora code and work together with your fellow Islandorians to complete projects selected by the community.

Our theme for the conference is Community - the Islandora community, the community of people our institutions serve, the community of researchers and librarians and developers who work together to curate digital assets, and the community of open source projects that work together and in parallel.

Registration is now open, with an Early Bird rate available until the end of March. Institutional rates are also available for groups of three or more.

For more information or to sign up for the conference, please visit our conference website:

Thank you,

The Islandora Team

Your Job Has Been Robot-sourced / M. Ryan Hess


“People are racing against the machine, and many of them are losing that race…Instead of racing against the machine, we need to learn to race with the machine.”

- Erik Brynjolfsson, Innovation Researcher

Libraries are busy making lots of metadata and data networks. But who are we making this for anyway? Answer: The Machines

I spent the last week catching up on what the TED Conference has to say on robots, artificial intelligence and what these portend for the future of humans…all with an eye on the impact on my own profession: librarians.

A digest of the various talks would go as follows:

    • Machine learning and AI capabilities are advancing at an exponential rate, just as forecast
    • Robots are getting smarter and more ubiquitous by the year (Roomba, Siri, Google self-driving cars, drone strikes)

Machines are replacing humans at an increasing rate and impacting unemployment rates

The experts are personally torn on the rise of the machines, noting that there are huge benefits to society, but that we are facing a future where almost every job will be at risk of being taken by a machine. Jeremy Howard used words like “wonderful” and “terrifying” in his talk about how quickly machines are getting smarter (quicker than you think!). Erik Brynjolfsson (quoted above) shared a mixed optimism about the prospects this robotification holds for us, saying that a major retooling of the workforce and even the way society shares wealth is inevitable.

Personally, I’m thinking this is going to be more disruptive than the Industrial Revolution, which stirred up some serious feelings as you may recall: Unionization, Urbanization, Anarchism, Bolshevikism…but also some nice stuff (once we got through the riots, revolutions and Pinkertons): like the majority of the world not having to shovel animal manure and live in sod houses on the prairie. But what a ride!

This got me thinking about the end game the speakers were loosely describing and how it relates to libraries. In their estimation, we will see many, many jobs disappear in our lifetimes, including lots of knowledge worker jobs. Brynjolfsson says the way we need to react is to integrate new human roles into the work of the machines. For example, having AI partners that act as consultants to human workers. In this scenario (already happening in healthcare with IBM Watson), machines scour huge datasets and then give their advice/prognosis to a human, who still gets to make the final call. That might work for some jobs, but I don’t think it’s hard to imagine that being a little redundant at some point, especially when you’re talking about machines that may even be smarter than their human partner.

But still, let’s take the typical public-facing librarian, already under threat by the likes of an ever-improving Google. As I discussed briefly in Rise of the Machines, services like Google, IBM Watson, Siri and the like are only getting better and will likely, and possibly very soon, put the reference aspect of librarianship out of business altogether. In fact, because these automated information services exist on mobile/online environments with no library required, they will likely exacerbate the library relevance issue, at least as far as traditional library models are concerned.

Of course, we’re quickly re-inventing ourselves (read how in my post Tomorrow’s Tool Library on Steroids), but one thing is clear, the library as the community’s warehouse and service center for information will be replaced by machines. In fact, a more likely model would be one where libraries pool community resources to provide access to cutting-edge AI services with access to expensive data resources, if proprietary data even exists in the future (a big if, IMO).

What is ironic, is that technical service librarians are actually laying the groundwork for this transformation of the library profession. Every time technical service librarians work out a new metadata schema, mark up digital content with micro-data, write a line of RDF, enhance SEO of their collections or connect a record to linked data, they are really setting the stage for machines to not only index knowledge, but understand its semantic and ontological relationships. That is, they’re building the infrastructure for the robot-infused future. Funny that.

As Brynjolfsson suggests, we will have to create new roles where we work side-by-side with the machines, if we are to stay employed.

On this point, I’d add that we very well could see that human creativity still trumps machine logic. It might be that this particular aspect of humanity doesn’t translate into code all that well. So maybe the robots will be a great liberation and we all get to be artists and designers!

Or maybe we’ll all lose our jobs, unite in anguish with the rest of the unemployed 99% and decide it’s time the other 1% share the wealth so we can all, live off the work of our robots, bliss-out in virtual reality and plan our next vacations to Mars.

Or, as Ray Kurzweil would say, we’ll just merge with the machines and trump the whole question of unemployment, let alone mortality.

Or we could just outlaw AI altogether and hold back the tide permanently, like they did in Dune. Somehow that doesn’t seem likely…and the machines probably won’t allow it. LOL

Anyway, food for thought. As Yoda said: “Difficult to see. Always in motion is the future.”

Meanwhile, speaking of movies…

If this subject intrigues you, Hollywood is also jumping into this intellectual meme, pushing out several robot and AI films over the last couple years. If you’re interested, here’s my list of the ones I’ve watched, ordered by my rating (good to less good).

  1. Her: Wow! Spike Jonze gives his quirky, moody, emotion-driven interpretation of the AI question. Thought provoking and compelling in every regard.
  2. Black Mirror, S02E01 – Be Right Back: Creepy to the max and coming to a bedroom near you soon!
  3. Automata: Bleak but interesting. Be sure NOT to read the expository intro text at the beginning. I kept thinking this was unnecessary to the film and ruined the mystery of the story. But still pretty good.
  4. Transcendence: A play on Ray Kurzwell’s singularity concept, but done with explosions and Hollywood formulas.
  5. The Machine: You can skip it.

Two more are on my must watch list: Chappie and Ex Machina, both of which look like they’ll be quality films that explore human-robot relations. They may be machines, but I love when we dress them up with emotions…I guess that’s what you should expect from a human being. :)

Repox / FOSS4Lib Updated Packages

Last updated January 23, 2015. Created by Peter Murray on January 23, 2015.
Log in to edit this page.

REPOX is a framework to manage data spaces. It comprises several
channels to import data from data providers, services to transform
data between schemas according to user's specified rules, and
services to expose the results to the exterior.
This tailored version of REPOX aims to provide to all the TEL and Europeana partners a
simple solution to import, convert and expose their bibliographic data via
OAI-PMH, by the following means:

  • Cross platform
    It is developed in Java, so it can be deployed in any
    operating system that has an available Java virtual machine.
  • Easy deployment
    It is available with an easy installer, which includes
    all the required software.
  • Support for several data formats and encodings
    It supports UNIMARC and MARC21 schemas, and encodings in ISO 2709 (including several variants),
    MarcXchange or MARCXML. During the course of the TELplus project, support
    will be added for other possible encodings required by the partners.
  • Data crosswalks
    It offers crosswalks for converting UNIMARC and MARC21 records to simple
    Dublin Core as also to TEL-AP (TEL Application
    Profile). A simple user interface makes it possible to customize these
    crosswalks, and create new ones for other formats.
Package Type: 
Development Status: 

Releases for Repox

Operating System: 
Programming Language: 
Open Hub Stats Widget: 

2015 VIVO implementation Fest / FOSS4Lib Upcoming Events

Monday, March 16, 2015 - 08:00 to Wednesday, March 18, 2015 - 17:00

Last updated January 23, 2015. Created by Peter Murray on January 23, 2015.
Log in to edit this page.

The i-Fest will be held March 16-18 and is being hosted by the Oregon Health and Science University Library in Portland, Oregon.

For further details about the i-Fest program, registration, travel, and accommodations, visit the blog post on at

What Do You Do With a 3D Printer? / LITA


“Big mac, 3D printer, 3D scanner” by John Klima is licensed under CC BY 2.0

This is the first in a series of posts about some technology I’ve introduced or will be introducing to my library. In my mind, the library is a place where the public can learn about new and emerging technologies without needing to invest in them. To that end, I’ve formed a technology committee at our library that will meet quarterly to talk about how we’re using the existing technology in the building and what type of technology we could introduce to the building.

This next two paragraphs have some demographic information so that you have an idea of whom I’m trying to serve (i.e., you can skip them if you want to get to the meat of the technology discussion).

I work at the Waukesha Public Library in the city of Waukesha, the 7th largest municipality in WI at around 72,000 people. We have a a service population of almost 100,000. The building itself is about 73,000 square feet with a collection of around 350,000 items.

Waukesha has a Hispanic population of about 10% with the remainder of our population being predominantly Caucasian. Our public is a pretty even mix across age groups and incomes. Technological interest also runs pretty evenly from early adopters to neophytes.

I’ve wanted a 3D printer forever. OK, only a few years, but in the world of technology a few years is almost forever. I didn’t bring up the idea to our executive director initially because I wasn’t sure I could justify the expense.

As assistant director in charge of technology at the library, I can justify spending up to a few hundred dollars on new technology. Try out a Raspberry Pi? Sure. Pick up a Surface? Go ahead. But spending a few thousand dollars? That felt like it needed more than my whim.

But after those few years went by and 3D printers were still a topic of discussion and I didn’t have one yet, I approached the executive director and our Friends group and got the money to buy a MakerBot Replicator 2 and a MakerBot Digitizer (it was the Digitizer that finally pushed me over the precipice to buy 3D equipment; more on that later).

So we bought the machine, set it up, and started printing a bunch of objects. At first it was just things on a SD card in the printer: a nut-and-bolt set, a shark, chain links, a comb, and a bracelet.

People loved watching the machine work. Particularly when it was making the chain links. They couldn’t understand how it could print interconnected chain links. I tried to explain that it printed in 100 micron thick layers (slightly thinner than a sheet a paper) and it built the objects up one layer at a time which let it make interconnected objects.

It made more sense if you could watch it.

Our young adult librarian starting making plans for her teen patrons. This past October we read Edgar Allan Poe as a community read and she had her teens make story jars of different Edgar Allan Poe stories using objects we printed: hearts, ravens, bones, coffins, etc.

One of our children’s librarians used the printer to enhance a board-game design program he ran. He printed out dice, figures, and markers that the kids could use when designing a game. Then they got to take their game home when they finished it. More recently he printed out a chess set that assembles into a robot for the winner of our upcoming chess tournament.

I printed out hollow jack o’ lanterns that showed a spooky face when you placed a small electric light inside them. When I realized I needed a desk organizer for the 3D printer I printed one instead of buying one.


“Mushroom candy tin and friend” by John Klima is licensed under CC BY 2.0

Now, as for the Digitizer. We’ve tried digitizing objects. To me that was the coolest thing we could do: make copies of physical objects. Unfortunately, the digitizer has worked poorly at best. It cannot handle small objects—things larger than a egg work best—and it cannot scan complicated or dull objects very well.

Our failures include a kaiju wind-up toy, a LEGO Eiffel Tower, and a squishy stressball brain. Our only success was a Mario Bros. mushroom candy tin. That scanned perfectly, but it’s round, shiny, and the perfect size. If you’re considering buying a digitizer, I would think twice about it (honestly, I’d recommend not getting one at this time).

Now the question I ask is: what’s next? The Replicator 2 isn’t the best machine to put out for public use as it would require quite a bit of staff oversight. There are some 3D printers—the Cube printer from 3D Systems for example—that are better suited for public use in my opinion. It’s currently a moot point as we don’t have space in our public area for one at this time, but I think offering one for public use is in our future plans somewhere down the line.

I’d like to use it more for programming in the library. I want to showcase it to the public more. Our technology committee will make plans so that we can do both of those things.

More importantly, what about the rest of you? Who has a 3D printer in their building? Do you use it for staff or public? Do you want to get a 3D printer for your library? What sorts of questions to have about them?

Where the heck did all of these librarians come from? / District Dispatch

Copyright WeekWe’re taking part in Copyright Week, a series of actions and discussions supporting key principles that should guide copyright policy. Every day this week, various groups are taking on different elements of the law, and addressing what’s at stake, and what we need to do to make sure that copyright promotes creativity and innovation.

Today’s topic is transparency, but I chose to write about librarians.

We have a good number of librarians who, beyond a doubt, are copyright geeks, like me. In fact, we call ourselves copyright geeks especially now that the term “geek” has gained such popularity. These are librarians— a few with JDs—who attend conferences like Berkeley Center for Law and Technology Symposium on copyright formalities. Really, who would find “Constraints and flexibilities in the Berne Convention” an attention -grabbing program? (I loved it!!)

2905162295_a4976145caWhat do we do? Crazy things like studying Congressional hearings from the 1970s, citing eBay v. MercExchange at CopyNight, and reading the entire 130-page Hargreaves Digital Economy Report. You can find our hoard at any American Library Association (ALA) conference program, meeting or discussion group that has anything to do with copyright. We make our selves available to the profession, teaching other librarians about copyright, social responsibility and of course, the four factors of fair use. Of course, we do not give legal advice, but we often know more about the copyright law than the typical counsel retained by the library or educational institutions. Yet we are not snobbish. We have our copyright scholar heroes, and we pester them, prizing any new copyright gem of knowledge that they might utter.

The increased interest in copyright is often interconnected with technological advancement and innovation (what else?), and the desire to use technology to the fullest extent – so we can preserve, lend, data mine, and rely on fair use. But way back in the day—yes, the time before the internet— there were librarians with copyright expertise formidable enough to represent library communities across this great nation at U. S. Congressional copyright policy-making since before the Copyright Act of 1976. These librarians were primarily ALA and the Association of Research Libraries (ARL) staff. Current staffs at these same associations, along with the staff at the Association of College and Research Libraries (ACRL) have formed a coalition, now more than 15 years ago called the Library Copyright Alliance (LCA). We were plodding along before the cooler kids (EFF and Public Knowledge) moved into the copyright neighborhood.

8cdff3495d6670630c8f61b146167210What sets librarian copyright geeks and their associations apart from the cool kids? We have continuing contact with the public, and we talk to them. If a member of the public has a copyright need, we help them. And if this member of the public has issue with government copyright policy, we tell them how to contact their Member of Congress.

Plus we have thousands of association members who believe in civil society and are probably more likely vote in an election. We might be stuck with the librarian stereotype, but on the other hand, our library communities have great trust in us. While it’s true that we don’t have the lobbying resources that large corporations have, and we can’t introduce folks to Angelina Jolie, we hold our own.

So in honor of Copyright Week, all hail the copyright librarians!! (Did you see – we even have a television show!!)

The post Where the heck did all of these librarians come from? appeared first on District Dispatch.