Planet Code4Lib

Hear How We’re All Makers – a LITA webinar / LITA

Sign up Now for

Diversity, Inclusion, and Empowerment in Library Makerspaces
Instructors: Sharona Ginsberg, Learning Technologies Librarian, SUNY Oswego and Lauren Di Monte, Data & Research Impact Librarian, University of Rochester
December 6, 2017, 11:00 am – 12:30 pm Central time

Sharona Ginsberg headshot     Lauren DiMonte headshot

One oft-overlooked aspect of making and makerspaces is its potential for empowerment, especially among populations that are otherwise marginalized or underrepresented. This 90 minute webinar will discuss why making is important for these populations, and what libraries can do to ensure their makerspaces are safe spaces of diversity, inclusion, and accessibility.
The presenters will each speak from their respective institutional contexts, but they will also provide specific tips and actions librarians can take, regardless of institutional type or budget. The presentation will address issues of accessibility in the sense of eliminating barriers for those with disabilities, and will address inclusion in terms of physical ability, neurodiversity, age, race and ethnicity, religion, socioeconomic status, gender identity, sexual orientation, and community status (i.e. student, faculty, etc.).

View details and Register here.

Discover upcoming LITA webinars and web courses

Digital Life Decoded: A user-centered approach to cyber-security and privacy
Offered: December 12, 2017

Questions or Comments?

For all other questions or comments related to the course, contact LITA at (312) 280-4268 or Mark Beatty,

HTTPS only for MPG/SFX and MPG.eBooks / Max Planck Digital Library

As of next week, all http requests to the MPG/SFX link resolver will be redirected to a corresponding https request.

The Max Planck Society electronic Book Index is scheduled to be switched to https only access the week after, starting on November 27, 2017.

Regular web browser use of the above services should not be affected.

Please thoroughly test any solutions that integrate these services via their web APIs.

Please consider re-subscribing to MPG.eBooks RSS feeds.

What burns away / Karen G. Schneider

We are among the lucky ones. We did not lose our home. We did not spend day after day evacuated, waiting to learn the fate of where we live. We never lost power or Internet. We had three or four days where we were mildly inconvenienced because PG&E wisely turned off gas to many neighborhoods, but we showered at the YMCA and cooked on an electric range we had been planning to upgrade to gas later this fall (and just did, but thank you, humble Frigidaire electric range, for being there to let me cook out my anxiety). We kept our go-bags near the car, and then we kept our go-bags in the car, and then, when it seemed safe, we took them out again. That, and ten days of indoor living and wearing masks when we went out, was all we went through.

But we all bear witness.

The Foreshadowing

It began with a five-year drought that crippled forests and baked plains, followed by an soaking-wet winter and a lush  spring that crowded the hillsides with greenery. Summer temperatures hit records several times, and the hills dried out as they always do right before autumn, but this time unusually crowded with parched foliage and growth.

The air in Santa Rosa was hot and dry that weekend, an absence of humidity you could snap between your fingers. In the southwest section of the city, where we live, nothing seemed unusual. Like many homes in Santa Rosa our home does not have air conditioning, so for comfort’s sake I grilled our dinner, our 8-foot backyard fence buffering any hint of the winds gathering speed northeast of us. We watched TV and went to bed early.

Less than an hour later one of several major fires would be born just 15 miles east of where we slept.

Reports vary, but accounts agree it was windy that Sunday night, with windspeeds ranging between 35 and 79 miles per hour, and a gust northwest of Santa Rosa reaching nearly 100 miles per hour. If the Diablo winds were not consistently hurricane-strength, they were exceptionally fast, hot, and dry, and they meant business.

A time-lapse map of 911 calls shows the first reports of downed power lines and transformers coming in around 10 pm.  The Tubbs fire was named for a road that is named for a 19th-century winemaker who lived in a house in  Calistoga that burned to the ground in an eerily similar fire in 1964. In three hours this fire sped 12 miles southwest, growing in size and intent as it gorged on hundreds and then thousands of homes in its way, breaching city limits and expeditiously laying waste to 600 homes in the Fountaingrove district before it tore through the Journey’s End mobile home park, then reared back on its haunches and leapt across a six-lane divided section of Highway 101, whereupon it gobbled up big-box stores and fast food restaurants flanking Cleveland Avenue, a business road parallel to the highway.  Its swollen belly, fat with miles of fuel, dragged over the area and took out buildings in the  the random manner of fires. Kohl’s and KMart were totaled and Trader Joe’s was badly damaged, while across the street from KMart, JoAnn Fabrics was untouched. The fire demolished one Mexican restaurant, hopscotched over another, and feasted on a gun shop before turning its ravenous maw toward the quiet middle-class neighborhood of Coffey Park, making short work of thousands more homes.

Santa Rosa proper is itself only 41 square miles, approximately 13 miles north-south and 9 miles east-west, including the long tail of homes flanking the Annadel mountains. By the time Kohl’s was collapsing, the “wildfire” was less than 4 miles from our home.

I woke up around 2 am, which I tend to do a lot anyway. I walked outside and smelled smoke, saw people outside their homes looking around, and went on Twitter and FaceBook. There I learned of a local fire, forgotten by most in the larger conflagration, but duly noted in brief by the Press Democrat: a large historic home at 6th and Pierson burned to the ground, possibly from  a downed transformer, and the fire licked the edge of the Santa Rosa Creek Trail for another 100 feet. Others in the West End have reported the same experience of reading about the 6th Street house fire on social media and struggling to reconcile the reports of this fire with reports of panic and flight from areas north of us and videos of walls of flame.

At 4 am I received a call that the university had activated its Emergency Operations Center and I asked if I should report in. I showered and dressed, packed a change of clothes in a tote bag, threw my bag of important documents in my purse, and drove south on my usual route to work, Petaluma Hill Road. The hills east of the road flickered with fire, the road itself was packed with fleeing drivers, and halfway to campus I braked at 55 mph when a massive buck sprang inches in front of my car, not running in that “oops, is this a road?” way deer usually cross lanes of traffic but yawing too and fro, its eyes wide. I still wonder, was it hurt or dying.

As I drove onto campus I thought, the cleaning crew. I parked at the Library and walked through the building, already permeated with smoky air. I walked as quietly as I could, so that if they were anywhere in the building I would hear them. As I walked through the silent building I wondered, is this the last time I will see these books? These computers? The new chairs I’m so proud of? I then went to the EOC and found the cleaning crew had been accounted for, which was a relief.

At Least There Was Food And Beer

A few hours later I went home. We had a good amount of food in the house, but like many of us who were part of this disaster but not immediately affected by it, I decided to stock up. The entire Santa Rosa Marketplace– CostCo and Trader Joe’s, Target–on Santa Rosa Avenue was closed, and Oliver’s had a line outside of people waiting to get in. I went to the “G&G Safeway”–the one that took over a down-at-the-heels family market known as G&G and turned it into a spiffy market with a wine bar, no less–and it was without power, but open for business and, thanks to a backup system, able to take ATM cards. I had emergency cash on me but was loathe to use it until I had to.

Sweating through an N95 mask I donned to protect my lungs, I wheeled my cart through the dark store, selecting items that would provide protein and carbs if we had to stuff them in our go-bags, but also fresh fruit and vegetables, dairy and eggs–things I thought we might not see for a while, depending on how the disaster panned out. (Note, we do already have emergency food, water, and other supplies.) The cold case for beer was off-limits–Safeway was trying to retain the cold in its freezer and fridge cases in case it could save the food–but there was a pile of cases of Lagunitas Lil Sumpin Sumpin on sale, so that with a couple of bottles of local wine went home with me too.

And with one wild interlude, for most of the rest of the time we stayed indoors with the windows closed.  I sent out email updates and made phone calls, kept my phone charged and read every Nexil alert, and people at work checked in with one another. My little green library emergency contact card stayed in my back pocket the entire time. We watched TV and listened to the radio, including extraordinary local coverage by KSRO, the Little Station that Could; patrolled newspapers and social media; and rooted for Sheriff Rob, particularly after his swift smack-down of a bogus, Breitbart-fueled report that an undocumented person had started the fires.

Our home was unoccupied for a long time before we moved in this September, possibly up to a decade, while it was slowly but carefully upgraded. The electric range was apparently an early purchase; it was a line long discontinued by Frigidaire, with humble electric coils. But it had been unused until we arrived, and was in perfect condition. If an electric range could express gratitude for finally being useful, this one did. I used it to cook homey meals: pork loin crusted with Smithfield bacon; green chili cornbread; and my sui generis meatloaf, so named because every time I make it, I grind and add meat scraps from the freezer for a portion of the meat mixture. (It would be several weeks before I felt comfortable grilling again.) We cooked. We stirred. We sauteed. We waited.

On Wednesday, we had to run an errand. To be truthful, it was an Amazon delivery purchased that Saturday, when the world was normal, and sent to an Amazon locker at the capacious Whole Foods at Coddington Mall, a good place to send a package until the mall closes down because the northeast section of the city is out of power and threatened by a massive wildfire. By Wednesday, Whole Foods had reopened, and after picking up my silly little order–a gadget that holds soda cans in the fridge–we drove past Russian River Brewing Company and saw it was doing business, so we had salad and beer for lunch, because it’s a luxury to have beer at lunch and the fires were raging and it’s so hard to get seating there nights and weekends, when I have time to go there, but there we were. We asked our waiter how he was doing, and he said he was fine but he motioned to the table across from ours, where a family was enjoying pizza and beer, and he said they had lost their homes.

There were many people striving for routine during the fires, and to my surprise, even the city planning office returned correspondence regarding some work we have planned for our new home, offering helpful advice on the permitting process required for minor improvements for homes in historic districts. Because it turns out developers and engineers could serenely ignore local codes and build entire neighborhoods in Santa Rosa in areas known to be vulnerable to wildfire; but to replace bare dirt with a little white wooden picket fence, or to restore front windows from 1950s-style plate glass to double-hung wooden windows with mullions–projects intended to reinstate our house to its historic accuracy, and to make it more welcoming–requires a written justification of the project, accompanying photos, “Proposed Elevations (with Landscape Plan IF you are significantly altering landscape) (5 copies),” five copies of a paper form, a Neighborhood Context and Vicinity Map provided by the city, and a check for $346, followed by “8-12 weeks” before a decision is issued.

The net result of this process is like the codes about not building on ridges, though much less dangerous; most people ignore the permitting process, so that the historic set piece that is presumably the goal is instead rife with anachronisms. And of course, first I had to bone up on the residential building code and the historic district guidelines, which contradict one another on key points, and because the permitting process is poorly documented I have an email traffic thread rivaling in word count Byron’s letters to his lovers.

But the planning people are very pleasant, and we all seemed to take comfort in plodding through the administrivia of city bureaucracy as if we were not all sheltering in place, masks over our noses and mouths, go-bags in our cars, while fires raged just miles from their office and our home.

The Wild Interlude, or, I Have Waited My Entire Career For This Moment

Regarding the wild interlude, the first thing to know about my library career is that nearly everywhere I have gone where I have had the say-so to make things happen, I have implemented key management. That mishmosh of keys in  a drawer, the source of so much strife and arguments, becomes an orderly key locker with numbered labels. It doesn’t happen overnight, because keys are control and control is political and politics are what we tussle about in libraries because we don’t have that much money, but it happens.

Sometimes I even succeed in convincing people to sign keys out so we know who has them. Other times I convince people to buy a locker with a keypad so we sidestep the question of where the key to the key locker is kept. But mostly, I leave behind the lockers, and, I hope, an appreciation for lockers. I realize it’s not quite as impressive as founding the Library of Alexandria, and it’s not what people bring up when I am introduced as a keynote speaker, and I have never had anyone ask for a tour of my key lockers nor have I ever been solicited to write a peer-reviewed article on key lockers. However unheralded, it’s a skill.

My memory insists it was Tuesday, but the calendar says it was late Monday night when I received a call that the police could not access a door to an area of the library where we had high-value items. It would turn out that this was a rogue lock, installed sometime soon after the library opened in 2000, that unlike others did not have a master registered with the campus, an issue we have since rectified. But in any event, the powers that be had the tremendous good fortune to contact the person who has been waiting her entire working life to prove beyond doubt that KEY LOCKERS ARE IMPORTANT.

After a brief internal conversation with myself, I silently nixed the idea of offering to walk someone through finding the key. I said I knew where the key was, and I could be there in twenty minutes to find it. I wasn’t entirely sure this was the case, because as obsessed as I am with key lockers, this year I have been preoccupied with things such as my deanly duties, my doctoral degree completion, national association work, our home purchase and household move, and the selection of geegaws like our new gas range (double oven! center griddle!). This means I had not spend a lot of time perusing this key locker’s manifest. So there was an outside chance I would have to find the other key, located somewhere in an another department, which would require a few more phone calls. I was also in that liminal state between sleep and waking; I had been asleep for two hours after being up since 2 am, and I would have agreed to do just about anything.

Within minutes I was dressed and again driving down Petaluma Hill Road, still busy with fleeing cars.  The mountain ridges to the east of the road roiled with flames, and I gripped the steering wheel, watching for more animals bolting from fire. Once in the library, now sour with smoke, I ran up the stairs into my office suite and to the key locker, praying hard that the key I sought was in it. My hands shook. There it was, its location neatly labeled by the key czarina who with exquisite care had overseen the organization of the key locker. The me who lives in the here-and-now profusely thanked past me for my legacy of key management, with a grateful nod to the key czarina as well. What a joy it is to be able to count on people!

Items were packed up, and off they rolled. After a brief check-in at the EOC, home I went, to a night of “fire sleep”–waking every 45 minutes to sniff the air and ask, is fire approaching?–a type of sleep I would have for the next ten days, and occasionally even now.

How we speak to one another in the here and now

Every time Sandy and I interact with people, we ask, how are you. Not, hey, how are ya, where the expected answer is “fine, thanks” even if you were just turned down for a mortgage or your mother died. But no, really, how are you. Like, fire-how-are-you. And people usually tell you, because everyone has a story. Answers range from: I’m ok, I live in Petaluma or Sebastopol or Bodega Bay (in SoCo terms, far from the fire), to I’m ok but I opened my home to family/friends/people who evacuated or lost their homes; or, I’m ok but we evacuated for a week; or, as the guy from Home Depot said, I’m ok and so is my wife, my daughter, and our 3 cats, but we lost our home.

Sometimes they tell you and they change the subject, and sometimes they stop and tell you the whole story: when they first smelled smoke, how they evacuated, how they learned they did or did not lose their home. Sometimes they have before-and-after photos they show you. Sometimes they slip it in between other things, like our cat sitter, who mentioned that she lost her apartment in Fountaingrove and her cat died in the fire but in a couple of weeks she would have a home and she’d be happy to cat-sit for us.

Now, post-fire, we live in that tritest of phrases, a new normal. The Library opened that first half-day back, because I work with people who like me believe that during disasters libraries should be the first buildings open and the last to close. I am proud to report the Library also housed NomaCares, a resource center for those at our university affected by the fire. That first Friday back we held our Library Operations meeting, and we shared our stories, and that was hard but good. But we also resumed regular activity, and soon the study tables and study rooms were full of students, meetings were convened, work was resumed, and the gears of life turned. But the gears turned forward, not back. Because there is no way back.

I am a city mouse, and part of moving to Santa Rosa was our decision to live in a highly citified section, which turned out to be a lucky call. But my mental model of city life has been forever twisted by this fire. I drive on 101 just four miles north of our home, and there is the unavoidable evidence of a fire boldly leaping into an unsuspecting city. I go to the fabric store, and I pass twisted blackened trees and a gun store totaled that first night. I drive to and from work with denuded hills to my east a constant reminder.

But that’s as it should be. Even if we sometimes need respite from those reminders–people talk about taking new routes so they won’t see scorched hills and devastated neighborhoods–we cannot afford to forget. Sandy and I have moved around the country in our 25 years together, and we have seen clues everywhere that things are changing and we need to take heed. People like to lapse into the old normal, but it is not in our best interests to do so.

All of our stories are different. But we share a collective loss of innocence, and we can never return to where we were. We can only move forward, changed by the fire, changed forever.

Org clocktables II: Summarizing a month / William Denton

In Org clocktables I: The daily structure I explained how I track my time working at an academic library, clocking in to projects that are either categorized as PPK (“professional performance and knowledge,” our term for “librarianship,”), PCS (“professional contributions and standing”, which covers research, professional development and the like) and Service. I do this by checking in and out of tasks with the magic of Org.

I’ll add a day to the example I used before, to make it more interesting. This is what the raw text looks like:

* 2017-12 December

** [2017-12-01 Fri]
CLOCK: [2017-12-01 Fri 09:30]--[2017-12-01 Fri 09:50] =>  0:20
CLOCK: [2017-12-01 Fri 13:15]--[2017-12-01 Fri 13:40] =>  0:25

*** PPK

**** Libstats stuff
CLOCK: [2017-12-01 Fri 09:50]--[2017-12-01 Fri 10:15] =>  0:25

Pull numbers on weekend desk activity for A.

**** Ebook usage
CLOCK: [2017-12-01 Fri 13:40]--[2017-12-01 Fri 16:30] =>  2:50

Wrote code to grok EZProxy logs and look up ISBNs of Scholars Portal ebooks.

*** PCS

*** Service

**** Stewards' Council meeting
CLOCK: [2017-12-01 Fri 10:15]--[2017-12-01 Fri 13:15] =>  3:00

Copious meeting notes here.

** [2017-12-04 Mon]
CLOCK: [2017-12-04 Mon 09:30]--[2017-12-04 Mon 09:50] =>  0:20
CLOCK: [2017-12-04 Mon 12:15]--[2017-12-04 Mon 13:00] =>  0:45
CLOCK: [2017-12-04 Mon 16:00]--[2017-12-04 Mon 16:15] =>  0:15

*** PPK

**** ProQuest visit
CLOCK: [2017-12-04 Mon 09:50]--[2017-12-04 Mon 12:15] =>  2:25

Notes on this here.

**** Math print journals
CLOCK: [2017-12-04 Mon 16:15]--[2017-12-04 Mon 17:15] =>  1:00

Check current subs and costs; update list of print subs to drop.

*** PCS

**** Pull together sonification notes
CLOCK: [2017-12-04 Mon 13:00]--[2017-12-04 Mon 16:00] =>  3:00

*** Service

All raw Org text looks ugly, especially all those LOGBOOK and PROPERTIES drawers. Don’t let that put you off. This is what it looks like on my screen with my customizations (see my .emacs for details):

Much nicer in Emacs. Much nicer in Emacs.

At the bottom of the month I use Org’s clock table to summarize all this.

#+BEGIN: clocktable :maxlevel 3 :scope tree :compact nil :header "#+NAME: clock_201712\n"
#+NAME: clock_201712
| Headline             | Time  |      |      |
| *Total time*           | *14:45* |      |      |
| 2017-12 December     | 14:45 |      |      |
| \_  [2017-12-01 Fri] |       | 7:00 |      |
| \_    PPK            |       |      | 3:15 |
| \_    Service        |       |      | 3:00 |
| \_  [2017-12-04 Mon] |       | 7:45 |      |
| \_    PPK            |       |      | 3:25 |
| \_    PCS            |       |      | 3:00 |

I just put in the BEGIN/END lines and then hit C-c C-c and Org creates that table. Whenever I add some more time, I can position the pointer on the BEGIN line and hit C-c C-c and it updates everything.

Now, there are lots of commands I could use to customize this, but this is pretty vanilla and it suits me. It makes it clear how much time I have down for each day and how much time I spent in each of the three pillars. It’s easy to read at a glance. I fiddled with various options but decided to stay with this.

It looks like this on my screen:

Much nicer in Emacs. Much nicer in Emacs.

That’s a start, but the data is not in a format I can use as is. The times are split across different columns, there are multiple levels of indents, there’s a heading and a summation row, etc. But! The data is in a table in Org, which means I can easily ingest it and process it in any language I choose, in the same Org file. That’s part of the power of Org: it turns raw data into structured data, which I can process with a script into a better structure, all in the same file, mixing text, data and output.

Which language, though? A real Emacs hacker would use Lisp, but that’s beyond me. I can get by in two languages: Ruby and R. I started doing this in Ruby, and got things mostly working, then realized how it should go and what the right steps were to take, and switched to R.

Here’s the plan:

  • ignore “Headline” and “Total time” and “2017-12 December” … in fact, ignore everything that doesn’t start with “\_”
  • clean up the remaining lines by removing “\_”
  • the first line will be a date stamp, with the total day’s time in the first column, so grab it
  • after that, every line will either be a PPK/PCS/Service line, in which case grab that time
  • or it will be a new date stamp, in which case capture that information and write out the previous day’s information
  • continue on through all the lines
  • until the end, at which point a day is finished but not written out, so write it out

I did this in R, using three packages to make things easier. For managing the time intervals I’m using hms, which seems like a useful tool. It needs to be a very recent version to make use of some time-parsing functions, so it needs to be installed from GitHub. Here’s the R:

library(hms) ## Right now, needs GitHub version
clean_monthly_clocktable <- function (raw_clocktable) {
  ## Clean up the table into something simple
  clock <- raw_clocktable %>% filter(grepl("\\\\_", Headline)) %>% mutate(heading = str_replace(Headline, "\\\\_ *", "")) %>% mutate(heading = str_replace(heading, "] .*", "]")) %>% rename(total = X, subtotal = X.1) %>% select(heading, total, subtotal)

  ## Set up the table we'll populate line by line
  newclock <- tribble(~date, ~ppk, ~pcs, ~service, ~total)

  ## The first line we know has a date and time, and always will
  date_old <- substr(clock[1,1], 2, 11)
  total_time_old <- clock[1,2]
  date_new <- NA
  ppk <- pcs <- service <- vacation <- total_time_new <- "0:00"

  ## Loop through all lines ...
  for (i in 2:nrow(clock)) {
    if      (clock[i,1] == "PPK")     { ppk      <- clock[i,3] }
    else if (clock[i,1] == "PCS")     { pcs      <- clock[i,3] }
    else if (clock[i,1] == "Service") { service  <- clock[i,3] }
    else {
     date_new <- substr(clock[i,1], 2, 11)
     total_time_new <- clock[i,2]
    ## When we see a new date, add the previous date's details to the table
    if (! {
     newclock <- newclock %>% add_row(date = date_old, ppk, pcs, service, total = total_time_old)
     ppk <- pcs <- service <- "0:00"
     date_old <- date_new
     date_new <- NA
     total_time_old <- total_time_new

  ## Finally, add the final date to the table, when all the rows are read.
  newclock <- newclock %>% add_row(date = date_old, ppk, pcs, service, total = total_time_old)
  newclock <- newclock %>% mutate(ppk = parse_hm(ppk), pcs = parse_hm(pcs), service = parse_hm(service), total = parse_hm(total), lost = as.hms(total - (ppk + pcs + service))) %>% mutate(date = as.Date(date))

All of that is in a SRC block like below, but I separated the two in case it makes the syntax highlighting clearer. I don’t think it does, but such is life. Imagine the above code pasted into this block:

#+BEGIN_SRC R :session :results values


Running C-c C-c on that will produce no output, but it does create an R session and set up the function. (Of course, all of this will fail if you don’t have R (and those three packages) installed.)

With that ready, now I can parse that monthly clocktable by running C-c C-c on this next source block, which reads in the raw clock table (note the var setting, which matches the #+NAME above), parses it with that function, and outputs cleaner data. I have this right below the December clock table.

#+BEGIN_SRC R :session :results values :var clock_201712=clock_201712 :colnames yes

|       date |      ppk |      pcs |  service |    total |     lost |
| 2017-12-01 | 03:15:00 | 00:00:00 | 03:00:00 | 07:00:00 | 00:45:00 |
| 2017-12-04 | 03:25:00 | 03:00:00 | 00:00:00 | 07:45:00 | 01:20:00 |

This is tidy data. It looks this this:

Again, in Emacs Again, in Emacs

That’s what I wanted. The code I wrote to generate it could be better, but it works, and that’s good enough.

Notice all of the same dates and time durations are there, but they’re organized much more nicely—and I’ve added “lost.” The “lost” count is how much time in the day was unaccounted for. This includes lunch (maybe I’ll end up classifying that differently), short breaks, ploughing through email first thing in the morning, catching up with colleagues, tidying up my desk, falling into Wikipedia, and all those other blocks of time that can’t be directly assigned to some project.

My aim is to keep track of the “lost” time and to minimize it, by a) not wasting time and b) properly classifying work. Talking to colleagues and tidying my desk is work, after all. It’s not immortally important work that people will talk about centuries from now, but it’s work. Not everything I do on the job can be classified against projects. (Not the way I think of projects—maybe lawyers and doctors and the self-employed think of them differently.)

The one technical problem with this is that when I restart Emacs I need to rerun the source block with the R function in it, to set up the R session and the function, before I can rerun the simple “update the monthly clocktable” block. However, because I don’t restart Emacs very often, that’s not a big problem.

The next stage of this is showing how I summarize the cleaned data to understand, each month, how much of my time I spent on PPK, PCS and Service. I’ll cover that in another post.

House passes OPEN Act to improve public access to government data / District Dispatch

Orange neon sign saying "OPEN"

The House of Representatives passed the OPEN Government Data Act on Nov. 15, 2017, as part of the bipartisan Foundations for Evidence-Based Policymaking Act.

On Wednesday, November 15, the House of Representatives passed ALA-supported legislation to improve public access to government data. The Open, Public, Electronic, and Necessary (OPEN) Government Data Act was included as part of the Foundations for Evidence-Based Policymaking Act (H.R. 4174), which the House passed by voice vote. Passage of the bill represents a victory for library advocates, who have supported the legislation since it was first introduced last year.

The OPEN Government Data Act would make more government data freely available online, in machine-readable formats, and discoverable through a federal data catalog. The legislation would codify and build upon then-President Obama’s 2013 executive order. ALA President Jim Neal responded to passage of the bill by saying,

ALA applauds the House’s passage of the OPEN Government Data Act today. This bill will make it easier for libraries to help businesses, researchers and students find and use valuable data that makes American innovation and economic growth possible. The strong bipartisan support for this legislation shows access to information is a value we can all agree on.

With this vote, both the House and the Senate have now passed the OPEN Government Data Act, albeit in different forms. In September, the Senate passed the OPEN bill as an attachment to the annual defense bill, but the provision was later removed in conference with the House. This shows that the Senate supports the fundamental concepts of the OPEN bill – now the question is whether the Senate will agree to the particular details of H.R. 4174 (which also contains new provisions that will require negotiation).

ALA hopes that Congress will soon reach agreement to send the OPEN Government Data Act to the President’s desk so that taxpayers can make better use of these valuable public assets. ALA thanks House Speaker Paul Ryan (R-WI), Reps. Trey Gowdy (R-SC), Derek Kilmer (D-WA), and Blake Farenthold (R-TX), and Sens. Patty Murray (D-WA), Brian Schatz (D-HI), and Ben Sasse (R-NE), for their leadership in unlocking data that will unleash innovation.


The post House passes OPEN Act to improve public access to government data appeared first on District Dispatch.

Techno-hype part 2 / David Rosenthal

Don't, don't, don't, don't believe the hype!
Public Enemy

Enough about the hype around self-driving cars, now on to the hype around cryptocurrencies.

Sysadmins like David Gerard tend to have a realistic view of new technologies; after all, they get called at midnight when the technology goes belly-up. Sensible companies pay a lot of attention to their sysadmins' input when it comes to deploying new technologies.

Gerard's Attack of the 50 Foot Blockchain: Bitcoin, Blockchain, Ethereum & Smart Contracts is a must-read, massively sourced corrective to the hype surrounding cryptocurrencies and blockchain technology. Below the fold, some tidbits and commentary. Quotes not preceded by links are from the book, and I have replaced some links to endnotes with direct links.

Gerard's overall thesis is that the hype is driven by ideology, which has resulted in cult-like behavior that ignores facts, such as:
Bitcoin ideology assumes that inflation is a purely monetary phenomenon that can only be caused by printing more money, and that Bitcoin is immune due to its strictly limited supply. This was demonstrated trivially false when the price of a bitcoin dropped from $1000 in late 2013 to $200 in early 2015 - 400% inflation - while supply only went up 10%.
There's recent evidence for this in the collapse of the SegWit2x proposal to improve Bitcoin's ability to scale. As Timothy B Lee writes:
There's a certain amount of poetic justice in the fact that leading Bitcoin companies trying to upgrade the Bitcoin network were foiled by a populist backlash. Bitcoin is as much a political movement as it is a technology project, and the core idea of the movement is a skepticism about decisions being made behind closed doors.
Gerard quotes Satoshi Nakamoto's release note for Bitcoin 0.1:
The root problem with conventional currency is all the trust that's required to make it work. The central bank must be trusted not to debase the currency, but the history of fiat currencies is full of breaches of that trust. Banks must be trusted to hold our money and transfer it electronically, but they lend it out in waves of credit bubbles with barely a fraction in reserve. We have to trust them with our privacy, trust them not to let identity thieves drain our accounts. Their massive overhead costs make micropayments impossible.
And points out that:
Bitcoin failed at every one of Nakamoto's aspirations here. The price is ridiculously volatile and has had multiple bubbles; the unregulated exchanges (with no central bank backing) front-run their customers, paint the tape to manipulate the price, and are hacked or just steal their user's funds; and transaction fees and the unreliability of transactions make micropayments completely unfeasible.
Instead, Bitcoin is a scheme to transfer money from later to earlier adopters:
Bitcoin was substantially mined early on - early adopters have most of the coins. The design was such that early users would get vastly better rewards than later users for the same effort.

Cashing in these early coins involves pumping up the price, then selling to later adopters, particularly in the bubbles. Thus Bitcoin was not a Ponzi or pyramid scheme, but a pump-and-dump. Anyone who bought in after the earliest days is functionally the sucker in the relationship.
Satoshi Nakamoto mined (but has never used) nearly 5% of all the Bitcoin there will ever be, a stash now notionally worth $7.5B. The distribution of notional Bitcoin wealth is highly skewed:
a Citigroup analysis from early 2014 notes: "47 individuals hold about 30 percent, another 900 hold a further 20%, the next 10,000 about 25% and another million about 20%".
Not that the early adopters' stashes are circulating:
Dorit Ron and Adi Shamir found in a 2012 study that only 22% of then-existing bitcoins were in circulation at all, there were a total of 75 active users or businesses with any kind of volume, one (unidentified) user owned a quarter of all bitcoins in existence, and one large owner was tying to hide their pile by moving it around in thousands of smaller transactions.
In the Citigroup analysis, Steven Englander wrote:
The uneven distribution of Bitcoin wealth may be the price to be paid for getting a rapid dissemination of the Bitcoin payments and store of value technology. If you build a better mousetrap, everyone expects you to profit from your invention, but users benefit as well, so there are social benefits even if the innovator grabs a big share.
Well, yes, but in this case the 1% of the population who innovated appear to have grabbed about 80% of the wealth, which is a bit excessive.

Since there are very few legal things you can buy with Bitcoin (see Gerard's Chapter 7) this notional wealth is only real if you can convert it into a fiat currency such as USD with which you can buy legal things. There are two problems doing so.

First, Nakamoto's million-Bitcoin hoard is not actually worth $7.5B. It is worth however many dollars other people would pay for it, which would be a whole lot less than $7.5B:
large holders trying to sell their bitcoins risk causing a flash crash; the price is not realisable for any substantial quantity. The market remains thin enough that single traders can send the price up or down $30, and an April 2017 crash from $1180 to 6 cents (due to configuration errors on Coinbase's GDAX exchange) was courtesy of 100 BTC of trades.
Second, Jonathan Thornburg was prophetic but not the way he thought:
A week after Bitcoin 0.1 was released, Jonathan Thornburg wrote on the Cryptography and Cryptography Policy mailing list: "To me, this means that no major government is likely to allow Bitcoin in its present form to operate on a large scale."
Governments have no problem with people using electricity to compute hashes. As Dread Pirate Roberts found out, they have ways of making their unhappiness clear when this leads to large-scale purchases of illicit substances. But they get really serious when this leads to large-scale evasion of taxes and currency controls.

Governments and the banks they charter like to control their money. The exchanges on which, in practice, almost all cryptocurrency transactions take place are, in effect, financial institutions but are not banks. To move fiat money to and from users the exchanges need to use actual banks. This is where governments exercise control, with regulations such as the US Know Your Customer/Anti Money Laundering regulations. These make it very difficult to convert Bitcoin into fiat currency without revealing real identities and thus paying taxes or conforming to currency controls.

Gerard stresses that Bitcoin is in practice a Chinese phenomenon, both on the mining side:
From 2014 onward, the mining network was based almost entirely in China, running ASICs on very cheap subsidised local electricity (There has long been speculation that much of this is to evade currency controls - buy electricity in yuan, sell bitcoins for dollars)
And on the trading side:
Approximately 95% of on-chain transactions are day traders on Chinese exchanges; Western Bitcoin advocates are functionally a sideshow, apart from the actual coders who work on the Bitcoin core software.
Gerard agrees with my analysis in Economies of Scale in Peer-to-Peer Networks that economics made decentralization impossible to sustain:
Everything about mining is more efficient in bulk. By the end of 2016, 75% of the bitcoin hashrate was being generated in one building, using 140 megawatts - or over half the estimated power used by all of Google's data centres worldwide.
This is the one case where I failed to verify Gerard's citation. The post he links to at NewsBTC says (my emphasis):
According to available information, the Bitmain Cloud Computing Center in Xinjiang, Mainland China will be a 45 room facility with three internal filters maintaining a clean environment. The 140,000 kW facility will also include independent substations and office space.
The post suggests that the facility wasn't to be completed until the following month, and quotes a tweet from Peter Todd (my emphasis):
So that's potentially as much as 75% of the current Bitcoin hashing power in one place
Gerard appears to have been somewhat ahead of the game.

The most interesting part of the book is Gerard's discussion of Bitfinex, and his explanation for the current bubble in Bitcoin. You need to read the whole thing, but briefly:
  • Bitfinex was based on the code from Bitcoinica, written by a 16-year old. The code was a mess.
  • As a result, in August 2016 nearly 120K BTC (then quoted at around $68M) was stolen from Bitfinex customer accounts.
  • Bitfinex avoided bankruptcy by imposing a 36% haircut across all its users' accounts.
  • Bitfinex offered the users "tokens", which they eventually, last April, redeemed for USD at roughly half what the stolen Bitcoins were then worth.
  • But by then Bitfinex's Taiwanese banks could no longer send USD wires, because Wells Fargo cut them off.
  • So the "USD" were trapped at Bitfinex, and could only be used to buy Bitcoin or other cryptocurrencies on Bitfinex. This caused the Bitcoin price on Bitfinex to go up.
  • Arbitrage between Bitfinex and the other exchanges (which also have trouble getting USD out) caused the price on other exchanges to rise.
Gerard points out that this mechanism drives the current Initial Coin Offering mania:
The trapped "USD" also gets used to buy other cryptocurrencies - the price of altcoins tends to rise and fall with the price of bitcoins - and this has fueled new ICOs ... as people desperately look for somewhere to put their unspendable "dollars". This got Ethereum and ICOs into the bubble as well.
In a November 3 post to his blog, Gerard reports that:
You haven’t been able to get actual money out of Bitfinex since mid-March, but now there are increasing user reports of problems withdrawing cryptos as well (archive).
Don't worry, the Bitcoin trapped like the USD at Bitfinex can always be used in the next ICO! Who cares about the SEC:
Celebrities and others have recently promoted investments in Initial Coin Offerings (ICOs).  In the SEC’s Report of Investigation concerning The DAO, the Commission warned that virtual tokens or coins sold in ICOs may be securities, and those who offer and sell securities in the United States must comply with the federal securities laws.
Or the Chinese authorities:
The People's Bank of China said on its website Monday that it had completed investigations into ICOs, and will strictly punish offerings in the future while penalizing legal violations in ones already completed. The regulator said that those who have already raised money must provide refunds, though it didn't specify how the money would be paid back to investors.
This post can only give a taste of an entertaining and instructive book, well worth giving to the Bitcoin enthusiasts in your life. Or you can point them to Izabella Kaminska's interview of David Gerard - it's a wonderfully skeptical take on blockchain technologies and markets.

Jobs in Information Technology: November 15, 2017 / LITA

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week

Humboldt County, County Librarian, Eureka, CA




Colorado State University Libraries, Data Management Specialist, Fort Colilns, CO

Colorado State University Libraries, Help Desk Director, Fort Collins, CO

Eastside Catholic School, Librarian, Sammamish, WA

Rice University, Fondren Library, Data and Government Information Librarian, Houston, TX

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

How mundane admin records helped open Finnish politics: An example of “impolite” transparency advocacy / Open Knowledge Foundation

This blogpost was jointly written by Aleksi Knuutila and Georgia Panagiotidou. Their bio’s can be found at the bottom of the page.

In a recent blog post Tom Steinberg, long-term advocate of transparency and open data, looked back on what advocacy groups working on open government had achieved in the past decade. Overall, progress is disappointing. Freedom of Information laws are under threat in many countries, and for all the enthusiasm for open data, much of the information that is public interest remains closed. Public and official support for transparency might be at an all time high, but that doesn’t necessarily mean that governments are transparent.

Steinberg blames the poor progress on one vice of the advocacy groups: being excessively polite. In his interpretation, groups working on transparency, particularly in his native UK, have relied on collaborative, win-win solutions with public authorities. They had been “like a caged bear, tamed by a zookeeper through the feeding of endless tidbits and snacks”. Significant victories in transparency, however, always had associated losers. Meaningful information about institutions made public will have consequences for people in a position of power. That is why strong initiatives for transparency are rarely the result of “polite” efforts, of collaboration and persuasion. They happen when decision-makers face enough pressure to make transparency seem more attractive than any alternative.

The pressure for opening government information can result from transparency itself, especially when it is forced on government. Here the method with which information is made available matters a great deal. Metahaven, a Dutch design collective, coined the term black transparency for the situations in which disclosure happens in an uninvited or involuntary way. The exposed information may itself demonstrate how its systematic availability can be in the public interest. Yet what can be as revealing in black transparency is the response of the authorities, whose reactions in themselves can show their limited commitment to ideals of openness.

Over the past few years, a public struggle took place in Finland regarding information about who influences legislation. Open Knowledge Finland played a part in shifting the debate and agenda by managing to make public a part of the information in question. The story demonstrates both the value and limitations of opening up data as a method of advocacy.

Finland is not perfect after all

Despite its reputation for good governance, Finnish politics is exceptionally opaque when it comes to information about who wields influence in political decisions. In recent years lobbying has become more professional and increasingly happens through hired communications agencies. Large reforms, such as the overhaul of health care, have been mired by the revolving doors (many links in Finnish) between those who design the rules in government and the interest groups looking to exploit them. Yet lobbying in the country is essentially unregulated, and little information is available about who is consulted or how much different interest groups spend on lobbying. While regulating lobbying is a challenge – and transparency can remain toothless – for instance the European Commission keeps an open log about meetings with interest groups and requires them to publish information about their expenditure on lobbying.

Some mundane administrative records become surprisingly important in the public discussion about transparency. The Finnish parliament, like virtually any public building, keeps a log of people who enter and leave. These visitor logs are kept ostensibly for security and are not necessarily designed to be used for other purposes. Yet Finnish activists and journalists, associated with the NGO Open Ministry and the broadcaster Svenska Yle, seized these records to study the influence of private interests. After an initiative to reform copyright law was dropped by parliament in 2014, the group filed freedom of information requests to access the parliament’s visitor log, to see who had met with the MPs influential in the case. Parliament refused to release the information, and over two years of debate in courts followed. In December 2016 the supreme administrative court declared the records public.

Despite the court’s decision, parliament still made access difficult. Following the judgment, the parliament administration began to delete the visitor log daily, making the most recent information about who MPs meet inaccessible. The court’s decision still forced them to keep an archive of older data. In apparent breach of law, the administration did not release this information in electronic format. When faced with requests for access to the records, parliament printed them on paper and insisted that people come to their office to view them. The situation was unusual: the institution responsible for legislation had also decided that it could choose not to follow the instructions of the courts that interpret law.

At this stage, Open Knowledge Finland secured the resources for a wider study of the parliament visitor logs. Because of the administration’s refusal to release the data electronically, we were uncertain what the best course of action was. Nobody knew what the content of the logs would be and whether going through them would be worth the effort. Still, we decided that we should collect and make the information available as soon as possible, while the archive that parliament kept still had some possible public relevance. Collecting and processing the data turned out to be a long process.

The hard work of turning documents into data

In the summer of 2017 the parliament’s administrative offices, on a side street behind the iconic main building, became familiar to us. After having our bags scanned in security, the staff would lead us to a meeting room. Two thick folders filled with papers had been placed on the table, containing the logs of all parliamentary meetings for a period of three months. We were always three people going to parliament, armed with cameras and staplers. After removing the staples from the printouts, we would take photographs in a carefully framed, standardised frame. To photograph the entire available archive, data from a complete year, required close to 2,000 images and four visits to the parliament offices.

Taking the photos in a carefully cropped way was important, since the next challenge was to turn these images into electronic format again. Only in this way could we have the data as a structured dataset that could be searched and queried. For this task open source tools proved invaluable. We used Tesseract for extracting the text from the images, and Tabula for making sure that the text was placed in structured tables. The process, so-called optical character recognition, was inevitably prone to errors. Some errors we were able to correct using tools such as OpenRefine, which is able to identify the likely mistakes in the dataset. Despite the corrections, we made sure the dataset includes references to the original photos, so that the digitised content could be verified from them.

Transforming the paper documents into a useable database required roughly one month of full-time work, spread between our team members. Yet this was only the first step. The content of the visitor log itself was fairly sparse, in most cases only containing dates and names, and little information about people’s affiliations, let alone the content of their meetings. To refine it, we scraped the parliament’s website and connected the names that occur in the log with the identities and affiliations of members of parliament and party staff. Using simple crowdsourcing techniques and public sources of information, we looked at a sample of the 500 people that most frequently visited parliament and tried to understand who they were working for. This stage of refinement required some tricky editorial choices, determining which questions we wanted the data to answer. We chose for instance to classify the most frequent visitors, to be able to answer questions about what parties are most frequently connected to particular types of visitors.

Collaboration with the media

For data geeks like us, being able to access this information was exciting enough. Yet for our final goal, making a case for better regulation on lobbying, releasing a dataset was not sufficient. We chose to partner with investigative journalists, who would be able to present, verify and contextualise the information to a broader audience. Our own analytical efforts focused broader patterns and regularities in the data, while journalists who have been covering Finnish politics for a long time were able to find the most relevant facts and narratives from the data. We gave the data under an embargo to some key journalists, so they would have the time and resources to work on the information. Afterwards the data was available to all journalists who requested it for their own work.

We were lucky that there was sustained media interest in the information. Alfred Harmsworth, the founder of the Daily Mirror, is attributed with the quote “news is what somebody somewhere wants to suppress; the rest is advertising”. In the same vein, when the story broke that the Finnish parliament had started deleting the most recent data about visitors, the interest in the historical records was guaranteed.

Despite the heightened interest, we also became conscious of how difficult it was for the media to interpret data. This was not just because of a lack of technical skills. There simply was such a significant amount of information – details of about 25,000 visits to parliament – that isolating the most meaningful pieces of information or getting an overview of what had happened was a challenge. For news organisations, for whom the dedication of staff even for days on a topic was a significant undertaking, investing into this kind of research was a risk. Even if they would spend the time going through the data, the returns of doing this were uncertain and unclear.

After we released the data to a wider range of publications, many news outlets ended up running fairly superficial stories based on the data, focusing on for instance the most frequently occurring names and other quantities, instead of going through the investigative effort of interrogating the significance of the meetings described in the logs. Information that is in the form of lists lends itself easily to clickbait-like titles. For media outlets that could not wait for their competition to beat them to it, this was to be expected. The news coverage was probably weakened by the fact that we could not share the data with a broader public, due to the fact that it contained personal details that were potentially sensitive. For instance Naomi Colvin has suggested that searchable public databases, that open information for wider scrutiny and discovery, can help to beat the fast tempo of the news cycle and maintain the relevance of datasets.

The stories that resulted from the data

What did journalists find when they wrote stories based on the data? Suomen Kuvalehti ran an in-depth feature that included investigations into the private companies that were most active lobbying. These included a Russian-backed payday loans provider as well as Uber, whose well-funded efforts extend even to Finland. YLE, the Finnish public broadcaster, described the privileged access that representatives of nuclear power enjoyed, while the newspaper Aamulehti showed how individual meetings between legislators and the finance industry had managed to shift regulation. Our own study of the data showed how representatives of private industry were more likely to have access to parties of the governing coalition, while unions and NGOs met more often with opposition parties.

In essence, the stories provided detail about how well-resourced actors were best placed to influence legislation. It confirmed, a cynical person might note, what most people had thought to be the case in the first place. Yet having clear documentation of this phenomenon may well make it harder to ignore. This line of argumentation was often raised with recent large leaks, the value of which may not lie in making public new facts, but providing the concrete data that makes the issue impossible to ignore. “From now on, we can’t pretend we don’t know”, as Slavoj Zizek ironically noted on Wikileaks.

Overall the media response was large. According to our media tracking, at least 50 articles were written in response to the release of the data. Several national newspapers ran editorials on the need for establishing rules for lobbying. In response, four political parties, out of the eight represented in parliament, declared that they would start publishing their own meetings with lobbyists. Parliament was forced to concede, and began to release daily snapshots of data about meetings in an electronic format. These were significant victories, both in practices of transparency as well as changing the policy agenda.

On the importance of time and resources

For a small NGO such as ours, the digitising and processing of information on this scale would obviously not have been possible recently, perhaps even five years ago. Our work was expedited by the availability of powerful open source tools for difficult tasks such as optical character recognition and correcting errors. Being a small association had its advantages as well, as we were aided by the network around the organisation, from which we were able to draw volunteers in areas from data science to media strategy. In many cases governments contain the consequences of releasing information through a kind of excess of transparency: they release so many documents, often in formats that are hard to process, that their meaning becomes muddled. When documents can be automatically processed and queried, this strategy weakens.

Still, it would be naive to think that technology is enough to make information advocacy effective or enough to allow everybody to participate in it. This line of work was possible due to some people’s commitment and personal sacrifice that spanned several years, as well as significant amounts of funding on the right moments. Notably, no newsroom would by themselves have had the resources to sustain the several months of labour that working through the data required. The strategy of being less “polite”, in Tom Steinberg’s terms, may well be desirable, but the obvious challenge is securing the resources to do it.


Author bio’s

Dr. Aleksi Knuutila is a social scientist with a focus on civic technologies and the politics of data, and an interest in applying both computational and qualitative methods for investigation and research. As a researcher with Open Knowledge Finland, he has advised the Finnish government on their personal data strategy and studied political lobbying using public sources of data. He is currently working on an a toolkit for using freedom of information for investigating how data and analytics are used in the public sector.

Georgia Panagiotidou is a software developer and data visualisation designer, with a focus on the intersections between media and technology. She was part of the Helsingin Sanomat data desk where she used to work to make data stories more reader friendly. Now, among other things, she works in data journalism projects most recently with Open Knowledge Finland to digitise and analyse the Finnish parliament visitor’s log. Her interests lie in open data, civic tech, data journalism and media art.

We would like to thank the following people who gave an invaluable contribution to the work: Sneha Das, Jari Hanska, Antti Knuutila, Erkka Laitinen, Juuso Parkkinen, Tuomas Peltomäki, Aleksi Romanov, Liisi Soroush, Salla Thure

ALA signs trade policy principles / District Dispatch by SamshoeToday ALA signed the Washington Principles on Copyright Balance in Trade Agreements, joining over 70 international copyright experts, think tanks and public interest groups. The Principles address the need for balanced copyright policy in trade agreements.

Over the years, trade policies have increasingly implicated copyright and other IP laws, sometimes creating international trade policies that conflict with U.S. copyright law by enforcing existing rights holder interests without considering the interests of new industry entrants and user rights to information. U.S. copyright law is exemplary in promoting innovation, creativity and information sector industries—software, computer design, research—because of fair use, safe harbor provisions, and other exceptions and limitations lacking in other countries.

The Principles were developed at a convening of U.S., Canadian and Mexican law professor and policy experts held by American University Washington College of Law’s Program on Information Justice and Intellectual Property (PIJIP). These three countries are currently engaged in NAFTA negotiations

The Washington Principles:
• Protect and promote copyright balance, including fair use
• Provide technology-enabling exceptions, such as for search engines and text- and data-mining
• Require safe harbor provisions to protect online platforms from users’ infringement
• Ensure legitimate exceptions for anti-circumvention, such as documentary filmmaking, cybersecurity research, and allowing assistive reading technologies for the blind
• Adhere to existing multilateral commitments on copyright term
• Guarantee proportionality and due process in copyright enforcement

The Principles are supplemented by “Measuring the Impact of Copyright Balance,” new research from the American University that finds that balanced copyright policies in foreign countries have a positive effect on the information sector industries in terms of net income and total sales and in the local production of creative and scholarly works and other high-quality output. These positive effects, however, do not harm the revenues of traditional content and entertainment industries. This suggests that industry, creativity and research are more likely to thrive under more open user rights policies that allow for experimentation and transformative use.

The post ALA signs trade policy principles appeared first on District Dispatch.

Librarians comment on Education Department priorities / District Dispatch

The American Library Association and librarians across the country submitted comments to the Department of Education (ED) in response to its 11 proposed priorities. The priorities, standard for a new administration, are a menu of goals for the ED to use for individual discretionary grant competitions. Over 1,100 individual comments were filed with the ED, including several dozen from the library community. U.S. Department of Education Seal, which bears a tree with a sun shining in the background

ALA noted the important role of public and school libraries in several key priority areas and how librarians help students of all ages. ALA commented on the role of libraries in providing flexible learning environments, addressing STEM needs, promoting literacy, expanding economic opportunity, as well as assisting veterans in achieving their educational goals.

In its letter to the ED, ALA noted:

“Libraries play an instrumental role in advancing formal educational programs as well as informal learning from pre-school through post-secondary education and beyond. Libraries possess relevant information, technology, experts, and community respect and trust to propel education and learning.”

Many librarians responded to ALA’s Action Alert, urging the ED to include libraries in its priorities, reflecting the range of services available at public and school libraries.

Responding to the need for STEM and computer skills development in Priority 6, one Baltimore City library media specialist wrote:

“Computer science is now foundational knowledge every student needs, yet students, particularly students of color and students on free and reduced lunch in urban and rural areas, do not have access to high-quality computer science courses. Females are not participating in equal numbers in the field of computer science or K-12 computer science courses. This is a problem the computer science community can address by giving teachers access to high-quality computer science professional development and schools access to courses focused on serving underserved communities.”

Highlighting the importance of certified librarians at school libraries, one commenter noted that “certified librarians found in school libraries are instructional partners, curriculum developers, and program developers that meet the objectives of their individual school’s improvement plan. School libraries are a foundational support system for all students.”

Echoing these comments, another school library advocate stated: “School libraries and school librarians transform student learning. They help learners to become critical thinkers, enthusiastic readers, skillful researchers, and ethical users of information. They empower learners with the skills needed to be college, career, and community ready.”

The comment period has closed, but individual comments will be available on the ED website.

The post Librarians comment on Education Department priorities appeared first on District Dispatch.

COUNTER data made tidy / William Denton

At work I’m analysing usage of ebooks, as reported by vendors in COUNTER reports. The Excel spreadsheet versions are ugly but a little bit of R can bring them into the tidyverse and give you nice, clean, usable data that meets the three rules of tidy data:

  1. Each variable must have its own column.
  2. Each observation must have its own row.
  3. Each value must have its own cell.

There are two kinds of COUNTER reports for books: BR1 (“Number of Successful Title Requests by Month and Title”) counts how many times people looked at a book and BR2 (“Number of Successful Section Requests by Month and Title”) counts how many times they look at a part (like a chapter) of a book. The reports are formatted in the same human-readable way, so this code works for both, but be careful to handle them separately.

Fragment of COUNTER report Fragment of COUNTER report

They start with seven lines of metadata about the report, and then you get the actual data. There are a few required columns, one of which is the title of the book, but that column doesn’t have a heading! It’s blank! Further to the right are columns for each month of the reporting period. Rows are for books or sections, but there is also a “Total for all titles” row that sums them all up.

This formatting is human-readable but terrible for machines. Happily, that’s easy to fix.

First, in R, load in some packages:

  • the basic set of tidyverse packages;
  • readxl, to read Excel spreadsheets;
  • lubridate, to manipulate dates; and
  • yulr, my own package of some little helper functions. If you want to use it you’ll need to install it specially, as explained in its documentation.

As it happens the COUNTER reports are all in one Excel spreadsheet, organized by sheets. Brill’s 2014 report is in the sheet named “Brill 2014,” so I need to pick it out and work on it. The flow is:

  • load in the sheet, skipping the first seven lines (including the one that tells you if it’s BR1 or BR2)
  • cut out columns I don’t want with a minus select
  • use gather to reshape the table by moving the month columns to rows, where the month name ends up in a column named “month;” the other fields that are minus selected are carried along unchanged
  • rename two columns
  • reformat the month name into a proper date, and rename the unnamed title column (which ended up being called X__1) while truncating it to 50 characters
  • filter out the row that adds up all the numbers
  • reorder the columns for human viewing
brill_2014 <- read_xlsx("eBook Usage.xlsx", sheet = "Brill 2014", skip = 7)
%>% select(-ISSN, -`Book DOI`, -`Proprietary Identifier`, -`Reporting Period Total`)
%>% gather(month, usage, -X__1, -ISBN, -Publisher, -Platform)
%>% rename(platform = Platform, publisher = Publisher)
%>% mutate(month = floor_date(as.Date(as.numeric(month), origin = "1900-01-01"), "month"), title = substr(X__1, 1, 50))
%>% filter(! title == "Total for all titles") %>% select(month, usage, ISBN, platform, publisher, title)

Looking at this I think that date mutation business may not always be needed, but some of the date formatting I had was wonky, and this made it all work.

That line above just works for one year. I had four years of Brill data, and didn’t want to repeat the long line for each, because if I ever need to make a change I’d have to make it four times and if I missed one there’d be a problem. This is the time to create a function. Now my code looks like this:

counter_parse_brill <- function (x) {
  x %>% select(-ISSN, -`Book DOI`, -`Proprietary Identifier`, -`Reporting Period Total`) %>% gather(month, usage, -X__1, -ISBN, -Publisher, -Platform) %>% rename(platform = Platform, publisher = Publisher) %>% mutate(month = floor_date(as.Date(as.numeric(month), origin = "1900-01-01"), "month"), title = substr(X__1, 1, 50)) %>% filter(! title == "Total for all titles") %>% select(month, usage, ISBN, platform, publisher, title)

brill_2014 <- read_xlsx("eBook Usage.xlsx", sheet = "Brill 2014", skip = 7) %>% counter_parse_brill()
brill_2015 <- read_xlsx("eBook Usage.xlsx", sheet = "Brill 2015", skip = 7) %>% counter_parse_brill()
brill_2016 <- read_xlsx("eBook Usage.xlsx", sheet = "Brill 2016", skip = 7) %>% counter_parse_brill()
brill_2017 <- read_xlsx("eBook Usage.xlsx", sheet = "Brill 2017", skip = 7) %>% counter_parse_brill()
brill <- rbind(brill_2014, brill_2015, brill_2016, brill_2017)

That looks much nicer in Emacs (in Org, of course):

R in Org R in Org

I have similar functions for other vendors. They are all very similar, but sometimes a (mandatory) Book DOI field or something else is missing, so a little fiddling is needed. Each vendor’s complete data goes into its own tibble, which I then glue together. Then I delete all the rows where no month is defined (which, come to think of it, I should investigate to make sure these aren’t being introduced by some mistake I made in reshaping the data), I add the ayear column so I can group things by academic year, and where usage of a book in a given month is 0, I make it 0 instead of NA.

ebook_usage <- rbind(brill, ebl, ebook_central, iet, scholars_portal, spie)

ebook_usage <- ebook_usage %>% filter(!
ebook_usage <- ebook_usage %>% mutate(ayear = academic_year(month))
ebook_usage$usage[$usage)] <- 0

The data now looks like this (truncating the title even more for display here):

month usage ISBN platform publisher title ayear
2014-01-01 0 9789004216921 BOPI Brill A Comme 2013
2014-01-01 0 9789047427018 BOPI Brill A Wande 2013
2014-01-01 0 9789004222656 BOPI Brill A World 2013
> str(ebook_usage)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':	1343899 obs. of  7 variables:
 $ month    : Date, format: "2014-01-01" "2014-01-01" "2014-01-01" "2014-01-01" ...
 $ usage    : num  0 0 0 0 0 0 0 0 0 0 ...
 $ ISBN     : chr  "9789004216921" "9789047427018" "9789004222656" "9789004214149" ...
 $ platform : chr  "BOPI" "BOPI" "BOPI" "BOPI" ...
 $ publisher: chr  "Brill" "Brill" "Brill" "Brill" ...
 $ title    : chr  "A Commentary on the United Nations Convention on t" "A Wandering Galilean: Essays in Honour of Seán Fre" "A World of Beasts: A Thirteenth-Century Illustrate" "American Diplomacy" ...
 $ ayear    : int  2013 2013 2013 2013 2013 2013 2013 2013 2013 2013 ...

The data is now ready for analysis.

Who should take the Survey of Research Information Management Practices? / HangingTogether

Is it someone in the library? Or maybe in the office of institutional research? What about the office of the vice president for research? Or maybe the provost’s or rector’s office?

It depends.

Practices vary regionally and also institutionally, and we hope to learn more about these variations through the international Survey of Research Information Management Practices, which has been collaboratively developed by OCLC Research and euroCRISAs we articulated in a recent OCLC Research position paper, Research Information Management: Defining RIM and the Library’s Role, because of the enterprise nature of the data inputs and uses, there are numerous institutional stakeholders in research information management, including the research office, institutional research, provost or rector, library, human resources, registrar, and campus communications. In some institutions, such as the University of Arizona, the library has assumed a leading role. In other institutions, such as Universität Münster, the research office takes the lead. 

Connecting the survey with the right person within each institution is a central challenge for this study. And that’s where we need your help in promoting this survey and getting it to the right person at your own institution. While completing the survey itself should take only 10-30 minutes, we recognize that it may take some additional legwork to answer all of the questions on behalf of your institution. That’s why we’ve provided a PDF version of the survey for you to review in advance. We also want to encourage all research institutions and universities to participate, regardless of the status of RIM adoption. One survey per institution, please.

In the meantime, OCLC and euroCRIS are working to promote the survey through multiple channels, including:

Our goal is to collect and share meaningful information on behalf of all stakeholders within the research information management landscape worldwide. We need the leadership and engagement of each of you in the community. Thanks for participating, and contact me if you have questions.

Rebecca Bryant, PhD

T-Shirts! Voting! / Evergreen ILS

Last year we did our first community t-shirt featuring a quote from a community member (pulled from the IRC quotes database).  This year we are doing it again but it will be a new quote.  Please rank your favorites from 1 to 6 and the one with the strongest weight will be used. This shirt will be available at the next International Evergreen Conference for sale along with the limited supply stock of the existing “I’m not a cataloger but I know enough MARC to be fun at parties” shirt.

Voting is done here:

Only one vote per person but make your opinion known!

New 7.x Committer: Kim Pham / Islandora

The Islandora 7.x Project committers have asked Kim Pham to become a committer and we are very pleased to announce that she has accepted.
Kim has been a longstanding and pro-active member of the Islandora Community, deeply devoted to helping community members, colleagues and peers and performing tasks such as making testing of Islandora Code less of an obscure magic and more something that we can actually reproduce in a systemic way (which has helped to reveal some outstanding bugs!) amongst other highly appreciated contributions.
Kim is a convenor of the Oral Histories Interest group, a very capable developer with a well-formed knowledge of our stack and code, a socially involved coding instructor, a release tester (and testing manager), a release documenter and a regular member of our many and diverse Islandora Calls.
Further details of the rights and responsibilities of being a Islandora committer can be found here:
Please join me in congratulating Kim! We are indeed very lucky to have her working with/for/at Islandora 7.x
Diego Pino

Techno-hype part 1 / David Rosenthal

Don't, don't, don't, don't believe the hype!
Public Enemy

New technologies are routinely over-hyped because people under-estimate the gap between a technology that works and a technology that is in everyday use by normal people.

You have probably figured out that I'm skeptical of the hype surrounding blockchain technology. Despite incident-free years spent routinely driving in company with Waymo's self-driving cars, I'm also skeptical of the self-driving car hype. Below the fold, an explanation.

Clearly, self-driving cars driven by a trained self-driving car driver in Bay Area traffic work fine:
We've known for several years now that Waymo's (previously Google's) cars can handle most road conditions without a safety driver intervening. Last year, the company reported that its cars could go about 5,000 miles on California roads, on average, between human interventions.
Crashes per 100M miles
Waymo's cars are much safer than almost all human drivers:
Waymo has logged over two million miles on U.S. streets and has only had fault in one accident, making its cars by far the lowest at-fault rate of any driver class on the road— about 10 times lower than our safest demographic of human drivers (60–69 year-olds) and 40 times lower than new drivers, not to mention the obvious benefits gained from eliminating drunk drivers.

However, Waymo’s vehicles have a knack for getting hit by human drivers. When we look at total accidents (at fault and not), the Waymo accident rate is higher than the accident rate of most experienced drivers ... Most of these accidents are fender-benders caused by humans, with no fatalities or serious injuries. The leading theory is that Waymo’s vehicles adhere to the letter of traffic law, leading them to brake for things they are legally supposed to brake for (e.g., pedestrians approaching crosswalks). Since human drivers are not used to this lawful behavior, it leads to a higher rate of rear-end collisions (where the human driver is at-fault).
Clearly, this is a technology that works. I would love it if my grand-children never had to learn to drive, but even a decade from now I think they will still need to.

But, as Google realized some time ago, just being safer on average than most humans almost all the time is not enough for mass public deployment of self-driving cars. Back in June, John Markoff wrote:
Three years ago, Google’s self-driving car project abruptly shifted from designing a vehicle that would drive autonomously most of the time while occasionally requiring human oversight, to a slow-speed robot without a brake pedal, accelerator or steering wheel. In other words, human driving was no longer permitted.

The company made the decision after giving self-driving cars to Google employees for their work commutes and recording what the passengers did while the autonomous system did the driving. In-car cameras recorded employees climbing into the back seat, climbing out of an open car window, and even smooching while the car was in motion, according to two former Google engineers.

“We saw stuff that made us a little nervous,” Chris Urmson, a roboticist who was then head of the project, said at the time. He later mentioned in a blog post that the company had spotted a number of “silly” actions, including the driver turning around while the car was moving.

Johnny Luu, a spokesman for Google’s self-driving car effort, now called Waymo, disputed the accounts that went beyond what Mr. Urmson described, but said behavior like an employee’s rummaging in the back seat for his laptop while the car was moving and other “egregious” acts contributed to shutting down the experiment.
Gareth Corfield at The Register adds:
Google binned its self-driving cars' "take over now, human!" feature because test drivers kept dozing off behind the wheel instead of watching the road, according to reports.

"What we found was pretty scary," Google Waymo's boss John Krafcik told Reuters reporters during a recent media tour of a Waymo testing facility. "It's hard to take over because they have lost contextual awareness." ...

Since then, said Reuters, Google Waymo has focused on technology that does not require human intervention.
Timothy B. Lee at Ars Technica writes:
Waymo cars are designed to never have anyone touch the steering wheel or pedals. So the cars have a greatly simplified four-button user interface for passengers to use. There are buttons to call Waymo customer support, lock and unlock the car, pull over and stop the car, and start a ride.
But, during a recent show-and-tell with reporters, they weren't allowed to press the "pull over" button:
a Waymo spokesman tells Ars that the "pull over" button does work. However, the event had a tight schedule, and it would have slowed things down too much to let reporters push it.
Google was right to identify the "hand-off" problem as essentially insoluble, because the human driver would have lost "situational awareness".

Jean-Louis Gassée has an appropriately skeptical take on the technology, based on interviews with Chris Urmson:
Google’s Director of Self-Driving Cars from 2013 to late 2016 (he had joined the team in 2009). In a SXSW talk in early 2016, Urmson gives a sobering yet helpful vision of the project’s future, summarized by Lee Gomesin an IEEE Spectrum article [as always, edits and emphasis mine]:

“Not only might it take much longer to arrive than the company has ever indicated — as long as 30 years, said Urmson — but the early commercial versions might well be limited to certain geographies and weather conditions. Self-driving cars are much easier to engineer for sunny weather and wide-open roads, and Urmson suggested the cars might be sold for those markets first.”
But the problem is actually much worse than either Google or Urmson say. Suppose, for the sake of argument, that self-driving cars three times as good as Waymo's are in wide use by normal people. A normal person would encounter a hand-off once in 15,000 miles of driving, or less than once a year. Driving would be something they'd be asked to do maybe 50 times in their life.

Even if, when the hand-off happened, the human was not "climbing into the back seat, climbing out of an open car window, and even smooching" and had full "situational awareness", they would be faced with a situation too complex for the car's software. How likely is it that they would have the skills needed to cope, when the last time they did any driving was over a year ago, and on average they've only driven 25 times in their life? Current testing of self-driving cars hands-off to drivers with more than a decade of driving experience, well over 100,000 miles of it. It bears no relationship to the hand-off problem with a mass deployment of self-driving technology.

Remember the crash of AF447?
the aircraft crashed after temporary inconsistencies between the airspeed measurements – likely due to the aircraft's pitot tubes being obstructed by ice crystals – caused the autopilot to disconnect, after which the crew reacted incorrectly and ultimately caused the aircraft to enter an aerodynamic stall, from which it did not recover.
This was a hand-off to a crew that was highly trained, but had never before encountered a hand-off during cruise. What this means is that unrestricted mass deployment of self-driving cars requires Level 5 autonomy:
Level 5 _ Full Automation

System capability: The driverless car can operate on any road and in any conditions a human driver could negotiate. • Driver involvement: Entering a destination.
Note that Waymo is just starting to work with Level 4 cars (the link is to a fascinating piece by Alexis C. Madrigal on Waymo's simulation and testing program). There are many other difficulties on the way to mass deployment, outlined by Timothy B. Lee at Ars Technica. Although Waymo is actually testing Level 4 cars in the benign environment of Phoenix, AZ:
Waymo, the autonomous car company from Google’s parent company Alphabet, has started testing a fleet of self-driving vehicles without any backup drivers on public roads, its chief executive officer said Tuesday. The tests, which will include passengers within the next few months, mark an important milestone that brings autonomous vehicle technology closer to operating without any human intervention.
But the real difficulty is this. The closer the technology gets to Level 5, the worse the hand-off problem gets, because the human has less experience. Incremental progress in deployments doesn't make this problem go away. Self-driving taxis in restricted urban areas maybe in the next five years; a replacement for the family car, don't hold your breath. My grand-children will still need to learn to drive.

Islandora 200 / Islandora

We keep a public map and list of known Islandora installations around the world (is your site on there? Send us a note if it's not!). This past week that list reached a new milestone: 200 known Islandora sites. In fact, we're now sitting at 213. To celebrate, let's have a look at some of our favorite recent additions:

Louisiana Digital Library

Developed by the team at Lousiana State University Libraries, this beautifully designed multi-site encompasses 17 Louisiana archives, libraries, museums, and other repositories, with more than 144,000 digital items. Collections run the gamut from photographs, maps, manuscript materials, books,  and oral histories.

The Advertising & Design Club of Canada

The Advertising & Design Club of Canada is a non-profit, non-political group dedicated to encouraging excellence in Canadian advertising and design. Islandora serves as the backbone for an archive of award winning ads and curated collections. 

Instituto Nacional de Antropología e Historia de México

An open access digital repository of the National Institute of Anthropology and History of Mexico, making cultural and historical heritage available to the public through a beautfiully-themed Islandora site. It marks our first map point in Mexico.

Latin American Digital Initiatives (LADI)

Latin American Digital Initiatives (LADI) is a collaborative project between LLILAS Benson and Latin American partner institutions that digitally preserves and provides access to unique archival documents from Latin America, with an emphasis on collections documenting human rights, race, ethnicity, and social exclusion in the region. Its stylishly theme collections are the work of the University of Texas at Austin.

Visual gateways into science: Why it’s time to change the way we discover research / Open Knowledge Foundation

Have you ever noticed that it is really hard to get an overview of a research field that you know nothing about? Let’s assume for a minute that a family member or a loved one of yours has fallen ill and unfortunately, the standard treatment isn’t working. Like many other people, you now want to get into the research on the illness to better understand what’s going on.

You proceed to type the name of the disease into PubMed or Google Scholar – and you are confronted with thousands of results, more than you could ever read.

It’s hard to determine where to start, because you don’t understand the terminology in the field, you don’t know what the main areas are, and it’s hard to identify important papers, journals, and authors just by looking at the results list. With time and patience you could probably get there. However, this is time that you do not have, because decisions need to be made. Decisions that may have grave implications for the patient.

If you have ever had a similar experience, you are not alone. We are all swamped with the literature, and even experts struggle with this problem. In the Zika epidemic in 2015 for example, many people scrambled to get an overview of what was until then an obscure research topic. This included researchers, but also practitioners and public health officials. And it’s not just medicine; almost all areas of research have become so specialized that they’re almost impenetrable from the outside.

But the thing is, there are many people on the outside that could benefit from scientific knowledge. Think about journalists, fact checkers, policy makers or students.

They all have the same problem – they don’t have a way in.

Reuse of scientific knowledge within academia is already limited, but when we’re looking at transfer to practice, the gap is even wider. Even in application-oriented disciplines, only a small percentage of research findings ever influence practice – and even if they do so, often with a considerable delay.

At Open Knowledge Maps, a non-profit organization dedicated to improving the visibility of scientific knowledge for science and society, it is our mission to change that. We want to provide visual gateways into research – because we think that it is important that we do not only provide access to research findings, but also to enable discoverability of scientific knowledge.

At the moment, there is a missing link between accessibility and discoverability – and we want to provide that link.

Imagine a world, where you can get an overview of any research field at a glance, meaning you can easily determine the main areas and relevant concepts in the field. In addition, you can instantly identify a set of papers that are relevant for your information need. We call such overviews knowledge maps. You can find an example for the field of heart diseases below. The bubbles represent the main areas and relevant papers are already attached to each of the areas.

Now imagine that each of these maps is adapted to the needs of different types of users, researchers, students, journalists or patients. And not only that: they are all structured and connected and they contain annotated pathways through the literature as to what to read first, and how to proceed afterwards.

This is the vision that we’ve have been working on for the past 1.5 years as a growing community of designers, developers, communicators, advisors, partners, and users. On our website, we are offering an openly accessible service, which allows you to create a knowledge map for any discipline. Users can choose between two databases: Bielefeld Academic Search Engine (BASE) with more than 110 million scientific documents from all disciplines, and PubMed, the large biomedical database with 26 million references. We use the 100 most relevant results for a search term as reported by the respective database as a starting point for our knowledge maps. We use text similarity to create the knowledge maps. The algorithm groups those papers together that have many words in common. See below for an example map of digital education.

We have received a lot of positive feedback on this service from the community. We are honored and humbled by hundreds of enthusiastic posts in blogs, and on Facebook and Twitter. The service has also been featured on the front pages of reddit and HackerNews, and recently, we won the Open Minds Award, the Austrian Open Source Award. Since the first launch of the service in May 2016, we have had more than 200,000 visits on Open Knowledge Maps. Currently, more than 20,000 users leverage Open Knowledge Maps for their research, work, and studies per month.

The “Open” in Open Knowledge Maps does not only stand for open access articles – we want to go the whole way of open science and create a public good.

This means that all of our software is developed open source. You can also find our development roadmap on Github and leave comments by opening an issue. The knowledge maps themselves are licensed under a Creative Commons Attribution license and can be freely shared and modified. We will also openly share the underlying data, for example as Linked Open Data. This way, we want to contribute to the open science ecosystem that our partners, including Open Knowledge Austria, rOpenSci, ContentMine, the Internet Archive Labs and Wikimedia are creating.

Open Knowledge International has played a crucial role in incubating the idea of an open discovery platform, by way of a Panton Fellowship where the first prototype of the search service was created. Since then, the Open Knowledge Network has enthusiastically supported the project, in particular the Austrian chapter as well as Open Knowledge International, Open Knowledge Germany and other regional organisations. Members of the international Open Knowledge community have become indispensable for Open Knowledge Maps, be it as team members, advisors or active supporters. A big shout-out and thank you to you!

As a next step, we want to work on structuring and connecting these maps – and we want to turn discovery into a collaborative process. Because someone has already gone that way before and they have all the overview and the insights. We want to enable people to communicate this knowledge so that we can start laying pathways through science for each other. We have created a short video to illustrate this idea:

How much metadata is practical? / HangingTogether

That was the topic discussed recently by OCLC Research Library Partners metadata managers, initiated by Jennifer Baxmeyer of Princeton, MJ Han of University of Illinois at Urbana-Champaign, and Stephen Hearn of the University of Minnesota. With the increasing availability of online metadata, we are seeing metadata added to discovery environments representing objects of widely varying granularity. For example, an article in Physical Review Letters—Precision Measurement of the Top Quark Mass in Lepton + Jets Final State—has approximately 300 author names for a five page article (some pictured here).

This seems disproportionate, especially when other objects with many contributors such as feature films and orchestral recordings are represented by only a relative handful of the associated names. If all the names associated with a film or a symphony recording were readily available as metadata, would it be appropriate to include them in library cataloging? Ensuring optimal search results in an environment in which metadata from varying sources with differing models of granularity and extensiveness poses challenges for catalogers and metadata managers.

Abbreviated forms of author names on journal articles make it difficult, and often impossible, to match them to the correct authority form, if it exists. Some discovery systems show only the first two or three lines of author names.  Research Information Management systems make it easier to apply some identity management for local researchers so that they are correctly associated with the articles they have written, which are displayed as part of their profiles. (See for example, Scholars@Duke, Experts@Minnesota or University of Illinois at Urbana-Champaign’s Experts research profiles.)  A number noted that attempts to encourage their researchers to include an ORCID (Open Researcher and Contributor Identifier) in their profiles have met with limited success. Linked data was posited as a potential way to access information across many environments, including those in Research Information Systems, but would require wider use of identifiers.

Much of the discussion was about receiving not enough good metadata from vendors rather than too much. A number of the metadata managers viewed quality at least as important as granularity. Some libraries reject records that are do not meet their minimum standards, while others apply batch updates before loading the records. One criteria for “good enough” metadata is whether it is sufficient to generate an accurate citation. Metadata quality has become a key concern, as evidenced by the Digital Library Federation’s Metadata Assessment Working Group, formed to “measure, evaluate and assess the metadata” in a variety of digital library systems. Record-by-record enhancement was widely considered impractical.

Information surplus will only increase, with accompanying varying levels of metadata granularity. It remains to be seen how the community can bridge if not integrate the various silos of information.



Overheard in LIL - Episode 2 / Harvard Library Innovation Lab

This week:

A chat bot that can sue anyone and everything!

Devices listening to our every move

And an interview with Jack Cushman, a developer in LIL, about built-in compassion (and cruelty) in law, why lawyers should learn to program, weird internet, and lovely podcast gimmicks (specifically that of Rachel and Griffin McElroy's Wonderful! podcast)

Starring Adam Ziegler, Anastasia Aizman, Brett Johnson, Casey Gruppioni, and Jack Cushman.

The Voyages of a Digital Collections Audit: Episode 1 (Charting Our Course) / Library Tech Talk (U of Michigan)

astronaut in space holding old books with earth in the background

The Digital Content & Collections department begins an ambitious full audit of our 280+ digital collections. In this first in a blog series about the endeavor, I note why we are doing this, how we surveyed the digital landscape, how we cemented alliances with others who will help us along the way, and where we're heading next.

Creating the NLLD special collection / District Dispatch

This is a guest post from Rosalind Seidel, our fall Special Collections Intern joining us from the University of Maryland (UMD). Rosie has one semester left in her MLIS program at UMD, and hopes to become a rare books and special collections librarian. She graduated from Loyola University New Orleans with a Bachelor of Arts in English Literature and Medieval Studies.

Covers of NLLD participant folders. From top, left to right: "The Card with the Charge" sticker from 1988; "Information Power" button from 1965; "A Word to the Wise" sticker from 1983; "Take time to read" sticker from 1987; "America's greatest bargain: the library" sticker from 1980.

Covers of NLLD participant folders. From top, left to right: “The Card with the Charge” sticker from 1988; “Information Power” button from 1965; “A Word to the Wise” sticker from 1983; “Take time to read” sticker from 1987; “America’s greatest bargain: the library” sticker from 1980.

My first project working in the American Library Association’s Washington Office involved inventorying, processing and creating a finding aid for the wealth of National Library Legislative Day files. Next, this collection will be sent to the ALA Archives where it will be digitized for future access.

Before I began this project, I was unfamiliar with National Library Legislative Day (NLLD) and its purpose. Delving into the files, I quickly learned NLLD is an annual event spanning two days where hundreds of librarians, information professionals and library supporters from across the country come together in Washington, D.C., to meet with their representatives and to advocate for our nation’s libraries.

The files I worked with ranged in date from 1973 to 2016. Such an expanse of years saw quite the development in advocacy for libraries across a 43-year period.Through the files, I got to look at the country and information policy in a whole new light. I began with files from 2016, working my way backward. As I went, it was interesting to see where certain issues arose, and how long they remained focal points. It was surreal, for example, to reach 1994—the year I was born—and see what the ongoing dialogue was, such as “Kids Need Libraries” and “How Stupid Can We Get?”  Surely it is because of the work of library advocates on NLLD that I grew up with the state of libraries and access to information that I did, and I owe them a debt of gratitude. Going forward as a young information professional, it will be my place to do the same.

The reoccurring issues in NLLD’s long history include the Library Services and Construction Act, the Elementary and Secondary Education Act, the Higher Education Act, the White House Conference on Libraries and Information Services, copyright, Title 44, the Library Services and Technology Act, the National Endowment for the Humanities, federal funding for libraries, and access to government information… just to name a few.

What I liked about the NLLD files is that the handouts usually took into account all levels of expertise of NLLD participants which, in turn, made the files an approachable collection. The handouts worked to make all NLLD events, such as lobbying, accessible even to the newest of participants and they also informed and educated participants about the issues on the agenda. I also got to handle letters from various U.S. Presidents in support of National Library Week. From those and other documents, I was able to see how information professionals viewed different administrations and, because of that, when NLLD efforts needed to be strengthened.

Overall, I valued the opportunity I was given to learn more about policy, and those policies I was unaware of that have better informed me about the history and state of librarianship. As my internship continues, I will be given the chance to explore the Washington Office’s history and the work they do even more. My internship has allowed me to interact with libraries, government, the information field, and history in incredible ways that I would never have anticipated. I look forward to what is to come.

National Library Legislative Day 2018NLLD 2018 will take place on May 7 and 8. Registration will open on December 1, 2017. To learn more about participating, visit:

The post Creating the NLLD special collection appeared first on District Dispatch.

Structured Data: Helping Google Understand Your Site / Richard Wallis

Firstly let me credit the source that alerted me to the subject of this post.

Jennifer Slegg of TheSEMPost wrote an article last week entitled Adding Structured Data Helps Google Understand & Rank Webpages Better. It was her report of a conversation with Google’s Webmaster Trends Analyst Gary Illyes at PUBCON Las Vegas 2017.

Jennifer’s report included some quotes from Gary which go a long way towards clarifying the power, relevance, and importance for Google for embedding Structured Data in web pages.

Those that have encountered my evangelism for doing just that, will know that there have been many assumptions about the potential effects of adding Structured Data to your HTML, but this is the first confirmation of those assumptions by Google folks that I am aware of.

To my mind there are two key quotes from Garry, firstly:

But more importantly, add structure data to your pages because during indexing, we will be able to better understand what your site is about.

In the early [only useful for Rich Snippets] days of, Google representatives went out of their way to assert that adding to a page would NOT influence its position in search results.  More recently, ‘non-commital’ could be described as the answer to questions about and indexing.    Gary’s phrasing is a little interesting “during indexing, we will be able to better understand“, but you can really only drawn certain conclutions from them.

So, this answers one of the main questions I am asked by those looking to me for help in understanding and applying

If I go to the trouble of adding to my pages, will Google [and others] take any notice?”  To paraphrase Mr Illyes — Yes.

The second key quote:

And don’t just think about the structured data that we documented on Think about any schema that you could use on your pages. It will help us understand your pages better, and indirectly, it leads to better ranks in some sense, because we can rank easier.

So structured data is important, take any schema from and implement it, as it will help.

This answers directly another common challenge I get when recommending the use of the whole vocabulary, and its extensions, as a source of potential Structured Data types for marking up your pages.

The challenge being “where is the evidence that any schema, not documented in the Google Developers Structured Data Pages, will be taken notice of?

So thanks Gary, you have just made my job, and the understanding of those that I help, a heck of a lot easier.

Appart from those two key points there are some other interesting takeaways from his session as reported by Jennifer.

Their recent increased emphasis on things Structured Data:

We launched a bunch of search features that are based on structured data. It was badges on image search, jobs was another thing, job search, recipes, movies, local restaurants, courses and a bunch of other things that rely solely on structure data, annotations.

It is almost like we started building lots of new features that rely on structured data, kind of like we started caring more and more and more about structured data. That is an important hint for you if you want your sites to appear in search features, implement structured data.

Google’s further increased Structured Data emphasis in the near future:

Next year, there will be two things we want to focus on. The first is structured data. You can expect more applications for structured data, more stuff like jobs, like recipes, like products, etc.

For those who have been sceptical as to the current commitment of Google and others to and Structured Data, this should go some way towards settling your concerns.

It is at his point I add in my usual warning against rushing off and liberally sprinkling terms across your web pages.  It is not like keywords.

The search engines are looking for structured descriptions (the clue is in the name) of the Things (entities) that your pages are describing; the properties of those things; and the relationships between those things and other entities.

Behind and Structured Data are some well established Linked Data principles, and to get the most effect from your efforts, it is worth recognising them.  

Applying Structured Data to your site is not rocket science, but it does need a little thought and planning to get it right.   With organisatons such as Google taking notice, like most things in life, it is worth doing right if you are going to do it at all.


Prospectus / Ed Summers

I’ve been trying to keep this blog updated as I move through the PhD program at the UMD iSchool. Sometimes it’s difficult to share things here because of fear that the content or ideas are just too rough around the edges. The big assumption being that anybody even finds it, and then finds the time to read the content.

As with most PhD programs the work is leading up to the dissertation. I’m finishing my coursework this semester and so I have put together a prospectus for the research I’d like to do in my dissertation. I’m going to spend the next 8 months or so doing a lot of background reading and writing about it, in order to set up this research. I imagine this prospectus will get revised some more before I share it with my committee, and the trajectory itself will surely change as I work through it. But I thought I’d share the prospectus in this preliminary state to see if anyone has suggestions for things to read or angles to take.

Many thanks to my advisor Ricky Punzalan for his help getting me this far.

Appraisal Practices in Web Archives

It is difficult to imagine today’s scientific, cultural and political systems without the web and the underlying Internet. As the web has become a dominant form of global communications and publishing over the last 25 years we have witnessed the emergence of web archiving as an increasingly important activity. Web archiving is the practice of collecting content from the web for preservation, which is then made accessible at another part of the web known as a web archive. Developing record keeping practices for web content is extremely important for the production of history (Brügger, 2017) and for sustaining the networked public sphere (Benkler, 2006). However, even with widespread practice we still understand very little about the processes by which web content is being selected for an archive.

Part of the reason for this is that the web is an immensely large, decentralized and constantly changing information landscape. Despite efforts to archive the entire web (Kahle, 2007), the idea of a complete archive of the web remains both economically infeasible (Rosenthal, 2012), and theoretically impossible (Masanès, 2006). Features of the web’s Hypertext Transfer Protocol (HTTP), such as code-on-demand (Fielding, 2000), content caching (Fielding, Nottingham, & Reschke, 2014) and personalization (Barth, 2011), have transformed what was originally conceived of as a document oriented web into an information system that delivers information based on who you are, when you ask, and what software you use (Berners-Lee & Fischetti, 2000). The very notion of a singular artifact that can be archived, which has been under strain since the introduction of electronic records (Bearman, 1989), is now being pushed to its conceptual limit.

The web is also a site of constant breakdown (Bowker & Star, 2000) in the form of broken links, failed business models, unsustainable infrastructure, obsolescence and general neglect. Ceglowski (2011) has estimated that about a quarter of all links break every 7 years. Even within highly curated regions of the web, such as scholarly publishing (Sanderson, Phillips, & Sompel, 2011) and jurisprudence (Zittrain, Albert, & Lessig, 2014) rates of link rot can be as high as 50%. Web archiving projects work in varying measures to stem this tide of loss–to save what is deemed worth saving before it becomes 404 Not Found. In many ways web archiving can be seen as a form of repair or maintenance work (Graham & Thrift, 2007 ; Jackson, 2014) that is conducted by archivists in collaboration with each other, as well as with tools and infrastructures that support their efforts.

Deciding what to keep and what gets to be labeled archival have long been a topic of discussion in archival science. Over the past two centuries archival researchers have developed a body of literature around the concept of appraisal, which is the practice of identifying and retaining records of enduring value. The rapid increase in the amount of records being generated, which began in the mid-20th century, led to the inevitable realization that it is impractical to attempt to preserve the complete documentary record. Appraisal decisions must be made, which necessarily shape the archive over time, and by extension our knowledge of the past (Bearman, 1989 ; Cook, 2011). It is in the particular contingencies of the historical moment that the archive is created, sustained and used (Booms, 1987 ; Harris, 2002). The desire for a technology that enables a complete archival record of the web, where everything is preserved and remembered in an archival panopticon, is an idea that has deep philosophical roots, and many social and political ramifications (Brothman, 2001 ; Mayer-Schönberger, 2011).

Notwithstanding these theoretical and practical complexities, the construction of web archives presents new design opportunities for archivists to work in collaboration with each other, as well as with the systems, services and bespoke software solutions used for performing the work. It is essential for these designs to be informed by a better understanding of the processes by which web content is selected for an archive. What are the approaches and theoretical underpinnings for appraisal in web archiving as a sociotechnical appraisal practice? To lay the foundation for answering this question I will be reviewing and integrating the research literature in three areas: Archives and Memory, Sociotechnical Systems (STS), and Praxiography.

Clearly, a firm grounding in the literature of appraisal practices in archives is an important dimension to this research project. Understanding the various appraisal techniques that have been articulated and deployed will help in assessing how these techniques are being translated to the appraisal of web content (Maemura, Becker, & Milligan, 2016). Particular attention will be paid to emerging practices for the appraisal of electronic records and web content. Because the web is a significantly different medium than archives have traditionally dealt with it is important to situate archival appraisal within the larger context of social or collective memory practices (Jacobsen, Punzalan, & Hedstrom, 2013). In addition, the emerging practice of participatory archiving will also be examined to gain insight into how the web is allowing the gatekeeping role of the archivist.

Appraisal practices for web content necessarily involve the use of computer technology as both the means by which the archival processing is performed, and as the source of the content that is being archived. Any analysis of appraisal practices must account for the ways in which the archivist and the technology of the web work together as part of a sociotechnical system. While the specific technical implementations of web archiving systems are of interest, the subject of archival appraisal requires that these systems be studied for their social and cultural and effects. The interdisciplinary approach of software studies provide a theoretical and methodological approach for analyzing computer technologies as assemblages of software, hardware, standards and social practices. Examining the literature of software studies as it relates to archival appraisal will also selectively include reading in the related areas of infrastructure, platform and algorithm studies.

Finally, since archival appraisal is at its core a practice it is imperative to theoretically ground an analysis of appraisal using the literature of practice theory or praxiography. Praxiography is a broad interdisciplinary field of research that draws upon branches of anthropology, sociology, history of science and philosophy in order to understand practice as a sociomaterial phenomena. Ethnographic attention to topics such as rules, strategies, outcomes, training, mentorship, artifacts, work and history also provide an approach to empirical study that I plan on using in my research.


2017-11-01 - Prospectus Draft

2017-12-01 - Prospectus Final Draft

2017-12-15 - Committee Review

2018-01-15 - Committee Approval Meeting

2018-09-01 - Proposal Final Draft

2018-10-01 - Proposal Defense


Archives and Memory

Anderson, K. D. (2011). Appraisal Learning Networks: How University Archivists Learn to Appraise Through Social Interaction. Los Angeles: University of California, Los Angeles.

Bond, L., Craps, S., and Vermeulen, P. (2017). Memory unbound: tracing the dynamics of memory studies. New York: Berghahn.

Bowker, G. C. (2005). Memory practices in the sciences. Cambridge: MIT Press.

Caswell, M. (2014). Archiving the Unspeakable: Silence, Memory, and the Photographic Record in Cambodia. Madison, WI: University of Wisconsin Press.

Daston, L., editor (2017). Science in the archives: pasts, presents, futures. Chicago: University of Chicago Press.

Gilliland, A. J., McKemmish, S., and Lau, A., editors (2016). Research in the Archival Multiverse. Melbourne: Monash University Press.

Halbwachs, M. (1992). On collective memory. Chicago: University of Chicago Press.

Hoskins, A., editor (2018). Digital memory studies: Media pasts in transition. London: Routledge.

Kosnik, A. D. (2016). Rogue Archives: Digital Cultural Memory and Media Fandom. Cambridge: MIT Press.

Van Dijck, J. (2007). Mediated memories in the digital age. Palo Alto: Stanford University Press.

Sociotechnical Theory

Berry, D. (2011). The philosophy of software: Code and mediation in the digital age. New York: Palgrave Macmillan.

Bowker, G. C. and Star, S. L. (2000). Sorting things out: Classification and its consequences. Cambridge: MIT Press.

Brunton, F. (2013). Spam: A shadow history of the Internet. Cambridge: MIT Press.

Chun, W. H. K. (2016). Updating to Remain the Same: Habitual New Media. Cambridge: MIT Press.

Cubitt, S. (2016). Finite Media: Environmental Implications of Digital Technologies. Durham: Duke University Press.

Hu, T. (2015). A Prehistory of the Cloud. Cambridge: MIT Press.

Dourish, P. (2017). The Stuff of Bits: An Essay on the Materialities of Information. Cambridge: MIT Press.

Edwards, P. N. (2010). A vast machine: Computer models, climate data, and the politics of global warming. Cambridge: MIT Press.

Emerson, L. (2014). Reading writing interfaces: From the digital to the bookbound. Minneapolis: University of Minnesota Press.

Kittler, F. A. (1999). Gramophone, film, typewriter. Palo Alto: Stanford University Press.

Galloway, A. R. (2004). Protocol: How control exists after decentralization. Cambridge: MIT Press.

Kelty, C. M. (2008). Two bits: The cultural significance of free software. Durham: Duke University Press.

Kitchin, R. and Dodge, M. (2011). Code/Space: Software and Everyday Life. Cambridge: MIT Press.

Rossiter, N. (2016). Software, Infrastructure, Labor: A Media Theory of Logistical Nightmares. Oxford: Routledge.

Russell, A. L. (2014). Open standards and the digital age. Cambridge: Cambridge University Press.

Practice Theory

Bourdieu, P. (1977). Outline of a Theory of Practice. Cambridge: Cambridge University Press.

Bräuchler, B. and Postill, J. (2010). Theorising media and practice. Bristol: Berghahn Books.

Foucault, M. (2012). Discipline & punish: The birth of the prison. New York: Vintage.

Latour, B. (2005). Reassembling the social: An introduction to actor-network-theory. Oxford: Oxford University Press.

Law, J. (2002). Aircraft stories: Decentering the object in technoscience. Durham: Duke University Press.

Schatzki, T. R., Cetina, K. K., and von Savigny, E. (2001). The practice turn in contemporary theory. Oxford: Routledge.

Wenger, E. (1998). Communities of Practice: Learning, meaning, and identity. Cambridge: Cambridge University Press.


Barth, A. (2011). HTTP state management mechanism (No. 6265). Internet Engineering Task Force. Retrieved from

Bearman, D. (1989). Archival methods. Archives and Museum Informatics, 3(1). Retrieved from

Benkler, Y. (2006). The wealth of networks: How social production transforms markets and freedom. Yale University Press.

Berners-Lee, T., & Fischetti, M. (2000). Weaving the web: The original design and ultimate destiny of the world wide web by its inventor. San Francisco: Harper.

Booms, H. (1987). Society and the formation of a documentary heritage: Issues in the appraisal of archival sources. Archivaria, 24(3), 69–107. Retrieved from

Bowker, G. C., & Star, S. L. (2000). Sorting things out: Classification and its consequences. MIT Press.

Brothman, B. (2001). The past that archives keep: Memory, history, and the preservation of archival records. Archivaria, 51, 48–80.

Brügger, N. (2017). The web as history. (N. Brügger & R. Schroeder, Eds.). UCL Press. Retrieved from

Ceglowski, M. (2011, May). Remembrance of links past. Retrieved from

Cook, T. (2011). We are what we keep; we keep what we are: Archival appraisal past, present and future. Journal of the Society of Archivists, 32(2), 173–189.

Fielding, R. (2000). Representational state transfer (PhD thesis). University of California at Irvine.

Fielding, R., Nottingham, M., & Reschke, J. (2014). Hypertext transfer protocol (http/1.1): Caching (No. 7234). Internet Engineering Task Force. Retrieved from

Graham, S., & Thrift, N. (2007). Out of order understanding repair and maintenance. Theory, Culture & Society, 24(3), 1–25.

Harris, V. (2002). The archival sliver: Power, memory, and archives in South Africa. Archival Science, 2(1-2), 63–86.

Jackson, S. J. (2014). Media technologies: Essays on communication, materiality and society. In P. Boczkowski & K. Foot (Eds.),. MIT Press. Retrieved from

Jacobsen, T., Punzalan, R. L., & Hedstrom, M. L. (2013). Invoking “collective memory”: Mapping the emergence of a concept in archival science. Archival Science, 13(2-3), 217–251.

Kahle, B. (2007). Universal access to all knowledge. The American Archivist, 70(1), 23–31.

Maemura, E., Becker, C., & Milligan, I. (2016). Understanding computational web archives research methods using research objects. In IEEE big data: Computation archival science. IEEE.

Masanès, J. (2006). Web archiving methods and approaches: A comparative study. Library Trends, 54(1), 72–90.

Mayer-Schönberger, V. (2011). Delete: The virtue of forgetting in the digital age. Princeton University Press.

Rosenthal, D. (2012, May). Let’s just keep everything forever in the cloud. Retrieved from

Sanderson, R., Phillips, M., & Sompel, H. V. de. (2011). Analyzing the persistence of referenced web resources with Memento. Open Repositories 2011 Conference. Retrieved from

Zittrain, J., Albert, K., & Lessig, L. (2014). Perma: Scoping and addressing the problem of link and reference rot in legal citations. Legal Information Management, 14(02), 88–99.

Diversity, Equity and Inclusion in Open Research and Education / Tara Robertson

It was such an honor to be invited to speak on a panel at OpenCon with Denisse Albornoz, Thomas Mboa, and Siko Bouterse. Lorraine Chuen did an amazing job putting the panel together and moderating.

Lorraine’s questions were:

  • How do the solutions put forth by the Open movements reinforce Western dominance, colonialism, as well as barriers on the basis of race, class, gender, ability, etc…?
  • How does exclusion and a lack of diversity impact their own Open advocacy work in their communities and/or institutions?
  • How might they begin to address this in their own communities?

This panel starts at 7h47m and here’s the group notes.

Here’s my slides and speaking notes.

Tara Robertson, Who is missing?Hello, my name is Tara Robertson. I am from Vancouver, Canada which is the unceded traditional territory of the Musqueam, Squamish and Tsleil-Waututh nations. Unceded means that the land was never sold, given, or released to any colonial government. In Canada we’re thinking a lot about relationships between settlers and First Nations in many areas of society, including education.

I am mixed race and queer, which means I’ve had a lot of life experiences where I don’t fit. Often being a misfit means that I’ve had a first hand personal view of power and group dynamics.

This month I changed careers and am part of the Diversity and Inclusion team at Mozilla, the organization that fights to keep the internet healthy, open, and accessible to all. Firefox Quantum launches on Tuesday, and if you’re not already using it as your web browser, you really should.

view from behind two people sitting at a board room table, across from 4 white people: 2 men and 2 womenImage credit:

In most social situations, I think it’s always interesting to observe:

  • Who is in the room?
  • Who is at the table?
  • Who speaks a lot?
  • Who has social capital?
  • Who feels welcome?
  • Whose ideas are respected and centered by default?

Who is missing?I think even more interesting is to note:

  • Who is missing?
  • Who isn’t even in the room?
  • Who doesn’t have a seat at the table?
  • Who is sitting on the margins?
  • Who doesn’t feel welcome?
  • Who has to fight to have their viewpoints respected?

I think this simple question is useful to keep in mind as we move into the do-a-thon tomorrow.

I’m going to share 2 short examples with you to illustrate this point.

Screenshot of The first example I want to talk about is how I got involved in open textbooks.

For the last 5 years I was the Accessibility Librarian for an organization that serves students with print disabilities at 20 colleges and universities. We digitized their print textbooks and learning materials into digital and accessible versions. In Canada, students with disabilities can register with their Disability Service Office at their university. Students need to provide medical documentation or a psycho-educational assessment. Then they meet with a disability counselor who looks at the documentation, the academic program objectives and the course syllabus and then figures out what barriers exist and what the necessary accommodations are. All of this takes time, and often students with print disabilities don’t have access to the course materials until a couple of weeks after their classmates.

When I heard about the British Columbia open textbook project I saw an opportunity for us to move from remediating things that were broken to inserting ourselves at the beginning of the publishing workflow to make things that were accessible to everyone from the start.

As part of this process we worked with BCcampus and a group of students with print disabilities to test some of the first open textbooks that had been produced in British Columbia. Working with a group of students who were visually impaired or blind highlighted some access issues that we weren’t aware of.

Including students with visual impairments also made us think about how we worked and we learned some unexpected things. For example, when we were co-presenting at a conference I learned a lot about the lack of accessible signage in our light rail stations and the extra prep work that blind and visually impaired people need to do to travel somewhere new.

By including students with disabilities in this process we came up with a better product and we learned a lot about how to work in ways that are inclusive to people who are blind. The students said they felt like they were improving things for other students with visual impairments. The students were also paid and co-presented with us at a few conferences, which was awesome. It’s way more impactful for faculty to hear directly from students with disabilities, than for them to hear from me.

Amanda Coolidge, from BCcampus, Sue Doner, from Camosun College and I cowrote The BC Open Textbook Accessibility Toolkit as a resource for faculty writing open textbooks to help them understand why this is important, who might be in their classroom and what they need to do to ensure their content is accessible from the start. I’m really proud that we won The Open Education Consortium Creative Innovation award for this work. Josie Gray, who is here, is working on updating this tooklit and working on making sure all of the BC Open Textbooks are accessible. The Toolkit is CC-BY licensed and has been translated into French, so feel free to use, reuse or remix this content.

Whose voices are missing? Students with disabilitiesWhen working at the university, are you ensuring that things are accessible to students with disabilities from the start? What does it say about who belongs when we don’t design for inclusion?

Most universities in North America have a Disability Resource Centre. You can reach out and recruit students to help you user test for accessibility. It’s important that students with disabilities are paid for this work as they are experts in accessibility and often face economic exclusion as many student jobs aren’t accessible to them. Also, as most of us are paid for our work, it’s important to pay people who are co-designing with us.

Open Access logo

The second example is about open access.

I think that we would all agree that open access to information is a good thing. This is definitely one of my core values as a librarian. However, over the last couple of years I’ve come to realize that this isn’t an absolute and that there are some times where it’s not appropriate or ethical for information to be open to all.

Last spring I learned that Reveal Digital, a nonprofit that works with libraries, digitized On Our Backs, a lesbian porn magazine that ran from 1984-2004. It had actually been online for several years before I learned about it. For a brief moment I was really excited — porn that was nostalgic for me was online! Then I quickly thought about friends who appeared in this magazine before the internet existed. I was worried that this kind of exposure could be personally or professionally harmful for them. There are ethical issues with digitizing collections like this. Consenting to a porn shoot that would be in a queer print magazine with a limited run is a different thing to consenting to have your porn shoot be available online.

Over the last year I’ve been researching this topic—I visited Cornell University’s Rare Book and Manuscripts Collection and found the contributor contracts, learned a lot more about US copyright law, and most importantly I talked to queer women who modeled for On Our Backs about their thoughts and feelings about this.

"When I heard all the issues of the magazine are being digitized, my heart sank. I meant this work to be for my community and now I am being objectified in a way that I have no control over. People can cut up my body and make it a collage. My professional and public life can be high jacked. These are uses I never intended and I still don’t want."

When Reveal Digital digitized this collection, the content was licensed under a Creative Commons CC-BY license. This license allows feminist porn to be remixed in ways that could appropriate the content and demean women. This license allows for this content to be repackaged, in any format, and sold, as long as credit is given and a link to the license is provided.

This is a quote from one of the models from an email to me in July 2016. She writes: “People can cut up my body and make a collage. My professional and personal life can be highjacked. These are uses I never intended and still don’t want.” Whose voices are missing? Marginalized communities

This research project has also been very personal and transformative for me.

In the past year, in my professional life I’ve come out as a former sex worker. I know what it’s like to have content about myself online that I didn’t consent to. In my case, it’s a newspaper article that appeared in a major newspaper that identifies me as a sex worker and a librarian. Throughout my career I’ve been terrified that my employer or my colleagues would find this out. We live in a judgmental society where there are many negative stereotypes about sex workers. I was worried that this would undermine my professional reputation.

Coming out as a former sex worker is one of the scariest things I’ve done in my career and thankfully I’ve only experienced support from colleagues. By coming out I made this potentially theoretical conversation about ethics an honest and messy conversation and named my stake in the broader conversation about The Right To Be Forgotten.

This conversation is about how we do good work in and with our communities. Being both a librarian and someone with sex work experience I have the privilege to speak from within our institutions. I choose to use that privilege to engage other librarians to consider the lives and perspectives of other queer sex workers.

Whose voice is missing? And how do we include these voices? So, I offer you these questions for tomorrow and for your work after OpenCon.

Whose voice is missing? Whose voice are we leaving out? And how to we change how we work to really include diverse voices?



ghost-to-wp / Hugh Rundle


A couple of weeks ago I made a script to translate a JSON export file from Ghost into an XML import ('WXR') file for WordPress. If you read my post from 2014 about why I moved from WordPress to Ghost this might strike you as an odd thing for me to do, and indeed I wasn't that happy about it. The reason I needed to migrate a Ghost blog to WordPress is that we're integrating the newCardigan website with CiviCRM soon, and Civi only runs on WordPress, Joomla! and Drupal. Joomla! is awful, none of the Cardi Core have any experience with Drupal, and all have experience with WordPress, so it was pretty easy to decide which to use. CiviCRM will allow us to move everything 'in house' - event bookings, website posts, cardicast episodes, memberships, and fundraising. Having one platform - one that we completely control - will make things easier for us and mean we're not forcing our community to give their details to yet another third party.

The first thing I did was, of course, search the web for someone else's solution. The official WordPress documentation is weirdly silent about the whole matter, although it could simply be that it's so poorly organised that it is there, but I couldn't find it. Tim Stephenson provides one possible solution, but it seemed very convoluted to me, involving re-arranging things in OpenRefine. It then looked like Ahmed Amayem had built the perfect tool to convert the Ghost JSON export to a usable XML import file for WordPress, but I couldn't get it to work. I'm not sure exactly what the problem was, but I ended up making my own tool to do the job.

Mostly the script simply moves data from a JSON field into a matching XML tag, plus adding some information in the header so that WordPress recognises it as a WXR file. The most complicated part was translating the logical but somewhat eccentric way Ghost stores tag information into information about which tags applied to which posts in WordPress. My initial version more or less worked, but with one important flaw: authors were not imported. It took me another week to realise that this was because every tag in the Wordpress XML file is encased in CDATA except for the title, and I had neglected to excape the <, > and & characters. As soon as you use one of these characters (I'd used '&' in a couple of post titles), the XML breaks. This didn't seem to stop the posts being imported, but did stop WordPress recognising the authors. Once I added a section to escape these characters, it seems to work pretty well.

The final thing I learned making this tool is how the package.json for npm packages is supposed to work. I usually don't bother with one of these files, but by filling in a bit of JSON in this file I made it possible to simply download ghost-to-wp, and then (assuming you have a reasonably recent version of nodejs installed) type two commands:

npm install
npm start yourghostexportfile.json

A file called WP_import.xml will be created, and you can simply use the WordPress import tool to import all your posts.

If you are in the unfortunate situation where you need to migrate from Ghost to WordPress, ghost-to-wp should make it pretty easy for you. You should be able to migrate authors, posts (including published status and stickiness), and tags. The main thing that won't come across in the script is images, because Ghost doesn't have a nice way to export them, and WordPress doesn't have a usable way to import them.

Review of Statistics for Library and Information Services / William Denton

A few months ago I read a review of Alon Friedman’s Statistics for Library and Information Services: A Primer for Using Open Source R Software for Accessibility and Visualization and was intrigued. It seemed like it would give me a good refresher on statistics while being grounded in the library world. I bought a copy and to my dismay found the first few chapters so poorly written and riddled with so many errors that I’m going to recycle it.

I do not recommend this book to anyone or for any collection. I don’t normally post negative reviews of books, but I saw so many errors in this one that I feel people need to know.

I was concerned as soon as I started reading chapter 1, but “1.7 Open Source Software” was where I got really worried:

The term open source often refers to something that can be modified because its gate is publicly accessible. In the context of software, open source means that the software code can be modified or enhanced by anyone. The open source movement began in the late 1970s, when two separate organizations promoted the idea of software that is available for anyone to use or modify. The first organization that aimed to create a free operating system was General Public License (GPL). The leading person behind this movement was Richard Stallman. The second organization was Open Source Initiative (OSI), under the leadership of Bruce Perens and Eric S. Raymond.

“Its gate”? Further: the GPL is a license, not an organization. RMS has been working on and for free software (there’s a difference) since the seventies, but the Free Software Foundation wasn’t created until 1985. The Open Source Initiative began in 1998.

The next paragraph confuses R with its predecessor, S. The third paragraph begins:

R is similar to other programming languages, such as C, Java, and Perl, in that is helps people perform a wide variety of computing tasks by giving them access to various commands.

I don’t understand how that sentence could be written by anyone that actually programs.

Skipping ahead past the very introductory statistics stuff, which is confusing, let’s look at “4.3 Introduction of Basic Functionality in R.” It will mislead any reader.

For example on page 53, “4.3.2 Writing Functions” begins:

When you write an R function there are two things you should keep in mind: the arguments and the return value.

Certainly true! True in any language. This is not the time to introduce functions, however. It’s too early.

The book then gives this example:

> p <- c("p1", "p2", "p3", "p4")
> p
> [1] p1, p2, p3, p4

In reality this will look like:

> p <- c("p1", "p2", "p3", "p4")
> p
[1] "p1" "p2" "p3" "p4"

There are many, many code snippets in the book where the output is wrong and the formatting bad.

The section concludes:

We will encounter functions throughout the book. For example, there is a function named “+” that does simple addition. However, there are perhaps two thousand or so functions built in to R, many of which never get called by the user directly but serve to help out other functions.

This is the reader’s first introduction to functions!

“4.3.4 The Return Value” says:

In R, the return value is a function that has exactly one return value. If you need to return more than one thing, you’ll need to make a list (or possibly a matrix, data.frame, or table display). We will discuss matrix, data.frame, and table display in chapter 17.

That first sentence is incorrect, and the section is utterly unhelpful.

Moving to the next section, let’s look at a few examples from “4.4 Introduction to Variables in R.”

On page 54, it says, “In any programming language, we often encounter seven types of data,” namely numeric (“decimal values, also known as numeric values”), integers, strings, characters, factors, fractions (“represents a part of a whole or any number with equal parts”) and “logical.” I offer this as an essay question in first-year computer science exams: “‘In any programming language, we often encounter seven types of data.’ Discuss.”

On page 55 there’s discussion of assigning variables. It says, “The most common operator assignment is <- and ==, the first assignment being the preferred way.” == is the equality test! This should be =!

Then assign() is introduced, though surely no beginning R user needs to know about it, and it’s introduced incorrectly. Here’s the example:

> assign("j", "4")
> j
[1] 4

What is that trailing 4 doing there? I don’t know. Also, j is being made a string, but it’s shown here as an integer. The example should look like:

> assign("j", "4")
> j
[1] "4"

Next it says variable names can contain underscores, which is correct and certainly useful to know, but the example won’t work. This is what it shows:

> bob2 <- 38_a
> bob2
[1] 38_a

This is what happens:

> bob2 <- 38_a
Error: unexpected input in "bob2 <- 38_"

That’s because 38_a is not a valid variable name. A bit of looking around turns up the documentation in ?make.names, which says, “A syntactically valid name consists of letters, numbers and the dot or underline characters and starts with a letter or the dot not followed by a number. Names such as ‘”.2way”’ are not valid, and neither are the reserved words.” So a_38 is valid, but not 38_a.

Even if the example used a_38 it still wouldn’t work, because that variable hasn’t been defined yet:

> bob2 <- a_38
Error: object 'a_38' not found

Moving on to page 57, about characters, the book says, “A character is used to represent a single element such as a letter of the alphabet or punctuation. In order to convert objects into character values in R, we need to declare this value as character with the as.character() function.” The example is:

> x = as.character(0.14)
> x
[1] "0.14"
> class(x)
[1] character

That code works and is correct, but why change to using = for assignment instead of the usual <-? Also, “0.14” is not a “single element”! Also, there’s no point in using as.character there, because it’s unnecessary; the function could be introduced later when one needs to convert some other data type to a string.

Furthermore, strings aren’t actually a different class:

> class("foobar")
[1] character

In the section on fractions we see that fractions aren’t actually built into R, because they require a special package to use them. The instructions on how to install that package are incorrect and will cause an error. Somebody must have noticed that because fractions have disappeared in the version on the web site. “‘In any programming language, we often encounter six types of data.’ Discuss.”

The section on logical variables says, “The logical value makes a comparison between variables and provides additional functionality by adding/subtracting/etc.” What?

This is the example (which should use <- for assignment):

> x = 1; y = 2 # sample value
> i> x + y
> i
[1] 3

What? That makes no sense. This is what happens when you run it:

> x = 1; y = 2
> i > x + y
Error: object 'i' not found

Was it meant to be something like this?

> 1 > x + y

On the web site it looks like this:

> x = 1; y = 2
> x > y

That runs and is correct, but if this was the original intent, how on earth did it get mangled into what’s in the book? Why not say 1 > 2 and see what happens?

All of that is just between pages 53–59 where the book is introducing the most very basic aspects of a programming language. I didn’t go further.

What little I read of basic statistics was confusing and unhelpful. I didn’t bother to go further on that either.

The web site has different code on it, but I don’t see any notices about errata or corrections.

Some of the many faults of the book could have been fixed by using methods of reproducible research. It’s possible to write a book mixing text and code in R and Markdown. Hadley Wickham and Garrett Grolemund did this in R for Data Science, an excellent book, and all of the source code is openly available.

Anyone in LIS looking to learn statistics and R is advised to look elsewhere. I will post recommendations as I find better books.

New CLAW Committer: Natkeeran Ledchumykanthan / Islandora

The Islandora CLAW committers have asked Natkeeran Ledchumykanthan to become a committer and we are pleased to announce that he has accepted!

Nat has worked with the project for a while now and has contributed both code and documentation to Islandora CLAW.  He’s been very active in our development workflow, reviewing and testing pull requests and filing issues for bugs he encounters.

Thank you Nat for all the time you’ve put into this project. We are all looking forward to working with you as a committer and feel that you are a valuable addition to our team.

Further details of the rights and responsibilities of being a Islandora committer can be found here:

LITA, Day One / Harvard Library Innovation Lab

We're off to a great start here in Denver at the LITA 2017 Forum.

Casey Fiesler set the mood for the afternoon with a provoking discussion of algorithmically-aided decision-making and its effects on our daily lives. Do YouTube's copyright-protecting algorithms necessarily put fetters on Fair Use? Do personalized search results play to our unconscious tendency to avoid things we dislike? Neither "technological solutionism" nor technophobia are adequate responses. Fiesler calls for algorithmic openness (tell us when algorithms are in use, and what are they doing), and for widespread acknowledgment that human psychology and societal factors are deeply implicated as well.

In a concurrent session immediately afterwards, Sam Kome took a deep dive into the personally identifiable information (PII) his library (and certainly everyone else's) has been unwittingly collecting about their patrons, simply by using today's standard technologies. Kome is examining everything from the bottom up, scrubbing data and putting in place policies to ensure that little or no PII touches his library systems again.

Jayne Blodgett discussed her strategy for negotiating the sometimes tense relationship between libraries and their partners in IT; hot on the heels of the discussion about patron privacy and leaky web services, the importance of this relationship couldn't be more plain.

Samuel Willis addressed web accessibility and its centrality to the mission of libraries. He detailed his efforts to survey and improve the accessibility of resources for patrons with print disabilities, and offered suggestions for inducing vendors to improve their products. The group pondered how to maintain the privacy of patrons with disabilities, providing the services they require without demanding that they identify themselves as disabled, and without storing that personal information in library systems.

The day screeched to a close with a double-dose of web security awareness: Gary Browning and Ricardo Viera checked the security chops of the audience, and offered practical tips for foiling the hackers who can and do visit our libraries and access our libraries' systems. (Word to the wise: you probably should be blocking any unneeded USB ports in your public-facing technology with USB blockers. )

And that's just one path through the many concurrent sessions from this afternoon at LITA.

Looking forward to another whirlwind day tomorrow!

Jobs in Information Technology: November 8, 2017 / LITA

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week




Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

2017 James Partridge Outstanding African American Information Professional Award recipient: my sister / District Dispatch

Guest writer Pat May is the ALA Washington Office’s Director of Administration

Last week I was honored to attend the presentation of the James Partridge Outstanding African American Information Professional Award – to my sister, Ruby Jaby.

Ruby Jaby holding award with four other supporters

Hampton “Skip” Auld, CEO Anne Arundel County (MD) Public Library;
Partridge Award recipient Ruby Jaby, Branch Manager, Crofton (MD) Community Library;
Catherine Hollerbach, Chief, Public Services and Branch Management, Anne Arundel County (MD) Public Library; Joe Thompson, Citizens for Maryland Libraries; B. Parker Hamilton, former Director of Montgomery County (MD) Public Library.

Awarded annually by the Citizens for Maryland Libraries and the University of Maryland’s College of Information Studies, the James Partidge award is given to people who “exemplify the highest ideals of the library/information profession including career-long dedicated service, leadership and a commitment to the empowerment of those whom they serve.” The James Partridge Award was created in 1998 and named for its first recipient. I was not surprised to learn that ALA’s own Satia Marshall Orange, former director of the Office for Library Outreach Services, was a 2001 recipient of this Award.

Growing up, I wouldn’t have guessed that my sister would one day be honored in this way. Ruby was always the seriously creative one of us six siblings. We all thought that she would make music a career (she played and taught piano), and I would be the librarian (I was a book addict before I even knew how to read). However, we both went in different career directions, she into librarianship and I into the Navy before joining the staff of the ALA Washington Office as an office administrator. Clearly, the library profession was the right choice for her.

Ruby has been a librarian for more than 43 years, but it was not until I read the nomination information submitted on her behalf that I realized what award-worthy career accomplishments she’d achieved. After becoming the Branch Manager of the Crofton Public Library in Maryland 25 years ago, Ruby tapped into her deep well of creative customer service ideas. Under her management, the Crofton Public Library branch has become a center for community activity and connections that cater to the needs of its youngest patrons as well as its most senior citizens. Examples of how she and her staff have creatively served their patrons over the years include:

  • Sensory Storytime for children with autism spectrum disorders and other developmental disabilities.
    a dedicated Teen Area in the library with furniture, equipment and other resources specifically catering to needs identified by a teen survey.
  • study chairs with arms for patrons who had difficulty getting out of easy chairs and more table space for the increased tutoring needs and for laptop users.
  • a welcoming lobby area that includes snack and beverage vending machines and café tables for patrons who spend long hours at the library and a bench for senior patrons to sit while waiting to be picked up.
  • wildly popular Star Wars events featuring the premier Star Wars costuming group the Old Line Garrison of the 501st Legion, and an annual Harry Potter event.
  • community partnerships that benefit the library with groups such as the Crofton Village Garden Club, the Red Cross and Boy Scouts.

Ruby has been quietly going about her duties in a profession she loves and making her library an invaluable resource to her community. In her own words, she “considers her branch more than a warehouse for books.” She sees it as “a community center for customers to come and relax and spend the whole day there in comfort and enjoyment.” By creative ideas for meeting her patrons’’ needs, Ruby is doing more than contributing to the positive image of the library profession. She and her staff are advocating for the right of all people to access information.

Needless to say, our family is very proud of Ruby’s accomplishments and very happy to see her dedication and hard work honored in this meaningful way.

The post 2017 James Partridge Outstanding African American Information Professional Award recipient: my sister appeared first on District Dispatch.

Release Notice for Islandora 7.x-1.10 / Islandora

We’ve rolled another lobster! Islandora 7.x-1.10 is hot off the presses (well, cooling a bit since yesterday), and is available

  • By checking out the 7.x-1.10 branches of all our Git repos*

  • or, as a Virtual Machine [1] (not to be used in production)

  • or, as individual downloads on the Release Notes and Downloads page [2].

* Except Tuque, which has release branch 1.10, and Islandora Drupal Filter, which if you’ve already installed you don’t need to update.



Requirements for Updating 

There are a couple of new things to watch out for when you’re updating to this release.

  • Islandora_altmetrics was replaced by islandora_badges. The altmetrics module is now provided as a sub-module of badges. If you use Islandora Altmetrics, do not pull your code, as that github repo has been deprecated. To update:

    1. Take note of your Islandora Altmetrics settings

    2. Uninstall Islandora Altmetrics (disable + uninstall and delete the module folder)

    3. Upgrade your code to 7.x-1.10, which now includes the islandora_badges module and its submodules.

    4. Run update.php (or drush updatedb and drush cc all) .

    5. Enable Islandora Badges and Islandora Altmetrics.

    6. Configure Islandora Altmetrics as desired.

You can also enable two new badges sub-modules, Scopus and Web of Science (which require subscriptions to their respective services).

  • A new version of OpenSeadragon is required. Version 2.2.1 is now required; you can download it from the link below [3] or use the included drush script, `drush openseadragon-plugin`.


Selected New Features and Improvements: 

We’ve got a lot of new great stuff in this release - see the release notes [4] for a full outline -  but here are some new features to be aware of.

  • Alphabetical facet sorting: You can now configure each Solr Facet to display its values sorted by label (alphabetically) instead of with the most frequent facet first (default).

  • Responding to checksums: Checksum Checker now calls a hook, so that other modules can respond to the result of checksum checking. See the new API file [5] for more details.

  • Rotate using OpenSeadragon. With the new version of OpenSeadragon (see above), you can now rotate images in the browser.

  • Compound thumbnails can be derived from the first child.

  • A variety of UX improvements by adding (or making available) more informative labels. There are too many to outline here, but they’re all in the release notes!

Thank You! 

This has been another community-led release, and thanks to all the folks who contributed over the last six months either with code, documentation, or interest groups. This release was only possible thanks to you, our hard working 7.x-1.10 release team! Thank you for all your work auditing, documenting, testing, and maintaining:

  • Adam Vessey
  • Alan Stanley
  • Bayard Miller
  • Brandon Weigel
  • Brian Harrington
  • Bryan Brown
  • Caleb Derven
  • Carolyn Moritz
  • Charles Hunt
  • Dan Aitken
  • Danny Lamb
  • Devin Soper
  • Diego Pino
  • Don Richards
  • Janice Banser
  • Jared Whiklo
  • Jonathan Green
  • Jordan Dukart
  • Keila Zayas-Ruiz
  • Kim Pham
  • Kirsta Stapelfeldt
  • Mark Baggett
  • Mark Jordan
  • Matthew Miguez
  • Melissa Anez
  • Nat Kanthan
  • Neil Mader
  • Nelson Hart
  • Paul Cummins
  • Rachel Smart
  • Robert Waltz
  • Robin Dean
  • Robin Naughton
  • Rosie Le Faive
  • Scott Ziegler
  • Wilhelmina Randtke
  • Will Panting
  • William Conlin

And special thanks to Kim Pham, our Testing Manager; Don Richards, our Documentation Manager, and Janice Banser, our Auditing Manager.


Your Release Managers,

Rosie Le Faive and Diego Pino

A Look Back at Open Access Week 2017 / ACRL TechConnect

This year’s Open Access Week at my institution was a bit different than before. With our time constrained by conference travel and staff shortages leaving everyone over-scheduled, we decided to aim for a week of “virtual programming”, with a week of blog posts and an invitation to view our open access research guide. While this lacked the splashiness of programming in prior years, in another way it felt important to do this work in this way. Yes, it may well be that only people already well-connected to the library saw any of this material. But promotion of open access requires a great deal of self-education among librarians or other library insiders before we can promote it more broadly. For many libraries, it may be the case that there are only a few “open access” people, and Open Access Week ends up being the only time during the year the topic is addressed by the library as a whole.

All the Colors of Open Access: Black and Green and Gold

There were a few shakeups in scholarly communication and open access over the past few months that made some of these discussions more broadly interesting across the academic landscape. The on-going saga of the infamous Beall’s List has been a major 2017 story. An article in the Chronicle of Higher Education about Jeffrey Beall was emailed to me more than once, and captured the complexity of why such a list is both an appealing solution to a problem but also reliant on sometimes questionable personal judgements. Jeffrey Beall’s attitude towards other scholarly communications librarians can be simplistic and vindictive, as an  interview with Times Higher Education in August made clear. June saw the announcement of Cabell’s Blacklist, which is based on Beall’s list, and uses a list of criteria to judge journal quality. At my own institution I know this prompted discussions of what the purpose of a blacklist is, versus using a vetted list of open access journals like the Directory of Open Access Journals. As a researcher in an article in Nature about this product states, it’s likely that a blacklist is more useful for promotion and tenure committees or hiring committees to judge applicants more than for potential authors to find good journals in which to publish.

This also completely leaves aside the green open access options, in which authors can negotiate with their publisher to make a version of their article openly available–often the final published version, but at least the text before layout. While publishing an article in an open access journal has many benefits, green open access can meet the open access goals of faculty without worrying about paying additional fees or worrying about journal quality. But we still need to educate people on green open access. I was chatting with a friend who is an economist recently, and he was wondering about how open access worked in other disciplines, since he was used to all papers being released as working papers before being published in traditional journals. I contrast this conversation with another where someone in a very different discipline who was concerned that putting even a summary of research could constitute prior publication. Given this wide disparity between disciplines, we will always struggle with widely casting a message about green open access. But I firmly believe that there are individuals within all disciplines who will be excited about open access, and that they will get at least some of their colleagues on board–or perhaps their graduate students. These people may be located in the interdisciplinary side, with one foot in a more preprint-friendly discipline. For instance, the bioethicists in the theology department, or the history of science people in the history department. And even the most well-meaning people forget to make their work open access, so making it as easy as possible while not making it so easy that people don’t know why they would do it–make sure there are still avenues for conversation.

Shaky Platforms

Making things easy to do requires having a good platform, but that became more complicated in August when Elsevier acquired bepress, which prompted discussions among many librarians about their values around open access and whether relying on vendors for open access platforms was a foolish gamble (the Library Loon summarizes this discussion well). This is a complex question, as the kinds of services provided by bepress’s Digital Commons go well beyond a simple hosting platform, and goes along with the strategy I pointed out Elsevier was pursuing in my Open Access 2016 post. Convincing faculty to participate in open access requires a number of strategies, and things like faculty profiles, readership dashboards, and attractive interfaces go a long way. No surprise that after purchasing platforms that make this easy, Elsevier (along other publishers) would go after ResearchGate in October, which is even easier to use in some ways, and certainly appealing for researchers.

All the discussion of predatory journals and blacklists (not to mention SciHub being ordered blocked thanks to an ACS lawsuit) seems old to those of us who have been doing this work for years, but it is still a conversation we need to have. More importantly, focusing on the positive aspects of open access helps get at the reasons people to participate in open access and move the conversation forward. We can do work to educate our communities about finding good open access journals, and how to participate legally. I believe that publishers are providing more green access options because their authors are asking for them, and we are helping authors to know how to ask.

I hope we were not too despairing this Open Access Week. We are doing good work, even if there is still a lot of poisonous rhetoric floating around. In the years I’ve worked in scholarly communication I’ve helped make thousands of articles, book chapters, dissertations, and books open access. Those items have in turn gone on to be cited in new publications. The scholarly communication cycle still goes on.


Archived CopyTalk webinar on students and music sharing available / District Dispatch

An espresso cup filled with coffee beans turned sideways on a saucer

By Lotus Head from Johannesburg, South Africa (stock.xchng)

An archived copy of the CopyTalk webinar “Music copyright: what do students know and what do we do about it?” is now available. Originally presented on November 2 by the Office for Information Technology Policy’s Copyright Education Subcommittee, the webinar features Kathleen DeLaurenti, Open Access Editor, Music Library Association Head Librarian, Arthur Friedheim Library at the Peabody Institute of the Johns Hopkins University.

DeLaurenti discusses her research project to identify how college-aged students perceive music copyright. Were they rabid music infringers because they didn’t understand copyright, or did what they understood guide them in a different direction? Of course, the answer is more nuanced than either choice, but the music industry might learn a lesson or two when developing educational copyright promotions and other tools based on this research. DeLaurenti won the Robert L. Oakley Memorial Scholarship in 2015, which in part helped fund her research. As a bonus, DeLaurenti’s project turned to a new direction with students creating YouTube videos that help to explain music copyright to peers. Try them out at your library!

Plan ahead! One hour CopyTalk webinars occur on the first Thursday of every month at 11am Pacific/2 pm Eastern Time. Free!

Our December 7 webinar will feature Emilie Algenio, Copyright/FairUse Librarian at the Texas A&M University. Algenio will discuss her first-year experience after appointment as a copyright librarian. This CopyTalk will be ideal for those librarians just starting their copyright gigs – don’t miss it!

The post Archived CopyTalk webinar on students and music sharing available appeared first on District Dispatch.

Use OCLC Research to Examine the Realities of Research Data Management at Your Institution / HangingTogether

Last week, I had the opportunity to meet with North American members of the OCLC Research Library Partnership in Baltimore, Maryland, and to engage in a day-long discussion about evolving scholarly services and workflows, particularly institutional repositories, research data management, and research information management. The OCLC Research Library Partnership provides a unique transnational collaborative network of peers to address common issues as well as the opportunity to engage directly with OCLC Research.

Much of our group discussion focused on research data management (RDM) services, which is a growing interest for libraries and a focus of inquiry here at OCLC Research. In particular, we engaged partners in conversation about the Realities of Research Data Management, a four-part series that explores how research universities are addressing the challenges of managing research data throughout the research life cycle. In the first report, we introduced a simple framework for describing the three major components of RDM services:

  • Education—educating researchers and other stakeholders on the importance of research data management and encouraging RDM skill-building
  • Expertise—providing decision support and customized solutions for researchers working through specific research data management problems
  • Curation—supplying technical infrastructure and related services that support data management throughout the research cycle

In the second report, we applied this framework to describe the RDM service bundles at four research institutions, such as at the University of Illinois at Urbana-Champaign, represented here.

Our exploration of RDM practices focused on four research universities with fairly mature service bundles. Some of our key findings include:

  • Each institution’s service offering is unique and shaped by both local and external conditions
  • It may not be necessary for an institution to implement the full range of possible RDM services
  • The RDM service bundle is NOT just local offering but also includes the external resources (such as external data centers) in use by researchers

The goal of this research is to provide libraries and research institutions with concrete examples of RDM services at four case study institutions with rich offerings to explore. The RDM Service Categories framework is also intended to provide a way for institutions as well as the broader scholarly communications community to discuss RDM offerings. We used this model in our discussion at Baltimore, and I was delighted to find it seemed to work as intended—as a useful framework for discussion, sharing, and scoping. By talking about three broad, discrete categories of services, we were able to share and compare more easily, without having to articulate each specific offering.

I want to invite you to apply this framework to the RDM service bundle at your institution.

Two reports in this series are still forthcoming, with Part Three: Incentives for Building University RDM Services coming in December and a final report on the sourcing and scaling of RDM services to come early in 2018.

Our conversations with Research Library Partners extended beyond RDM and also included sharing about interoperability, identifiers, and collaborations with other campus units. I will be sharing more about these conversations in future blog posts.

Hack-A-Way 2017 @ Fort Benjamin Harrison / Evergreen ILS

Day 1 of the 2017 Hack-A-Way at Fort Benjamin Harrison is beginning as most things do with setup!  Many thanks to the Indiana State Library who have been gracious enough to host us two years in a row at the inn located in Fort Benjamin Harrison just outside Indianapolis, Indiana.  This is one of ten building remaining from the original fort’s construction and in it’s time has served as a command post, hospital and more.  In fact it was a place where soldiers returning from the front in 1918 were treated for both physical and mental health issues and was the source of an outbreak of influenza in this region of the United States after soldiers brought it back from Europe.  This week however we promise to just be pushing git commits out of here!




And early birds here are Anna Goben and Jason Boyer from the Indiana State Library doing setup.





Keynote at Pacific Neighborhood Consortium / David Rosenthal

I was invited to deliver a keynote at the 2017 Pacific Neighborhood Consortium in Tainan, Taiwan. My talk, entitled The Amnesiac Civilization, was based on the series of posts earlier this year with the same title. The theme was "Data Informed Society", and my abstract was:
What is the data that informs a society? It is easy to think that it is just numbers, timely statistical information of the kind that drives Google Maps real-time traffic display. But the rise of text-mining and machine learning means that we must cast our net much wider. Historic and textual data is equally important. It forms the knowledge base on which civilization operates.

For nearly a thousand years this knowledge base has been stored on paper, an affordable, durable, write-once and somewhat tamper-evident medium. For more than five hundred years it has been practical to print on paper, making Lots Of Copies to Keep Stuff Safe. LOCKSS is the name of the program at the Stanford Libraries that Vicky Reich and I started in 1998. We took a distributed approach; providing libraries with tools they could use to preserve knowledge in the Web world. They could work the way they were used to doing in the paper world, by collecting copies of published works, making them available to readers, and cooperating via inter-library loan. Two years earlier, Brewster Kahle had founded the Internet Archive, taking a centralized approach to the same problem.

Why are these programs needed? What have we learned in the last two decades about their effectiveness? How does the evolution of Web technologies place their future at risk?
Below the fold, the text of my talk.


I'm honored to join the ranks of your keynote speakers, and grateful for the opportunity to visit beautiful Taiwan. You don't need to take notes, or photograph the slides, or even struggle to understand my English, because the whole text of my talk, with links to the sources and much additional material in footnotes, has been posted to my blog.

What is the data that informs a society? It is easy to think that it is just numbers, timely statistical information of the kind that drives Google Maps real-time traffic display. But the rise of text-mining and machine learning means that we must cast our net much wider. Historic and textual data is equally important. It forms the knowledge base on which civilization operates.

Qing dynasty print of Cai Lun
Ever since 105AD when Cai Lun (蔡伦) invented the process for making paper, civilizations have used it to record their history and its context in everyday life. Archives and libraries collected and preserved originals. Scribes labored to create copies, spreading the knowledge they contained. Bi Sheng's (毕昇) invention of movable type in the 1040s AD greatly increased the spread of copies and thus knowledge, as did Choe Yun-ui's (최윤의) 1234 invention of bronze movable type in Korea, Johannes Gutenberg's 1439 development of the metal type printing press in Germany, and Hua Sui's (华燧) 1490 introduction of bronze type in China.[1]

Thus for about two millennia civilizations have been able to store their knowledge base on this affordable, durable, write-once, and somewhat tamper-evident medium. For more than half a millennium it has been practical to print on paper, making Lots Of Copies to Keep Stuff Safe. But for about two decades the knowledge base has been migrating off paper and on to the Web.

Lots Of Copies Keep Stuff Safe is the name of the program at the Stanford Libraries that Vicky Reich and I started 19 years ago last month. We took a distributed approach to preserving knowledge; providing libraries with tools they could use to continue in the Web world their role in the paper world of collecting copies of published works and making them available to readers. Two years earlier, Brewster Kahle had founded the Internet Archive, taking a centralized approach to the same problem.

My talk will address three main questions:
  • Why are these programs needed?
  • What have we learned in the last two decades about their effectiveness?
  • How does the evolution of Web technology place their future at risk?

Why archive the Web?

Paper is a durable medium, but the Web is not. From its earliest days users have experienced "link rot", links to pages that once existed but have vanished. Even in 1997 they saw it as a major problem:
6% of the links on the Web are broken according to a recent survey by Terry Sullivan's All Things Web. Even worse, linkrot in May 1998 was double that found by a similar survey in August 1997.

Linkrot definitely reduces the usability of the Web, being cited as one of the biggest problems in using the Web by 60% of the users in the October 1997 GVU survey. This percentage was up from "only" 50% in the April 1997 survey.[4]
Figure 1b
Research at scale in 2001 by Lawrence et al validated this concern. They:
analyzed 270,977 computer science journal papers, conference papers, and technical reports ... From the 100,826 articles cited by another article in the database (thus providing us with the year of publication), we extracted 67,577 URLs. ... Figure 1b dramatically illustrates the lack of persistence of Internet resources. The percentage of invalid links in the articles we examined varied from 23 percent in 1999 to a peak of 53 percent in 1994.
The problem is worse than this. Martin Klein and co-authors point out that Web pages suffer two forms of decay or reference rot:
  • Link rot: The resource identified by a URI vanishes from the web. As a result, a URI reference to the resource ceases to provide access to referenced content.
  • Content drift: The resource identified by a URI changes over time. The resource’s content evolves and can change to such an extent that it ceases to be representative of the content that was originally referenced.
Similarity over time at arXiv
They examined scholarly literature on the Web and found:
one out of five STM articles suffering from reference rot, meaning it is impossible to revisit the web context that surrounds them some time after their publication. When only considering STM articles that contain references to web resources, this fraction increases to seven out of ten.
The problem gets worse through time:
even for articles published in 2012 only about 25% of referenced resources remain unchanged by August of 2015. This percentage steadily decreases with earlier publication years, although the decline is markedly slower for arXiv for recent publication years. It reaches about 10% for 2003 through 2005, for arXiv, and even below that for both Elsevier and PMC.
Thus, as the arXiv graph shows, they find that, after a few years, it is very unlikely that a reader clicking on a web-at-large link in an article will see what the author intended.

This isn't just a problem for scholarly literature, it is even worse on the general Web. The British Library's Andy Jackson analyzed the UK Web Archive and:
was shocked by how quickly link rot and content drift come to dominate the scene. 50% of the content is lost after just one year, with more being lost each subsequent year. However, it’s worth noting that the loss rate is not maintained at 50%/year. If it was, the loss rate after two years would be 75% rather than 60%.[5]
It isn't just that Web servers can go away or their contents be rewritten. Access to Web pages is mediated by the Domain Name Service (DNS), and they can become inaccessible because the domain owner fails to pay the registrar, or their DNS service, or for political reasons:
In March 2010 every webpage with the domain address ending in .yu disappeared from the internet – the largest ever to be removed. This meant that the internet history of the former Yugoslavia was no longer available online. Dr Anat Ben-David, from the Open University in Israel, has managed to rebuild about half of the lost pages – pages that document the Kosovo Wars, which have been called "the first internet war”.[8]
The terms "link rot" and "content drift" suggest randomness but in many cases they hide deliberate suppression or falsification of information. More than a decade ago, in only my 6th blog post, I wrote:
Winston Smith in "1984" was "a clerk for the Ministry of Truth, where his job is to rewrite historical documents so that they match the current party line". George Orwell wasn't a prophet. Throughout history, governments of all stripes have found the need to employ Winston Smiths and the US government is no exception.
Examples of Winston Smith's work are everywhere, such as:
George W. Bush’s “Mission Accomplished” press release. The first press release read that ‘combat operations in Iraq have ceased.’ After a couple weeks, that was changed to ‘major combat operations have ceased.’ And then, the whole press release disappeared off the White House’s website completely.[7]
Britain has its Winston Smiths too:
One of the most enduring symbols of 2016's UK Brexit referendum was the huge red "battle bus" with its message, "We send the EU £350 million a week, let's fund our NHS instead. Vote Leave." ... Independent fact-checkers declared the £350 million figure to be a lie. Within hours of the Brexit vote, the Leave campaign scrubbed its website of all its promises, and Nigel Farage admitted that the £350 million was an imaginary figure and that the NHS would not see an extra penny after Brexit.[6]
Data on the Web is equally at risk. Under the Harper administration, Canadian librarians fought a long, lonely struggle with their Winston Smiths:
Protecting Canadians’ access to data is why Sam-Chin Li, a government information librarian at the University of Toronto, worked late into the night with colleagues in February 2013, frantically trying to archive the federal Aboriginal Canada portal before it disappeared on Feb. 12. The decision to kill the site, which had thousands of links to resources for Aboriginal people, had been announced quietly weeks before; the librarians had only days to train with web-harvesting software.[11]
EOT Crawl Statistics
A year ago, a similar but much larger emergency data rescue effort swung into action in the US:
Between Fall 2016 and Spring 2017, the Internet Archive archived over 200 terabytes of government websites and data. This includes over 100TB of public websites and over 100TB of public data from federal FTP file servers totaling, together, over 350 million URLs/files.
Partly this was the collaborative "End of Term" (EOT) crawl that is organized at each change of Presidential term, but this time there was added urgency:
Through the EOT project’s public nomination form and through our collaboration with the DataRefugeEnvironmental Data and Governance Initiative (EDGI), and other efforts, over 100,000 webpages or government datasets were nominated by citizens and preservationists for archiving.
1473 nova remains
Note the emphasis on datasets. It is important to keep scientific data, especially observations that are not repeatable, for the long term. A recent example is Korean astronomers' records of a nova in 1437, which provide strong evidence that:
"cataclysmic binaries"—novae, novae-like variables, and dwarf novae—are one and the same, not separate entities as has been previously suggested. After an eruption, a nova becomes "nova-like," then a dwarf nova, and then, after a possible hibernation, comes back to being nova-like, and then a nova, and does it over and over again, up to 100,000 times over billions of years.[12]
By BabelStone, CC BY-SA 3.0
580 years is peanuts. An example more than 5 times older is from China. In the Shang dynasty:
astronomers inscribed eclipse observations on animal bones. About 3200 years later, researchers used these records to estimate that the accumulated clock error was about 7 hours. From this they derived a value for the viscosity of the Earth's mantle as it rebounds from the weight of the glaciers.
Today, those eclipse records would be on the Web, not paper or bone. Will astronomers 3200 or even 580 years from now be able to use them?

What have we learned about archiving the Web?

I hope I've convinced you that a society whose knowledge base is on the Web is doomed to forget its past unless something is done to preserve it[18]. Preserving Web content happens in three stages:
  • Collection
  • Preservation
  • Access

What have we learned about collection?

In the wake of NASA's March 2013 takedown of their Technical Report Server James Jacobs, Stanford's Government Documents librarian, stressed the importance of collecting Web content:
pointing to web sites is much less valuable and much more fragile than acquiring copies of digital information and building digital collections that you control. The OAIS reference model for long term preservation makes this a requirement ... “Obtain sufficient control of the information provided to the level needed to ensure Long-Term Preservation.” Pointing to a web page or PDF at is not obtaining any control.
Memory institutions need to make their own copies of Web content. Who is doing this?

Internet Archive HQ
The Internet Archive is by far the largest and most used Web archive, having been trying to collect the whole of the Web for more than two decades. Its "crawlers" start from a large set of "seed" web pages and follow links from them to other pages, then follow those links, according to a set of "crawl rules". Well-linked-to pages will be well represented; they may be important or they may be "link farms". Two years ago Kalev Leetaru wrote:
of the top 15 websites with the most snapshots taken by the Archive thus far this year, one is an alleged former movie pirating site, one is a Hawaiian hotel, two are pornography sites and five are online shopping sites. The second-most snapshotted homepage is of a Russian autoparts website and the eighth-most-snapshotted site is a parts supplier for trampolines.
The Internet Archive's highly automated collection process may collect a lot of unimportant stuff, but it is the best we have at collecting the "Web at large"[25]. The Archive's recycled church in San Francisco, and its second site nearby sustain about 40Gb/s outbound and 20Gb/s inbound serving about 4M unique IPs/day. Each stores over 3*1011 Web pages, among much other content. The Archive has been for many years in the top 300 Web sites in the world. For comparison, the Library of Congress typically ranks between 4000 and 6000.

Network effects mean that technology markets in general and the Web in particular are winner-take-all markets. Just like Google in search, the Internet Archive is the winner in its market. Other institutions can't compete in archiving the whole Web, they must focus on curated collections.

UK Web Archive
The British Library, among other national libraries, has been collecting their "national Web presence" for more than a decade. One problem is defining "national Web presence". Clearly, it is more than the .uk domain, but how much more? The Portuguese Web Archive defines it as the .pt domain plus content embedded in or redirected from the .pt domain. That wouldn't work for many countries, where important content is in top-level domains such as .com.

Dr. Regan Murphy Kao of Stanford's East Asian Library described their approach to Web collecting:
Mao Kobayashi
we sought to archive a limited number of blogs of ground-breaking, influential figures – people whose writings were widely read and represented a new way of approaching a topic. One of the people we choose was Mao Kobayashi. ... Mao broke with tradition and openly described her experience with cancer in a blog that gripped Japan. She harnessed this new medium to define her life rather than allow cancer to define it.[3]
Curated collections have a problem. What made the Web transformational was the links (see Google's PageRank). Viewed in isolation, curated collections break the links and subtract value. But, viewed as an adjunct to broad Web archives they can add value in two ways:
UK Web Archive link analysis
  • By providing quality assurance, using greater per-site resources to ensure that important Web resources are fully collected.
  • By providing researchers better access to preserved important Web resources than the Internet Archive can. For example, better text search or data mining. The British Library has been a leader in this area.
Nearly one-third of a trillion Web pages at the Internet Archive is impressive, but in 2014 I reviewed the research into how much of the Web was then being collected[13] and concluded:
Somewhat less than half ... Unfortunately, there are a number of reasons why this simplistic assessment is wildly optimistic.
Costa et al ran surveys in 2010 and 2014 and concluded in 2016:
during the last years there was a significant growth in initiatives and countries hosting these initiatives, volume of data and number of contents preserved. While this indicates that the web archiving community is dedicating a growing effort on preserving digital information, other results presented throughout the paper raise concerns such as the small amount of archived data in comparison with the amount of data that is being published online.
I revisited this topic earlier this year and concluded that we were losing ground rapidly. Why is this? The reason is that collecting the Web is expensive, whether it uses human curators, or large-scale technology, and that Web archives are pathetically under-funded:
The Internet Archive's budget is in the region of $15M/yr, about half of which goes to Web archiving. The budgets of all the other public Web archives might add another $20M/yr. The total worldwide spend on archiving Web content is probably less than $30M/yr, for content that cost hundreds of billions to create[19].
My rule of thumb has been that collection takes about half the lifetime cost of digital preservation, preservation about a third, and access about a sixth. So the world may spend only about $15M/yr collecting the Web.

As an Englishman it is important in this forum I also observe that, like the Web itself, archiving is biased towards English. For example, pages in Korean are less than half as likely to be collected as pages in English[20].

What have we learned about preservation?

Since Jeff Rothenberg's seminal 1995 Ensuring the Longevity of Digital Documents it has been commonly assumed that digital preservation revolved around the problem of formats becoming obsolete, and thus their content becoming inaccessible. But in the Web world formats go obsolete very slowly if at all. The real problem is simply storing enough bits reliably enough for the long term.[9]

Kryder's Law
Or rather, since we actually know how to store bits reliably, it is finding enough money to store enough bits reliably enough for the long term. This used to be a problem we could ignore. Hard disk, a 60-year-old technology, is the dominant medium for bulk data storage. It had a remarkable run of more than 30 years of 30-40%/year price declines; Kryder's Law, the disk analog of Moore's Law. The rapid cost decrease meant that if you could afford to store data for a few years, you could afford to store it "forever".

Kryder's Law breakdown
Just as Moore's Law slowed dramatically as the technology approached the physical limits, so did Kryder's Law. The slowing started in 2010, and was followed by the 2011 floods in Thailand, causing disk prices to double and not recover for 3 years. In 2014 we predicted Kryder rates going forward between 10-20%, the red lines on the graph, meaning that:
If the industry projections pan out ... by 2020 disk costs per byte will be between 130 and 300 times higher than they would have been had Kryder's Law continued.
Economic model output
So far, our prediction has proved correct, which is bad news. The graph shows the endowment, the money which, deposited with the data and invested at interest, will cover the cost of storage "forever". It increases strongly as the Kryder rate falls below 20%, which it has. Absent unexpected technological change, the cost of long-term data storage is far higher than most people realize.[10]

What have we learned about access?

There is clearly a demand for access to the Web's history. The Internet Archive's Wayback Machine provides well over 1M users/day access to it.

1995 Web page
On Jan 11th 1995, the late Mark Weiser, CTO of Xerox PARC, created Nijinksy and Pavlova's Web page, perhaps the start of the Internet's obsession with cat pictures. You can view this important historical artifact by pointing your browser to the Internet Archive's Wayback Machine, which captured the page 39 times between Dec 1st 1998 and May 11th 2008. What you see using your modern browser is perfectly usable, but it is slightly different from what Mark saw when he finished the page over 22 years ago.

Ilya Kreymer used two important recent developments in digital preservation
to build , a Web site that allows you to view preserved Web content using the browser that its author would have. The first is that emulation & virtualization techniques have advanced to allow Ilya to create on-the-fly behind the Web page a virtual machine running, in this case, a 1998 version of Linux with a 1998 version of the Mosaic browser visiting the page. Note the different fonts and background. This is very close to what Mark would have seen in 1995.

The Internet Archive is by far the biggest Web archive, for example holding around 40 times as much data as the UK Web Archive. But the smaller archives contain pages it lacks, and there is little overlap between them, showing the value of curation.

The second development is Memento (RFC7089), a Web protocol that allows access facilities such as to treat the set of compliant Web archives as if it were one big archive. aggregates many Web archives, pulling each Web resource a page needs from the archive with the copy closest in time to the requested date.[14]

The future of Web preservation

There are two main threats to the future of Web preservation, one economic and the other a combination of technological and legal.

The economic threat

Preserving the Web and other digital content for posterity is primarily an economic problem. With an unlimited budget collection and preservation isn't a problem. The reason we're collecting and preserving less than half the classic Web of quasi-static linked documents is that no-one has the money to do much better. The other half is more difficult and thus more expensive. Collecting and preserving the whole of the classic Web would need the current global Web archiving budget to be roughly tripled, perhaps an additional $50M/yr.

Then there are the much higher costs involved in preserving the much more than half of the dynamic "Web 2.0" we currently miss.

British Library real income
If we are to continue to preserve even as much of society's memory as we currently do we face two very difficult choices; either find a lot more money, or radically reduce the cost per site of preservation.

It will be hard to find a lot more money in a world where libraries and archive budgets are decreasing. For example, the graph shows that the British Library's income has declined by 45% in real terms over the last decade.[2]

The Internet Archive is already big enough to reap economies of scale, and it already uses innovative engineering to minimize cost. But Leetaru and others criticize it for:
  • Inadequate metadata to support access and research.
  • Lack of quality assurance leading to incomplete collection of sites.
  • Failure to collect every version of a site.
Generating good metadata and doing good QA are hard to automate and thus the first two are expensive. But the third is simply impossible.

The technological/legal threat

The economic problems of archiving the Web stem from its two business models, advertising and subscription:
Note weather in GIF of reloads of
CNN page from Wayback Machine[17]
  • To maximize the effectiveness of advertising, the Web now potentially delivers different content on every visit to a site. What does it mean to "archive" something that changes every time you look at it?
  • To maximize the effectiveness of subscriptions, the Web now potentially prevents copying content and thus archiving it. How can you "archive" something you can't copy?
Personalization, geolocation and adaptation to browsers and devices mean that each of the about 3.4*109 Internet users may see different content from each of about 200 countries they may be in, and from each of the say 100 device and browser combinations they may use. Storing every possible version of a single average Web page could thus require downloading about 160 exabytes, 8000 times as much Web data as the Internet Archive holds.

The situation is even worse. Ads are inserted by a real-time auction system, so even if the page content is the same on every visit, the ads differ. Future scholars, like current scholars studying Russian use of social media in the 2016 US election, will want to study the ads but they won't have been systematically collected, unlike political ads on TV.[21]

The point here is that, no matter how much resource is available, knowing that an archive has collected all, or even a representative sample, of the versions of a Web page is completely impractical. This isn't to say that trying to do a better job of collecting some versions of a page is pointless, but it is never going to provide future researchers with the certainty they crave. And doing a better job of each page will be expensive.

Although it is possible to collect some versions of today's dynamic Web pages, it is likely soon to become impossible to collect any version of most Web pages. Against unprecedented opposition, Netflix and other large content owners with subscription or pay-per-view business models have forced W3C, the standards body for the Web, to mandate that browsers support Encrypted Media Extensions (EME) or Digital Rights Management (DRM) for the Web.[15]

EME data flows
The W3C's diagram of the EME stack shows an example of how it works. An application, i.e. a Web page, requests the browser to render some encrypted content. It is delivered, in this case from a Content Distribution Network (CDN), to the browser. The browser needs a license to decrypt it, which it obtains from the application via the EME API by creating an appropriate session then using it to request the license. It hands the content and the license to a Content Decryption Module (CDM), which can decrypt the content using a key in the license and render it.

What is DRM trying to achieve? Ostensibly, it is trying to ensure that each time DRM-ed content is rendered, specific permission is obtained from the content owner. In order to ensure that, the CDM cannot trust the browser it is running in. For example, it must be sure that the browser can see neither the decrypted content nor the key. If it could see, and save for future use, either it would defeat the purpose of DRM. The license server will not be available to the archive's future users, so preserving the encrypted content without the license is pointless.

Content owners are not stupid. They realized early on that the search for uncrackable DRM was a fool's errand[22]. So, to deter reverse engineering, they arranged for the 1998 Digital Millenium Copyright Act (DMCA) to make any attempt to circumvent protections on digital content a criminal offense. US trade negotiations mean that almost all countries (except Israel) have DMCA-like laws.[23]

Thus we see that the real goal of EME is to ensure that, absent special legislation such as a few national libraries have, anyone trying to either capture the decrypted content, or preserve the license for future use, would be committing a crime. Even though the British Library, among others, has the legal right to capture the decrypted content, it is doubtful that they have the technical or financial resources to do so at scale.

Scale is what libraries and archives would need. Clearly, EME will be rapidly adopted for streaming video sites, not just Netflix but YouTube, Vimeo and so on. Even a decade ago, to study US elections you needed YouTube video, but it will no longer be possible to preserve Web video.

But that's not the big impact that EME will have on society's memory. It is intended for video and audio, but it will be hard for W3C to argue that other valuable content doesn't deserve the same protection, for example academic journals. DRM will spread to other forms of content. The business models for Web content are of two kinds, and both are struggling:
  • Paywalled content. It turns out that, apart from movies and academic publishing, only a very few premium brands such as The Economist, the Wall Street Journal and the New York Times have viable subscription business models based on (mostly) paywalled content. Even excellent journalism such as The Guardian is reduced to free access, advertising and voluntary donations. Part of the reason is that Googling the headline of paywalled news stories often finds open access versions of the content. Clearly, newspapers and academic publishers would love to use Web DRM to ensure that their content could be accessed only from their site, not via Google or Sci-Hub.
  • Advertising-supported content. The market for Web advertising is so competitive and fraud-ridden that Web sites have been forced into letting advertisers run ads that are so obnoxious and indeed riddled with malware, and to load up their sites with trackers, that many users have rebelled.[24] They use ad-blockers; these days it is pretty much essential to do so to keep yourself safe and to reduce bandwidth consumption. Not to mention that sites such as Showtime are so desperate for income that their ads mine cryptocurrency in your browser. Sites are very worried about the loss of income from blocked ads. Some, such as Forbes, refuse to supply content to browsers that block ads (which, in Forbes case, turned out to be a public service; the ads carried malware). DRM-ing a site's content will prevent ads being blocked. Thus ad space on DRM-ed sites will be more profitable, and sell for higher prices, than space on sites where ads can be blocked. The pressure on advertising-supported sites, which include both free and subscription news sites, to DRM their content will be intense.
Thus the advertising-supported bulk of what we think of as the Web, and the paywalled resources such as news sites that future scholars will need will become un-archivable.


I wish I could end this talk on an optimistic note, but I can't. The information the future will need about the world of today is on the Web. Our ability to collect and preserve it has been both inadequate and decreasing. This is primarily due to massive under-funding. A few tens of millions of dollars per year worldwide set against the trillions of dollars per year of revenue the Web generates. There is no realistic prospect of massively increasing the funding for Web archiving. The funding for the world's memory institutions, whose job it is to remember the past, has been under sustained attack for many years.

The largest component of the cost of Web archiving is the initial collection. The evolution of the Web from a set of static, hyper-linked documents to a JavaScript programming environment has been steadily raising the difficulty and thus cost of collecting the typical Web page. The increasingly dynamic nature of the resulting Web content means that each individual visit is less and less representative of "the page". What does it mean to "preserve" something that is different every time you look at it?

And now, with the advent of Web DRM, our likely future is one in which it is not simply increasingly difficult, expensive and less useful to collect Web pages, but actually illegal to do so.

Call To Action

So I will end with a call to action. Please:
  • Use the Wayback Machine's Save Page Now facility to preserve pages you think are important.
  • Support the work of the Internet Archive by donating money and materials.
  • Make sure your national library is preserving your nation's Web presence.
  • Push back against any attempt by W3C to extend Web DRM.


  1. For details, see volume 5 part 1 of Joseph Needham's Science and Civilisation in China.
  2. The nominal income data was obtained from the British Library's Annual Report series. The real income was computed from it using the Bank of England's official inflation calculator. More on this, including the data for the graph, is here.
  3. The BBC report on their nomination of Mao Kobayashi as one of their "100 Women 2016" includes:
    In Japan, people rarely talk about cancer. You usually only hear about someone's battle with the disease when they either beat it or die from it, but 34-year-old newsreader Mao Kobayashi decided to break the mould with a blog - now the most popular in the country - about her illness and how it has changed her perspective on life.
    It is noteworthy that, unlike the BBC, Dr. Kao doesn't link to Mao Kobayashi's blog, nor does she link to Stanford's preserved copy. Fortunately, the Internet Archive started collecting Kobayashi's blog in September 2016.
  4. Note that both links in this quote have rotted. I replaced them with links to the preserved copies in the Internet Archive's Wayback Machine.
  5. More on this and related research can be found here and here.
  6. Compare the Wayback Machine's capture of the Leave campaign's website  three days before the referendum with the day after (still featuring the battle bus with the £350 million claim) and nine weeks later (without the battle bus). This scrubbing of inconvenient history is a habit with UK Conservatives:
    The Conservatives have removed a decade of speeches from their website and from the main internet library – including one in which David Cameron claimed that being able to search the web would democratise politics by making "more information available to more people".

    The party has removed the archive from its public website, erasing records of speeches and press releases from 2000 until May 2010. The effect will be to remove any speeches and articles during the Tories' modernisation period, including its commitment to spend the same as a Labour government.
    In a remarkable step the party has also blocked access to the Internet Archive's Wayback Machine, a US-based library that captures webpages for future generations, using a software robot that directs search engines not to access the pages.
  7. Wikipedia has a comprehensive article on the "Mission Accomplished" speech. A quick-thinking reporter's copying of an on-line court docket revealed more history rewriting in the aftermath of the 2008 financial collapse. Another example of how the Wayback Machine exposed more of the Federal government's Winston Smith-ing is here.
  8. The quote is the abstract to a BBC World Series programme entitled Restoring a Lost Web Domain.
  9. To discuss the reliability requirements for long-term storage, I've been using "A Petabyte for a Century" as a theme on my blog since 2007. It led to a paper at iPRES 2008 and an article for ACM Queue entitled Keeping Bits Safe: How Hard Can It Be? which subsequently appeared in Communications of the ACM, triggering some interesting observations on copyright.
  10. I have been blogging about the costs of long-term storage with the theme "Storage Will Be Much Less Free Than It Used To Be" since at least 2010. The graph is from a simplified version of the economic model of long-term storage initially described in this 2012 paper. For a detailed discussion of technologies for long-term storage see The Medium-Term Prospects For Long-Term Storage Systems.
  11. The McLeans article continues:
    The need for such efforts has taken on new urgency since 2014, says Li, when some 1,500 websites were centralized into one, with more than 60 per cent of content shed. Now that reporting has switched from print to digital only, government information can be altered or deleted without notice, she says. (One example: In October 2012, the word “environment” disappeared entirely from the section of the Transport Canada website discussing the Navigable Waters Protection Act.)
  12. Source
    The nova was recorded in the Joseonwangjosillok, the Annals of the Joseon Dynasty. Multiple copies were printed on paper, stored in multiple, carefully designed, geographically diverse archives, and faithfully tended according to a specific process of regular audit, the details of which were carefully recorded each time. Copies lost in war were re-created. I blogged about the truly impressive preservation techniques that have kept them legible for almost 6 centuries.
  13. Estimating what proportion of the Web is preserved is a hard problem. The numerator, the content of the world's Web archives, is fairly easy to measure. But the denominator, the size of the whole Web, is extremely hard to measure. I discuss some attempts here.
  14. Ilya Kreymer details the operation of in a guest post on my blog.
  15. For all the gory details on the problems EME poses for archives, the security of Web browsers, and many other areas, see The Amnesiac Civilization: Part 4.
  16. The image is a slide from a talk by the amazing Maciej Cegłowski entitled What Happens Next Will Amaze You, and it will. Cory Doctorow calls Cegłowski's talks "barn-burning", and he's right.
  17. This GIF is an animation of a series of reloads of a preserved version of CNN's home page from 27 July 2013. Nothing is coming from the live Web, all the different weather images are the result of a single collection by the Wayback Machine's crawler. GIF and information courtesy of Michael Nelson.
  18. See also this excellent brief video from, the Portuguese Web Archive, explaining Web archiving for the general public. It is in Portuguese but with English subtitles.
  19. "hundreds of billions" is guesswork on my part. I don't know any plausible estimates of the investment in Web content. In this video, Daniel Gomes claims that estimated that the value of the content it preserves is €216B, but the methodology used is not given. Their estimate seems high, Portugal's 2016 GDP was €185B.
  20. In Comparing the Archival Rate of Arabic, English, Danish, and Korean Language Web Pages Lulwah Alkwai, Michael Nelson and Michelle Weigle report that in their sample of Web pages:
    English has a higher archiving rate than Arabic, with 72.04% archived. However, Arabic has a higher archiving rate than Danish and Korean, with 53.36% of Arabic URIs archived, followed by Danish and Korean with 35.89% and 32.81% archived, respectively.
    Their Table 31 also reveals some problems with the quality of Korean Web page archiving.
    Table XXXI. Top 10 Archived Korean URI-Rs
    Korean URI-RsMemento CountCategory,339Error page
    joins.com17,096News,046Error page,414Error page
    daum.net14,305Search engine,042Error page,676Error page
    hankooki.com7,762Search engine
    5 of the top 10 most frequently archived Korean URIs in their sample are what are called "soft 404s", pages that should return "404 Not Found" but instead return "200 OK".
  21. Senators have introduced a bill:
    the Honest Ads Act, would require companies like Facebook and Google to keep copies of political ads and make them publicly available. Under the act, the companies would also be required to release information on who those ads were targeted to, as well as information on the buyer and the rates charged for the ads.
  22. Based on reporting by Kyle Orland at Ars Technica, Cory Doctorow writes:
    Denuvo is billed as the video game industry's "best in class" DRM, charging games publishers a premium to prevent people from playing their games without paying for them. In years gone by, Denuvo DRM would remain intact for as long as a month before cracks were widely disseminated.

    But the latest crop of Denuvo-restricted games were all publicly cracked within 24 hours.

    It's almost as though hiding secrets in code you give to your adversary was a fool's errand.
  23. Notably, Portugal's new law on DRM contains several useful features including a broad exemption from anti-circumvention for libraries and archives, and a ban on applying DRM to public-domain or government-financed documents. For details, see the EFF's Deeplinks blog.
  24. Some idea of the level of fraud in Web advertising can be gained from an experiment by Business Insider:
    a Business Insider advertiser thought they had purchased $40,000 worth of ad inventory through the open exchanges when in reality, the publication only saw $97, indicating the rest of the money went to fraud.

    "There was more people saying they were selling Business Insider inventory then we could ever possibly imagine," ... "We believe there were 10 to 30 million impressions of Business Insider, for sale, every 15 minutes."

    To put the numbers in perspective, Business Insider says it sees 10 million to 25 million impressions a day.
  25. Last Thursday's example was New York's Gothamist and DNAinfo local news sites:
    A week ago, reporters and editors in the combined newsroom of DNAinfo and Gothamist, two of New York City’s leading digital purveyors of local news, celebrated victory in their vote to join a union.

    On Thursday, they lost their jobs, as Joe Ricketts, the billionaire founder of TD Ameritrade who owned the sites, shut them down.
    Twitter was unhappy:
    The unannounced closure prompted wide backlash from other members of the media on Twitter, with many pointing out that neither writers whose work was published on the sites nor readers can access their articles any longer.
    The Internet Archive has collected Gothamist since 2003; it has many but not all of the articles. By Saturday, Joe Ricketts thought better of the takedown; the sites were back up. But for how long? See also here.


I'm grateful to Herbert van de Sompel and Michael Nelson for constructive comments on drafts, and to Cliff Lynch, Michael Buckland and the participants in UC Berkeley's Information Access Seminar, who provided useful feedback on a rehearsal of this talk. That isn't to say they agree with it.

Org clocktables I: The daily structure / William Denton


Recently I started tracking my time at work using Org’s clocking feature, and it’s working out very well. The actual act of tracking my time makes me much more focused, and I’m wasting less time and working more on important, planned, relevant things. It’s also helping me understand how much time I spend on each of the three main pillars of my work (librarianship + research and professional development + service). In order to understand all this well I wrote some code to turn the Org’s clocktable into something more usable. This is the first of two or three posts showing what I have.

Three pillars

In York University Libraries, where I work, librarians and archivists have academic status. We are not faculty (that’s the professors), but we’re very similar. We’re in the same union. We have academic freedom (important). We get “continuing appointment,” not tenure, but the process is much the same.

University professors have three pillars to their work: teaching, research and service. Service is basically work that contributes to the running of the university: serving on committees (universities have lots of committees, and they do important work, like vetting new courses and programs, allocating research funds, or deciding who gets tenure), being on academic governance bodies such as faculty councils and Senate, having a position in the union, etc. Usually there’s a 40/40/20 ratio on these three areas: people spend about 40% of their time on teaching, 40% on research and 20% on service. This fluctuates term to term and year to year—and person to person—but that’s the general rule in most North American universities, as I understand it.

Waiting for Sir Simon Rattle and the Berlin Philharmonic to enter Roy Thomson Hall, last November. Waiting for Sir Simon Rattle and the Berlin Philharmonic to enter Roy Thomson Hall, last November.

For librarians and archivists the situation can be different. Instead of teaching, let’s say we do “librarianship” as a catch-all term. (Or “archivy,” which the archivists assure me is a real word, but I still think it looks funny.) Then we also do professional development/research and service. In some places, like Laurentian, librarians have full parity with professors, and they have the 40/40/20 ratio. That is ideal. A regrettable example is Western, where librarians and archivists have to spend 75% of their time on professional work. That severely limits the contributions they can make both to the university and to librarianship and scholarship in general.

At York there is no defined ratio. For professors it’s understood to be the 40/40/20, but for librarians and archivists I think it’s understood that is not our ratio, but nothing is set out instead. (This, and that profs have a 2.5 annual course teaching load but we do not have an equivalent “librarianship load,” leads to problems.)

I have an idea of what the ratio should be, but I’m not going to say it here because this may become a bargaining issue. I didn’t know if my work matched that ratio because I don’t have exact details about how I spend my time. I’ve been doing a lot of service, but how much? How much of my time is spent on research?

This question didn’t come to me on my own. A colleague started tracking her time a couple of months ago, jotting things down each day. She said she hadn’t realized just how much damned time it takes to schedule things. I was inspired by her to start clocking my own time.

This is where I got to apply an aspect of Org I’d read about but never used. Org is amazing!

Work diary

I keep a file,, where I put notes on everything I do. I changed how I use subheadings and now I give every day this structure:

* 2017-12 December

** [2017-12-01 Fri]

*** PPK

*** PCS

*** Service

“PPK” is “professional performance and knowledge,” which is our official term for “librarianship” or “archivy.” “PCS” is “professional contribution and standing,” which is the umbrella term for research and more for faculty. Right now for us that pillar is called “professional development,” but that’s forty-year-old terminology we’re trying to change, so I use the term faculty use. (Check the T&P criteria for a full explanation.)

On my screen, because of my Emacs configuration, that looks like this:

Initial structure. Initial structure.

Clocking in

First thing in the morning, I create that structure, then under the date heading I run C-u C-u C-c C-x C-i (where C-c means Ctrl-c). Now, I realize that’s a completely ridiculous key combination to exist, but when you start using Emacs heavily, you get used to such incantations and they become second nature. C-c C-x C-i is the command org-clock-in. As the docs say, “With two C-u C-u prefixes, clock into the task at point and mark it as the default task; the default task will then always be available with letter d when selecting a clocking task.” That will make more sense in a minute.

When I run that command, Org adds a little block under the heading:

** [2017-12-01 Fri]
CLOCK: [2017-12-01 Fri 09:30]

*** PPK

*** PCS

*** Service

The clock is running, and a little timer shows up in my mode line that tells me how long I’ve been working on the current thing.

I’ll spend a while deleting email and checking some web sites, then let’s say I decide to respond to an email about reference desk statistics, because I can get it done before I have to head over to a 10:30 meeting. I make a new subheading under PPK, because this is librarianship work, and clock into it with C-c C-x C-i. The currently open task gets closed, the duration is noted, and a new clock starts.

** [2017-12-01 Fri]
CLOCK: [2017-12-01 Fri 09:30]--[2017-12-01 Fri 09:50] =>  0:20

*** PPK

**** Libstats stuff
CLOCK: [2017-12-01 Fri 09:50]

Pull numbers on weekend desk activity for A.

*** PCS

*** Service

(Remember this doesn’t look ugly the way I see it in Emacs. There’s another screenshot below.)

I work on that until 10:15, then I make a new task (under Service) and check into it (again with C-c C-x C-i). I’m going to a monthly meeting of the union’s stewards’ council, and walking to the meeting and back counts as part of the time spent. (York’s campus is pretty big.)

* [2017-12-01 Fri]
CLOCK: [2017-12-01 Fri 09:30]--[2017-12-01 Fri 09:50] =>  0:20

*** PPK

**** Libstats stuff
CLOCK: [2017-12-01 Fri 09:50]--[2017-12-01 Fri 10:15] =>  0;25

Pull numbers on weekend desk activity for A.

*** PCS

*** Service

**** Stewards' Council meeting
CLOCK: [2017-12-01 Fri 10:15]

The meeting ends at 1, and I head back to my office. Lunch was provided during the meeting (probably pizza or extremely bready sandwiches, but always union-made), so I don’t take a break for that. In my office I’m not ready to immediately settle into a task, so I hit C-u C-c C-x C-i (just the one prefix), which lets me “select the task from a list of recently clocked tasks.” This is where the d mentioned above comes in: a little list of recent tasks pops up, and I can just hit d to clock into the [2017-12-01 Fri] task.

** [2017-12-01 Fri]
CLOCK: [2017-12-01 Fri 09:30]--[2017-12-01 Fri 09:50] =>  0:20
CLOCK: [2017-12-01 Fri 13:15]

*** PPK

**** Libstats stuff
CLOCK: [2017-12-01 Fri 09:50]--[2017-12-01 Fri 10:15] =>  0:25

Pull numbers on weekend desk activity for A.

*** PCS

*** Service

**** Stewards' Council meeting
CLOCK: [2017-12-01 Fri 10:15]--[2017-12-01 Fri 13:15] =>  3:00

Copious meeting notes here.

Now I might get a cup of tea if I didn’t pick one up on the way, or check email or chat with someone about something. My time for the day is accruing, but not against any specific task. Then, let’s say it’s a focused day, and I settle in and work until 4:30 on a project about ebook usage. I clock in to that, then when I’m ready to leave I clock out of it with C-c C-x C-o.

** [2017-12-01 Fri]
CLOCK: [2017-12-01 Fri 09:30]--[2017-12-01 Fri 09:50] =>  0:20
CLOCK: [2017-12-01 Fri 13:15]--[2017-12-01 Fri 13:40] =>  0:25

*** PPK

**** Libstats stuff
CLOCK: [2017-12-01 Fri 09:50]--[2017-12-01 Fri 10:15] =>  0:25

Pull numbers on weekend desk activity for A.

**** Ebook usage
CLOCK: [2017-12-01 Fri 13:40]--[2017-12-01 Fri 16:30] =>  2:50

Wrote code to grok EZProxy logs and look up ISBNs of Scholars Portal ebooks.

*** PCS

*** Service

**** Stewards' Council meeting
CLOCK: [2017-12-01 Fri 10:15]--[2017-12-01 Fri 13:15] =>  3:00

Copious meeting notes here.

In Emacs, this looks much more appealing.

Final structure at end of day. Final structure at end of day.

That’s one day of time clocked. In my next post I’ll add another day and a clocktable, and then I’ll show the code I use to summarize it all into tidy data.


I’m doing all this for my own use, to help me be as effective and efficient and aware of my work habits as I can be. I want to spend as much of my time as I can working on the most important work. Sometimes that’s writing code, sometimes that’s doing union work, sometimes that’s chatting with a colleague about something that’s a minor thing to me but that takes an hour because it’s important to them, and sometimes that’s watching a student cry in my office and waiting for the moment when I can tell them that as stressful as things are right now it’s going to get better. (Women librarians get much, much more of this than men do, but I still get some. It’s a damned tough thing, doing a university degree.) I recommend my colleague Lisa Sloniowski’s award-winning article Affective Labor, Resistance, and the Academic Librarian (Library Trends Vol. 64, No. 4, 2016) for a serious look at all this.

Unifying the Library Discovery Experience / Library Tech Talk (U of Michigan)

New Search Interface

How the University of Michigan Library is unifying the user experience of discovery across multiple kinds of information, from the catalog to licensed content, from subject expertise to library webpages and LibGuides.

Saying goodbye to the Library Success Wiki / Meredith Farkas


In July 2005, on the heels of the successful ALA Annual 2005 Wiki, I developed the Library Success Wiki. Here’s what I said about it then:


“I would like this wiki to be a one-stop-shop for inspiration. All over the country, librarians are developing successful programs and doing innovative things with technology that no one outside of their libraries knows about. There are lots of great blogs out there sharing information about the profession, but there is no one place where all of this information is collected and organized.

I originally got the idea for the wiki when I became frustrated by how large my Bloglines backlog had become as I’d bookmarked lots of posts with amazing ideas that I wanted to save for later (when they were more relevant to what I was working on). A blog is such an amazing medium for sharing information, but what do we do with the information once we’ve read it? Where do we collect it? In or Furl or whatever is the latest social bookmarking tool? In theory, people can find what other people bookmarked in, but in reality, with all the different tags we could use, it’s not quite so easy. And now there are so many social bookmarking tools that I find them more useful for bookmarking stuff for myself than in finding what other people bookmarked. I think a wiki is a fantastic place to collect all of these great ideas related to librarianship. All of those posts and websites you thought were brilliant. All of those successful initiatives you heard about. Wouldn’t it be great to be able to find it all in one place? So when you decide you want to bug your colleagues about switching to IM reference, you can easily find lots of posts and stories about other people who did the same thing.

If you’ve done something at your library that you consider a success, please write about it in the wiki or provide a link to outside coverage. If you have materials that would be helpful to other librarians, add them to the wiki. And if you know of a librarian or a library that is doing something great, feel free to include information about it or links to it. Basically, if you know of anything positive that might be useful to other librarians (including useful websites), this is the place to put it. I hope this wiki will be a venue where people can share ideas with one another and where librarians can learn to replicate the successes of other libraries/librarians.”

Knowledge-sharing has always been a passion of mine and a wiki was a good tool (at the time) for collecting knowledge from a diverse array of librarians across the world. In 2005, Facebook didn’t exist (to the public at least). Twitter didn’t exist. Google Docs didn’t exist. Google Sites didn’t exist. A whole bunch of other collaboration and CMS-type tools didn’t exist. At the time, a wiki was one of the only free ways to collect knowledge from lots of different people, many of whom the person creating the wiki didn’t know. And it received contributions from thousands of librarians and certain pages were THE place to find information on that topic.

But now, other more stable tools exist for this. Mediawiki software is vulnerable to spam and is not the most stable thing out there. I (and my husband when it’s beyond my capabilities) have spent so much time over the past twelve years troubleshooting the software, reverting spam, and blocking spammers. And, all the while, usage of the wiki has declined and many pages have become painfully stale and dated.

With a heavy heart, I’m announcing that, unless someone else wants to run the Library Success Wiki on their own server, the wiki will be going dark on February 2, 2018. This should give people time to move information important to them to other collaboration tools and for a knight in shining armor who wants the hassle of managing the wiki themselves to emerge. It can be hard to let go of services that no longer have the ROI they used to, and I’ve wrestled with the idea of saying goodbye to the wiki for years. It’s time. It’s past time.

Image source

Islandora Technical Advisory Group (TAG) First Meeting / Islandora

Just wanted everyone to know that Islandora's Technical Advisory Group had it's first meeting to discuss PHP 5.3.3 given that we have had problems with Travis around it recently.  Fortunately, Travis has merged in changes to address the issue. Community member Jonathan Green deserves big thanks for filing it!

Just as a heads up, Travis has recently announced that it will no longer be servicing the Ubuntu distribution we use to run tests against PHP 5.3.  This doesn't mean it's no longer available, just that fixes to it are unlikely to be prioritized and there's really no guarantees. So we got lucky this time. In case we're not so lucky next time, there's ways to continue running tests against PHP 5.3.3, they just involved some work. Hopefully PHP 5.3.3 will remain a large enough need that those who use it will take on the work load to keep things going. However, in the event that there's no one willing or able to take on the work, the TAG has come up with a back up plan. In order to maintain limited support and guarantee nothing new will be introduced that breaks compatibility, the codebase's syntax will still be checked for PHP 5.3.3 even if the tests can't be run.

The full notes on the meeting are here. And for the record, all agendas and notes are publicly available at the Islandora Github wiki alongside the Coordinating Committee's. For future meetings, I'll be announcing the agendas a day or so before we meet. And if you have an issue you want tackled by the TAG, you can always bring it up at a Committer's call or just message me directly.

Platform Monopolies and Archives / Archival Connections

I am at the InterPARES Trust North American Team meeting in Vancouver, and the issue of platform monopolies has risen to the top of my mind. Here is a quick list of readings I’ve thrown together while listening to and engaging in the discussion: For now, I don’t have much to say, other than this: As a … Continue reading Platform Monopolies and Archives

Spotlight Series: Rebecca McGuire / LITA

Allow me to introduce Rebecca McGuire, Visiting Instructional Tech Specialist at Mortenson Center for International Library Programs.  A division of the University of Illinois Library, the Mortenson Center, provides leadership and technology guidance to libraries throughout the world.  Rebecca shares information about this unique role, her favorite tech blogs, and predictions about the future of libraries. A full transcript of the interview can be found here.

  1. What is your background?

“After getting a bachelor’s degree in Political Science and International Affairs, I spent a year teaching ESL students in a middle school. I loved teaching, but wanted to do it in a more informal environment, so I decided to get a Master’s in Library and Information Science at the University of Illinois. I also decided to pursue a certificate in Community Informatics, which really opened my eyes to how important access, understanding, and application of technology is to both personal and community development.”

  1. What were some of your early library jobs and how did they prepare you for your current position?

 Rebecca was able to explore and become comfortable with hardware and software, while troubleshooting for the University of Illinois iSchool Tech Help Desk and teaching classes at the Instructional Technology Design Office at the iSchool. “I learned that you don’t necessarily need to be a technology genius or have a Computer Science degree to work with technology in a library setting; you just need to be able to solve problems, find answers, think critically, communicate clearly, and collaborate with people with varying levels of expertise. Also, patience is so important!”

  1. Tell me about your responsibilities as Visiting Instructional Technology Specialist at Mortenson Center for International Library Programs.

 “The Mortenson Center for International Library programs is a small unit within the University of Illinois Library. We’re involved in a variety of projects around the world, and we primarily work with international partners to provide capacity building, professional development programs, and training for librarians from outside of the United States.   My main responsibility is working on a grant-funded project developing an interactive and adaptable Library Leadership Training toolkit for librarians around the world [Strengthening Innovative Library Leaders or SILL]. This foundational 2-day training focuses on Leadership Styles, Communication, Innovation, and Planning. It’s meant to be delivered to public or community library workers at any level. The goal is that this training curriculum is easy to administer, translatable, adaptable to local contexts, and freely available online, even in places with low-bandwidth and limited technology access.”

Rebecca’s Equipment

  1. What does a typical day look like?

 “When I’m working abroad, my days usually consist of trainings, where I help to facilitate the program and also video record the training. When I’m in my office at the University of Illinois, I work on editing videos and photos, creating and editing training materials, building the training toolkit website, and collaborating with training partners. I also coordinate other educational programs and events for the Mortenson Center and design promotional materials.”

  1. Tell me about libraries 10 years from now- what do they look like and what services do they offer?

 “Libraries will always be places where the community can access and learn how to utilize free resources, including print and online materials, computers, and additional technology they need. Now, libraries are becoming places to not only access, but also create content with maker spaces, video and audio studios, new technology, and educational workshops. I also appreciate the trend of libraries serving as community and student collaborative spaces, where all community members are able to work together on projects that are important to them. I also think libraries will continue to leave their physical buildings and grow to meet their community, throughout city busses, parks, community centers, and beyond.”

  1. What was the best advice you received while in school or early in your career?

“Someone gave me the advice to check out current job postings that interested me, then tailor my classes and volunteer experiences to match with the required skills for the jobs I wanted. This really helped me to narrow my focus and ensure that I was learning everything I needed to for a library career that I wanted.”

  1. How do you stay current on new technology?

“I get to help out in the Media Commons of the University of Illinois Undergraduate Library every week, which includes a video studio, audio booth, and multimedia workstations. They always have new emerging technology in the office that they’re testing, so I get to try new technology that can be applied to library settings, like VR. I also love using if I want to explore a program that’s new to me more in depth. In addition, I try to stay current on instructional technology trends by reading blogs and websites such as:

  1. Share technology that you can’t live or couldn’t do your job without.

“WordPress (for our training toolkit website), my Lumix GH4 and Lumix LX100 cameras and various audio recorders to capture trainings, and to create polished promotional materials for the Mortenson Center. I also use the Adobe Creative Suite often, especially PremierePro, Lightroom and Illustrator. Also Facebook, because it’s a great way to communicate and stay in touch with librarians I’ve worked with around the world.”

 I’m excited to announce that the next interview will be with Ken Varnum, new editor of Information Technology and Libraries (ITAL). Ken will be a speaker at the 2017 LITA Forum in Denver, and has kindly agreed to meet with me to discuss his vision for the future of ITAL, his favorite library technologies, and his early career ambitions in U.S./Soviet relations.

Hack-A-Way 2017 Is HERE! / Evergreen ILS

As I type a small knot of Evergreeners have already congregated and begun hashing out topics far and wide at the Atlanta aIrport.  While this certainly isn’t the Hack-A-Way proper it’s appropriate that an informal event has an informal soft start.  This isn’t to say that the Hack-A-Way is completely free form.  Over the years the Hack-A-Way has become an increasingly important part of the community’s development cycle and that has been reflected by it’s attendance and the infrastructure necessary to support it.  While we still try to keep obstacles out of the way of development and discussion one thing we have added governance for is to recognize the need to affirm that the Hack-A-Way must be a welcoming space for attendees.

Towards this end the Evergreen community as a whole, including the Hack-A-Way, have adopted a formal code of conduct:

Evergreen Event Code of Conduct

Inevitably, we can’t think of every possible thing to include in a code so this is more representative than definitive.  Towards this end we ask attendees to abide by it’s spirit as well as letter and remind them to be aware of the diversity in our community and avoid being inflammatory whether it’s religion, politics or even sports if discussion touches on these areas.

Additionally, we will have a photography and video policy in effect that we ask everyone abide by:


Evergreen Event Photography/Audio/Video Policy


We have three emergency responders that will be available if any attendees have issues with other attendees and conduct:

Kathy Lussier of MassLNC

Galen Charlton of Equinox

Rogan Hamby of Equinox

Please don’t hesitate to contact any of us if you have an issue that you feel you need to report or even just need to talk to someone about something that made you uncomfortable but aren’t sure how you want to handle it.

#evgils #hackaway17

Hyku News / DuraSpace News

“Hyku” is the result of a thirty-month project to develop a scalable, performant and multi-tenant digital content repository solution within the Samvera (previously known as Hydra) framework. This work was done by Stanford University, DuraSpace, and The Digital Public Library of America (DPLA) through a generous grant from the Institute of Museum and Library Services (IMLS).