Planet Code4Lib

2017 LITA Forum – Save the Date / LITA

Save the date and you can …

Register now for the 2017 LITA Forum 
Denver, CO
November 9-12, 2017

Registration is Now Open!

Join us in Denver, Colorado, at the Embassy Suites by Hilton Denver Downtown Convention Center, for the 2017 LITA Forum, a three-day education and networking event featuring 2 preconferences, 2 keynote sessions, more than 50 concurrent sessions and 20 poster presentations. It’s the 20th annual gathering of the highly regarded LITA Forum for technology-minded information professionals. Meet with your colleagues involved in new and leading edge technologies in the library and information technology field. Registration is limited in order to preserve the important networking advantages of a smaller conference. Attendees take advantage of the informal Friday evening reception, networking dinners and other social opportunities to get to know colleagues and speakers.

Keynote Speakers:

The Preconference Workshops:

  • IT Security and Privacy in Libraries: Stay Safe From Ransomware, Hackers & Snoops, with Blake Carver, Lyrasis
  • Improving Technology Services with Design Thinking: A Workshop, with Michelle Frisque, Chicago Public Library

Comments from past attendees:

“Best conference I’ve been to in terms of practical, usable ideas that I can implement at my library.”
“I get so inspired by the presentations and conversations with colleagues who are dealing with the same sorts of issues that I am.”
“It was a great experience. The size was perfect, and it was nice to be able to talk shop with people doing similar work in person over food and drinks.”
“After LITA I return to my institution excited to implement solutions I find here.”
“This is always the most informative conference! It inspires me to develop new programs and plan initiatives.”
“I thought it was great. I’m hoping I can come again in the future.”

Get all the details, register and book a hotel room at the 2017 Forum website.

Questions or Comments?

Contact LITA at (312) 280-4268 or Mark Beatty,

See you in Denver.

lita logo with text

Evergreen 3.0 development update #11: meet us in Chicago / Evergreen ILS

Mallard duck from the book Birds and nature (1904). Public domain image

Since the previous update, another 23 patches have been committed to the master branch.

This week also marks two maintenance releases, Evergreen 2.11.6 and 2.12.3, and most of the patches pushed were bug fixes for the web staff client.

I’m currently in Chicago for American Library Association’s Annual Conference, and the Evergreen community is holding an event today! Specifically, on Saturday, 24 June from 4:30 to 5:30 in room W177 of McCormick Place, Ron Gagnon and Elizabeth Thomsen of NOBLE will be moderating a discussion of a recent feature in Evergreen to adjust the sorting of catalog search results based on the popularity of resources. Debbie Luchenbill of MOBIUS
will also discuss Evergreen’s group formats and editions feature. Come see us!

Duck trivia

The color and patterns of duck plumage have long be studied as examples of sexual selection as a factor in evolution.


Updates on the progress to Evergreen 3.0 will be published every Friday until general release of 3.0.0. If you have material to contribute to the updates, please get them to Galen Charlton by Thursday morning.

Libraries across the U.S. are Ready to Code / District Dispatch

This post was originally published on Google’s blog The Keyword.

Picture of Emily Zorea, Youth Services Librarian, Brewer Public Library talking about coding in libraries.

“It always amazes me how interested both parents and kids are in coding, and how excited they become when they learn they can create media on their own–all by using code.” – Emily Zorea, Youth Services Librarian, Brewer Public Library

Emily Zorea is not a computer scientist. She’s a Youth Services Librarian at the Brewer Public Library in Richland Center, Wisconsin, but when she noticed that local students were showing an interest in computer science (CS), she started a coding program at the library. Though she didn’t have a CS background, she understood that coding, collaboration and creativity were critical skills for students to approach complex problems and improve the world around them. Because of Emily’s work, the Brewer Public Library is now Ready to Code. At the American Library Association, we want to give librarians like Emily the opportunity to teach these skills, which is why we are thrilled to partner with Google on the next phase of the Libraries Ready to Code initiative — a $500,000 sponsorship from Google to develop a coding toolkit and make critical skills more accessible for students across 120,000 libraries in the U.S.

Libraries will receive funding, consulting expertise, and operational support from Google to pilot a CS education toolkit that equips any librarian with the ability to implement a CS education program for kids. The resources aren’t meant to transform librarians into expert programmers but will support them with the knowledge and skills to do what they do best: empower youth to learn, create, problem solve, and develop the confidence and future skills to succeed in their future careers.

For libraries, by libraries
Librarians and staff know what works best for their communities, so we will rely on them to help us develop the toolkit. This summer a cohort of libraries will receive coding resources, like CS First, a free video-based coding club that doesn’t require CS knowledge, to help them facilitate CS programs. Then we’ll gather feedback from the cohort so that we can build a toolkit that is useful and informative for other libraries who want to be Ready to Code. The cohort will also establish a community of schools and libraries who value coding, and will use their knowledge and expertise to help that community.

Critical thinking skills for the future
Though every student who studies code won’t become an engineer, critical thinking skills are essential in all career paths. That is why Libraries Ready to Code also emphasizes computational thinking, a basic set of problem-solving skills, in addition to code, that is at the heart of connecting the libraries’ mission of fostering critical thinking with computer science.

Jason Gonzales, technology specialist, Muskogee Public Library, talking to students.

“Ready to Code means having the resources available so that if someone is interested in coding or wants to explore it further they are able to. Knowing where to point youth can allow them to begin enjoying and exploring coding on their own.”- Jason Gonzales, technology specialist, Muskogee Public Library

Many of our library educators, like Jason Gonzales, a technology specialist at the Muskogee Public Library, already have exemplary programs that combine computer science and computational thinking. His community is located about 50 miles outside of Tulsa, Oklahoma, so the need for new programming was crucial, given that most youth are not able to travel to the city to pursue their interests. When students expressed an overwhelming interest in video game design, he knew what the focus of a new summer coding camp would be. Long-term, he hopes students will learn more digital literacy skills so they are comfortable interacting with technology and applying it to other challenges now and in the future.

From left to right: Jessie 'Chuy' Chavez of Google, Inc. with Marijke Visser and Alan Inouye of ALA's OITP at the Google Chicago office.

From left to right: Jessie ‘Chuy’ Chavez of Google, Inc. with Marijke Visser and Alan Inouye of ALA’s OITP at the Google Chicago office.

When the American Library Association and Google announced the Libraries Ready to Code initiative last year, it began as an effort to learn about CS activities, like the ones that Emily and Jason led. We then expanded to work with university faculty at Library and Information Science (LIS) schools to integrate CS content their tech and media courses. Our next challenge is scaling these successes to all our libraries, which is where our partnership with Google, and the development of a toolkit, becomes even more important. Keep an eye out in July for a call for libraries to participate in developing the toolkit. We hope it will empower any library, regardless of geography, expertise, or affluence to provide access to CS education and ultimately, skills that will make students successful in the future.

The post Libraries across the U.S. are Ready to Code appeared first on District Dispatch.

Apply to be the next ITAL Editor / LITA

Applications and nominations are invited for the position of editor of Information Technology And Libraries (ITAL), the flagship publication of the Library Information Technology Association (LITA).

LITA seeks an innovative, experienced editor to lead its top-tier, open access journal with an eye to the future of library technology and scholarly publishing. The editor is appointed for a three-year term, which may be renewed for an additional three years. Duties include:

  • Chairing the ITAL Editorial Board
  • Managing the review and publication process:
    • Soliciting submissions and serving as the primary point of contact for authors
    • Assigning manuscripts for review, managing review process, accepting papers for publication
    • Compiling accepted and invited articles into quarterly issues
  • Liaising with service providers including the journal publishing platform and indexing services
  • Marketing and promoting the journal
  • Participating as a member of and reporting to the LITA Publications Committee

Some funding for editorial assistance plus a $1,500/year stipend are provided.

Please express your interest or nominate another person for the position using this online form:

Applications and nominations that are received by July 21 will receive first consideration. Applicants and nominees will be contacted by the search committee and an appointment will be made by the LITA Board of Directors upon the recommendation of the search committee and the LITA Publications Committee. Applicants must be a member of ALA and LITA at the time of appointment.

Contact with any questions.

Information Technology and Libraries (ISSN 2163-5226) publishes material related to all aspects of information technology in all types of libraries. Topic areas include, but are not limited to, library automation, digital libraries, metadata, identity management, distributed systems and networks, computer security, intellectual property rights, technical standards, geographic information systems, desktop applications, information discovery tools, web-scale library services, cloud computing, digital preservation, data curation, virtualization, search-engine optimization, emerging technologies, social networking, open data, the semantic web, mobile services and applications, usability, universal access to technology, library consortia, vendor relations, and digital humanities.

Seeking a Few Brass Tacks: Measuring the Value of Resource Sharing / HangingTogether

At the two most recent American Library Association conferences, I’ve met with a small ad hoc group of librarians to discuss how we might measure and demonstrate the value that sharing our collections delivers to various stake holders: researchers, library administrators, parent organizations, service/content providers.

First we described our current collection sharing environment and how it is changing (Orlando, June 2016).

Then we walked through various ways in which, using data, we might effectively measure and demonstrate the value of interlending – and how some in our community are already doing it (Atlanta, January 2017).

Our next logical step will be to settle on some concrete actions we can take – each by ourselves, or working among the group, or collaborating with others outside the group – to begin to measure and demonstrate that value in ways that tell a meaningful and compelling story.

As the group prepares to meet for a third time – at ALA Annual in Chicago this weekend – I thought it might be useful to share our sense of what some of these actions might eventually look like, and what group members have been saying about the possibilities during our conversations.

“Value of Resource Sharing” discussion topics: Round III

We demonstrate value best by documenting the value we deliver to our patrons.

o “One could fruitfully explore how what patrons value (speed, convenience, efficiency, ease) determines whether resource sharing is ultimately perceived as valuable.”
o “Rather than focusing on systems and exploring the life cycle of the request, we should look at that of the learner.”
o “We need to support our value not just with numbers, which are important, but with human examples of how we make a difference with researchers.”
o “We are now sharing this [citation study] work with our faculty and learning a lot, such as their choice not to use the best, but most accessible material.”
o “Did they value what we provided, and, if so, why?”
o “We know that resource sharing supports research, course completion, and publishing, but it is usually a one-way street: we provide information on demand but don’t see the final result, the contribution of that material to the final product.”
o “We need to collect and tell the stories of how the material we obtain for our users transforms their studies or allows them to succeed as researchers.”
o “I think we need to explore how we can make the process smoother for both the patrons and library staff. We talk about the cost of resource sharing a lot but we haven’t really talked about how it could be easier or how policies get in the way or how our processes are so costly because they make so much busy work.”

Common methods of measuring and demonstrating value include determining how much it costs a library to provide a service, or how much a library service would cost if the patron had to purchase it from a commercial provider.

o Q: “How much did you spend on textbooks?”  A:”None! ILL!”
o “Why not measure that expense [of providing access to academic databases to students]?”
o “Build an equation to calculate the costs of various forms of access: shelve/retrieve on campus, shelve/retrieve remotely, etc.”
o “Paul Courant did a study of what it cost to keep a book on the shelf on campus as opposed to in offsite storage….Are the numbers in the Courant study still right?”

Collections have long been a way for libraries to demonstrate value – by counting them and publicizing their size. Numbers of volumes to which you have access via consortia is becoming a more useful metric. Collections can have different values for an organization, depending upon context: where they are housed, how quickly they can be provided to users, and who wants access to them.

o “How can access to legacy print in high density storage be monetized? Perhaps a change in mindset is in order – to lower costs for institutions committed to perpetual preservation and access, and raise costs for institutions that do not.”
o “What would be the cost to retain a last copy in a secure climate controlled environment? Would we then be counting on ARLs to do the work of preserving our cultural heritage? We already know there are unique material not held by ARLs, so how do the pieces fit together? How do we incorporate public libraries which also have many unique materials in their collections? How do we equitably share the resources and costs?”
o “We rely on redundancy…65% of…requests are for things…already owned.”

We can demonstrate value by providing new services to patrons that make their experience more like AmaZoogle.

o “How do we create delivery predictability models like everyone in e-commerce already offers? Are we just afraid to predict because we don’t want to be wrong? Or do we really not know enough to offer delivery information to users?”
o “I’m interested in focusing on the learning moments available throughout the resource sharing workflows and integrating stronger information literacy into the users’ experience…’We’ve begun processing your request for a dissertation…Did you know your library provides access to these peer reviewed journal articles that you might find helpful?’ or ‘You can expect this article to hit your inbox within 24 hours – are you ready to evaluate and cite it? You might find these research guides helpful…'”

What ideas do you have for measuring the value of sharing collections?  We’d love to hear from you about this.  Please leave us a comment below.

I’ll report out about takeaways from the group’s third meeting soon after ALA.

Implications/Questions / Ed Summers

… we are concerned with the argument, implicit if not explicit in many discussions about the pitfalls of interdisciplinary investigation, that one primary measure of the strength of social or cultural investigation is the breadth of implications for design that result (Dourish, 2006). While we have both been involved in ethnographic work carried out for this explicit purpose, and continue to do so, we nonetheless feel that this is far from the only, or even the most significant, way for technological and social research practice to be combined. Just as from our perspective technological artifacts are not purely considered as “things you might want to use,” from their investigation we can learn more than simply “what kinds of things people want to use.” Instead, perhaps, we look to some of the questions that have preoccupied us throughout the book: Who do people want to be? What do they think they are doing? How do they think of themselves and others? Why do they do what they do? What does technology do for them? Why, when, and how are those things important? And what roles do and might technologies play in the social and cultural worlds in which they are embedded?

These investigations do not primarily supply ubicomp practitioners with system requirements, design guidelines, or road maps for future development. What they might provide instead are insights into the design process itself; a broader view of what digital technologies might do; an appreciation for the relevance of social, cultural, economic, historical, and political contexts as well as institutions for the fabric of everyday technological reality; a new set of conceptual resources to bring to bear within the design process; and a new set of questions to ask when thinking about technology and practice.

Dourish & Bell (2011), p. 191-192

I’m very grateful to Jess Ogden for pointing me at this book by Dourish and Bell when I was recently bemoaning the fact that I struggled to find any concrete implications for design in Summers & Punzalan (2017).


Dourish, P. (2006). Implications for design. In Proceedings of the sigchi conference on human factors in computing systems (pp. 541–550). ACM. Retrieved from

Dourish, P., & Bell, G. (2011). Divining a digital future: Mess and mythology in ubiquitous computing. MIT PressPress.

Summers, E., & Punzalan, R. (2017). Bots, seeds and people: Web archives as infrastructure. In Proceedings of the 2017 acm conference on computer supported cooperative work and social computing (pp. 821–834). New York, NY, USA: ACM.

Irrationality and Human-Computer Interaction / Dan Cohen

When the New York Times let it be known that their election-night meter—that dial displaying the real-time odds of a Democratic or Republican win—would return for Georgia’s 6th congressional district runoff after its notorious November 2016 debut, you could almost hear a million stiff drinks being poured. Enabled by the live streaming of precinct-by-precinct election data, the dial twitches left and right, pauses, and then spasms into another movement. It’s a jittery addition to our news landscape and the source of countless nightmares, at least for Democrats.

We want to look away, and yet we stare at the meter for hours, hoping, praying. So much so that, perhaps late at night, we might even believe that our intensity and our increasingly firm grip on our iPhones might affect the outcome, ever so slightly.

Which is silly, right?

*          *          *

Thirty years ago I opened a bluish-gray metal door and entered a strange laboratory that no longer exists. Inside was a tattered fabric couch, which faced what can only be described as the biggest pachinko machine you’ve ever seen, as large as a giant flat-screen TV. Behind a transparent Plexiglas front was an array of wooden pegs. At the top were hundreds of black rubber balls, held back by a central gate. At the bottom were vertical slots.

A young guy—like me, a college student—sat on the couch in a sweatshirt and jeans. He was staring intently at the machine. So intently that I just froze, not wanting to get in the way of his staring contest with the giant pinball machine.

He leaned in. Then the balls started releasing from the top at a measured pace and they chaotically bounced around and down the wall, hitting peg after peg until they dropped into one of the columns at the bottom. A few minutes later, those hundreds of rubber balls had formed a perfectly symmetrical bell curve in the columns.

The guy punched the couch and looked dispirited.

I unfroze and asked him the only phrase I could summon: “Uh, what’s going on?”

“I was trying to get the balls to shift to the left.”

“With what?”

With my mind.”

*          *          *

This was my first encounter with the Princeton Engineering Anomalies Research program, or PEAR. PEAR’s stated mission was to pursue an “experimental agenda of studying the interaction of human consciousness with sensitive physical devices, systems, and processes,” but that prosaic academic verbiage cloaked a far cooler description: PEAR was on the hunt for the Force.

This was clearly bananas, and also totally enthralling for a nerdy kid who grew up on Star Wars. I needed to know more. Fortunately that opportunity presented itself through a new course at the university: “Human-Computer Interaction.” I’m not sure I fully understood what it was about before I signed up for it.

The course was team-taught by prominent faculty in computer science, psychology, and engineering. One of the professors was George Miller, a founder of cognitive psychology, who was the first to note that the human mind was only capable of storing seven-digit numbers (plus or minus two digits). And it included engineering professor Robert Jahn, who had founded PEAR and had rather different notions of our mental capacity.

*          *          *

One of the perks of being a student in Human-Computer Interaction was that you were not only welcome to stop by the PEAR lab, but you could also engage in the experiments yourself. You would just sign up for a slot and head to the basement of the engineering quad, where you would eventually find the bluish-gray door.

By the late 1980s, PEAR had naturally started to focus on whether our minds could alter the behavior of a specific, increasingly ubiquitous machine in our lives: the computer. Jahn and PEAR’s co-founder, Brenda Dunne, set up several rooms with computers and shoebox-sized machines with computer chips in them that generated random numbers on old-school red LED screens. Out of the box snaked a cord with a button at the end.

You would book your room, take a seat, turn on the random-number generator, and flip on the PC sitting next to it. Once the PC booted up, you would type in a code—as part of the study, no proper names were used—to log each experiment. Then the shoebox would start showing numbers ranging from 0.00 to 2.00 so quickly that the red LED became a blur. You would click on the button to stop the digits, and then that number was recorded by the computer.

The goal was to try to stop the rapidly rotating numbers on a number over 1.00, to push the average up as far as possible. Over dozens of turns the computer’s monitor showed how far that average diverged from 1.00.

That’s a clinical description of the experiment. In practice, it was a half-hour of tense facial expressions and sweating, a strange feeling of brow-beating a shoebox with an LED, and some cursing when you got several sub-1.00 numbers in a row. It was human-computer interaction at its most emotional.

Jahn and Dunne kept the master log of the codes and the graphs. There were rumors that some of the codes—some of the people those codes represented—had discernable, repeatable effects on the random numbers. Over many experiments, they were able to make the average rise, ever so slightly but enough to be statistically significant.

In other words, there were Jedis in our midst.

Unfortunately, over several experiments—and a sore thumb from clicking on the button with increasing pressure and frustration—I had no luck affecting the random numbers. I stared at the graph without blinking, hoping to shift the trend line upwards with each additional stop. But I ended up right in the middle, as if I had flipped a coin a thousand times and gotten 500 heads and 500 tails. Average.

*          *          *

Jahn and Dunne unsurprisingly faced sustained criticism and even some heckling, on campus and beyond. When PEAR closed in 2007, all the post-mortems dutifully mentioned the editor of a journal who said he could accept a paper from the lab “if you can telepathically communicate it to me.” It’s a good line, and it’s tempting to make even more fun of PEAR these many years later.

The same year that PEAR closed its doors, the iPhone was released, and with it a new way of holding and touching and communicating with a computer. We now stare intently at these devices for hours a day, and much of that interaction is—let’s admit it—not entirely rational.

We see those three gray dots in a speech bubble and deeply yearn for a good response. We open the stocks app and, in the few seconds it takes to update, pray for green rather than red numbers. We go to the New York Times on election eve and see that meter showing live results, and more than anything we want to shift it to the left with our minds.

When asked by what mechanism the mind might be able to affect a computer, Jahn and Dunne hypothesized that perhaps there was something like an invisible Venn diagram, whereby the ghost in the machine and the ghost in ourselves overlapped ever so slightly. A permeability between silicon and carbon. An occult interface through which we could ever so slightly change the processes of the machine itself and what it displays to us seconds later.

A silly hypothesis, perhaps. But we often act like it is all too real. at IIPC / Harvard Library Innovation Lab

At IIPC last week, Jack Cushman (LIL developer) and Ilya Kreymer (former LIL summer fellow) shared their work on security considerations for web archives, including, a sandbox for developers interested in exploring web archive security.

Slides: repo:

David Rosenthal of Stanford also has a great write-up on the presentation:

Megan Ozeran wins 2017 LITA / Ex Libris Student Writing Award / LITA

Picture of Megan OzeranMegan Ozeran has been selected as the winner of the 2017 Student Writing Award sponsored by Ex Libris Group and the Library and Information Technology Association (LITA) for her paper titled “Managing Metadata for Philatelic Materials.” Ozeran is a MLIS candidate at the San Jose State University School of Information.

“Megan Ozeran’s paper was selected as the winner because it takes a scholarly look at an information technology topic that is new and unresolved. Ms. Ozeran’s discussion offers a thorough examination of the current state of cataloging stamps and issues related to their discoverability,” said Rebecca Rose, the Chair of this year’s selection committee.

The LITA/Ex Libris Student Writing Award recognizes outstanding writing on a topic in the area of libraries and information technology by a student or students enrolled in an ALA-accredited library and information studies graduate program. The winning manuscript will be published in Information Technology and Libraries (ITAL), LITA’s open access, peer reviewed journal, and the winner will receive $1,000 and a certificate of merit.

The Award will be presented during the LITA Awards Ceremony & President’s Program at the ALA Annual Conference in Chicago (IL), on Sunday, June 25, 2017.

The members of the 2017 LITA/Ex Libris Student Writing Award Committee are: Rebecca Rose (Chair), Ping Fu, and Mary Vasudeva.

Thank you to Ex Libris for sponsoring this award.

Ex Libris logo

WAC2017: Security Issues for Web Archives / David Rosenthal

Jack Cushman and Ilya Kreymer's Web Archiving Conference talk Thinking like a hacker: Security Considerations for High-Fidelity Web Archives is very important. They discuss 7 different security threats specific to Web archives:
  1. Archiving local server files
  2. Hacking the headless browser
  3. Stealing user secrets during capture
  4. Cross site scripting to steal archive logins
  5. Live web leakage on playback
  6. Show different page contents when archived
  7. Banner spoofing
Below the fold, a brief summary of each to encourage you to do two things:
  1. First, view the slides.
  2. Second, visit, which is a sandbox with
    a local version of Webrecorder that has not been patched to fix known exploits, and a number of challenges for you learn how they might apply to web archives in general.

Archiving local server files

A page being archived might have links that, when interpreted in the context of the crawler, point to local resources that should not end up in public Web archives. Examples include:
  • http://localhost:8080/
  • file:///etc/passwd
It is necessary to implement restrictions in the crawler to prevent it collecting from local addresses or from protocols other than http(s). It is also a good idea to run the crawler in an isolated container or VM to maintain control over the set of resources local to the crawler.

Hacking the headless browser

Nowadays collecting many Web sites requires executing the content in a headless browser such as PhantomJS. They all have vulnerabilities, only some of which are known at any given time. The same is true of the virtualization infrastructure. Isolating the crawler in a VM or a container does add another layer of complexity for the attacker, who now needs exploits not just for the headless browser but also for the virtualization infrastructure. But it requires that both need to be kept up-to-date. This isn't a panacea, just risk reduction.

Stealing user secrets during capture

User-driven Web recorders place user data at risk, because they typically hand URLs to be captured to the recording process as suffixes to a URL for the Web recorder, thus vitiating the normal cross-domain protections. Everything, login pages, third-party ads, etc. is regarded as part of the Web recorder domain.

Mitigating this risk is complex, potentially including rewriting cookies, intercepting Javascript's access to cookies, and manipulating sessions.

Cross site scripting to steal archive logins

Similarly, the URLs used to replay content must be carefully chosen to avoid the risk of cross-site scripting attacks on the archive. When replaying preserved content, the archive must serve all preserved content from a different top-level domain from that used by users to log in to the archive and for the archive to serve the parts of a replay page (e.g. the Wayback machine's timeline) that are not preserved content. The preserved content should be isolated in an iframe. For example:
  • Archive domain:
  • Content domain:

Live web leakage on playback

Especially with Javascript in archived pages, it is hard to make sure that all resources in a replayed page come from the archive, not from the live Web. If live Web Javascript is executed, all sorts of bad things can happen. Malicious Javascript could exfiltrate information from the archive, track users, or modify the content displayed.

Injecting the Content-Security-Policy (CSP) header into replayed content can mitigate these risks by preventing compliant browsers from loading resources except from the specified domain(s), which would be the archive's replay domain(s).

Show different page contents when archived

I wrote previously about the fact that these days the content of almost all web pages depends not just on the browser, but also the user, the time, the state of the advertising network and other things. Thus it is possible for an attacker to create pages that detect when they are being archived, so that the archive's content will be unrepresentative and possibly hostile. Alternately, the page can detect that it is being replayed, and display different content or attack the replayer.

This is another reason why both the crawler and the replayer should be run in isolated containers or VMs. The bigger question of how crawlers can be configured to obtain representative content from personalized, geolocated, advert-supported web-sites is unresolved, but out of scope for Cushman and Kreymer's talk.

Banner spoofing

When replayed, malicious pages can overwrite the archives banner, misleading the reader about the provenance of the page.

Introducing FileStores / LibreCat/Catmandu blog

Catmandu is always our tool of choice when working with structured data. Using the Elasticsearch or MongoDB Catmandu::Store-s it is quite trivial to store and retrieve metadata records. Storing and retrieving a YAML, JSON (and by extension XML, MARC, CSV,…) files can be as easy as the commands below:

$ catmandu import YAML to database < input.yml
$ catmandu import JSON to database < input.json
$ catmandu import MARC to database <
$ catmandu export database to YAML > output.yml

catmandu.yml  configuration file is required with the connection parameters to the database:

$ cat catmandu.yml
    package: ElasticSearch
       client: '1_0::Direct' 
       index_name: catmandu

Given these tools to import and export and even transform structured data, can this be extended to unstructured data? In institutional repositories like LibreCat we would like to manage metadata records and binary content (for example PDF files related to the metadata).  Catmandu 1.06 introduces the Catmandu::FileStore as an extension to the already existing Catmandu::Store to manage binary content.

A Catmandu::FileStore is a Catmandu::Store where each Catmandu::Bag acts as a “container” or a “folder” that can contain zero or more records describing File content. The files records themselves contain pointers to a backend storage implementation capable of serialising and streaming binary files. Out of the box, one Catmandu::FileStore implementation is available Catmandu::Store::File::Simple, or short File::Simple, which stores files in a directory.

Some examples. To add a file to a FileStore, the stream command needs to be executed:

$ catmandu stream /tmp/myfile.pdf to File::Simple --root /data --bag 1234 --id myfile.pdf

In the command above: /tmp/myfile.pdf is the file up be uploaded to the File::Store. File::Simple is the name of the File::Store implementation which requires one mandatory parameter, --root /data which is the root directory where all files are stored.  The--bag 1234 is the “container” or “folder” which contains the uploaded files (with a numeric identifier 1234). And the --id myfile.pdf is the identifier for the new created file record.

To download the file from the File::Store, the stream command needs to be executed in opposite order:

$ catmandu stream File::Simple --root /data --bag 1234 --id myfile.pdf to /tmp/file.pdf


$ catmandu stream File::Simple --root /data --bag 1234 --id myfile.pdf > /tmp/file.pdf

On the file system the files are stored in some deep nested structure to be able to spread out the File::Store over many disks:


A listing of all “containers” can be retreived by requesting an export of the default (index) bag of the File::Store:

$ catmandu export File::Simple --root /data to YAML
_id: 1234

A listing of all files in the container “1234” can be done by adding the bag name to the export command:

$ catmandu export File::Simple --root /data --bag 1234 to YAML
_id: myfile.pdf
_stream: !!perl/code '{ "DUMMY" }'
content_type: application/pdf
created: 1498125394
md5: ''
modified: 1498125394
size: 883202

Each File::Store implementation supports at least the fields presented above:

  • _id: the name of the file
  • _stream: a callback function to retrieve the content of the file (requires an IO::Handle as input)
  • content_type: the MIME-Type of the file
  • created: a timestamp when the file was created
  • modified: a timestamp when the file was last modified
  • size: the byte length of the file
  • md5: optional a MD5 checksum

We envision in Catmandu that many implementations of FileStores can be created to be able to store files in GitHub, BagIts, Fedora Commons and more backends.

Using the Catmandu::Plugin::SideCar  Catmandu::FileStore-s and Catmandu::Store-s can be combined as one endpoint. Using Catmandu::Store::Multi and Catmandu::Store::File::Multi many different implementations of Stores and FileStores can be combined.

This is a short introduction, but I hope you will experiment a bit with the new functionality and provide feedback to our project.

OKI Agile: Scrum and sprints in open data / Open Knowledge Foundation

This is the third in a series of blogs on how we are using the Agile methodology at Open Knowledge International. Originating from software development, the Agile manifesto describes a set of principles that prioritise agility in work processes: for example through continuous development, self-organised teams with frequent interactions and quick responses to change ( In this blogging series we go into the different ways Agile can be used to work better in teams and to create more efficiency in how to deliver projects. The first posts dealt with user stories and methodologies: this time we go into using scrum and sprints to manage delivery of projects.

Throughout my time as a project manager of open data projects in The Public Knowledge Workshop in Israel and in Open Knowledge International, I have used various tools and methods to manage delivery of software and content development. I have used Trello, Asana and even a Google spreadsheet, but at the end of the day I am always going back to Github to run all of the project tasks, assisted by Waffle.

Many people that I spoke to are afraid of using GitHub for project management. To be fair, I am still afraid of Git, but GitHub is a different concept: It is not a code language, it is a repo site, and it has got really good functions and a very friendly user interface to use for it. So do not fear the Octocat!

Why Github?

  • As an open source community facing products, our code is always managed on Github. Adding another platform to deal with non-code tasks just adding more complications and syncing.
  • It is open to the community to contribute and see the progress and does not need permissions management (like Trello).
  • Unlike what people think – it is really easy to learn how to use Github web version, and it’s labels and milestones feature are helpful for delivery.

Why Waffle?

  • It syncs with Github and allows to show the tasks as Kanban.  
  • It allows to write estimates that hours of work for each task.

So far, working on Github for the project showed the following:

  1. Better cooperation between different streams of work
    Having one platform helps the team to understand what each function in the project is doing.  I believe that the coder should understand the content strategy and the community lead should understand the technical constraints while working on a project  It gives back better feedback and ideas for improving the product.
  2. Better documentation
    Having all in one place allows to create better documentation for the future.

So what did we do for GODI (the Global Open Data index) 2016?

  • Firstly, I have gathered all the tasks from the Trello and moved it to the Github.
  • I created tags that allow to differentiate between different types of tasks – content, design, code and community.
  • I added milestones and sorted out all tasks to fit their respective milestones of the project. I also created a “backlog” for all tasks that are not prioritise for the project but need to be done one day in the future. Each milestone got a deadline that responds to the project general deadlines.
  • I made sure that all the team members are part of the repository.
  • I organised Waffle to create columns – we use the default Waffle ones: Backlog, Ready, In Progress and Done.

Using one system and changing the work culture means that I needed to be strict on how the team communicates. It is sometimes unpleasant and needed me to be the “bad cop” but it is a crucial part of the process of enforcing a new way of working.  It means repetitive reminders to document issues on the issue tracker, ignoring issues that are not on GitHub and commenting on the Github when issues are not well documented.

Now, after all is in one system, we can move to the daily management of tasks.


  • Before the sprint call
    • Make sure all issues are clear –  Before each sprint, the scrum master (in this case, also the project manager), make sure that all issues are clear and not vague. The SM will also add tasks that they think are needed to this sprint.
    • Organise issues – In this stage, prior to the sprint call, use the Waffle to move tasks to represent where you as a project manager think they are currently.
  • During the sprint call:
    • Explain to the team the main details about the sprint:  
      • Length of the milestone or how many weeks this milestone will take
      • Length of the sprint
      • Team members – who are they? Are they working part time or not?
      • Objectives for the sprint these derive from the milestone
      • Potential risks and mitigation
      • Go through the issues: yes, you did it before, but going through the issues with the team helps you as PM or SM to understand where the team is, what blocks them and creates a true representation of the tasks for the delivery team.
      • Give time estimates Waffle allows to give rough time estimates between 1-100 hours. Use it to forecast work for the project.
      • Create new tasksspeaking together gets the creative juices going. This will lead to creation of new issues. This is a good thing. Make sure they are labeled correctly.
      • Make sure that everyone understand their tasks: In the last 10 minutes of the sprint, repeat the division of work and who is doing what.
    • After the sprint call and during the sprint:
      • Make sure to have regular stand ups I have 30 minute stand ups, to allow the team to have more time to share issues. However, make sure not to have more than 30 minutes. If an issue demands more time to discuss, this means it needs its own dedicated call to untangle it, so set a call with the relevant team members for that issue.
      • Create issues as they arise – Don’t wait for the stand up or sprint kick-off call to create issues. Encourage the team and the community to create issues as well.
      • Always have a look at the issue trackerMaking sure all issues are there is a key action in agile work. I start everyday with checking the issues to make sure that I don’t miss critical work.
      • Hyper communicate – Since we are a remote team, it is best to repeat a message than not say it at all. I use Slack to make sure that the team knows that a new issue arise or if there is an outside blocker. I will repeat it on the team stand ups to make sure all team members are up-to-date.

    How do you manage you sprints and projects? Leave us a comment below!


    New open energy data portal set to spark innovation in energy efficiency solutions / Open Knowledge Foundation

    Viderum spun off as a company from Open Knowledge International in 2016 with the aim to provide services and products to further expand the reach of open data around the world. Last week they made a great step in this direction by powering the launch of the Energy Data Service portal, which will make Denmark’s energy data available to everyone. This press release has been reposted from Viderum‘s website at

    Image credit: Jürgen Sandesneben, Flickr CC BY

    A revolutionary new online portal, which gives open access to Denmark’s energy data, is set to spark innovation in smart, data-led solutions for energy efficiency. The Energy Data Service, launched on 17 June 2017 by the CEO of Denmark’s state-owned gas and electricity provider Energinet, and the Minister for Energy, Utilities and Climate, will share near real-time aggregated energy consumption data for all Danish municipalities, as well data on CO2emissions, energy production and the electricity market.

    Developers, entrepreneurs and companies will be able to access and use the data to create apps and other smart data services that empower consumers to use energy more efficiently and flexibly, saving them money and cutting their carbon footprint.

    Viderum is the technology partner behind the Energy Data Service. It developed the portal using CKAN, the leading data management platform for open data, originally developed by non-profit organisation Open Knowledge International.

    Sebastian Moleski, CEO of Viderum said: “Viderum is excited to be working with Energinet at the forefront of the open data revolution to make Denmark’s energy data available to everyone via the Energy Data Service portal. The portal makes a huge amount of complex data easily accessible, and we look forward to developing its capabilities further in the future, eventually providing real-time energy and CO2 emissions data.”

    Energinet hopes that the Energy Data Service will be a catalyst for the digitalisation of the energy sector and for green innovation and economic growth, both in Denmark and beyond.

    “As we transition to a low carbon future, we need to empower consumers to be smarter with how they use energy. The Energy Data Service will enable the development of innovative data based solutions to make this possible. For example, an electric car that knows when there is spare capacity on the electricity grid, making it a good time to charge itself.Or an app that helps local authorities understand energy consumption patterns in social housing, so they can make improvements that will save money and cut carbon”, said Peder Ø. Andreasen, CEO of Energinet.

    The current version of the Energy Data Service includes the following features:

    • API (Application Programme Interface) access to all raw data, which makes it easy to use in data applications and services
    • Downloadable data sets in regular formats (CSV and Excel)
    • Helpful user guides
    • Contextual information and descriptions of data sets
    • Online discussion forum for questions and knowledge sharing

    What makes an anti-librarian? / Galen Charlton

    Assuming the order gets made and shipped in time (update 2017-06-22: it did), I’ll be arriving in Chicago for ALA Annual carrying a few tens of badge ribbons like this one:

    Am I hoping that the librarians made of anti-matter will wear these ribbons to identify themselves, thereby avoiding unpleasant explosions and gamma ray bursts? Not really. Besides, there’s an obvious problem with this strategy, were anti-matter librarians a real constituency at conferences.

    No, in a roundabout way, I’m mocking this behavior by Jeffrey Beall:"This is fake news from an anti-librarian. Budget cuts affect library journal licensing much more than price hikes. #OA #FakeNewsJeffrey Beall added,"

    Seriously, dude?

    I suggest reading Rachel Walden’s tweets for more background, but suffice it to say that even if you were to discount Walden’s experience as a medical library director (which I do not), Beall’s response to her is extreme. (And for even more background, John Dupuis has an excellent compilation of links on recent discussions about Open Access and “predatory” journals.)

    But I’d like to unpack Beall’s choice of the expression “anti-librarian”? What exactly makes for an anti-librarian?

    We already have plenty of names for folks who oppose libraries and librarians. Book-burners. Censors. Austeritarians. The closed-minded. The tax-cutters-above-all-else. The drowners of governments in bathtubs. The fearful. We could have a whole taxonomy, in fact, were the catalogers to find a few spare moments.

    “Anti-librarian” as an epithet doesn’t fit most of these folks. Instead, as applied to a librarian, it has some nasty connotations: a traitor. Somebody who wears the mantle of the profession but opposes its very existence. Alternatively: a faker. A purveyor of fake news. One who is unfit to participate in the professional discourse.

    There may be some librarians who deserve to have that title — but it would take a lot more than being mistaken, or even woefully misguided to earn that.

    So let me also protest Beall’s response to Walden explicitly:

    It is not OK.

    It is not cool.

    It is not acceptable.

    Evergreen 2.11.6 and 2.12.3 released / Evergreen ILS

    The Evergreen community is pleased to announce two maintenance releases of Evergreen: 2.11.6 and 2.12.3.

    Evergreen 2.12.3 includes the following bugfixes:

    • Web staff client fixes
      • The receipt on payment checkbox now prints a receipt at time of payment.
      • The Items Out count in the patron screen now includes long overdue items.
      • A fix was added to prevent values from a previously-edited patron from appearing in the edit form of a subsequent patron.
      • User notification preferences now save correctly in the patron registration and edit forms.
      • The UPDATE_MARC permission is no longer requested when performing a search from the staff catalog.
      • Non-cataloged circulations now display in the Items Out screen without requiring a refresh.
      • Required statistical categories are now required to be entered in the copy editor. (A similar bug for the patron editor was fixed in the 2.12.1 release).
      • Voiding bills now requires confirmation.
      • Staff can no longer use the copy editor to put items into or out of the following statuses: checked out, lost, in transit, on holds shelf, long overdue, and canceled transit.
      • The contrast is improved for alert text showing the amount a patron owes in bills.
      • Circ modifiers now sort alphabetically in the copy editor.
    • Other bugfixes
      • Code to prevent a hold already on the Holds Shelf from being transferred to another title.
      • A fix to a bug that prevented users from scheduling reports with a relative month if the report template used a date that applied the Year
        Month transform with the On or After (>=) operator.
      • A fix to a bug where the max fines threshold was reached prematurely due to the presence of account adjustments.
      • A check that prevents a SMS message from attempting to sending when the SMS carrier is null.
      • For systems that provide search format as a filter on the advanced search page, a fix so that the format selected in the search bar when launching a new search from the results page overrides any previously-set formats.
      • The addition of an optional new Apache/mod_perl configuration variable for defining the port Apache listens on for HTTP traffic. This resolves an issue where added content lookups attempting HTTP requests on the local Apache instance on port 80 failed because Apache was using non-standard ports.
      • A fix to the public catalog’s My List page responsive design so that it now displays properly on mobile devices and allows users to place holds from My List.
      • A fix to a bug where the second (and subsequent) pages of search results in the public catalog (when group formats and editions is in effect) does not correctly generate links to hits that are not part of of a multi-bib metarecords.

    Evergreen 2.11.6 includes the following fixes:

    • Code to prevent a hold already on the Holds Shelf from being transferred to another title.
    • A fix to a bug that prevented users from scheduling reports with a relative month if the report template used a date that applied the Year
      Month transform with the On or After (>=) operator.
    • A fix to a bug where the max fines threshold was reached prematurely due to the presence of account adjustments.
    • A check that prevents a SMS message from sending if the SMS carrier is null.

    Please visit the downloads page to view the release notes and retrieve the server software and staff clients.

    THE Research Networking Event–Register for the 2017 VIVO Conference by June 30 and SAVE $100 / DuraSpace News

    From the organizers of the 2017 VIVO Conference

    The 2017 VIVO Conference is all about research networking! If this topic and creating an integrated record of the scholarly work of your organization is of interest then the 2017 VIVO Conference is the place to be Aug 2-4 in New York City. Institutions with production VIVOs as well as those who are considering implementing VIVO will be in attendance, present their work, and/or offer workshops.

    DuraSpace Launches New Web Site / DuraSpace News

    DuraSpace has a lot to celebrate in 2017. Our community-supported open source technologies continue to contribute to advancing the access and preservation goals of our member organizations and beyond. The DuraSpace hosted services team is onboarding new customers, while at the same time contributing to efforts to offer new technologies that provide full, hosted access to, control of, and protection for your content.

    House expected to approve CTE reauthorization / District Dispatch

    Perkins CTE Program helps library patrons thrive in 21st Century Economy

    Libraries play numerous roles in communities across the country, working generally to meet the needs of their patrons at every life stage. Whether providing high-speed broadband access to rural and urban communities alike, running youth reading sessions and book clubs, teaching computer literacy to patrons seeking to learn new skills or aiding small businesses, libraries serve as learning centers helping patrons along their career paths.

    Libraries also play a valuable and specific role in supporting and working to improve secondary and postsecondary Closeup of child's hands on a laptop keyboard, display shows some sort of coding programCareer and Technical Education (CTE) programs funded by the Carl D. Perkins Career and Technical Education Act (“Perkins Act”), the federal bill which governs the more than $1 billion in federal funding for career and technical education activities across the country. Such programs help equip youth and adults with the academic, technical and employability skills and knowledge needed to secure employment in today’s high-growth industries. In so doing, libraries help close the “skills gap” and expand economic opportunity to more communities across the nation. Some libraries work directly with their state labor and employment offices to implement CTE programs which receive Federal funding.

    Libraries and certified librarians also provide valuable CTE resources, equipment, technology, instructional aids, and publications designed to strengthen and foster academic and technical skills achievement. In many communities, libraries play a significant role in career and technical development. Often the library is the only place where patrons can access the high-speed broadband vital to those working to apply for jobs, research careers, and towards enhanced certification and training.

    As early as this week, the House of Representatives is expected to pass legislation reauthorizing the Perkins Act, which was originally adopted in 1984. ALA recently submitted a letter to the House Committee on Education and Workforce supporting this bi-partisan legislation: the Career and Technical Education for the 21st Century Act (H.R. 2353), which was approved by the Committee on June 6.

    The House timed the vote on the reauthorization to occur during the National Week of Making spearheaded by the Congressional Maker Caucus. The week highlights the growing maker movement across the country.

    We’ve been here before, however, as the House passed similar legislation in 2016 only to see reauthorization of the Perkins program stall in the Senate, where a companion bill has yet to be introduced. Unfortunately, the President’s budget seeks to cut $168.1 million from the Perkins CTE State Grant program, which had previously received $1.118 billion in funding for FY15, FY16 and FY17. ALA will continue work to support robust funding for CTE programs and, if the House acts favorably, to urge the Senate to follow its lead and promptly reauthorize the Perkins Act.

    The post House expected to approve CTE reauthorization appeared first on District Dispatch.

    Jobs in Information Technology: June 21, 2017 / LITA

    New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

    New This Week

    Purchase College – State University of New York, Client Systems Administrator, Purchase, NY

    EBSCO Information Services, Product Owner, FOLIO, Ipswich, MA

    Pacific States University, University Librarian, Los Angeles, CA

    Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

    Mashcat at ALA Annual 2017 + shared notes / Galen Charlton

    I’m leaving for Chicago tomorrow to attend ALA Annual 2017 (and to eat some real pizza), and while going over the schedule I found some programs that may be of interest to Mashcat folk:

    As a little experiment, I’ve started a Google Doc for shared notes about events and other goings-on at the conference. There will of course be a lot of coverage on social media about the conference, but the shared notes doc might be a way for Mashcatters to identify common themes.

    Ellyssa Kroski Receives 2017 LITA/Library Hi Tech Award / LITA

    Picture of Ellyssa KroskiEllyssa Kroski has been named the winner of the 2017 LITA/Library Hi Tech Award for Outstanding Communication in Library and Information Technology.

    The Library and Information Technology Association (LITA) and Emerald Group Publishing sponsor the Award, which recognizes outstanding individuals or institutions for their long-term contributions in the area of Library and Information Science technology and its application. The winner receives a citation and a $1,000 stipend.

    The Award Committee selected Kroski because of her passion for teaching and technology, as well as her impact in public libraries, academic libraries, and with library school students. As a technology expert, an editor, contributor, and compiler of books, she has helped make technology and makerspaces accessible for many different types of institutions.

    Kroski is the Director of Information Technology at the New York Law Institute as well as an award-winning editor and author of 36 books including Law Librarianship in the Digital Age for which she won the AALL’s 2014 Joseph L. Andrews Legal Literature Award. Her ten-book technology series, The Tech Set, won the ALA’s Best Book in Library Literature Award in 2011. She is a librarian, an adjunct faculty member at Drexel and San Jose State University, and an international conference speaker.

    When notified she had won the Award, Kroski said, “I am incredibly honored and very pleased to receive this award from LITA. I have been fortunate enough to collaborate with and be inspired by many amazingly talented librarians throughout my professional development activities and to all of those colleagues I extend my thanks and share this honor as well.”

    Members of the 2017 LITA/Library Hi-Tech Award Committee are: Vanessa L. Ames (Chair) and Robert Wilson.

    Thank you to Emerald Publishing and Library Hi Tech for sponsoring this award.

    Emerald Publishing logo

    ALA celebrates World Wi-Fi Day / District Dispatch

    Among all their other functions in our communities, libraries are critical spaces for people to access the internet, and they are increasingly doing so wirelessly via Wi-Fi.

    Virtually all public libraries in the U.S. provide Wi-Fi to patrons. By doing so, libraries serve as community technology hubs that enable digital opportunity and Michael Petricone stands with Rep Darrell Issa and Comissioner Michael O'Riellyfull participation in the nation’s economy. Wi-Fi is a critical part of how libraries are transforming our programs and services in the digital age.

    June 20th was World Wi-Fi Day, a global initiative helping to bridge the digital divide as well as recognizing and celebrating the role of Wi-Fi in cities and communities around the world. In Washington, D.C., the WifiForward Coalition—of which ALA is a founding member—held a kick off celebration at the Consumer Technology Association’s Innovation House off of Capitol Hill. Congressman Darrell Issa (R-CA) and Federal Communications Commissioner Michael O’Rielly were on hand to expound on the wonders of Wi-Fi and to voice their support for policies that would help its growth and success.

    ALA added the following statement to materials for World Wi-Fi Day:

    “With Wi-Fi, our nation’s 120,000 libraries are able to dramatically increase our capacity to connect people of all incomes and backgrounds to the Internet beyond our public desktop computers. Wi-Fi allows us to serve more people anywhere in the library, as well as enabling mobile technology training labs, roving reference, access to diverse digital collections and pop-up library programs and services. Library wi-fi is essential to support The E’s of Libraries®—Education, Employment, Entrepreneurship, Empowerment and Engagement—on campuses and in communities nationwide. The American Library Association is proud to be a supporter of World Wi-Fi Day.”

    The post ALA celebrates World Wi-Fi Day appeared first on District Dispatch.

    Opening Keynote – Dr. Kimberly Christen / Access Conference

    We are excited to announce that Dr. Kimberly Christen will be the opening keynote speaker for Access 2017. Join us for her aptly titled talk “The Trouble with Access

    Dr. Kimberly Christen is the Director of the Digital Technology and Culture Program and the co-Director of the Center for Digital Scholarship and Curation at Washington State University.

    She is the founder of Mukurtu CMS an open source community archive platform designed to meet the needs of Indigenous communities, the co-Director of the Sustainable Heritage Network, a global community providing educational resources for stewarding digital heritage and co-Director for the Local Contexts initiative, an educational platform to support the management of intellectual property specifically using Traditional Knowledge Labels.

    More of her work can be found at her website: and you can follow her on Twitter @mukurtu.

    Always Already Computational Reflections / Open Knowledge Foundation

    Always Already Computational is a project bringing together a variety of different perspectives to develop “a strategic approach to developing, describing, providing access to, and encouraging reuse of collections that support computationally-driven research and teaching” in subject areas relating to library and museum collections.  This post is adapted from my Position Statement for the initial workshop.  You can find out more about the project at

    Earlier this year, I spent two and a half days in beautiful University of California Santa Barbara at a workshop speaking with librarians, developers, and museum and library collection managers about data.  Attendees at this workshop represented a variety of respected cultural institutions including the New York Public Library, the British Library, the Internet Archive, and others.

    Our task was to build a collective sense of what it means to treat library and museum “collections”—the (increasingly) digitized catalogs of their holdings—as data for analysis, art, research, and other forms of re-use.  We gathered use cases and user stories in order to start the conversation on how to best publish collections for these purposes.  Look for further outputs on the project website: .  For the moment, here are my thoughts on the experience and how it relates to work at Open Knowledge International, specifically, Frictionless Data.

    Always Already Computational

    Open Access to (meta)Data

    The event organizers—Thomas Padilla (University of California Santa Barbara), Laurie Allen (University of Pennsylvania), Stewart Varner (University of Pennsylvania), Sarah Potvin (Texas A&M University), Elizabeth Russey Roke (Emory University), Hannah Frost (Stanford University)—took an expansive view of who should attend.  I was honored and excited to join, but decidedly new to Digital Humanities (DH) and related fields.  The event served as an excellent introduction, and I now understand DH to be a set of approaches toward interrogating recorded history and culture with the power of our current tools for data analysis, visualization, and machine learning.  As part of the Frictionless Data project at Open Knowledge International, we are building apps, libraries, and specifications that support the basic transport and description of datasets to aid in this kind of data-driven discovery.  We are trialling this approach across a variety of fields, and are interested to determine the extent to which it can improve research using library and museum collection data.

    What is library and museum collection data?  Libraries and museums hold physical objects which are often (although not always) shared for public view on the stacks or during exhibits.  Access to information (metadata) about these objects—and the sort of cultural and historical research dependent on such access—has naturally been somewhat technologically, geographically, and temporally restricted.  Digitizing the detailed catalogues of the objects libraries and museums hold surely lowered the overhead of day-to-day administration of these objects, but also provided a secondary public benefit: sharing this same metadata on the web with a permissive license allows a greater variety of users in the public—researchers, students of history, and others—to freely interrogate our cultural heritage in a manner they choose.  

    There are many different ways to share data on the web, of course, but they are not all equal.  A low impact, open, standards-based set of approaches to sharing collections data that incorporates a diversity of potential use cases is necessary.  To answer this need, many museums are currently publishing their collection data online, with permissive licensing, through GitHub: The Tate Galleries in the UK, Cooper Hewitt, Smithsonian Design Museum and The Metropolitan Museum of Art in New York have all released their collection data in CSV (and JSON) format on this popular platform normally used for sharing code.  See A Nerd’s Guide To The 2,229 Paintings At MoMA and An Excavation Of One Of The World’s Greatest Art Collections both published by FiveThirtyEight for examples of the kind of exploratory research enabled by sharing museum collection data in bulk, in a straightforward, user-friendly way.  What exactly did they do, and what else may be needed?

    Packages of Museum Data

    Our current funding from the Sloan Foundation enables us to focus on this researcher use case for consuming data.  Across fields, the research process is often messy, and researchers, even if they are asking the right questions, possess a varying level of skill in working with datasets to answer them.  As I wrote in my position statement:

    Such data, released on the Internet under open licenses, can provide an opportunity for researchers to create a new lens onto our cultural and artistic history by sparking imaginative re-use and analysis.  For organizations like museums and libraries that serve the public interest, it is important that data are provided in ways that enable the maximum number of users to easily process it.  Unfortunately, there are not always clear standards for publishing such data, and the diversity of publishing options can cause unnecessary overhead when researchers are not trained in data access/cleaning techniques.

    My experience at this event, and some research beforehand, suggested that there is a spectrum of data release approaches ranging from a basic data “dump” as conducted by the museums referenced above to more advanced, though higher investment, approaches such as publishing data as an online service with a public “API” (Application Programming Interface).  A public API can provide a consistent interface to collection metadata, as well as an ability to request only the needed records, but comes at the cost of having the nature of the analysis somewhat preordained by its design.  In contrast, in the data dump approach, an entire dataset, or a coherent chunk of it, can be easier for some users to access and load directly into a tool like R (see this UK Government Digital Service post on the challenges of each approach) without needing advanced programming.  As a format for this bulk download, CSV is the best choice as the MoMa reflected when releasing their collection data online:

    CSV is not just the easiest way to start but probably the most accessible format for a broad audience of researchers, artists, and designers.  

    This, of course, comes at the cost of not having a less consistent interface for the data, especially in the case of the notoriously underspecified CSV format.  The README file will typically go into some narrative detail about how to best use the dataset, some expected “gotchas” (e.g. “this UTF-8 file may not work well with Excel on a Mac”).  It might also list the columns in a tabular data file stored in the dataset, expected types and formats for values in each column (e.g. the date_acquired column should, hopefully, contain dates in a one or another international format).  This information is critical for actually using the data, and the automated export process that generates the public collection dataset from the museum’s internal database may try to ensure that the data matches expectations, but bugs exist, and small errors may go unnoticed in the process.

    The Data Package descriptor (described in detail on our specifications site), used in conjunction with Data Package-aware tooling, is meant to somewhat restore the consistent interface provided by an API by embedding this “schema” information with the data.  This allows the user or the publisher to check that the data conforms to expectations without requiring modification of the data itself: a “packaged” CSV can still be loaded into Excel as-is (though without the benefit of type checking enabled by the Data Package descriptor).  The Carnegie Museum of Art, in its release of its collection data, follows the examples set by the Tate, the Met, the Moma, and Cooper-Hewitt as described above, but opted to also include a Data Package descriptor file to help facilitate online validation of the dataset through tools such as Good Tables.  As tools come online for editing, validating, and transforming Data Packages, users of this dataset should be able to benefit from those, too:

    We are a partner in the Always Already Computational: Collections as Data project, and as part of this work, we are working with Carnegie Museum of Art to provide a more detailed look at the process that went into the creation of the CMOA dataset, as well as sketching a potential ways in which the Data Package might help enable re-use of this data.  In the meantime, check out our other case studies on the use of Data Package in fields as diverse as ecology, cell migration, and energy data:

    Also, pay your local museum or library a visit.

    Software contributions reduce our debt / Library Tech Talk (U of Michigan)

    Contributing to software projects can be harder and more time consuming than coding customized solutions. But over the long term, writing generalized solutions that can be used and contributed to by developers from around the world reduces our dependence on ourselves and our organizational resources, thus drastically reducing our technical debt.

    WATCH New Hyku, Hyrax, and the Hydra-in-a-Box Project Demo / DuraSpace News

    From Michael J. Giarlo, Technical Manager, Hydra-in-a-Box Project, Software Architect, on behalf of the Hyku tech team

    Stanford, CA  Here's the latest demo of advances made on Hyku, Hyrax, and the Hydra-in-a-Box project.

    Visit / DuraSpace News

    DuraSpace is pleased to announce that the new HykuDirect web site is up and running, and ready to field inquiries about the exciting new hosted service currently in development:

    • The site features Hyku background information, a complete list of key features, a timeline that lays out the steps towards availability of a full-production service, and a contact form.

    Timothy Cole Wins 2017 LITA/OCLC Kilgour Research Award / LITA

    Picture of Timothy ColeTimothy Cole, Head of the Mathematics Library and Professor of Library and Information Science at the University of Illinois Urbana-Champaign, has been selected as the recipient of the 2017 Frederick G. Kilgour Award for Research in Library and Information Technology, sponsored by OCLC and the Library and Information Technology Association (LITA). Professor Cole also holds appointments in the Center for Informatics Research in Science and Scholarship (CIRSS) and the University Library.

    The Kilgour Award is given for research relevant to the development of information technologies, especially work which shows promise of having a positive and substantive impact on any aspect(s) of the publication, storage, retrieval and dissemination of information, or the processes by which information and data is manipulated and managed. The winner receives $2,000, a citation, and travel expenses to attend the LITA Awards Ceremony & President’s Program at the 2017 ALA Annual Conference in Chicago (IL).

    Over the past 20 years, Professor Cole’s research in digital libraries, metadata design and sharing, and interoperable linked data frameworks have significantly enhanced discovery and access of scholarly content which embodies the spirit of this prestigious Award. His extensive publication record includes research papers, books, and conference publications and has earned more than $11 million in research grants during his career.

    The Award Committee also noted Professor Cole’s significant contributions to major professional organizations including the World Wide Web Consortium (W3C), Digital Library Federation, and Open Archives Initiative, all of which help set the standards in metadata and linked data practices that influence everyday processes in libraries. We believe his continuing work on Linked Open Data will further improve how information is discovered and accessed. With all of Professor Cole’s research and service contributions, the Committee unanimously found him to be the ideal candidate to receive the 2017 Frederick G. Kilgour Award.

    When notified he had been selected, Professor Cole said, “I am honored and very pleased to accept this Award. Fred Kilgour’s recognition more than 50 years ago of the ways that computers and computer networks could improve both library services and workflow efficiencies was remarkably prescient, and his longevity and consistent success in this dynamic field was truly amazing. Many talented librarians have built on his legacy, and over the course of my career, I have found the opportunity to meet, learn from, and work with many of these individuals, including several prior Kilgour awardees, truly rewarding. I have been especially fortunate in my opportunities and colleagues at Illinois — notably (to name but three) Bill Mischo, Myung-Ja Han, and Muriel Foulonneau — as well as in my collaborations with other colleagues across the globe. It is these collaborations that account in large measure for the modest successes I have enjoyed. I am humbled by and most appreciative of the Award Committee for giving me this opportunity to join the ranks of Kilgour awardees.”

    Members of the 2017 Kilgour Award Committee are: Tabatha Farney (Chair), Ellen Bahr, Matthew Carruthers, Zebulin Evelhoch, Bohyun Kim, Colby Riggs, and Roy Tennant (OCLC Liaison).

    Thank you to OCLC for sponsoring this award.
    OCLC logo

    Hack-to-Learn at the Library of Congress / Library of Congress: The Signal

    When hosting workshops, such as Software Carpentry, or events, such as Collections As Data, our National Digital Initiatives team made a discovery—there is an appetite among librarians for hands-on computational experience. That’s why we created an inclusive hackathon, or a “hack-to-learn,” taking advantage of the skills librarians already have and paring them with programmers to mine digital collections.

    Hack-to-Learn took place on May 16-17 in partnership with George Mason and George Washington University Libraries. Over the two days, 61 attendees used low or no-cost computational tools to explore four library collection as data sets. You can see the full schedule here.

    Day two of the workshop took place at George Washington University Libraries. Here, George Oberle III, History Librarian at George Mason University, gives a Carto tutorial. Photo by Justin Littman, event organizer.

    The Data Sets

    The meat of this event was our ability to provide library collections as data to explore, and with concerted effort we were able to make a diverse set available and accessible.

    In the spring, the Library of Congress released 25 million of its MARC records for free bulk download. Some have already been working with the data – Ben Schmidt was able to join us on day one to present his visual hacking history of MARC cataloging and Matt Miller made a list of 9 million unique titles. We thought these cataloging records would also be a great collection for hack-to-learn attendees because the format is well-structured and familiar for librarians.

    The Eleanor Roosevelt Papers Project at George Washington University shared its “My Day” collection – Roosevelt’s daily syndicated newspaper column and the closest thing we have to her diary. George Washington University Libraries contributed their Tumblr End of Term Archive- text and metadata from  72 federal Tumblr blogs harvested as part of the End of Term Archive project.

    Topic modelling in MALLET with the Eleanor Roosevelt “My Day” collection. MALLET generates a list of topics from a corpus and keywords composing those topics. An attendee suggested it would be a useful method for generating research topics for students (and we agree!).

    As excitement for hack-to-learn grew, the Smithsonian joined the fun by providing their Phyllis Diller Gag file. Donated to the Smithsonian American History Museum, the gag file is a physical card catalog containing 52,000 typewritten joke cards the comedian organized by subject. The Smithsonian Transcription Center put these joke cards online, and they were transcribed by the public in just a few weeks. Our event was the first time these transcriptions were used.

    Gephi network analysis visualization of the Phyllis Diller Gag file. The circles (or nodes) represent joke authors and their relationship to each other based on their joke subjects.

    To encourage immediate access to the data and tools, we spent a significant amount of time readying these four data sets so ready-to-load versions were available. For the MARC records to be amenable for the mapping tool Carto, for example, Wendy Mann, Head of George Mason University Data Services, had to reduce the size of the set, then convert the 1,000 row files to csv using MarcEdit, map the MARC fields as column headings, create load files for MARC fields in each file, and then mass edit column names in OpenRefine so that each field name began with a character as opposed to a number (a Carto requirement).

    We also wanted to be transparent about this work so attendees could re-create these workflows after hack-to-learn. We bundled the data sets in their multiple versions of readiness, README files, a list of resources, a list of brainstorming ideas of what possible questions to ask of the data, and install directions for the different tools all in a folder that was available for attendees a week before the event. We invited attendees to join a Slack channel to ask questions or report errors before and during the event, and opened day one with a series of lightning talks about the data sets from content and technical experts.

    What Was Learned

    Participants were largely librarians, faculty or students from our three partner organizations. 12 seats were opened to the public and quickly filled by librarians, faculty or students from universities or cultural heritage institutions. Based on our registration survey, the majority of participants trended towards little or no experience. Almost half reported experience with OpenRefine, while 44.8% reported having never used any of the tools before. 49.3% wanted to learn about “all” methodologies (data cleaning, text mining, network analysis, etc.), and 46.3% reported interest in specifically text mining.

    31.3% of hack-to-learn registrants were curious about computational research and wanted and introduction, and 28.4% were familiar with some tools but not all. 14.9% thought it sounded fun!

    Twenty-one attendees responded to our post-event survey. Participants confirmed that collections as data work felt less “intimidating” and the tools more “approachable.” Respondents reported a recognition of untapped potential in their data sets and requested more events of this kind.

    “I was able to get results using all the tools, so in a sense everything worked well. Pretty sure my ‘success’ was related to the scale of task I set for myself; I viewed the work time as time for exploring the tools, rather than finishing something.”

    Many appreciated the event’s diversity- the diversity of data sets and tools, the mixture of subject matter and technical experts, and the mix between instructional and problem-solving time.

    “The tools and datasets were all well-selected and gave a good overview of how they can be used. It was the right mix of easy to difficult. Easy enough to give us confidence and challenging enough to push our skills.”

    The Phyllis Diller team works with OpenRefine at Hack-to-Learn, May 17, 2017. Photo by Shawn Miller.

    When asked what could be improved, many felt that identifying what task to do or question to ask of the data set was difficult, and attendees often underestimated the data preparation step. We received suggestions such as adding guided exercises with the tools before independent work and more time for digging deeper into a particular methodology or research question.

    “It was at first overwhelming but ultimately hugely beneficial to have multiple tools and multiple data sets to choose from. All this complexity allowed me to think more broadly about how I might use the tools, and having data sets with different characteristics allowed for more experimentation.”

    Most importantly, attendees identified what still needed to be learned. Insights from the event related to the limitations of the tools. For example, attendees recognized GUI interfaces were accessible and useful for surface-level investigation of a data set, but command-line knowledge was needed for deeper investigation or in some cases, working with a larger data set. Several participants in the post-event survey showed interest in learning Python as a result.

    Recognizing what they didn’t know was not discouraging. In fact, one point we heard from multiple attendees was the desire for more hack-to-learn events.

    “If someone were to host occasional half-day or drop-in hack-a-thons with these or other data sets, I would like to try again. I especially appreciate that you were welcoming of people like me without a lot of programming experience … Your explicit invitation to people with *all* levels of experience was the difference between me actually doing this and not doing it.”

    We’d like to send a big thank you again to our partners at George Washington and George Mason University Libraries, and to the Smithsonian American History Museum and Smithsonian Transcription Center for you time and resources to make Hack-to-Learn a success! We encourage anyone reading this to consider doing one at your library, and if you do, let us know so we can share it on The Signal!



    Learn about Contextual Inquiry, after ALA Annual / LITA

    Sign up today for 

    Contextual Inquiry: Using Ethnographic Research to Impact your Library UX

    This new LITA web course begins, July 6, 2017, shortly after ALA Annual. Use the excitement generated by the conference to further explore new avenues to increase your user engagement. The contextual inquiry research methodology helps to better understand the intents and motivations behind user behavior. The approach involves in-depth, participant-led sessions where users take on the role of educator, teaching the researcher by walking them through tasks in the physical environment in which they typically perform them.

    Instructors: Rachel Vacek, Head of Design & Discovery, University of Michigan Library; and Deirdre Costello, Director, UX Research, EBSCO Information Services
    July 6 – August 10, 2017
    Register here, courses are listed by date and you need to log in

    In this session, learn what’s needed to conduct a Contextual Inquiry and how to analyze the ethnographic data once collected. We’ll talk about getting stakeholders on board, the IRB, Institutional Review Board, process and scalability for different sized library teams. We’ll cover how to synthesize and visualize your findings as sequence models and affinity diagrams that directly inform the development of personas and common task flows. Finally, learn how this process can help guide your design and content strategy efforts while constructing a rich picture of the user experience.

    View details and Register here.

    This is a blended format web course

    The course will be delivered as separate live webinar lectures, one per week. You do not have to attend the live lectures in order to participate. The webinars will be recorded for later viewing.

    Check the LITA Online Learning web page for additional upcoming LITA continuing education offerings.

    Questions or Comments?

    For all other questions or comments related to the course, contact LITA at (312) 280-4268 or Mark Beatty,

    Pray for Peace / Karen Coyle

    This is a piece I wrote on March 22, 2003, two days after the beginning of the second Gulf war. I just found it in an old folder, and sadly have to say that things have gotten worse than I feared. I also note an unfortunate use of terms like "peasant" and "primitive" but I leave those as a recognition of my state of mind/information. Pray for peace.

    Saturday, March 22, 2003

    Gulf War II

    The propaganda machine is in high gear, at war against the truth. The bombardments are constant and calculated. This has been planned carefully over time.

    The propaganda box sits in every home showing footage that it claims is of a distant war. We citizens, of course, have no way to independently verify that, but then most citizens are quite happy to accept it at face value.

    We see peaceful streets by day in a lovely, prosperous and modern city. The night shots show explosions happening at a safe distance. What is the magical spot from which all of this is being observed?

    Later we see pictures of damaged buildings, but they are all empty, as are the streets. There are no people involved, and no blood. It is the USA vs. architecture, as if the city of Bagdad itself is our enemy.

    The numbers of casualties, all of them ours, all of them military, are so small that each one has an individual name. We see photos of them in dress uniform. The families state that they are proud. For each one of these there is the story from home: the heavily made-up wife who just gave birth to twins and is trying to smile for the camera, the child who has graduated from school, the community that has rallied to help re-paint a home or repair a fence.

    More people are dying on the highways across the USA each day than in this war, according to our news. Of course, even more are dying around the world of AIDS or lung cancer, and we aren't seeing their pictures or helping their families. At least not according to the television news.

    The programming is designed like a curriculum with problems and solutions. As we begin bombing the networks show a segment in which experts explain the difference between the previous Gulf War's bombs and those used today. Although we were assured during the previous war that our bombs were all accurately hitting their targets,  word got out afterward that in fact the accuracy had been dismally low. Today's experts explain that the bombs being used today are far superior to those used previously, and that when we are told this time that they are hitting their targets it is true, because today's bombs really are accurate.

    As we enter and capture the first impoverished, primitive village, a famous reporter is shown interviewing Iraqi women living in the USA who enthusiastically assure us that the Iraqi people will welcome the American liberators with open arms. The newspapers report Iraqis running into the streets shouting "Peace to all." No one suggests that the phrase might be a plea for mercy by an unarmed peasant facing a soldier wearing enough weaponry to raze the entire village in an eye blink.

    Reporters riding with US troops are able to phone home over satellite connections and show us grainy pictures of heavily laden convoys in the Iraqi desert. Like the proverbial beasts of burden, the trucks are barely visible under their packages of goods, food and shelter. What they are bringing to the trade table is different from the silks and spices that once traveled these roads, but they are carrying luxury goods beyond the ken of many of Iraq's people: high tech sensor devices, protective clothing against all kinds of dangers, vital medical supplies and, perhaps even more important, enough food and water to feed an army. In a country that feeds itself only because of international aid -- aid that has been withdrawn as the US troops arrive -- the trucks are like self-contained units of American wealth motoring past.

    I feel sullied watching any of this, or reading newspapers. It's an insult to be treated like a mindless human unit being prepared for the post-war political fall-out. I can't even think about the fact that many people in this country are believing every word of it. I can't let myself think that the propaganda war machine will win.

    Pray for peace.

    Analysis of Sci-Hub Downloads / David Rosenthal

    Bastian Greshake has a post at the LSE's Impact of Social Sciences blog based on his F1000Research paper Looking into Pandora's Box. In them he reports on an analysis combining two datasets released by Alexandra Elbakyan:
    • A 2016 dataset of 28M downloads from Sci-Hub between September 2015 and February 2016.
    • A 2017 dataset of 62M DOIs to whose content Sci-Hub claims to be able to provide access.
    Below the fold, some extracts and commentary.

    Greshake's procedure was as follows:
    • Obtain the bibliographic metadata (publisher, journal, year) for each of the 62M Sci-Hub DOIs from CrossRef. About 76% of the 62M could be resolved.
    • Match the DOI of each of the 28M downloads with the corresponding metadata. Metadata for about 77% of the downloads could be obtained.
    • Count the number of downloads for each of the ~47M/62M DOIs that had metadata.
    From the download count and the metadata Greshake was able to draw many interesting graphs, including:
    Greshake makes some interesting observations. The distributions are heavily skewed towards articles from major publishers:
    Looking at the data on a publisher level, there are ~1,700 different publishers, with ~1,000 having at least a single paper downloaded. Both corpus and downloaded publications are heavily skewed towards a set of few publishers, with the 9 most abundant publishers having published ~70% of the complete corpus and ~80% of all downloads respectively.
    And from a small number of major journals:
    The complete released database contains ~177,000 journals, with ~60% of these having at least a single paper downloaded. ... <10% of the journals being responsible for >50% of the total content in Sci-Hub. The skew for the downloaded content is even more extreme, with <1% of all journals getting over 50% of all downloads.
    The download skew towards the major publishers is not simply caused by their greater representation in the corpus:
    982 publishers differed significantly from the expected download numbers, with 201 publishers having more downloads than expected and 781 being underrepresented. Interestingly, while some big publishers like Elsevier and Springer Nature come in amongst the overly downloaded publishers, many of the large publishers, like Wiley-Blackwell and the Institute of Electrical and Electronics Engineers (IEEE) are being downloaded less than expected given their portfolio.
    There may be significant use from industry as opposed to academia:
    12 of the 20 most downloaded journals can broadly be classified as being within the subject area of chemistry. This is an effect that has also been seen in a prior study looking at the downloads done from Sci-Hub in the United States. In addition, publishers with a focus on chemistry and engineering are also amongst the most highly accessed and overrepresented. ... it's noteworthy that both disciplines have a traditionally high number of graduates who go into industry.
    But Greshake's implication seems to be contradicted by the data. Among the least downloaded publishers are ACM, SPIE, AMA, IOP, APS, BMJ, IEEE and Wiley-Blackwell, all of whom cover fields that are oriented more toward practitioners than academics. The disparity may be due more to the practitioner-oriented fields being less dominated by publishers hostile to open access. For example, computing (ACM, IEEE) and physics (IOP, APS) are heavy users of, eliminating the need for Sci-Hub. Whereas chemistry is dominated by ACS and RSC, both notably unenthusiastic about open access, and the third and fourth most downloaded publishers.

    The corpus and even more the downloads are heavily skewed towards recent articles:
    over 95% of the publications listed in Sci-Hub were published after 1950 ... Over 95% of all downloads fall into publications done after 1982, with ~35% of the downloaded publications being less than 2 years old at the time they are being accessed
    There is a:
    bleak picture when it came to the diversity of actors in the academic publishing space, with around 1/3 of all articles downloaded being published through Elsevier. The analysis presented here puts this into perspective with the whole space of academic publishing available through Sci-Hub, in which Elsevier is also the dominant force with ~24% of the whole corpus. The general picture of a few publishers dominating the market, with around 50% of all publications being published through only 3 companies, is even more pronounced at the usage level compared to the complete corpus, ... Only 11% of all publishers, amongst them already dominating companies, are downloaded more often than expected, while publications of 45% of all publishers are significantly less downloaded.
    Greshake concludes:
    the Sci-Hub data shows that the academic publishing field is even more of an oligopoly in terms of actual usage when compared to the amount of literature published.
    This is not unexpected; the major publishers' dominance is based on bundling large numbers of low-quality (and thus low-usage) journals with a small number of must-have (and thus high-usage) journals in "big deals".

    Testing HTTP calls in Python / Brown University Library Digital Technologies Projects

    Many applications make calls to external services, or other services that are part of the application. Testing those HTTP calls can be challenging, but there are some different options available in Python.


    One option for testing your HTTP calls is to mock out your function that makes the HTTP call. This way, your function doesn’t make the HTTP call, since it’s replaced by a mock function that just returns whatever you want it to.

    Here’s an example of mocking out your HTTP call:

    import requests
    class SomeClass:
      def __init__(self): = self._fetch_data()
      def _fetch_data(self):
        r = requests.get('')
        return r.json()
      def get_collection_ids(self):
        return [c['id'] for c in['collections']]
    from unittest.mock import patch
    MOCK_DATA = {'collections': [{'id': 1}, {'id': 2}]}
    with patch.object(SomeClass, '_fetch_data', return_value=MOCK_DATA) as mock_method:
      thing = SomeClass()
      assert thing.get_collection_ids() == [1, 2]

    Another mocking option is the responses package. Responses mocks out the requests library specifically, so if you’re using requests, you can tell the responses package what you want each requests call to return.

    Here’s an example using the responses package (SomeClass is defined the same way as in the first example):

    import responses
    import json
    MOCK_JSON_DATA = json.dumps({'collections': [{'id': 1}, {'id': 2}]})
    def test_some_class():
      responses.add(responses.GET,  '',
      thing = SomeClass()
      assert thing.get_collection_ids() == [1, 2]

    Record & Replay Data

    A different type of solution is to use a package to record the responses from your HTTP calls, and then replay those responses automatically for you.

    • – is a Python version of the Ruby VCR library, and it supports various HTTP clients, including requests.

    Here’s a example, again using SomeClass from the first example:

    import vcr 
    IDS = [674, 278, 280, 282, 719, 300, 715, 659, 468, 720, 716, 687, 286, 288, 290, 296, 298, 671, 733, 672, 334, 328, 622, 318, 330, 332, 625, 740, 626, 336, 340, 338, 725, 724, 342, 549, 284, 457, 344, 346, 370, 350, 656, 352, 354, 356, 358, 406, 663, 710, 624, 362, 721, 700, 661, 364, 660, 718, 744, 702, 688, 366, 667]
    with vcr.use_cassette('vcr_cassettes/cassette.yaml'):
      thing = SomeClass()
      fetched_ids = thing.get_collection_ids()
      assert sorted(fetched_ids) == sorted(IDS)
    • betamax – From the documentation: “Betamax is a VCR imitation for requests.” Note that it is more limited than, since it only works for the requests package.

    Here’s a betamax example (note: I modified the code in order to test it – maybe there’s a way to test the code with betamax without modifying it?):

    import requests
    class SomeClass:
        def __init__(self, session=None):
   = self._fetch_data(session)
        def _fetch_data(self, session=None):
             if session:
                 r = session.get('')
                 r = requests.get('')
             return r.json()
        def get_collection_ids(self):
            return [c['id'] for c in['collections']]
    import betamax
    CASSETTE_LIBRARY_DIR = 'betamax_cassettes'
    IDS = [674, 278, 280, 282, 719, 300, 715, 659, 468, 720, 716, 687, 286, 288, 290, 296, 298, 671, 733, 672, 334, 328, 622, 318, 330, 332, 625, 740, 626, 336, 340, 338, 725, 724, 342, 549, 284, 457, 344, 346, 370, 350, 656, 352, 354, 356, 358, 406, 663, 710, 624, 362, 721, 700, 661, 364, 660, 718, 744, 702, 688, 366, 667]
    session = requests.Session()
    recorder = betamax.Betamax(
     session, cassette_library_dir=CASSETTE_LIBRARY_DIR
    with recorder.use_cassette('our-first-recorded-session', record='none'):
        thing = SomeClass(session)
        fetched_ids = thing.get_collection_ids()
        assert sorted(fetched_ids) == sorted(IDS)

    Integration Test

    Note that with all the solutions I listed above, it’s probably safest to cover the HTTP calls with an integration test that interacts with the real service, in addition to whatever you do in your unit tests.

    Another possible solution is to test as much as possible with unit tests without testing the HTTP call, and then just rely on the integration test(s) to test the HTTP call. If you’ve constructed your application so that the HTTP call is only a small, isolated part of the code, this may be a reasonable option.

    Here’s an example where the class fetches the data if needed, but the data can easily be put into the class for testing the rest of the functionality (without any mocking or external packages):

    import requests
    class SomeClass:
        def __init__(self):
            self._data = None
        def data(self):
            if not self._data:
                r = requests.get('')
                self._data = r.json()
            return self._data
        def get_collection_ids(self):
            return [c['id'] for c in['collections']]
    import json
    MOCK_DATA = {'collections': [{'id': 1}, {'id': 2}]}
    def test_some_class():
        thing = SomeClass()
        thing._data = MOCK_DATA
        assert thing.get_collection_ids() == [1, 2]

    Islandora at Open Repositories 2017 / Islandora

    Next week the Open Repositories conference in Brisbane, Australia will begin. Islandora will be there, with an exhibit table where you can stop by to chat and pick up the latest and greatest in laptop stickers, an electronic poster in the Cube, and several sessions that may be of interest to our community:

    For the full schedule and more information about the conference, check out the Open Repositories 2017 website.

    Librarian speaks with Rep. Eshoo at net neutrality roundtable / District Dispatch

    “There’s nothing broken about existing net neutrality rules that needs to be fixed,” opined Congresswoman Anna Eshoo (D-CA-18) at a roundtable she convened in her district to discuss the impacts of the policy and the consequences of gutting it.

    Director of the Redwood City Public Library Derek Wolfgram joined Chris Riley, director of Public Policy at Mozilla; Gigi Sohn, former counselor to FCC Chairman Tom Wheeler; Andrew Scheuermann, CEO and co-founder of Arch Systems; Evan Engstrom, executive director of Engine; Vlad Pavlov, CEO and co-founder of rollApp; Nicola Boyd, co-founder of VersaMe; and Vishy Venugopalan, vice president of Citi Ventures in the discussion.

    Mozilla hosted Congresswoman Anna Eshoo for a discussion about net neutrality.

    On May 18, FCC Chairman Ajit Pai began the process of overturning critical net neutrality rules—which ensure internet service providers must treat all internet traffic the same—a move which the assembled panelists agreed would hurt businesses and consumers. The Congresswoman singled out anchor institutions—libraries and schools in particular—as important voices in the current discussion because libraries are “there for everyone.”

    Having seen the impacts of the digital divide in my own community, I felt that it was very important to highlight the value of net neutrality in breaking down barriers, rather than creating new ones, for families and small businesses to connect with educational resources, employment access and opportunities for innovation.

    “I was honored to have the opportunity to contribute a library perspective to Congresswoman Eshoo’s roundtable discussion on net neutrality,” said Wolfgram. “The Congresswoman clearly understands the value of libraries as ‘anchor institutions’ in this country’s educational infrastructure and recognizes the potential consequences of the erosion of equitable access to information if net neutrality were to be eliminated. Having seen the impacts of the digital divide in my own community, I felt that it was very important to highlight the value of net neutrality in breaking down barriers, rather than creating new ones, for families and small businesses to connect with educational resources, employment access and opportunities for innovation.”

    In his comments to the roundtable, Wolfgram identified two reasons strong, enforceable net neutrality rules are core to libraries’ public missions: preserving intellectual freedom and promoting equitable access to information. The Redwood City Library connects patrons to all manner of content served by the internet and many of these content providers, he fears, would not have the financial resources to compete against corporate content providers. Without net neutrality, high-quality educational resources could be relegated to second tier status.

    Like so many libraries across the country, the Redwood City Library provides low-cost access to the internet for members of the community who otherwise couldn’t connect. Students, even in the heart of Silicon Valley, depend on library-provided WiFi sitting in cars outside the library to get their work done, said Wolfgram. Redwood City Library has recently started loaning internet hot spots, focusing on school-age children and families in an effort to bridge this gap.

    “I would hate to see this big step forward, then the students get second-class access or don’t have a full connection to the resources they need,” said Wolfgram. “The internet should contribute to the empowerment of all.”

    Congresswoman Eshoo agreed, calling the current net neutrality rules, “a celebration of the First Amendment.”

    Former FCC official Sohn indicated the stakes are even higher. At issue, she said, is whether the FCC will have any role in overseeing the dominant communications network of our lifetimes. The FCC’s current proposal puts at risk subsidies for providing broadband to rural residents and people with low incomes through the Lifeline program. It is, as one panelist commented, like “replacing the real rules with no rules.”

    The panel concluded with a call to action and a reminder of how public comment matters: the FCC has to follow a rulemaking process and future legal challenges will depend on the robust record developed now. “It’s essential to build a record to win,” Sohn said.

    And she’s right. On June 9, we published guidance on how you can comment at the FCC on why net neutrality matters to your library. You can also blog, tweet, post and talk to your community about the importance of net neutrality and show the overwhelming support for this shared public platform.

    Rep. Eshoo’s office reached out to ALA to identify a librarian to participate in her roundtable. ALA, on behalf of the library community, deeply appreciates the invitation and her continuing support of libraries and the public interest.

    The post Librarian speaks with Rep. Eshoo at net neutrality roundtable appeared first on District Dispatch.

    Securing the Collective Print Books Collection: Progress to Date / HangingTogether

    In 2012, Sustainable Collection Services (SCS) and the Michigan Shared Print Initiative (MI-SPI) undertook one of the first shared print monographs projects in the US. Seven libraries came together under the auspices of Midwest Collaborative for Library Services (MCLS) to identify and retain 736,000 monograph holdings for an initial period of 15 years. This work laid the cornerstone of a secure North American (and ultimately international) collective print book collection.

    Since then, ten other groups have quietly continued this important work, with the help of SCS (part of OCLC since 2015) and the GreenGlass Model Builder. The results speak for themselves:

    • 11 Shared Print Programs (some with multiple projects)
    • 143 Institutions participating (almost all below the research level)
    • 7.6 million distinct editions identified for long-term retention
    • 19.7 million title-holdings now under long term retention commitment

    Models and retention criteria vary according to local and regional priorities, but most of the committed titles are secured under formal Memoranda of Understanding (MOU) for 15 years, often with review every five years. In some respects, these are grass-roots activities, designed to address local needs, but it seems clear that these programs can contribute significantly to a federated national or international solution, such as that envisioned by the MLA’s Working Group on The Future of the Print Record.

    Organizations at the forefront of shared print monographs retention to date include:

    In addition, the HathiTrust Shared Print Program has made excellent progress, with 50 libraries proposing 16 million monograph volumes for 25-year retention. That work continues, and will ultimately secure multiple holdings of all 7.8 million distinct monograph titles in the HathiTrust digital archive. OCLC/SCS has additional group projects underway in Maryland and Nova Scotia, and both EAST and SCELC are about to bring additional libraries into their shared print programs. As shown in the maps below, construction of the secure collective monographs collection is well underway.

    US Print Monograph Retentions by State (June 2017) OCLC Sustainable Collection Services


    Canadian Print Monograph Retentions by Province (June 2017) OCLC SCS

    In subsequent posts, I’ll examine patterns of overlap and geographic distribution of retention commitments, as well as registration of those commitments in WorldCat. I’ll also share some thoughts about managing the collective collection holistically. For now, congratulations and thanks to the many librarians and consortial staff whose hard work has brought the community so far so quickly.

    [Special thanks to my SCS colleague Andy Breeding, who compiled the data and maps.]

    A Discovery Opportunity for Archives? / Richard Wallis

    A theme of my last few years has been enabling the increased discoverability of Cultural Heritage resources by making the metadata about them more open and consumable.

    Much of this work has been at the libraries end of the sector but I have always have had an eye on the broad Libraries, Archives, and Museums world, not forgetting Galleries of course.

    Two years ago at the LODLAM Summit 2015 I ran a session to explore if it would be possible to duplicate in some way the efforts of the Schema Bib Extend W3C Community Group which proposed and introduced an extension and enhancements to the vocabulary to improve its capability for describing bibliographic resources, but this time for archives physical, digital and web.

    Interest was sufficient for me to setup and chair a new W3C Community Group, Schema Architypes. The main activity of the group has been the creation and discussion around a straw-man proposal for adding new types to the vocabulary.

    Not least the discussion has been focused on how the concepts from the world of archives (collections, fonds, etc.) can be represented by taking advantage of the many modelling patterns and terms that are already established in that generic vocabulary, and what few things would need to be added to expose archive metadata to aid discovery.

    Coming up next week is LODLAM Summit 2017, where I have proposed a session to further the discussion on the proposal.

    So why am I now suggesting that there maybe an opportunity for the discovery of archives and their resources?

    In web terms for something to be discoverable it, or a description of it, needs to be visible on a web page somewhere. To take advantage of the current structured web data revolution, being driven by the search engines and their knowledge graphs they are building, those pages should contain structured metadata in the form of markup.

    Through initiatives such as ArchivesSpace and their application, and ArcLight it is clear that many in the world of archives have been focused on web based management, search, and delivery views of archives and the resources and references they hold and describe. As these are maturing it is clear that the need for visibility on the web is starting to be addressed.

    So archives are now in a great place to grab the opportunity to take advantage of the benefits of to aid discovery of their archives and what they contain. At least with these projects, they have the pages on which to embed that structured web data, once a consensus around the proposals from the Schema Architypes Group has formed.

    I call out to those involved with the practical application of systems for the management, searching, and delivery of archives to at least take a look at the work of the group and possibly engage on a practical basis, exploring the potential and challenges for implementing

    So if you want to understand more behind this opportunity, and how you might get involved, either join the W3C Group or contact me direct.


    *Image acknowledgement to The Oberlin College Archives

    Evergreen 3.0 development update #10 / Evergreen ILS

    Ducks and geese. Photo courtesy Andrea Neiman

    Since the previous update, another 8 patches have been committed to the master branch.

    As it happened, all of the patches in question are concerned with fixing various bugs with the web staff client. Evergreen 3.0 will round out the functionality available in the web staff client by adding support for serials and offline circulation, but a significant amount of the effort will also include dealing with bugs now that some libraries are starting to use the web staff client in limited production as of version 2.12.

    Launchpad is, of course, used to keep track of the bugs, and I would like to highlight some of the tags used:

    • webstaffclient, which is a general catch-all tag for all bugs and feature requests related to the web staff client.
    • webstaffprodcirc, which is for bugs that significantly affect the use of the web staff client’s circulation module.
    • fixedinwebby, which is for bugs that are fixed in the web staff client but which have a XUL client side that will likely not be fixed. As a reminder, the plan is to deprecated the XUL staff client with the release of 3.0 and remove it entirely with the Fall 2018 release.

    Duck trivia

    Even ducks were not immune to the disco craze of the 70s.


    Updates on the progress to Evergreen 3.0 will be published every Friday until general release of 3.0.0. If you have material to contribute to the updates, please get them to Galen Charlton by Thursday morning.

    Atmire Acquires Open Repository / DuraSpace News

    From Bram Luyten, @mire

    Atmire NV has entered into an agreement to acquire Open Repository, BioMed Central's repository service for academic institutions, charities, NGOs and research organisations.

    Under the agreement, Atmire will take over management and support of all Open Repository customers effective from July 28th. The acquisition adds to Atmire's client base of institutions using DSpace (an open source repository software package typically used for creating open access repositories) and allows BioMed Central to focus on its core business concerns.

    IIPC 2017 – Day Three / Harvard Library Innovation Lab

    On day three of IIPC 2017 (day 1, day 2), we heard more about what I see as the two main themes of the conference: archives users and metadata for provenance.

    On the user front, I’ll point out Sumitra Duncan’s talk on NYARC Discovery; like WALK, presented yesterday, this project aggregates search across multiple archives, improving access for users. Peter Webster of Webster Research & Consulting and Chris Fryer from the Parliamentary Archives spoke about their study of the archive’s users: the questions of what users want and need, and how they actually use the archive, are fundamental. How we think archives should or could be used may not be as pertinent as we imagine….

    On the metadata front, Emily Maemura and Nicholas Worby from the University of Toronto spoke about the ways in which documentation and curatorial process affect users’ experience of and access to archives – the staffing history of a collecting organization, for example, could be an important part of understanding why a web archive contains what it does. Jackie Dooley (OCLC Research), Alexis Antracoli (Princeton University), and Karen Stoll Farrell (Frick Art Reference Library) presented their work on developing web archiving metadata best practices to meet user needs – and it becomes clear that my two main themes could really be seen as one. OCLC Research will issue their reports in July.

    I’ll also point out Nicholas Taylor’s excellent talk on the legal use cases for archives, and, of course, LIL’s Anastasia Aizman and Matt Phillips, who gave a super talk on their ongoing work on comparing web archives. Thanks again, and hope to see you all next year!

    IIPC 2017 – Day Two / Harvard Library Innovation Lab

    Most of us attended the technical track on day two of IIPC 2017. (See also Matt’s post about the first day.) Andrew Jackson of the British Library expanded on his talk the previous day about workflows for ingesting and processing web archives. Nick Ruest and Ian MIlligan described WALK, or Web Archiving for Longitudinal Knowledge, a system for aggregating Canadian web archives, generating derivative products, and making them accessible via search and visualizations. Gregory Wiedeman from University at Albany, SUNY, described his process for automating the creation of web archive records in ArchivesSpace and adding descriptive metadata using Archive-It APIs according to DACS (Describing Archives: A Content Standard).

    After the break, the Internet Archive’s Jefferson Bailey roared through a presentation of IA’s new tools, including systems for analysis, search, capture (Brozzler!), and availability. Mat Kelly from Old Dominion University described three tools for enabling non-techical users to create, index, and view web archives: WARCreate, WAIL, and Mink. Lozana Rossenova and Ilya Kreymer of Rhizome demonstrated the use of containerized browsers for playback of web content that is no longer usable in modern browsers (think Java applets), as well as some upcoming features in Webrecorder for patching content into incomplete captures.

    Following lunch, Fernando Melo and João Nobre from described their new APIs for search and temporal analysis of Portuguese web archives. Nicholas Taylor of Stanford University Libraries talked about the ongoing rearchitecture of LOCKSS (Lots of Copies Keep Stuff Safe), expanding its role from a focus on the archiving of electronic journals to a tool for preserving web archives and other digital objects more generally. (In the Q&A, LOCKSS founder David Rosenthal mentioned the article “Familiarity breeds contempt: the honeymoon effect and the role of legacy code in zero-day vulnerabilities”.) Jefferson Bailey returned, along with Naomi Dushay, also from the Internet Archive, to talk about WASAPI (the Web Archiving Systems API) for transfer of data between archives.

    After another break, LIL’s own Jack Cushman took the stage with Ilya Kreymer for a fantastic presentation of, a tool for exploring security issues in web archives: serving a captured web page is very much akin to hosting attacker-supplied content, and provides a series of challenges for trying out different kinds of attacks against a simplified local web archive. Mat Kelly then returned with David Dias of Protocol Labs to discuss InterPlanetary Wayback, which stores web archive files in IPFS, the InterPlanetary File System. Finally, Andrew Jackson wrapped up the session by leading a discussion of planning for an IIPC hackathon or other mechanism for gathering to code.

    Thanks, all, for another excellent day!

    The Pattern Language of the Library / Mita Williams

    I am an olds.

    When I first started working at the University of Windsor in July of 1999, the first floor of the Leddy Library was largely taken up by stacks of reference books. The largest collection of the library’s private and semi-private study carrels were on the second floor.

    Keeping in mind that ideally reference materials are close at hand when one is writing why would our library actively separate reading and writing activities through in its architecture?

    I think there must have been a variety of reasons behind why it was decided to place the study carrels on the second floor with the most obvious being that the library was designed to keep activities requiring concentration away from the distraction of people walking into the library and through its space.

    But there’s another reason why such a separation existed which is suggested by the fact you can find an electrical outlet in every single study carrel on the second floor at even though the building came to be decades before laptops were available.

    The answer is typewriters. Noisy, clattering typewriters.

    I didn’t make this connection myself. That insight came from this Twitter conversation from 2014.

    While there is a rich conversation to be had about how some of the information literacy practices that separate research and writing as separate processes may have resulted from vestigial practice based on avoiding typewriter noise, I’m more interested in exploring what the affordance of laptops might mean to the shape of the spaces of and within the library today.

    The book did not kill the building.

    The laptop will change our chairs.

    Our print reference collection is now in the basement of the West Building of the Leddy Library. Much of the space on the first floor of the Main Building is filled with banks of computer workstations that we used to call our Learning Commons.

    But the perceived need for banks of workstations was waned in libraries. You don’t see as many seas of desktops in newly constructed library buildings. Now the entire library is perceived as a Learning Commons.

    The image above that references the Learning Common concept is from Steelcase who design furniture for offices and other spaces such as libraries like GVSU’s Mary Idema Pew Library:

    I was recently looking through the Steelcase product catalogue and I was taken by the way that the company makes very clear how the form of their furniture is tightly associated with function.

    (If you are a subscriber to my newsletter: in the above video there’s a reference to that theory that I wrote about which suggests that the most comfortable seating is one when you feel protected from the back.)

    When I read about their turnstone Campfire suite of products it reminded me of a book I read sometime ago called make space: How to Set the Stage for Creative Collaboration. I found the book on our shelves, took it down and leafed through the book and found this:


    While make space makes no specific allusion to A Pattern Language by Christopher Alexander et al. I feel it’s almost impossible not to conclude that it must have provided some inspiration.

    An except from A Pattern Language: 251 Different Chairs

    From A Pattern Language:

    People are different sizes; they sit in different ways. And yet there is a tendency in modern times to make all the chairs alike.

    From Twitter:

    From A Pattern Language

    Of course, this tendency to make all chairs alike is fueled by the demands of prefabrication and the supposed economies of scale. Designers have for years been creating “perfect chairs” — chairs that can be manufactured cheaply on mass. These chairs are made to be comfortable for the average person. And the institutions that buy chairs have been persuaded that buying these chairs in bulk meets all their needs.

    I particularly like this excerpt from A Pattern Language because I know an example of this very tension. In 2014 I sat in on the 2014 Library Interior Design Award Winners presentation at ALA Annual. There the interior designer being celebrated publicly lamented the fact that the NCSU Library opted for a wide variety of chairs including many that did not match the larger aesthetic of the space. Then the librarian spoke and told us that said chairs were so loved by students that some of them made a Tumblr of them in their honor.

    I think we fundamentally underestimate how much a difference a variety of chairs can make in the experience of a place.

    For example: this is a science classroom.

    Kids come early to get the best seats.

    Here’s another example. This picture is of a community bench that the neighbour of Dave Meslin made available for others.

    My neighbours cut ten feet off their shrub, and replaced it with a community bench! ❤️

    A post shared by dave meslin (@davemeslin) on

    A community bench is what I would consider an example of tactical urbanism – a phrase that I like to think I first heard from People from Public Spaces. I am looking forward to reading Karen Munro’s Tactical Urbanism for Librarians: Quick, Low-Cost Ways to Make Big Changes.

    I should also say that I’m not the first librarian to try to bring in Pattern Language thinking to how we design our spaces. In 2009 William Denton and Stacey Allison-Cassin explained their “vision of the One Big Library and how Christopher Alexander’s pattern language idea will help us build it.”

    In reviewing their talk for this blog post I re-read from their slides this quotation from A Pattern Language:

    This is a fundamental view of the world. It says that when you build a thing, you cannot merely build that thing in isolation, but must also repair the world around it, and within it, so that the larger world at one place becomes more coherent, and more whole; and the thing which you make takes its place in the web of nature, as you make it.

    I had forgotten that I read that particular phrase – repair the world – in that text.

    About three months ago I started a special interest group at Hackforge called Repair the World which is “a monthly meet-up of those who want to learn more about the technologies and policies we can employ in Windsor-Essex to lead us towards a carbon-neutral future and to help our community cope with the effects of global warming”.

    For our first meeting, I didn’t do much other than set a time and place, give one suggested reading for potential discussion, and help set up the chairs in the space in a circle for our discussion.

    In The Chairs Are Where the People Go, Sheila Heti transcribed and edited this advice from Misha Glouberman:

    There’s a thoughtlessness in how people consider their audience that’s reflected in how they set up chairs. You can see that thoughtlessness immediately…

    … At a conference, if you want to create a discussion group, you can set up the chairs in a circle, and you don’t need a table…

    … Setting up chairs takes a lot of time, but anyone can do it. If you’re running a project and you want to get people involved, ask them to set up chairs. People like to set up chairs, and it’s easy work to delegate. It’s even easier to get people to put chairs away.

    Everyone should know these things.




    IIPC 2017 – Day One / Harvard Library Innovation Lab



    It's exciting to be back at IIPC this year to chat and web archives!


    The conference kicked off at on Wednesday, June 14, at 9:00 with coffee, snacks, and familiar faces from all parts of the world. Web archives bring us together physically!



    So many people to meet. So many collaborators to greet!


    Jane Winters and Nic Taylor welcomed. It’s wonderful to converse and share in this space — grand, human, bold, warm, strong. Love the Senate House at University of London. Thank you so much for hosting us!

    Leah Lievrouw, UCLA
    Web history and the landscape of communication/media research

    Leah told us that computers are viewed today as a medium — as human communication devices. This view is common now, but hasn’t been true for too long. Computers as a medium was very fringe even in the early 80s.

    We walked through a history of communications to gain more understanding of computers as human communication devices and started with some history of information organization and sharing.

    Paul Otlet pushed efforts forward to organize all of the world’s information in the late 19th century Belgium and France.

    The Coldwar Intellectuals by J Light describes how networked information moved from the government and the military to the public.

    And, how that network information became interesting when it was push and pull — send an email and receive a response, or send a message on a UNIX terminal to another user and chat. Computers are social machines, not just calculating machines.

    Leah took us through how the internet and early patterns of the web were formed by the time and the culture — in this case, the incredible activity of Stanford, Berkley. Mileu of the Bay Area — bits and boolean logic through psychedelics. Fred Turner’s From Counterculture to Cyberculture is a fantastic read on this scene.

    Stewart Brand, Ted Nelson, the WELL online community, and so on.

    We’re still talking about way before the web here. The idea of networked information was there, but we didn’t have a protocol (http) or a language (html) being used (web browser) at large scale (the web). Wired Cities by Dutton, Blumer, Kraemer sounds like a fantastic read to understand how mass wiring/communication made the a massive internet/web a possibility!

    The Computer as Communication Device described by J.C.R. Licklider and Bob Taylor was a clear vision to the future — we’re still not at a place where computers understand us as humans, we’re still are fairly rigid with defined request and responses patterns.

    The web was designed to access, create docs, that’s it. Early search engines and browsers exchanged discrete documents — we thought about the web as discrete, linked documents.

    Then, user generated content came along — wikis, blogs, tagging, social network sites. Now it’s easy for lots of folks to create content and and the network is even more powerful as a communication tool for many people!

    The next big phase came with mobile — about mid 2000s. More and more and more people!

    Data subject (data cloud or data footprint) is an approach that has felt interesting recently at UCLA. Maybe it’s real-time “flows” rather than “stacks” of docs or content.

    Technology as cultural material and material culture.



    University of London is a fantastic space!


    Jefferson Bailey, Internet Archive
    Advancing access and interface for research use of web archives

    Internet Archive is a massive archive! 32 Petabytes (with duplications)

    And, they have search APIs!!

    Holy smokes!!! Broad access to wayback without a URL!!!!!!!

    IA has been working on a format called WAT. It’s about 20-25% the size of a WARC and contains just about everything (including title, headers, link) except the content. And, it’s a JSON format!

    Fun experiments when you have tons of web archives!!! and US Military powerpoints are two gems!


    Digital Desolation
    Tatjana Seitz

    A story about a homepage can be generated using its layout elements — (tables, fonts, and so on). Maybe the web counter and the alert box mark the page in time and can be used to understand the page!

    Analysis of data capture cannot be purely technical, has to be socio-technical.

    Digital desolation is a term that describes abandoned sites on the web. Sites that haven’t been restyled. Sites age over time. (Their wrinkles are frames and table !!?? lol)

    Old sites might not bubble to the top in today’s search engines — they’re likely at the long tail of what is returned. You have to work to find good old pages.



    The team grabbing some morning coffee


    Ralph Schroederucla, Oxford Internet Institute
    Web Archives and and theories of the web

    Ralph is looking at how information is used and pursued.

    How do you seek information? Not many people ask this core question. Some interesting researcher (anyone know?) in Finland does thought. He sits down with folks and asks “how do you think about getting information when you’re just sitting in your house? How does your mind seek information?”

    Googlearchy — a few sites exist that dominate !

    You can look down globally at which websites dominate the attention space. The idea that we’d all come together in a one global culture, that hasn’t happened yet — instead, there’s been a slow crystallization of different clusters

    It used to be an anglo-ization of the web, now things may have moved to the south asian – Angela Wu talks about this.

    Some measurements show that American and Chinese devote their attention to about the same bubble of websites — it might be that Americans are no more outward looking than are Chinese

    We need a combined quantitative and qualitative study of web attention — we don’t access the web by typing in a URL (unless you’re in internet archive) we go to google

    It’s hard to know about internet as a human right
    Maybe having reliable information about health could be construed as civil rights
    And unreliable, false information goes against human rights

    London is a delightful host for post-conference wanderings


    Oh, dang, it’s lunch already. It’s been a fever of web archiving!

    We have coverage at this year’s IIPC! What a fantastic way to attend a conference — with the depth and breadth of much of hte team!

    Anastasia Aizman, Becky Cremona, Jack Cushman, Brett Johnson, Matt Phillips, and Ben Steinberg are in attendance this year.


    Caroline Nyvang, Thomas Hvid Kromann & Eld Zierau
    Continuing the web at large


    The authors conducted a survey of 35 master thesis from University of Copenhagen found that there were 899 web refs, 26.4 web refs on avg, 0 min, 80 max.

    About 80% of links in theses were not dated or loosely dated — urls without dates are not reliable for citations?

    Students are not consistent when they refer to web material, even if they followed well known style guides.

    The speakers studied another corpus — 10 danish academic monographs and found similar variation around citations. Maybe we can work toward a good reference style?

    Form of suggested reference might be something like


    Where page is the content coverage, or thing the author is citing. Fantastic!

    What if we were to make the content coverage in a fragment identifier (the stuff after the # in the address? Maybe something like this,<timestamp>/<url>#<content coverage>



    And totally unrelated, this fridge was spotted later that day on the streets of
    London. We need a fridge in LIL. Probably not worth shipping back though.


    Some Author, some organization

    The UK Web Archive has been actively grabbing things from the web since 2004.

    Total collection of 400 TB of UK websites only, imposing a “territorial” boundary –
    .uk, .scot, .cymru, etc.

    Those TLDs are not everything though — if the work is made available from a website with a uk domain name or that person is physically based in uk



    Fantastic first day!! Post-conference toast (with a bday cheers!)!!


    Recap, decompress, and keep the mind active for day two of IIPC!

    The day was full of energy, ideas, and friendly folks sharing their most meaningful work. An absolute treat to be here and share our work! Two more days to soak up!


    UXLibs III: Conference Thoughts / Shelley Gullikson

    (This was a difficult post to write and ended up being quite personal. You might just want my UXLibs III conference notes.)

    I was really really looking forward to UXLibs III. I love the UXLibs conferences and this year, I was presenting with Kristin Meyer. Kristin and I wrote an article for WeaveUX last year and it was an absolutely amazing experience. We had never met and yet the partnership was so easy; we had similar ideas about deadlines and effort, and we had similar writing styles. With UXLibs III, we were able to work together again and would finally meet in person. Exciting!

    And the conference was great. The content was absolutely and completely what I’m interested in. Meeting Kristin in person and presenting together was fabulous. The other people I met were really great. The people I already knew were completely lovely. Plus, there was ceilidh dancing!

    And yet… coming home, I don’t feel as fired up as I have in previous years. Is my love affair with UXLibs over?

    During the conference, I had a great conversation with Bernadette Carter from Birmingham City University about the Team Challenge. She was struck by how most of us wanted to fix all the problems identified in the research documents, even the things that weren’t our responsibility—like broken plugs. She loved that we all cared so much that we wanted to fix ALL THE THINGS. But we also talked about how, back in our own libraries, it can be incredibly frustrating when we can’t fix things outside of the library’s control.

    I wonder if the implicit promise of the first UXLibs was that we were learning how to fix everything for our users. We just needed to observe behaviour, ask the right questions, read a number of love letters and break-up letters and we would understand what our users needed. Then it would just be a matter of designing solutions, taking them through a few iterations and voilà! Fixed!

    But we can’t fix everything for our users—for any number of reasons—and that’s hard. But UXLibs is also now a community where we can talk about it being hard.

    In Andy’s opening address (partly replicated here), he talked about his struggles with having UX work either ignored or undermined at his previous workplace. I didn’t take any notes during Andy’s talk, and I think that was because I was busy thinking about how similar his themes were to what I was about to say in the presentation Kristin and I were doing immediately after Andy’s talk.

    In that presentation, I talked about a UX project that didn’t go well, mostly because of the organizational culture in my library. When I look at Andy’s model of UX adoption (below), I think my library would rate even worse than his—all in the red. On top of our not-great org culture, we are going through a tremendous amount of change. I don’t see (yet) how the UX work I want to do fits. I don’t see how I fit.


    This year has been difficult for me professionally. I’ve felt uninspired. I’ve felt useless. I still feel a bit adrift. UXLibs was a shining beacon in my calendar that pulled me through the winter. It was supposed to save me, I think; to help me feel inspired and useful and full of purpose again.

    Having been pretty open about challenges in my library on the first morning of the conference, many of the conversations I had during the rest of the conference were related to that. So I guess it’s not surprising that, post-conference, I’m not feeling fired up with inspiration. It was incredibly helpful to share feelings of struggle, but it hasn’t created momentum for what I might do next.

    Thinking about the conference keynotes, my takeaways weren’t so much ideas for doing things, but rather cautions to be more careful with, and more thoughtful about the things I do. This is not at all a negative; I think it’s a sign of maturity.

    In the Question Time panel on the last day, one of the questions was whether UX was a fad. I thought it was a bit of a silly question at the time and of course none of panelists agreed that UX is a fad. But thinking about it a bit more deeply now, I think for me UX was not a fad but a new lens—a shiny one! It extended my long-held interest in human-computer interaction and usability to the larger library: physical and virtual space and services. My intro to UX coincided with a change of job, and with that change, I had newfound freedom to pursue UX work. It wasn’t a fad, but it was a new love—a great and glorious infatuation. The love isn’t gone, but I’m starting to notice the snoring and farting, and really couldn’t someone else cook dinner once in a goddamned while?

    UXLibs has matured in three years, and most relationships do lose a bit of fire after the first while. My more muted reaction to the conference this year is not a reflection of anything that’s wrong with UXLibs. I’ve just got my own stuff to work out. But I’m in this for the long haul. I’ll be back next year, as excited to attend as ever. These are my people. This is my place.

    Emulation: Windows10 on ARM / David Rosenthal

    At last December's WinHEC conference, Qualcomm and Microsoft made an announcement to which I should have paid more attention:
    Qualcomm ... announced that they are collaborating with Microsoft Corp. to enable Windows 10 on mobile computing devices powered by next-generation Qualcomm® Snapdragon™ processors, enabling mobile, power efficient, always-connected cellular PC devices. Supporting full compatibility with the Windows 10 ecosystem, the Snapdragon processor is designed to enable Windows hardware developers to create next generation device form factors, providing mobility to cloud computing.
    The part I didn't think about was:
    New Windows 10 PCs powered by Snapdragon can be designed to support x86 Win32 and universal Windows apps, including Adobe Photoshop, Microsoft Office and Windows 10 gaming titles.
    How do they do that? The answer is obvious: emulation! Below the fold, some thoughts.

    Because of the ubiquity of the x86 instruction set, much of the work described as emulation is more correctly described as virtualization. As discussed in my report on emulation, virtualization and emulation are end-points of a spectrum; the parts that the hardware you're running does implement are virtualized and the parts it doesn't are emulated. Because ARM and x86 are completely different instruction sets, Qualcomm is at the emulation end of the spectrum. More than two decades ago, Apple used emulation to migrate from the Motorola 68000 to the PowerPC instruction set; this isn't anything new or surprising.

    It is obviously in everyone's interest, except Intel's, to have more effective competition in the market for chips to run Windows than AMD has been able to provide. This is especially true given the way PC and mobile technologies are merging. Intel's consistent failure to deliver performance competitive with ARM in the mobile market and Qualcomm's ability to integrate 5G connectivity are significant.

    Now, MojoKid at /. points me to Brandon Hill's Intel Fires Warning Shot At Qualcomm And Microsoft Over Windows 10 ARM Emulation In X86 Birthday Blog Post. The Intel blog post is authored by Steven Rogers, EVP and General Counsel for Intel, and Richard Uhlig, Intel Labs Fellow and Director of Systems and Software Research, and it clearly is a warning shot:
    There have been reports that some companies may try to emulate Intel’s proprietary x86 ISA without Intel’s authorization. Emulation is not a new technology, and Transmeta was notably the last company to claim to have produced a compatible x86 processor using emulation (“code morphing”) techniques. Intel enforced patents relating to SIMD instruction set enhancements against Transmeta’s x86 implementation even though it used emulation.
    Transmeta vs. Intel was an unequal battle, and Transmeta lost (my emphasis):
    On October 24, 2007, Transmeta announced an agreement to settle its lawsuit against Intel Corporation. Intel agreed to pay $150 million upfront and $20 million per year for five years to Transmeta in addition to dropping its counterclaims against Transmeta. Transmeta also agreed to license several of its patents and assign a small portfolio of patents to Intel as part of the deal. Transmeta also agreed to never manufacture x86 compatible processors again.
    But Microsoft+Qualcomm vs. Intel is a battle of equals, especially given Intel and Microsoft's co-dependent quasi-monopoly. It is likely to go down to the wire. If it ends up in court, is likely to clarify the legalities of using emulation significantly.

    Unfortunately, the interests of preservation won't figure in any such court battle. Clearly, these interests would favor Qualcomm+Microsoft, but the favor wouldn't be returned. Their interests would have a much closer time horizon. The way this conflict plays out will have a big effect on the PC business, and on the future of emulation as a preservation strategy.