Jon Ippolito, Professor of New Media at the University of Maine
As artists have embraced a range of new media and forms in the last century as the work of collecting, conserving and exhibiting these works has become increasingly complex and challenging. In this space, Richard Rinehart and Jon Ippolito have been working to develop and understand approaches to ensure long-term access to digital works. In this installment of our insights interview series I discuss Richard and Jon’s new book, “Re-collection: Art, New Media, and Social Memory.” The book offers an articulation of their variable media approach to thinking about works of art. I am excited to take this opportunity to explore the issues the book raises about digital art in particular and a perspective on digital preservation and social memory more broadly as part of our Insights Interview Series.
Trevor: The book takes a rather broad view of “new media”; everything from works made of rubber, to CDs, art installations made of branches, arrangements of lighting, commercial video games and hacked variations of video games. For those unfamiliar with your work more broadly, could you tell us a bit about your perspective on how these hang together as new media? Further, given that the focus of our audience is digital preservation, could you give us a bit of context for what value thinking about various forms of non-digital variable new media art offer us for understanding digital works?
Richard Rinehart, Director of the Samek Art Museum at Bucknell University.
Richard: Our book does focus on the more precise and readily-understood definition of new media art as artworks that rely on digital electronic computation as essential and inextricable. The way we frame it is that these works are at the center of our discussion, but we also discuss works that exist at the periphery of this definition. For instance, many digital artworks are hybrid digital/physical works (e.g., robotic works) and so the discussion cannot be entirely contained in the bitstream.
We also discuss other non-traditional art forms–performance art, installation art–that are not as new as “new media” but are also not that old in the history of museum collecting. It is important to put digital art preservation in an historical context, but also some of the preservation challenges presented by these works are shared with and provide precedents for digital art. These precedents allow us to tap into previous solutions or at least a history of discussion around them that could inform or aid in preserving digital art. And, vice versa, solutions for preserving digital art may aid in preserving these other forms (not least of which is shifting museum practices). Lastly, we bring non-digital (but still non-traditional) art forms into the discussion because some of the preservation issues are technological and media-based (in which case digital is distinct) but some issues are also artistic and theoretical, and these issues are not necessarily limited to digital works.
Jon: Yeah, we felt digital preservation needed a broader lens. The recorded culture of the 20th century–celluloid, vinyl LPs, slides–is a historical anomaly that’s a misleading precedent for preserving digital artifacts. Computer scientist Jeff Rothenberg argues that even JPEGs and PDF documents are best thought of as applications that must be “run” to be accessed and shared. We should be looking at paradigms that are more contingent than static files if we want to forecast the needs of 21st-century heritage.
Casting a wider net can also help preservationists jettison our culture’s implicit metaphor of stony durability in favor of one of fluid adaptability. Think of a human record that has endured and most of us picture a chiseled slab of granite in the British Museum–even though oral histories in the Amazon and elsewhere have endured far longer. Indeed, Dragan Espenschied has pointed out cases in which clay tablets have survived longer than stone because of their adaptability: they were baked as is into new buildings, while the original carvings on stones were chiseled off to accommodate new inscriptions. So Richard and I believe digital preservationists can learn from media that thrive by reinterpretation and reuse.
Trevor: The book presents technology, institutions and law as three sources of problems for the conservation of variable media art and potentially as three sources of possible solutions. Briefly, what do you see as the most significant challenges and opportunities in these three areas? Further, are there any other areas you considered incorporating but ended up leaving out?
Jon: From technology, the biggest threat is how the feverish marketing of our techno-utopia masks the industry’s planned obsolescence. We can combat this by assigning every file on our hard drives and gadget on our shelves a presumptive lifespan, and leaving room in our budgets to replace them once their expiration date has expired.
From institutions, the biggest threat is that their fear of losing authenticity gets in the way of harnessing less controllable forms of cultural perseverance such as proliferative preservation. Instead of concentrating on the end products of culture, they should be nurturing the communities where it is birthed and finds meaning.
From the law, the threat is DRM, the DMCA, and other mechanisms that cut access to copyrighted works–for unlike analog artifacts, bits must be accessed frequently and openly to survive. Lawyers and rights holders should be looking beyond the simplistic dichotomy of copyright lockdown versus “information wants to be free” and toward models in which information requires care, as is the case for sacred knowledge in many indigenous cultures.
Other areas? Any in which innovative strategies of social memory are dismissed because of the desire to control–either out of greed (“we can make a buck off this!”) or fear (“culture will evaporate without priests to guard it!”).
Trevor: One of the central concepts early in the book is “social memory,” in fact, the term makes its way into the title of the book. Given its centrality, could you briefly explain the concept and discuss some of how this framework for thinking about the past changes or upsets other theoretical perspectives on history and memory that underpin work in preservation and conservation?
Richard: Social memory is the long-term memory of societies. It’s how civilizations persist from year to year or century to century. It’s one of the core functions of museums and libraries and the purpose of preservation. It might alternately be called “cultural heritage,” patrimony, etc. But the specific concept of social memory is useful for the purpose of our book because there is a body of literature around it and because it positions this function as an active social dynamic rather than a passive state (cultural heritage, for instance, sounds pretty frozen). It was important to understand social memory as a series of actions that take place in the real world every day as that then helps us to make museum and preservation practices tangible and tractable.
The reason to bring up social memory in the first place is to gain a bit of distance on the problem of preserving digital art. Digital preservation is so urgent that most discussions (perhaps rightfully) leap right to technical issues and problem-solving. But, in order to effect the necessary large-scale and long-term changes in, say, museum practices, standards and policies we need to understand the larger context and historic assumptions behind current practices. Museums (and every cultural heritage institution) are not just stubborn; they do things a certain way for a reason. To convince them to change, we cannot just point at ad-hoc cases and technical problematics; we have to tie it to their core mission: social memory. The other reason to frame it this way is that new media really are challenging the functions of social memory; not just in museums, but across the board and here’s one level in which we can relate and share solutions.
These are some ways in which the social memory allows us to approach preservation differently in the book, but here’s another, more specific one. We propose that social memory takes two forms: formal/canonical/institutional memory and informal/folkloric/personal memory (and every shade in between). We then suggest how the preservation of digital art may be aided by BOTH social memory functions.
Trevor: Many of the examples in the book focus on boundary-breaking installation art, like Flavin’s work with lighting, and conceptual art, like Nam June Paik’s work with televisions and signals, or Cory Arcangel’s interventions on Nintendo cartridges. Given that these works push the boundaries of their mediums, or focus in depth on some of the technical and physical properties of their mediums do you feel like lessons learned from them apply directly to seemingly more standardized and conventional works in new media? For instance, mass produced game cartridges or Flash animations and videos? To what extent are lessons learned about works largely intended to be exhibited art in galleries and museums applicable to more everyday mass-produced and consumed works?
Richard: That’s a very interesting question and its speaks to our premise that preserving digital art is but one form of social memory and that lessons learned therein may benefit other areas. I often feel that preserving digital art is useful for other preservation efforts because it provides an extreme case. Artists (and the art world) ensure that their media creations are about as complex as you’ll likely find; not necessarily technically (although some are technically complex and there are other complexities introduced in their non-standard use of technologies) but because what artists do is to complicate the work at every level–conceptually, phenomenologically, socially, technically; they think very specifically about the relationship between media and meaning and then they manifest those ideas in the digital object.
I fully understand that preserving artworks does not mean trying to capture or preserve the meaning of those objects (an impossible task) but these considerations must come into play when preserving art even at a material level; especially in fungible digital media. So, for just one example, preserving digital artworks will tell us a lot about HCI considerations that attend preserving other types of interactive digital objects.
Jon: Working in digital preservation also means being a bit of a futurist, especially in an age when the procession from medium to medium is so rapid and inexorable. And precisely because they play with the technical possibilities of media, today’s artists are often society’s earliest adopters. My 2006 book with Joline Blais, “At the Edge of Art,” is full of examples, whether how Google Earth came from Art+Com, Wikileaks from Antoni Muntadas, or gestural interfaces from Ben Fry and Casey Reas. Whether your metaphor for art is antennae (Ezra Pound) or antibodies (Blais), if you pay attention to artists you’ll get a sneak peek over the horizon.
Trevor: Richard suggests that the key to digital media is variability and not fixity which is the defining feature of digital media. Beyond this that conservators should move away from “outdated notions of fixity.” Given the importance of the concept of fixity in digital preservation circles, could you unpack this a bit for us? While digital objects do indeed execute and perform the fact that I can run a fixity check and confirm that this copy of the digital object is identical to what it was before seems to be an incredibly powerful and useful component of ensuring long-term access to them. Given that based on the nature of digital objects, we can actually ensure fixity in a way we never could with analog artifacts, this idea of distancing ourselves from fixity seemed strange.
Richard: You hit the nail on the head with that last sentence; and we’re hitting a little bit of a semantic wall here as well–fixity as used in computer science and certain digital preservation circles does not quite have the same meaning as when used in lay text or in the context of traditional object-based museum preservation. I was using fixity in the latter sense (as the first book on this topic, we wrote for a lay audience and across professional fields as much as possible.) Your last thought compares the uses of “fixity” as checks between analog media (electronic, reproducible; film, tape, or vinyl) compared to digital media, but in the book I was comparing fixity as applied to a different class of analog objects (physical; marble, bronze, paint) compared to digital objects.
If we step back from the professional jargon for a moment, I would characterize the traditional museological preservation approach for oil painting and bronze sculptures to be one based on fixity. The kind of digital authentication that you are talking about is more like the scientific concept of repeatability; a concept based on consistency and reproduction–the opposite of the fixity! I think the approach we outline in the book is in opposition to fixity of the marble-bust variety (as inappropriate for digital media) but very much in-line with fixity as digital authentication (as one tool for guiding and balancing a certain level of change with a certain level of integrity.) Jon may disagree here–in fact we built in these dynamics of agreement/disagreement into our book too.
Jon: I’d like to be as open-minded as Richard. But I can’t, because I pull my hair out every time I hear another minion of cultural heritage fixated on fixity. Sure, it’s nifty that each digital file has a unique cryptographic signature we can confirm after each migration. The best thing about checksums is that they are straightforward, and many preservation tools (and even some operating systems) already incorporate such checks by default. But this seems to me a tiny sliver of a far bigger digital preservation problem, and to blow it out of proportion is to perpetuate the myth that mathematical replication is cultural preservation.
Two files with different passages of 1s and 0s automatically have different checksums but may still offer the same experience; for example, two copies of a digitized film may differ by a few frames but look identical to the human eye. The point of digitizing a Stanley Kubrick film isn’t to create a new mathematical artifact with its own unchanging properties, but to capture for future generations the experience us old timers had of watching his cinematic genius in celluloid. As a custodian of culture, my job isn’t to ensure my DVD of A Clockwork Orange is faithful to some technician’s choices when digitizing the film; it’s to ensure it’s faithful to Kubrick’s choices as a filmmaker.
Furthermore, there’s no guarantee that born-digital files with impeccable checksums will bear any relationship to the experience of an actual user. Engineer and preservationist Bruno Bachiment gives the example of an archivist who sets a Web spider loose on a website, only to have the website’s owners update it in the middle of the crawling process. (This happens more often than you might think.) Monthly checksums will give the archivist confidence that she’s archived that website, but in fact her WARC files do not correspond to any digital artifact that has ever existed in the real world. Her chimera is a perversion caused by the capturing process–like those smartphone panoramas of a dinner where the same waiter appears at both ends of the table.
As in nearly all storage-based solutions, fixity does little to help capture context. We can run checksums on the Riverside “King Lear” till the cows come home, and it still won’t tell us that boys played women’s parts, or that Elizabethan actors spoke with rounded vowels that sound more like a contemporary American accent than the King’s English, or how each generation of performers has drawn on the previous for inspiration. Even on a manuscript level, a checksum will only validate one of many variations of a text that was in reality constantly mutating and evolving.
The context for software is a bit more cut-and-dried, and the professionals I know who use emulators like to have checksums to go with their disk images. But checksums don’t help us decide what resolution or pace they should run at, or what to do with past traces of previous interactions, or what other contemporaneous software currently taken for granted will need to be stored or emulated for a work to run in the future.
Finally, even emulation will only capture part of the behaviors necessary to reconstruct digital creations in the networked age, which can depend on custom interfaces, environmental data or networks. You can’t just go around checksumming wearable hardware or GPS receivers or Twitter networks; the software will have to mutate to accommodate future versions of those environments.
So for a curator to run regular tests on a movie’s fixity is like a zookeeper running regular tests on a tiger’s DNA. Just because the DNA tests the same doesn’t guarantee the tiger is healthy, and if you want the species to persist in the long term, you have to accept that the DNA of individuals is certainly going to change.
We need a more balanced approach. You want to fix a butterfly? Pin it to a wall. If you want to preserve a butterfly, support an ecosystem where it can live and evolve.
Trevor: The process of getting our ideas out on the page can often play a role in pushing them in new directions. Are there any things that you brought into working on the book that changed in the process of putting it together?
Richard: A book is certainly slow media; purposefully so. I think the main change I noticed was the ability to put our ideas around preservation practice into a larger context of institutional history and social memory functions. Our previous expressions in journal articles or conference presentation simply did not allow us time to do that and, as stated earlier, I feel that both are important in the full consideration of preservation.
Jon: When Richard first approached me about writing this book, I thought, well it’s gonna be pretty tedious because it seemed we would be writing mostly about our own projects. At the time I was only aware of a single emulation testbed in a museum, one software package for documenting opinions on future states of works, and no more conferences and cross-institutional initiatives on variable media preservation than I could count on one hand.
Fortunately, it took us long enough to get around to writing the book (I’ll take the blame for that) that we were able to discover and incorporate like-minded efforts cropping up across the institutional spectrum, from DOCAM and ZKM to Preserving Virtual Worlds and JSMESS. Even just learning how many art museums now incorporate something as straightforward as an artist’s questionnaire into their acquisition process! That was gratifying and led me to think we are all riding the crest of a wave that might bear the digital flotsam of today’s culture into the future.
Trevor: The book covers a lot of ground, focusing on a range of issues and offering myriad suggestions for how various stakeholders could play a role in ensuring access to variable media works into the future. In all of that, is there one message or issue in the work that you think is the most critical or central?
Richard: After expanding our ideas in a book; it’s difficult to come back to tweet format, but I’ll try…
Change will happen. Don’t resist it; use it, guide it. Let art breathe; it will tell you what it needs.
Samuel Moore, Rosie Graves and Peter Kraker are the 2013-2014 Open Knowledge Panton Fellows – tasked with experimenting, exploring and promoting open practises through their research over the last twelve months. They just posted their final reports so we’d like to heartily congratulate them on an excellent job and summarise their highlights for the Open Knowledge community.
Over the last two years the Panton Fellowships have supported five early career researchers to further the aims of the Panton Principles for Open Data in Science alongside their day to day research. The provision of additional funding goes some way towards this aim, but a key benefit of the programme is boosting the visibility of the Fellow’s work within the open community and introducing them to like-minded researchers and others within the Open Knowledge network.
On stage at the Open Science Panel Vienna (Photo by FWF/APA-Fotoservice/Thomas Preiss)
Peter Kraker (full report) is a postdoctoral researcher at the Know-Centre in Graz and focused his fellowship work on two facets: open and transparent altmetrics and the promotion of open science in Austria and beyond. During his Felowship Peter released the open source visualization Head Start, which gives scholars an overview of a research field based on relational information derived from altmetrics. Head Start continues to grow in functionality, has been incorporated into Open Knowledge Labs and is soon to be made available on a dedicated website funded by the fellowship.
Peter’s ultimate goal is to have an environment where everybody can create their own maps based on open knowledge and share them with the world. You are encouraged to contribute! In addition Peter has been highly active promoting open science, open access, altmetrics and reproducibility in Austria and beyond through events, presentations and prolific blogging, resulting in some great discussions generated on social media. He has also produced a German summary of open science activities every month and is currently involved in kick-starting a German-speaking open science group through the Austrian and German Open Knowledge local groups.
Rosie with an air quality monitor
Rosie Graves (full report) is a postdoctoral researcher at the University of Leicester and used her fellowship to develop an air quality sensing project in a primary school. This wasn’t always an easy ride, the sensor was successfully installed and an enthusiastic set of schoolhildren were on board, but a technical issue meant that data collection was cut short, so Rosie plans to resume in the New Year. Further collaborations on crowdsourcing and school involvement in atmospheric science were even more successful, including a pilot rain gauge measurement project and development of a cheap, open source air quality sensor which is sure to be of interest to other scientists around the Open Knowledge network and beyond. Rosie has enjoyed her Panton Fellowship year and was grateful for the support to pursue outreach and educational work:
“This fellowship has been a great opportunity for me to kick start a citizen science project … It also allowed me to attend conferences to discuss open data in air quality which received positive feedback from many colleagues.”
Samuel Moore (full report) is a doctoral researcher in the Centre for e-Research at King’s College London and successfully commissioned, crowdfunded and (nearly) published an open access book on open research data during his Panton Year: Issues in Open Research Data. The book is still in production but publication is due during November and we encourage everyone to take a look. This was a step towards addressing Sam’s assessment of the nascent state of open data in the humanities:
“The crucial thing now is to continue to reach out to the average researcher, highlighting the benefits that open data offers and ensuring that there is a stock of accessible resources offering practical advice to researchers on how to share their data.”
Another initiative Sam initiated during the fellowship was establishing the forthcoming Journal of Open Humanities Data with Ubiquity Press, which aims to incentivise data sharing through publication credit, which in turn makes data citable through usual academic paper citation practices. Ultimately the journal will help researchers share their data, recommending repositories and best practices in the field, and will also help them track the impact of their data through citations and altmetrics.
We believe it is vital to provide early career researchers with support to try new open approaches to scholarship and hope other organisations will take similar concrete steps to demonstrate the benefits and challenges of open science through positive action.
Finally, we’d like to thank the Computer and Communications Industry Association (CCIA) for their generosity in funding the 2013-14 Panton Fellowships.
This blog post a cross-post from the Open Science blog, see the original here.
We are pleased to announce the release of Sufia 4.2.0.
This release of Sufia includes the ability to cache usage statistics in the application database, an accessibility fix, and a number of bug fixes. Thanks to Carolyn Cole, Michael Tribone, Adam Wead, Justin Coyne, and Mike Giarlo for their work on this release.
Almost every American owns a cell phone. More than half use a smartphone and sleeps with it next to the bed. How many do you think visit their library website on their phone, and what do they do there? Heads up: this one’s totally America-centric.
Who uses library mobile websites?
Almost one in five (18%) Americans ages 16-29 have used a mobile device to visit a public library’s website or access library resources in the past 12 months, compared with 12% of those ages 30 and older.) Younger Americans’ Library Habits and Expectations (2013)
If that seems anticlimactic, consider that just about every adult in the U.S. owns a cell phone, and almost every millenial in the country is using a smartphone. This is the demographic using library mobile websites, more than half of which already have a library card.
This 2013 Pew report makes the point that while digital natives still really like print materials and the library as a physical space, a non-trivial number of them said that libraries should definitely move most library services online. Future-of-the-library blather is often painted in black and white, but it is naive to think physical–or even traditional–services are going away any time soon. Rather, there is already demand for complementary or analogous online services.
Literally. When asked, 45% of Americans ages 16 – 29 wanted “apps that would let them locate library materials within the library.” They also wanted a library-branded Redbox (44%), and an “app to access library services” (42%) – by app I am sure they mean a mobile-first, responsive web site. That’s what we mean here at #libux.
48% were interested in events and programs – especially old people
44% did research
30% sought readers’ advisory (book reviews or recommendations)
30% paid fines (yikes)
27% signed-up for library programs and events
6% reserved a room
Still, young Americans are way more invested in libraries coordinating more closely with schools, offering literacy programs, and being more comfortable ( chart ). They want libraries to continue to be present in the community, do good, and have hipster decor – coffee helps.
Webbification is broadly expected, but it isn’t exactly a kudos subject. Offering comparable online services is necessary, like it is necessary that MS Word lets you save work. A library that doesn’t offer complementary or analogous online services isn’t buggy so much as it is just incomplete.
Take this away
The emphasis on the library as a physical space shouldn’t be shocking. The opportunity for the library as a hyper-locale specifically reflecting its community’s temperament isn’t one to overlook, especially for as long as libraries tally success by circulation numbers and foot traffic. The whole library-without-walls cliche that went hand-in-hand with all that Web 2.0 stuff tried to show-off the library as it could be in the cloud, but “the library as physical space” isn’t the same as “the library as disconnected space.” The tangibility of the library is a feature to be exploited both for atmosphere and web services. “Getting lost in the stacks” can and should be relegated to just something people say than something that actually happens.
The main reason for library web traffic has been and continues to be to find content (82%) and how to get it (72%).
Mobile first: The library catalog, as well as basic information about the library, must be optimized for mobile
Streamline transactions: placing and removing holds, checking out, paying fines. There is a lot of opportunity here. Basic optimization of the OPAC and cart can go along way, but you can even enable self checkout, library card registration using something like Facebook login, or payment through Apple Pay.
Be online: [duh] Offer every basic service available in person online
Improve in-house wayfinding through the web: think Google Indoor Maps
Exploit smartphone native services to anticipate context: location, as well as time-of-day, weather, etc., can be used to personalize service or contextually guess at the question the patron needs answered. “It’s 7 a.m. and cold outside, have a coffee on us.” – or even a simple “Yep. We’re open” on the front page.
Market the good the library provides to the community to win support (or donations)
A free and open-source software project launched in 2011, PressForward enables teams of researchers to aggregate, filter, and disseminate relevant scholarship using the popular WordPress web publishing platform. Just about anything available on the open web is fair game: traditional journal articles, conference papers, white papers, reports, scholarly blogs, and digital projects.
Join us for our CopyTalk, our copyright webinar, on December 4 at 2pm Eastern Time. This installment of CopyTalk is entitled, “Introducing the Statement of Best Practices in Fair Use of Collections Containing Orphan Works for Libraries, Archives, and Other Memory Institutions”.
Peter Jaszi (American University, Washington College of Law) and David Hansen (UC Berkeley and UNC Chapel Hill) will introduce the “Statement of Best Practices in Fair Use of Collections Containing Orphan Works for Libraries, Archives, and Other Memory Institutions.” This Statement, the most recent community-developed best practices in fair use, is the result of intense discussion group meetings with over 150 librarians, archivists, and other memory institution professionals from around the United States to document and express their ideas about how to apply fair use to collections that contain orphan works, especially as memory institutions seek to digitize those collections and make them available online. The Statement outlines the fair use rationale for use of collections containing orphan works by memory institutions and identifies best practices for making assertions of fair use in preservation and access to those collections.
Did you know that over 2,400 items related to Thanksgiving reside at the DPLA? From Thanksgiving menus from hotels and restaurants across this great land to Thanksgiving postcards to images of the fortunate and less fortunate taking part in Thanksgiving day festivities.
Here’s just a taste of Thanksgiving at the Digital Public Library of America.
To follow up on the October 27th webinar “$2.2 Billion Reasons to Pay Attention to WIOA,” the American Library Association (ALA) today releases a list of resources and tools that provide more information about the Workforce Innovation and Opportunity Act (WIOA). The Workforce Innovation and Opportunity Act allows public libraries to be considered additional One-Stop partners, prohibits federal supervision or control over selection of library resources and authorizes adult education and literacy activities provided by public libraries as an allowable statewide employment and training activity.
In-person, 3-day Advanced DSpace Course in Austin March 17-19, 2015. The total cost of the course is being underwritten with generous support from the Texas Digital Library and DuraSpace. As a result, the registration fee for the course for DuraSpace Members is only $250 and $500 for Non-Members (meals and lodging not included). Seating will be limited to 20 participants.
The discussions between libraries and major publishers about subscriptions have only rarely been actual negotiations. In almost all cases the libraries have been unwilling to walk away and the publishers have known this. This may be starting to change; Dutch libraries have walked away from the table with Elsevier. Below the fold, the details. VNSU, the association representing the 14 Dutch research universities, negotiates on their behalf with journal publishers. Earlier this month they announced that their current negotiations with Elsevier are at an impasse, on the issues of costs and the Dutch government's Open Access mandate:
Negotiations between the Dutch universities and publishing company Elsevier on subscription fees and Open Access have ground to a halt. In line with the policy pursued by the Ministry of Education, Culture and Science, the universities want academic publications to be freely accessible. To that end, agreements will have to be made with the publishers. The proposal presented by Elsevier last week totally fails to address this inevitable change.
During several round[s] of talks, no offer was made which would have led to a real, and much-needed, transition to open access. Moreover, Elsevier has failed to deliver an offer that would have kept the rising costs of library subscriptions at an acceptable level. ... In the meantime, universities will prepare for the possible consequences of an expiration of journal subscriptions. In case this happens researchers will still be able to publish in Elsevier journals. They will also have access to back issues of these journals. New issues of Elsevier journals as of 1-1-2015 will not be accessible anymore.
I assume that this means that post-cancellation access will be provided by Elsevier directly, rather than by an archiving service. The government and the Dutch research funder have expressed support for VNSU's position.
This stand by the Dutch is commendable; the outcome will be very interesting. In a related development, if my marginal French is not misleading me, a new law in Germany allows authors of publicly funded research to make their accepted manuscripts freely available 1 year after initial publication. Both stand in direct contrast to the French "negotiation" with Elsevier:
France may not have any money left for its universities but it does have money for academic publishers. While university presidents learn that their funding is to be reduced by EUR 400 million, the Ministry of Research has decided, under great secrecy, to pay EUR 172 million to the world leader in scientific publishing Elsevier .
We’re all awash in technological innovation. It can be a challenge to know what new tools are likely to have staying power — and what that might mean for libraries. The recently published Top Technologies Every Librarian Needs to Know highlights a selected set of technologies that are just starting to emerge and describes how libraries might adapt them in the next few years.
In this webinar, join the authors of three chapters from the book as they talk about their technologies and what they mean for libraries.
Hands-Free Augmented Reality: Impacting the Library Future Presenters: Brigitte Bell & Terry Cottrell
Based on the recent surge of interest in head-mounted augmented reality devices such as the 3D gaming console Oculus Rift and Google’s Glass project, it seems reasonable to expect that the implementation of hands-free augmented reality technology will become common practice in libraries within the next 3-5 years.
The Future of Cloud-Based Library Systems Presenters: Elliot Polak & Steven Bowers
In libraries, cloud computing technology can reduce the costs and human capital associated with maintaining a 24/7 Integrated Library System while facilitating an up-time that is costly to attain in-house. Cloud-Based Integrated Library Systems can leverage a shared system environment, allowing libraries to share metadata records and other system resources while maintaining independent local information allowing for reducing redundant workflows and yielding efficiencies for cataloging/metadata and acquisitions departments.
Library Discovery: From Ponds to Streams Presenter: Ken Varnum
Rather than exploring focused ponds of specialized databases, researchers now swim in oceans of information. What is needed is neither ponds (too small in our interdisciplinary world) or oceans (too broad and deep for most needs), but streams — dynamic, context-aware subsets of the whole, tailored to the researcher’s short- or long-term interests.
Webinar Fees are:
LITA Member: $39
Register Online now to join us what is sure to be an excellent and informative webinar.
Open Knowledge and Code for Africa are pleased to announce the launch of our pilot Open Government Fellowship programme. The six month programme seeks to empower the next generation of leaders in field of open government.
We are looking for candidates that fit the following profile:
Currently engaged in the open government and/or related communities . We are looking to support individuals already actively participating in the open government community
Understands the role of civil society and citizen based organisations in bringing about positive change through advocacy and campaigning
Understands the role and importance of monitoring government commitments on open data as well as on other open government policy related issues
Has facilitation skills and enjoys community-building (both online and offline).
Is eager to learn from and be connected with an international community of open government experts, advocates and campaigners
Currently living and working in Africa. Due to limited resources and our desire to develop a focused and impactful pilot programme, we are limiting applications to those currently living and working in Africa. We hope to expand the programme to the rest of the world starting in 2015.
The primary objective of the Open Government Fellowship programme is to identify, train and support the next generation of open government advocates and community builders. As you will see in the selection criteria, the most heavily weighted item is current engagement in the open government movement at the local, national and/or international level.
Selected candidates will be part of a six-month fellowship pilot programme where we expect you to work with us for an average of six days a month, including attending online and offline trainings, organising events, and being an active member of the Open Knowledge and Code for Africa communities.
Fellows will be expected to produce tangible outcomes through during their fellowship but what these outcomes are will be up to the fellows to determine. In the application, we ask fellows to describe their vision for their fellowship or, to put it another way, to lay out what they would like to accomplish. We could imagine fellows working with a specific government department or agency to make a key dataset available, used and useful by the community or organising a series of events addressing a specific topic or challenge citizens are currently facing. We do not wish to be prescriptive, there are countless possibilities for outcomes for the fellowship but successful candidates will demonstrate a vision that has clear, tangible outcomes.
To support fellows in achieving these outcomes, all fellows will receive a stipend of $1,000 per month in addition to a project grant of $3,000 to spend over the course of your fellowship. Finally, a travel stipend is available for each fellow for national and/or international travel related to furthering the objective of their fellowship.
There are up to 3 fellowship positions open for the February to July 2015 pilot programme. Due to resourcing, we will only be accepting fellowship applications from individuals living and working in Africa. Furthermore, in order to ensure that we are able to provide fellows with strong local support during the pilot phase, we will are targeting applicants from the following countries where Code for Africa and/or Open Knowledge already have existing networks: Angola, Burkina Faso, Cameroon, Ghana, Kenya, Morocco, Mozambique, Mauritius, Namibia, Nigeria, Rwanda, South Africa, Senegal, Tunisia, Tanzania, and Uganda. We are hoping to roll out the programme in other regions in autumn 2015. If you are interested in the fellowship but not currently located in one of the target countries, please get in touch.
Do you have questions? See more about the Fellowship Programme here and have a looks at this Frequently Asked Questions (FAQ) page. If this doesn’t answer your question, email us at Katelyn[dot]Rogers[at]okfn.org
PeerLibrary’s groups and collections functionality is especially suited towards educators running classes that involve reading and discussing various academic publications. This week we would like to highlight one such collection, created for a graduate level computer science class taught by Professor John Kubiatowicz at UC Berkeley. The course, Advanced Topics in Computer Systems, requires weekly readings which are handily stored on the PeerLibrary platform for students to read, discuss, and collaborate outside of the typical classroom setting. Articles within the collection come from a variety of sources, such as the publicly available “Key Range Locking Strategies” and the closed access “ARIES: A Transaction Recovery Method”. Even closed access articles, which hide the article from unauthorized users, allow users to view the comments and annotations!
Gates Foundation to require immediate free access for journal articles
By Jocelyn Kaiser 21 November 2014 1:30 pm
Breaking new ground for the open-access movement, the Bill & Melinda Gates Foundation, a major funder of global health research, plans to require that the researchers it funds publish only in immediate open-access journals.
The policy doesn’t kick in until January 2017; until then, grantees can publish in subscription-based journals as long as their paper is freely available within 12 months. But after that, the journal must be open access, meaning papers are free for anyone to read immediately upon publication. Articles must also be published with a license that allows anyone to freely reuse and distribute the material. And the underlying data must be freely available.
Is this going to work? Will researchers be able to comply with these requirements without harm to their careers? Does the Gates Foundation fund enough research that new open access venues will open up to publish this research (and if so how will their operation be funded?), or do sufficient venues already exist? Will Gates Foundation grants include funding for “gold” open access fees?
I am interested to find out. I hope this article is accurate about what their doing, and am glad they are doing it if so.
The Gates Foundation’s own announcement appears to be here, and their policy, which doesn’t answer very many questions but does seem to be bold and without wiggle-room, is here.
I note that the policy mentions “including any underlying data sets.” Do they really mean to be saying that underlying data sets used for all publications “funded, in whole or in part, by the foundation” must be published? I hope so. Requiring “underlying data sets” to be available at all is in some ways just as big or bigger as requiring them to be available open access.
Join BitCurator users from around the globe for a hands-on day focused on current use and future development of the BitCurator digital software environment. Hosted by the BitCurator Consortium (BCC), this event will be grounded in the practical, boots-on-the-ground experiences of digital archivists and curators. Come wrestle with current challenges—engage in disc image format debates, investigate emerging BitCurator integrations and workflows, and discuss the “now what” of handling your digital forensics outputs.
Slate recently published a series of maps illustrating the languages other than English spoken in each of the fifty US states. In nearly every state, the most commonly spoken non-English language was Spanish. But when Spanish is excluded as well as English, a much more diverse – and sometimes surprising – landscape of languages is revealed, including Tagalog in California, Vietnamese in Oklahoma, and Portuguese in Massachusetts.
Public library collections often reflect the attributes and interests of the communities in which they are embedded. So we might expect that public library collections in a given state will include relatively high quantities of materials published in the languages most commonly spoken by residents of the state. We can put this hypothesis to the test by examining data from WorldCat, the world’s largest bibliographic database.
WorldCat contains bibliographic data on more than 300 million titles held by thousands of libraries worldwide. For our purposes, we can filter WorldCat down to the materials held by US public libraries, which can then be divided into fifty “buckets” representing the materials held by public libraries in each state. By examining the contents of each bucket, we can determine the most common language other than English found within the collections of public libraries in each state:
MAP 1: Most common language other than English found in public library collections, by state
As with the Slate findings regarding spoken languages, we find that in nearly every state, the most common non-English language in public library collections is Spanish. There are exceptions: French is the most common non-English language in public library collections in Massachusetts, Maine, Rhode Island, and Vermont, while German prevails in Ohio. The results for Maine and Vermont complement Slate’s finding that French is the most commonly spoken non-English language in those states – probably a consequence of Maine and Vermont’s shared borders with French-speaking Canada. The prominence of German-language materials in Ohio public libraries correlates with the fact that Ohio’s largest ancestry group is German, accounting for more than a quarter of the state’s population.
Following Slate’s example, we can look for more diverse language patterns by identifying the most common language other than English and Spanish in each state’s public library collections:
MAP 2: Most common language other than English and Spanish found in public library collections, by state
Excluding both English- and Spanish-language materials reveals a more diverse distribution of languages across the states. But only a bit more diverse: French now predominates, representing the most common language other than English and Spanish in public library collections in 32 of the 50 states. Moreover, we find only limited correlation with Slate’s findings regarding spoken languages. In some states, the most common non-English, non-Spanish spoken language does match the most common non-English, non-Spanish language in public library collections – for example, Polish in Illinois; Chinese in New York, and German in Wisconsin. But only about a quarter of the states (12) match in this way; the majority do not. Why is this so? Perhaps materials published in certain languages have low availability in the US, are costly to acquire, or both. Maybe other priorities drive collecting activity in non-English materials – for example, a need to collect materials in languages that are commonly taught in primary, secondary, and post-secondary education, such as French, Spanish, or German.
Or perhaps a ranking of languages by simple counts of materials is not the right metric. Another way to assess if a state’s public libraries tailor their collections to the languages commonly spoken by state residents is to compare collections across states. If a language is commonly spoken among residents of a particular state, we might expect that public libraries in that state will collect more materials in that language compared to other states, even if the sum total of that collecting activity is not sufficient to rank the language among the state’s most commonly collected languages (for reasons such as those mentioned above). And indeed, for a handful of states, this metric works well: for example, the most commonly spoken language in Florida after English and Spanish is French Creole, which ranks as the 38th most common language collected by public libraries in the state. But Florida ranks first among all states in the total number of French Creole-language materials held by public libraries.
But here we run into another problem: the great disparity in size, population, and ultimately, number of public libraries, across the states. While a state’s public libraries may collect heavily in a particular language relative to other languages, this may not be enough to earn a high national ranking in terms of the raw number of materials collected in that language. A large, populous state, by sheer weight of numbers, may eclipse a small state’s collecting activity in a particular language, even if the large state’s holdings in the language are proportionately less compared to the smaller state. For example, California – the largest state in the US by population – ranks first in total public library holdings of Tagalog-language materials; Tagalog is California’s most commonly spoken language after English and Spanish. But surveying the languages appearing in Map 2 (that is, those that are the most commonly spoken language other than English and Spanish in at least one state), it turns out that California also ranks first in total public library holdings for Arabic, Chinese, Dakota, French, Italian, Korean, Portuguese, Russian, and Vietnamese.
To control for this “large state problem”, we can abandon absolute totals as a benchmark, and instead compare the ranking of a particular language in the collections of a state’s public libraries to the average ranking for that language across all states (more specifically, those states that have public library holdings in that language). We would expect that states with a significant population speaking the language in question would have a state-wide ranking for that language that exceeds the national average. For example, Vietnamese is the most commonly spoken language in Texas other than English and Spanish. Vietnamese ranks fourth (by total number of materials) among all languages appearing in Texas public library collections; the average ranking for Vietnamese across all states that have collected materials in that language is thirteen. As we noted above, California has the most Vietnamese-language materials in its public library collections, but Vietnamese ranks only eighth in that state.
Map 3 shows the comparison of the state-wide ranking with the national average for the most commonly spoken language other than English and Spanish in each state:
MAP 3: Comparison of state-wide ranking with national average for most commonly spoken language other than English and Spanish
Now it appears we have stronger evidence that public libraries tend to collect heavily in languages commonly spoken by state residents. In thirty-eight states (colored green), the state-wide ranking of the most commonly spoken language other than English and Spanish in public library collections exceeds – often substantially – the average ranking for that language across all states. For example, the most commonly spoken non-English, non-Spanish language in Alaska – Yupik – is only the 10th most common language found in the collections of Alaska’s public libraries. However, this ranking is well above the national average for Yupik (182nd). In other words, Yupik is considerably more prominent in the materials held by Alaskan public libraries than in the nation at large – in the same way that Yupik is relatively more common as a spoken language in Alaska than elsewhere.
As Map 3 shows, six states (colored orange) exhibit a ranking equal to the national average; in all of these cases the language in question is French or German, languages that tend to be highly collected everywhere (the average ranking for French is four, and for German, five). Five states (colored red) exhibit a ranking that is below the national average; in four of the five cases, the state ranking is only one notch below the national average.
The high correlation between languages commonly spoken in a state, and the languages commonly found within that state’s public library collections suggests that public libraries are not homogenous, but in many ways reflect the characteristics and interests of local communities. It also highlights the important service public libraries provide in facilitating information access to community members who may not speak or read English fluently. Finally, public libraries’ collecting activity across a wide range of non-English language materials suggests the importance of these collections in the context of the broader system-wide library resource. Some non-English language materials in public library collections – perhaps the French Creole-language materials in Florida’s public libraries, or the Yupik-language materials in Alaska’s public libraries – could be rare and potentially valuable items that are not readily available in other parts of the country.
Visit your local public library … you may find some unexpected languages on the shelf.
Acknowledgement: Thanks to OCLC Research colleague JD Shipengrover for creating the maps.
Note on data: Data used in this analysis represent public library collections as they are cataloged in WorldCat. Data is current as of July 2013. Reported results may be impacted by WorldCat’s coverage of public libraries in a particular state.
Multi-Entity Models of Resource Description in the Semantic Web: A comparison of FRBR, RDA, and BIBFRAME by Tom Baker, Karen Coyle, Sean Petiya Published in: Library Hi Tech, v. 32, n. 4, 2014 pp 562-582 DOI:10.1108/LHT-08-2014-0081 Open Access Preprint
The above article was just published in Library hi Tech. However, because the article is a bit dense, as journal articles tend to be, here is a short description of the topic covered, plus a chance to reply to the article.
We now have a number of multi-level views of bibliographic data. There is the traditional "unit card" view, reflected in MARC, that treats all bibliographic data as a single unit. There is the FRBR four-level model that describes a single "real" item, and three levels of abstraction: manifestation, expression, and work. This is also the view taken by RDA, although employing a different set of properties to define instances of the FRBR classes. Then there is the BIBFRAME model, which has two bibliographic levels, work and instance, with the physical item as an annotation on the instance.
In support of these views we have three RDF-based vocabularies:
The vocabularies use a varying degree of specification. FRBRer is the most detailed and strict, using OWL to define cardinality, domains and ranges, and disjointness between classes and between properties. There are, however, no sub-classes or sub-properties. BIBFRAME properties all are defined in terms of domains (classes), and there are some sub-class and sub-property relationships. RDA has a single set of classes that are derived from the FRBR entities, and each property has the domain of a single class. RDA also has a parallel vocabulary that defines no class relationships; thus, no properties in that vocabulary result in a class entailment. 
As I talked about in the previous blog post on classes, the meaning of classes in RDF is often misunderstood, and that is just the beginning of the confusion that surrounds these new technologies. Recently, Bernard Vatant, who is a creator of the Linked Open Vocabularies site that does a statistical analysis of the existing linked open data vocabularies and how they relate to each other, said this on the LOV Google+ group:
"...it seems that many vocabularies in LOV are either built or used (or both) as constraint and validation vocabularies in closed worlds. Which means often in radical contradiction with their declared semantics."
What Vatant is saying here is that many vocabularies that he observes use RDF in the "wrong way." One of the common "wrong ways" is to interpret the axioms that you can define in RDFS or OWL the same way you would interpret them in, say, XSD, or in a relational database design. In fact, the action of the OWL rules (originally called "constraints," which seems to have contributed to the confusion, now called "axioms") can be entirely counter-intuitive to anyone whose view of data is not formed by something called "description logic (DL)."
A simple demonstration of this, which we use in the article, is the OWL axiom for "maximum cardinality." In a non-DL programming world, you often state that a certain element in your data is limited to the number of times it can be used, such as saying that in a MARC record you can have only one 100 (main author) field. The maximum cardinality of that field is therefore "1". In your non-DL environment, a data creation application will not let you create more than one 100 field; if an application receiving data encounters a record with more than one 100 field, it will signal an error.
The semantic web, in its DL mode, draws an entirely different conclusion. The semantic web has two key principles: open world, and non-unique name. Open world means that whatever the state of the data on the web today, it may be incomplete; there can be unknowns. Therefore, you may say that you MUST have a title for every book, but if a look at your data reveals a book without a title, then your book still has a title, it is just an unknown title. That's pretty startling, but what about that 100 field? You've said that there can only be one, so what happens if there are 2 or 3 or more of them for a book? That's no problem, says OWL: the rule is that there is only one, but the non-unique name rule says that for any "thing" there can be more than one name for it. So when an OWL program  encounters multiple author 100 fields, it concludes that these are all different names for the same one thing, as defined by the combination of the non-unique name assumption and the maximum cardinality rule: "There can only be one, so these three must really be different names for that one." It's a bit like Alice in Wonderland, but there's science behind it.
What you have in your database today is a closed world, where you define what is right and wrong; where you can enforce the rule that required elements absolutely HAVE TO be there; where the forbidden is not allowed to happen. The semantic web standards are designed for the open world of the web where no one has that kind of control. Think of it this way: what if you put a document onto the open web for anyone to read, but wanted to prevent anyone from linking to it? You can't. The links that others create are beyond your control. The semantic web was developed around the idea of a web (aka a giant graph) of data. You can put your data up there or not, but once it's there it is subject to the open functionality of the web. And the standards of RDFS and OWL, which are the current standards that one uses to define semantic web data, are designed specifically for that rather chaotic information ecosystem, where, as the third main principle of the semantic web states, "anyone can say anything about anything."
I have a lot of thoughts about this conflict between the open world of the semantic web and the needs for closed world controls over data; in particular whether it really makes sense to use the same technology for both, since there is such a strong incompatibility in underlying logic of these two premises. As Vatant implies, many people creating RDF data are doing so with their minds firmly set in closed world rules, such that the actual result of applying the axioms of OWL and RDF on this data on the open web will not yield the expected closed world results.
This is what Baker, Petiya and I address in our paper, as we create examples from FRBRer, RDA in RDF, and BIBFRAME. Some of the results there will probably surprise you. If you doubt our conclusions, visit the site http://lod-lam.slis.kent.edu/wemi-rdf/ that gives more information about the tests, the data and the test results.
 "Entailment" means that the property does not carry with it any "classness" that would thus indicate that the resource is an instance of that class.
 Programs that interpret the OWL axioms are called "reasoners". There are a number of different reasoner programs available that you can call from your software, such as Pellet, Hermit, and others built into software packages like TopBraid.
If you are interested in an update about where/how to get the data after reading this see here.
Much has been written about the significance of Twitter as the recent events in Ferguson echoed round the Web, the country, and the world. I happened to be at the Society of American Archivists meeting 5 days after Michael Brown was killed. During our panel discussion someone asked about the role that archivists should play in documenting the event.
There was wide agreement that Ferguson was a painful reminder of the type of event that archivists working to “interrogate the role of power, ethics, and regulation in information systems” should be documenting. But what to do? Unfortunately we didn’t have time to really discuss exactly how this agreement translated into action.
Fortunately the very next day the Archive-It service run by the Internet Archiveannounced that they were collecting seed URLs for a Web archive related to Ferguson. It was only then, after also having finally read Zeynep Tufekci‘s terrific Medium post, that I slapped myself on the forehead … of course, we should try to archive the tweets. Ideally there would be a “we” but the reality was it was just “me”. Still, it seemed worth seeing how much I could get done.
I had some previous experience archiving tweets related to Aaron Swartz using Twitter’s search API. (Full disclosure: I also worked on the Twitter archiving project at the Library of Congress, but did not use any of that code or data then, or now.) I wrote a small Python command line program named twarc (a portmanteau for Twitter Archive), to help manage the archiving.
You give twarc a search query term, and it will plod through the search results, in reverse chronological order (the order that they are returned in), while handling quota limits, and writing out line-oriented-json, where each line is a complete tweet. It worked quite well to collect 630,000 tweets mentioning “aaronsw”, but I was starting late out of the gate, 6 days after the events in Ferguson began. One downside to twarc is it is completely dependent on Twitter’s search API, which only returns results for the past week or so. You can search back further in Twitter’s Web app, but that seems to be a privileged client. I can’t seem to convince the API to keep going back in time past a week or so.
So time was of the essence. I started up twarc searching for all tweets that mention ferguson, but quickly realized that the volume of tweets, and the order of the search results meant that I wouldn’t be able to retrieve the earliest tweets. So I tried to guesstimate a Twitter ID far enough back in time to use with twarc’s --max_id parameter to limit the initial query to tweets before that point in time. Doing this I was able to get back to 2014-08-10 22:44:43 — most of August 9th and 10th had slipped out of the window. I used a similar technique of guessing a ID further in the future in combination with the --since_id parameter to start collecting from where that snapshot left off. This resulted in a bit of a fragmented record, which you can see visualized (sort of below):
In the end I collected 13,480,000 tweets (63G of JSON) between August 10th and August 27th. There were some gaps because of mismanagement of twarc, and the data just moving too fast for me to recover from them: most of August 13th is missing, as well as part of August 22nd. I’ll know better next time how to manage this higher volume collection.
Apart from the data, a nice side effect of this work is that I fixed a socket timeout error in twarc that I hadn’t noticed before. I also refactored it a bit so I could use it programmatically like a library instead of only as a command line tool. This allowed me to write a program to archive the tweets, incrementing the max_id and since_id values automatically. The longer continuous crawls near the end are the result of using twarc more as a library from another program.
Bag of Tweets
To try to arrange/package the data a bit I decided to order all the tweets by tweet id, and split them up into gzipped files of 1 million tweets each. Sorting 13 million tweets was pretty easy using leveldb. I first loaded all 16 million tweets into the db, using the tweet id as the key, and the JSON string as the value.
db = leveldb.LevelDB('./tweets.db')for line infileinput.input():
tweet = json.loads(line)
This took almost 2 hours on a medium ec2 instance. Then I walked the leveldb index, writing out the JSON as I went, which took 35 minutes:
db = leveldb.LevelDB('./tweets.db')for k, v in db.RangeIter(None, include_value=True):
After splitting them up into 1 million line files with cut and gzipping them I put them in a Bag and uploaded it to s3 (8.5G).
I am planning on trying to extract URLs from the tweets to try to come up with a list of seed URLs for the Archive-It crawl. If you have ideas of how to use it definitely get in touch. I haven’t decided yet if/where to host the data publicly. If you have ideas please get in touch about that too!
What technology are you watching on the horizon? Have you seen brilliant ideas that need exposing? Do you really like sharing with your LITA colleagues?
The LITA Top Tech Trends Committee is trying a new process this year and issuing a Call for Panelists. Answer the short questionnaire by 12/10 to be considered. Fresh faces and diverse panelists are especially encouraged to respond. Past presentations can be viewed at http://www.ala.org/lita/ttt.
Help preserve our shared heritage, increase funding for conservation, and strengthen collections care by completing the Heritage Health Information (HHI) 2014 National Collections Care Survey. The HHI 2014 is a national survey on the condition of collections held by archives, libraries, historical societies, museums, scientific research collections, and archaeological repositories. It is the only comprehensive survey to collect data on the condition and preservation needs of our nation’s collections.
The deadline for the Heritage Health Information 2014: A National Collections Care Survey is December 19, 2014. In October, the Heritage Health Information sent invitations to the directors of over 14,000 collecting institutions across the country to participate in the survey. These invitations included personalized login information, which may be entered at hhi2014.com.
Questions about the survey may be directed to hhi2014survey [at] heritagepreservation [dot] org or 202-233-0824.
The session offers information on laws, legal resources and legal reference practices. Participants will learn how to handle a law reference interview, including where to draw the line between information and advice, key legal vocabulary and citation formats. During the webinar, leaders offer tips on how to assess and choose legal resources for patrons.
Catherine McGuire is the head of Reference and Outreach at the Maryland State Law Library. McGuire currently plans and presents educational programs to Judiciary staff, local attorneys, public library staff and members of the public on subjects related to legal research and reference. She serves as Vice Chair of the Conference of Maryland Court Law Library Directors and the co-chair of the Education Committee of the Legal Information Services to the Public Special Interest Section (LISP-SIS) of the American Association of Law Libraries (AALL).
It was a foregone conclusion that once we launched this series, we would be featuring FJM sooner rather than later, but it happens that we're visiting them just as they have launched a new collection: La saga Fernández-Shaw y el teatro lírico, containing three archives of a family of Spanish playwrights. This collection is also a great example of why we love this site: innovative browsing tools such as a timeline viewer, carefully curated collections spanning a wide varieties of objects types living side-by-side (the Knowledge Protal approach really makes this work), and seamless multi-language support.
FJM was also highlighted by D-LIB Magazine this month, as their Featured Digital Collection, a well -deserved honour that explores their collections and past projects in greater depth.
Curious about the code behind this repo? FJM has been kind enough to share the details of a number of their initial collections on GitHub. Since they take the approach of using .NET for the web interface instead of using Drupal, the FJM .Net Library may also prove useful to anyone exploring alternate front-ends for their own collections.
Our Show and tell interview was completed by Luis Martínez Uribe, who will be joining us at Islandora Camp in Madrid as an instructor in the Admin Track in May 2015.
What is the primary purpose of your repository? Who is the intended audience?
We have always said that more than a technical system, the FJM digital repository tries to bring in a new working culture. Since the Islandora deployment, the repository has been instrumental in transforming the way in which data is generated and looked after across the organization. Thus the main purpose behind our repository philosophy is to take an active approach to ensure that our organizational data is managed using appropriate standards, made available via knowledge portals and preserved for future access.
The contents are highly heterogeneous with materials from the departments of Art, Music, Conferences, a Library of Spanish Music and Theatre as well as various outputs from scientific centres and scholarships. Therefore the audience ranges from the general public interested in particular art exhibitions, concerts or lecture to the highly specialised researchers in fields such as theatre, sociology or biology.
Why did you choose Islandora?
Back in 2010 the FJM was looking for a robust and flexible repository framework to manage an increasing volume of interrelated digital materials. With preservation in mind, the other most important aspect was the capacity to create complex models to accommodate relations between diverse types of content from multiple sources such as databases, the library catalogue, etc. Islandora provided the flexibility of Fedora plus easy customization powered by Drupal. Furthermore, discoverygarden could kick start us with their services and having Mark Leggott leading the project provided us with the confidence that our library needs and setting would be well understood.
Which modules or solution packs are most important to your repository?
In our latest collections we mostly use Drupal for prototyping. For this reason modules such as the Islandora Solr Client, the PDF Solution Pack or the Book Module are rather useful components to help us test and correct our collections once ingested and before the web layer is deployed.
What feature of your repository are you most proud of?
We like to be able to present the information through easy to grasp visualizations and have used timelines and maps in the past. In addition to this, we have started exploring the use of recommendation systems that once an object is selected it will suggest other materials of interest. This has been used in production in “All our art catalogues since 1973”.
Who built/developed/designed your repository (i.e, who was on the team?)
After that, the Library and IT Services undertook the development of a small and simple collection of essays to then move into a more complex product like the Personal Library of Cortazar that required more advanced work from web programmers and designers.
In the last year, we have developed a .NET library that allows us to interact with the Islandora components such as Fedora, Solr or RISearch. Since then we have undertaken more complex interdepartmental ventures like the collection “All our art catalogues since 1973” where Library, IT and the web team have worked with colleagues in other departments such digitisation, art and design.
Do you have plans to expand your site in the future?
The knowledge portals developed using Islandora have been well received both internally and externally with many visitors. We plan to expand the collections with many more materials as well as using the repository to host the authority index and the thesaurus collections for the FJM. This will continue our work to ensure that the FJM digital materials are managed, connected and preserved.
What is your favourite object in your collection to show off?
This is a hard one, but if we have to chose our favourite object we would probably chose a resource like the The Avant-Garde Applied (1890.1950) art catalogue. The catalogue is presented with different photos of the spine and back cover, with other editions and related catalogues with a responsive web design and multi-device progressive loading viewer.
Our thanks to Luis and to FJM for agreeing to this feature. To learn more about their approach to Islandora, you can query to source by attending Islandora Camp EU2.
In honor of Thanksgiving, I’d like to give thanks for 5 tech tools that make life as a librarian much easier.
On any given day I work on at least 6 different computers and tablets. That means I need instant access to my documents wherever I go and without cloud storage I’d be lost. While there are plenty of other free file hosting services, I like Drive the most because it offers 15GB of free storage and it’s incredibly easy to use. When I’m working with patrons who already have a Gmail account, setting up Drive is just a click away.
I dabbled in Goodreads for a bit, but I must say, Libib has won me over. Libib lets you catalog your personal library and share your favorite media with others. While it doesn’t handle images quite as well as Goodreads, I much prefer Libib’s sleek and modern interface. Instead of cataloging books that I own, I’m currently using Libib to create a list of my favorite children’s books to recommend to patrons.
Hopscotch is my favorite iOS app right now. With Hopscotch, you can learn the fundamentals of coding through play. The app is marketed towards kids, but I think the bubbly characters and lighthearted nature appeals to adults too. I’m using Hopscotch in an upcoming adult program at the library to show that coding can be quirky and fun. If you want to use Hopscotch at your library, check out their resources for teachers. They’ve got fantastic ready made lesson plans for the taking.
My love affair with Photoshop started many years ago, but as I’ve gotten older, Illustrator and I have become a much better match. I use Illustrator to create flyers, posters, and templates for computer class handouts. The best thing about Illustrator is that it’s designed for working with vector graphics. That means I can easily translate a design for a 6-inch bookmark into a 6-foot poster without losing image quality.
Twitter is hands-down my social network of choice. My account is purely for library-related stuff and I know I can count on Twitter to pick me up and get me inspired when I’m running out of steam. Thanks to all the libraries and librarians who keep me going!
What tech tools are you thankful for? Please share in the comments!
When Boston Public Library first designed its statewide digitization service plan as an LSTA-funded grant project in 2010, we offered free imaging to any institution that agreed to make their digitized collections available through the Digital Commonwealth repository and portal system. We hoped and suggested that money not spent by our partners on scanning might then be invested in the other side of any good digital object – descriptive metadata. We envisioned a resurgence of special collections cataloging in libraries, archives, and historical societies across Massachusetts.
After a couple of years, reality set in. Most of our partners did not have the resources to generate good descriptive records structured well enough to fit into our MODS application profile without major oversight and intervention on our part. What we did find, however, were some very dedicated and knowledgeable local historians, librarians, and archivists who maintained a variety of documentation that could be best described as “pre-metadata.” Their local landscapes included inventories, spreadsheets, caption files, finding aids, catalog cards, sleeve inscriptions, dusty three-ring binders – the rich soil from which good metadata grows.
We understood it was now our job to cultivate and harvest metadata from these local sources. And thus the “Metadata Mob” was born. It is a fun and creative type of mob — less roughneck and more spontaneous dance routine. Except, instead of wildly cavorting to Do-Re-Mi in train stations, we cut-and-paste, we transcribe, we script, we spell check, we authorize, we regularize, we refine, we edit, and we enhance. It is a highly customized, hands-on process that differs slightly (or significantly) from collection to collection, institution to institution.
In many ways, the work Boston Public Library does has come to resemble the locally-sourced food movement in that we focus on how each community understands and represents their collections in their own unique way. Free-range metadata, so to speak, that we unearth after plowing through the annals of our partners.
Randall Harrow, 1870-1900. Boston Public Library via Digital Commonwealth.
We don’t impose our structures or processes on anyone beyond offering advice on some standard information science principles – the three major “food groups” of metadata as it were – well defined schema, authority control, and content standard compliance. We encourage our partners to maintain their local practices.
We then carefully nurture their information into healthy, juicy, and delicious metadata records that we can ingest into the Digital Commonwealth repository. We have all encountered online resources with weak and frail frames — malnourished with a few inconsistently used Dublin Core fields and factory-farmed values imported blindly from collection records or poorly conceived legacy projects. Our mob members eschew this technique. They are craftsmen, artisans, information viticulturists. If digital library systems are nourished by the metadata they ingest, then ours will be kept vigorous and healthy with the rich diet they have produced.
Thanks to SEMAP for use of their the logo in the header image. Check out SEMAP’s very informative website at semaponline.org. Buy Fresh, Buy Local! Photo credit: Lori De Santis.
With the DSpace 5 release coming up, we wanted to make it easier for aspiring developers to get up and running with DSpace development. In our experience, starting off on the right foot with a proven set of tools and practices can reduce someone’s learning curve and help in quickly getting to initial results. IDEA 13, the integrated development environment by IntelliJ can make a developer’s life a lot easier thanks to a truckload of features that are not included in your run-of-the-mill text editor.
By Michele Mennielli, International Relations, Cineca
Bologna, Italy During the recent euroCRIS Strategic Membership Meeting held in Amsterdam November 11-13 Cineca had the opportunity to present a new version of DSpace-CRIS with DSpace 4.2. This version of DSpace CRIS will be released in the next few days.
Before you read this post, be aware that this web page is sharing your usage with Google, Facebook, StatCounter.com, unglue.it and Harlequin.com. Google because this is Blogger. Facebook because there's a "Like" button, StatCounter because I use it to measure usage, and Harlequin because I embedded the cover for Rebecca Avery's Maid to Crave directly from Harlequin's website. Harlequin's web server has been sent the address of this page along with you IP address as part of the HTTP transaction that fetches the image, which, to be clear, is not a picture of me.
I'm pretty sure that having read the first paragraph, you're now able to give informed consent if I try to sell you a book (see unglue.it embed -->) and constitute myself as a book service for the purposes of a New Jersey "Reader Privacy Act", currently awaiting Governor Christie's signature. (Update Nov 22: Gov. Christie has conditionally vetoed the bill.) That act would make it unlawful to share information about your book use (borrowing, downloading, buying, reading, etc.) with a third party, in the absence of a court order to do so. That's good for your reading privacy, but a real problem for almost anyone running a commercial "book service".
Let's use Maid to Crave as an example. When you click on the link, your browser first sends a request to Harlequin.com. Using the instructions in the returned HTML, it then sends requests to a bunch of web servers to build the web page, complete with images, reviews and buy links. Here's the list of hosts contacted as my browser builds that page:
seal.verisign.com (A security company)
www.goodreads.com (The review comes from GoodReads. They're owned by Amazon.)
stats.g.doubleclick.net (Doubleclick is an advertising network owned by Google)
cdn.gigya.com (Gigya’s Consumer Identity Management platform helps businesses identify consumers across any device, achieve a single customer view by collecting and consolidating profile and activity data, and tap into first-party data to reach customers with more personalized marketing messaging.)
www.facebook.com (I'm told this is a social network)
fbstatic-a.akamaihd.net (Akamai is here helping to distribute facebook content)
platform.twitter.com (yet another social network)
edge.quantserve.com (QuantCast is an "audience research and behavioural advertising company")
All of these servers are given my IP address and the URL of the Harlequin page that I'm viewing. All of these companies except Verisign, Norton and Akamai also set tracking cookies that enable them to connect my browsing of the Harlequin site with my activity all over the web. The Guardian has a nice overview of these companies that track your use of the web. Most of them exist to better target ads at you. So don't be surprised if, once you've visited Harlequin, Amazon tries to sell you romance novels.
Certainly Harlequin qualifies as a commercial book service under the New Jersey law. And certainly Harlequin is giving personal information (IP addresses are personal information under the law) to a bunch of private entities without a court order. And most certainly it is doing so without informed consent. So its website is doing things that will be unlawful under the New Jersey law.
But it's not alone. Almost any online bookseller uses services like those used by Harlequin. Even Amazon, which is pretty much self contained, has to send your personal information to Ingram to fulfill many of the book orders sent to it. Under the New Jersey law, it appears that Amazon will need to get your informed consent to have Ingram send you a book. And really, do I care? Does this improve my reading privacy?
The companies that can ignore this law are Apple, Target, Walmart and the like. Book services are exempt if they derive less than 2% of their US consumer revenue from books. So yay Apple.
Lord knows we need some basic rules about privacy of our reading behavior. But I think the New Jersey law does a lousy job of dealing with the realities of today's internet. I wonder if we'll ever start a real discussion about what and when things should be private on the web.
Nate Hoffelder over at The Digital Reader highlighted the passage of a new "Reader Privacy Act" passed by the New Jersey State Legislature. If signed by Governor Chris Christie it would take effect immediately. It was sponsored by my state senator, Nia Gill.
In light of my writing about privacy on library websites, this poorly drafted bill, though well intentioned, would turn my library's website into a law-breaker, subject to a $500 civil fine for every user. (It would also require us to make some minor changes at Unglue.it.)
It defines "personal information" as "(1) any information that identifies, relates to, describes, or is associated with a particular user's use of a book service; (2) a unique identifier or Internet Protocol address, when that identifier or address is used to identify, relate to, describe, or be associated with a particular user, as related to the user’s use of a book service, or book, in whole or in partial form; (3) any information that relates to, or is capable of being associated with, a particular book service user’s access to a book service."
“Provider” means any commercial entity offering a book service to the public.
A provider shall only disclose the personal information of a book service user [...] to a person or private entity pursuant to a court order in a pending action brought by [...] by the person or private entity.
Any book service user aggrieved by a violation of this act may recover, in a civil action, $500 per violation and the costs of the action together with reasonable attorneys’ fees.
My library, Montclair Public Library, uses a web catalog run by Polaris, a division of Innovative Interfaces, a private entity, for BCCLS, a consortium serving northern New Jersey. Whenever I browse a catalog entry in this catalog, a cookie is set by AddThis (and probably other companies) identifying me and the web page I'm looking at. In other words, personal information as defined by the act is sent to a private entity, without a court order.
And so every user of the catalog could sue Innovative for $500 each, plus legal fees.
Existing library privacy laws in NJ have reasonable exceptions for "proper operations of the library". This law does not have a similar exemption.
I urge Governor Christie to veto the bill and send it back to the legislature for improvements that take account of the realities of library websites and make it easier for internet bookstores and libraries to operate legally in the Garden State.
You can contact Gov. Christie's office using this form.
Update: Just talked to one of Nia Gill's staff; they're looking into it. Also updated to include the 2nd set of amendments.
Update 2: A close reading of the California law on which the NJ statute was based reveals that poor wording in section 4 is the source of the problem. In the California law, it's clear that it pertains only to the situation where a private entity is seeking discovery in a legal action, not when the private entity is somehow involved in providing the service.
Where the NJ law reads
A provider shall only disclose the personal information of a book service user to a government entity, other than a law enforcement entity, or to a person or private entity pursuant to a court order in a pending action brought by the government entity or by the person or private entity.
it's meant to read
In a pending action brought by the government entity other than a law enforcement entity, or by a person or by a private entity, a provider shall only disclose the personal information of a book service user to such entity or person pursuant to a court order.
According to New Jersey Governor Chris Christie's conditional veto statement, "Citizens of this State should be permitted to read what they choose without unnecessary government intrusion." It's hard to argue with that! Personally, I think we should also be permitted to read what we choose without corporate surveillance.
As previously reported in The Digital Reader, the bill passed in September by wide margins in both houses of the New Jersey State Legislature and would have codified the right to read ebooks without letting the government and everybody else knowing about it.
I wrote about some problems I saw with the bill. Based on a California law focused on law enforcement, the proposed NJ law added civil penalties on booksellers who disclosed the personal information of users without a court order. As I understood it, the bill could have prevented online booksellers from participating in ad networks (they all do!).
Governor Christie's veto statement pointed out more problems. The proposed law didn't explicitly prevent the government from asking for personal reading data, it just made it against the law for a bookseller to comply. So, for example, a local sheriff could still ask Amazon for a list of people in his town reading an incriminating book. If Amazon answered, somehow the reader would have to:
find out that Amazon had provided the information
sue Amazon for $500.
Another problem identified by Christie was that the proposed law imposed privacy burdens on booksellers stronger than those on libraries. Under another law, library records in New Jersey are subject to subpoena, but bookseller records wouldn't be. That's just bizarre.
In New Jersey, a governor can issue a "Conditional Veto". In doing so, the governor outlines changes in a bill that would allow it to become law. Christie's revisions to the Reader Privacy Act make the following changes:
The civil penalties are stripped out of the bill. This allows Gov. Christie to position himself and NJ as "business-friendly".
A requirement is added preventing the government from asking for reader information without a court order or subpoena. Christie gets to be on the side of liberty. Yay!
It's made clear that the law applies only to government snooping, and not to promiscuous data sharing with ad networks. Christie avoids the ire of rich ad network moguls.
Child porn is carved out of the definition of "books". Being tough on child pornography is one of those politically courageous positions that all politicians love.
The resulting bill, which was quickly reintroduced in the State Assembly, is stronger but narrower. It wouldn't apply in situations like the recent Adobe Digital Editions privacy breach, but it should be more effective at stopping "unnecessary government intrusion". I expect it will quickly pass the Legislature and be signed into law. A law that properly addresses the surveillance of ebook reading by private companies will be much more complicated and difficult to achieve.
I'm not a fan of his by any means, but Chris Christie's version of the Reader Privacy Act is a solid step in the right direction and would be an excellent model for other states. We could use a law like it on the national level as well.
As some of you already know, Marlene and I are moving from Seattle to Atlanta in December. We’ve moved many (too many?) times before, so we’ve got most of the logistics down pat. Movers: hired! New house: rented! Mail forwarding: set up! Physical books: still too dang many!
We could do it in our sleep! (And the scary thing is, perhaps we have in the past.)
One thing that is different this time is that we’ll be driving across the country, visiting friends along the way. 3,650 miles, one car, two drivers, one Keurig, two suitcases, two sets of electronic paraphernalia, and three cats.
Who wants to lay odds on how many miles it will take each day for the cats to lose their voices?
Fortunately Sophia is already testing the cats’ accommodations:
I will miss the friends we made in Seattle, the summer weather, the great restaurants, being able to walk down to the water, and decent public transportation. I will also miss the drives up to Vancouver for conferences with a great bunch of librarians; I’m looking forward to attending Code4Lib BC next week, but I’m sorry to that our personal tradition of American Thanksgiving in British Columbia is coming to an end.
As far as Atlanta is concerned, I am looking forward to being back in MPOW’s office, having better access to a variety of good barbecue, the winter weather, and living in an area with less de facto segregation.
It’s been a good two years in the Pacific Northwest, but much to my surprise, I’ve found that the prospect of moving back to Atlanta feels a bit like a homecoming. So, onward!
As the Ebola outbreak continues, the public must sort through all of the information being disseminated via the news media and social media. In this rapidly evolving environment, librarians are providing valuable services to their communities as they assist their users in finding credible information sources on Ebola, as well as other infectious diseases.
On Tuesday, December 12, 2014, library leaders from the U.S. National Library of Medicine will host the free webinar “Ebola and Other Infectious Diseases: The Latest Information from the National Library of Medicine.” As a follow-up to the webinar they presented in October, librarians from the U.S. National Library of Medicine will be discussing how to provide effective services in this environment, as well as providing an update on information sources that can be of assistance to librarians.
Siobhan Champ-Blackwell is a librarian with the U.S. National Library of Medicine Disaster Information Management Research Center. Champ-Blackwell selects material to be added to the NLM disaster medicine grey literature data base and is responsible for the Center’s social media efforts. Champ-Blackwell has over 10 years of experience in providing training on NLM products and resources.
Elizabeth Norton is a librarian with the U.S. National Library of Medicine Disaster Information Management Research Center where she has been working to improve online access to disaster health information for the disaster medicine and public health workforce. Norton has presented on this topic at national and international association meetings and has provided training on disaster health information resources to first responders, educators, and librarians working with the disaster response and public health preparedness communities.
As faculty and students delve into digital scholarly works, they are tripping over the kinds of challenges that libraries specialize in overcoming, such as questions regarding digital project planning, improving discovery or using quality metadata. Indeed, nobody is better suited at helping scholars with their decisions regarding how to organize and deliver their digital works than librarians.
At my institution, we have not marketed our expertise in any meaningful way (yet), but we receive regular requests for help by faculty and campus organizations who are struggling with publishing digital scholarship. For example, a few years ago a team of librarians at my library helped researchers from the University of Ireland at Galway to migrate and restructure their online collection of annotations from the Vatican Archive to a more stable home on Omeka.net. Our expertise in metadata standards, OAI harvesting, digital collection platforms and digital project planning turned out to be invaluable to saving their dying collection and giving it a stable, long-term home. You can read more in my Saved by the Cloud post.
These kinds of requests have continued since. In recognition of this growing need, we are poised to launch a digital consultancy service on our campus.
Digital Project Planning
A core component of our jobs is planning digital projects. Over the past year, in fact, we’ve developed a standard project planning template that we apply to each digital project that comes our way. This has done wonders at keeping us all up to date on what stage each project is in and who is up next in terms of the workflow.
Researchers are often experts at planning out their papers, but they don’t normally have much experience with planning a digital project. For example, because metadata and preservation are things that normally don’t come up for them, they overlook planning around these aspects. And more generally, I’ve found that just having a template to work with can help them understand how the experts do digital projects and give them a sense of the issues they need to consider when planning their own projects, whether that’s building an online exhibit or organizing their selected works in ways that will reap the biggest bang for the buck.
We intend to begin formally offering project planning help to faculty very soon.
It’s also our job to keep abreast of the various technologies available for distributing digital content, whether that is harvesting protocols, web content management systems, new plugins for WordPress or digital humanities exhibit platforms. Sometimes researchers know about some of these, but in my experience, their first choice is not necessarily the best for what they want to do.
It is fairly common for me to meet with campus partners that have an existing collection online, but which has been published in a platform that is ill-suited for what they are trying to accomplish. Currently, we have many departments moving old content based in SQL databases to plain HTML pages with no database behind them whatsoever. When I show them some of the other options, such as our Digital Commons-based institutional repository or Omeka.net, they often state they had no idea that such options existed and are very excited to work with us.
I think people in general are becoming more aware of metadata, but there is still lots of technical considerations that your typical researcher may not be aware of. At our library, we have helped out with all aspects of metadata. We have helped them clean up their data to conform to authorized terms and standard vocabularies. We have explained Dublin Core. We have helped re-encode their data so that diacritics display online. We have done crosswalking and harvesting. It’s a deep area of knowledge and one that few people outside of libraries know on a suitably deep level.
One recommendation for any budding metadata consultants that I would share is that you really need to be the Carl Sagan of metadata. This is pretty technical stuff and most people don’t need all the details. Stick to discussing the final outcome and not the technical details and your help will be far more understood and appreciated. For example, I once presented to a room of researchers on all the technical fixes to a database that we made to enhance and standardize the metadata, but his went over terribly. People later came up to me and joked that whatever it was we did, they’re sure it was important and thanked us for being there. I guess that was a good outcome since they acknowledged our contribution. But it would have been better had they understood, the practical benefits for the collection and users of that content.
Search Engine Optimization is not hard, but it is likely that few people outside of the online marketing and web design world know what it is. I often find people can understand it very quickly if you simply define it as “helping Google understand your content so it can help people find you.” Simple SEO tricks like defining and then using keywords in your headers will do wonders for your collection’s visibility in the major search engines. But you can go deep with this stuff too, so I like to gauge my audience’s appetite for this stuff and then provide them with as much detail as I think they have an appetite for.
It’s a sad statement on the state of libraries, but the real discovery game is in the major search engines…not in our siloed, boutique search interfaces. Most people begin their searches (whether academic or not) in Google and this is really bad news for our digital collections since by and large, library collections are indexed in the deep web, beyond the reach of the search robots.
I recently tried a search for the title of a digital image in one of our collections in Google.com and found it. Yeah! Now I tried the same search in Google Images. No dice.
More librarians are coming to terms with this discovery problem now and we need to share this with digital scholars as they begin considering their own online collections so that they don’t make the mistakes libraries made (and continue to make…sigh) with our own collections.
We had one department at my institution that was sitting on a print journal that they were considering putting online. Behind this was a desire to bring the publication back to life since they had been told by one researcher in Europe that she thought the journal had been discontinued years ago. Unfortunately, it was still being published, it just wasn’t being indexed in Google. We offered our repository as an excellent place to do so, especially because it would increase their visibility worldwide. Unfortunately, they opted for a very small, non-profit online publisher whose content we demonstrated was not surfacing in Google or Google Scholar. Well, you can lead a horse to water…
Still, I think this kind of understanding of the discovery universe does resonate with many. Going back to our somewhat invisible digital images, we will be pushing many to social media like Flickr with the expectation that this will boost visibility in the image search engines (and social networks) and drive more traffic to our digital collections.
This one is a tough one because people often come with pre-conceived notions of how they want their content organized or the site designed. For this reason, sometimes usability advice does not go over well. But for those instances when our experiences with user studies and information architecture can influence a digital scholarship project, it’s time well spent. In fact, I often hear people remark that they “never thought of it that way” and they’re willing to try some of the expert advice that we have to share.
Such advice includes things like:
Best practices for writing for the web
Principles of information architecture
User Experience design
It’s fitting to end on marketing. This is usually the final step in any digital project and one that often gets dropped. And yet, why do all the work of creating a digital collection only to let it go unnoticed. As digital project expert, librarians are familiar with the various channels available to promote and build followers with tools like social networking sites, blogs and the like.
With our own digital projects, we discuss marketing at the very beginning so we are sure all the hooks, timing and planning considerations are understood by everyone. In fact, marketing strategy will impact some of the features of your exhibit, your choice of keywords used to help SEO, the ultimate deadlines that you set for completion and the staffing time you know you’ll need post launch to keep the buzz buzzing.
Most importantly, though, marketing plans can greatly influence the decision for which platform to use. For example, one of the benefits of Omeka.net (rather than self-hosted Omeka) is that any collection hosted with them becomes part of a network of other digital collections, boosting the potential for serendipitous discovery. I often urge faculty to opt for our Digital Commons repository over, say, their personal website, because anything they place in DC gets aggregated into the larger DC universe and has built-in marketing tools like email subscriptions and RSS feeds.
The bottom line here is that marketing is an area where librarians can shine. Online marketing of digital collections really pulls together all of the other forms of expertise that we can offer (our understanding of metadata, web technology and social networks) to fulfill the aim of every digital project: to reach other people and teach them something.
Steve Hetzler of IBM gave a talk at the recent Storage Valley Supper Club on a new, scale-free metric for evaluating storage performance that he calls "Touch Rate". He defines this as the proportion of the store's total content that can be accessed per unit time. This leads to some very illuminating graphs that I discuss below the fold.
Steve's basic graph is a log-log plot with performance increasing up and to the right. Response time for accessing an object (think latency) decreases to the right on the X-axis and the touch rate, the proportion of the total capacity that can be accessed by random reads in a year (think bandwidth) increases on the Y-axis. For example, a touch rate of 100/yr means that random reads could access the entire contents 100 times a year. He divides the graph into regions suited to different applications, with minimum requirements for response time and touch rate. So, for example, transaction processing requires response times below 10ms and touch rates above 100 (the average object is accessed about once every 3 days).
The touch rate depends on the size of the objects being accessed. If you take a specific storage medium, you can use its specifications to draw a curve on the graph as the size varies. Here Steve uses "capacity disk" (i.e. commodity 3.5" SATA drives) to show the typical curve, which varies from being bandwidth limited (for large objects on the left, horizontal side) being response limited (for small objects on the right, vertical side).
As an example of the use of these graphs, Steve analyzed the idea of MAID (Massive Array of Idle Drives). He used HGST MegaScale DC 4000.B SATA drives, and assumed that at any time 10% of them would be spun-up and the rest would be in standby. With random accesses to data objects, 9 out of 10 of them will encounter a 15sec spin-up delay, which sets the response time limit. Fully powering-down the drives as Facebook's cold storage does would save more power but increase the spin-up time to 20s. The system provides only (actually somewhat less than) 10% of the bandwidth per unit content, which sets the touch rate limit.
The Steve looked at the fine print of the drive specifications. He found two significant restrictions:
The drives have a life-time limit of 50K start/stop cycles.
For reasons that are totally opaque, the drives are limited to a total transfer of 180TB/yr.
Applying these gives this modified graph. The 180TB/yr limit is the horizontal line, reducing the touch rate for large objects. If the drives have a 4-year life, we would need 8M start/stop cycles to achieve a 15sec response time. But we only have 50K. To stay within this limit, the response time has to increase by a factor of 8M/50K, or 160, which is the vertical line. So in fact a traditional MAID system is effective only in the region below the horizontal line and left of the vertical line, much smaller than expected.
This analysis suggests that traditional MAID is not significantly better than tapes in a robot. Here, for example, Steve examines configurations varying from one tape drive for 1600 LTO6 tapes, or 4PB per drive, to a quite unrealistically expensive 1 drive per 10 tapes, or 60TB per drive. Tape drives have a 120K lifetime load/unload cycle limit, and the tapes can withstand at most 260 full-file passes, so tape has a similar pair of horizontal and vertical lines.
The reason that Facebook's disk-based cold storage doesn't suffer from the same limits as traditional MAID is that it isn't doing random I/O. Facebook's system schedules I/Os so that it uses the full bandwidth of the disk array, raising the touch rate limit to that of the drives, and reducing the number of start-stop cycles. Admittedly, the response time for a random data object is now a worst-case 7 times the time for which a group of drives is active, but this is not a critical parameter for Facebook's application.
Steve's metric seems to be a major contribution to the analysis of storage systems.
I presented a version of this talk at the 2014 Futurebook Conference in London, England. They also kindly featured me in the program. Thank you to The Bookseller for a wonderful conference filled with innovation and intelligent people!
After the Reformation (when all the books in Oxford were burned), Sir Thomas Bodley decided to create a place where people could go and access all the world’s information at their fingertips, for free.
“What does that sound like?” she asked. “…the Internet?”
While this is a lovely conceit, the part of the story that resonated with me for this talk is the other big change that Bodley made, which was to work with publishers, who were largely a monopoly at that point, to fill his library for free by turning the library into a copyright library. While this seemed antithetical to the ways that publishers worked, in giving a copy of their very expensive books away, they left an indelible and permanent mark on the face of human knowledge. It was not only preservation, but self-preservation.
Bodley was what people nowadays would probably call “an innovator” and maybe even in the parlance of my field, a “community manager.”
By thinking outside of the scheme of how publishing works, he joined together with a group of skeptics and created one of the greatest knowledge repositories in the world, one that still exists 700 years later. This speaks to a few issues:
Sharing economies, community, and publishing should and do go hand in hand and have since the birth of libraries. By stepping outside of traditional models, you are creating a world filled with limitless knowledge and crafting it in new and unexpected ways.
The bound manuscript is one of the most enduring technologies. This story remains relevant because books are still books and people are still reading them.
As the same time, things are definitely changing. For the most part, books and manuscripts were pretty much identifiable as books and manuscripts for the past 1000 years.
But what if I were to give Google Maps to a 16th Century Map Maker? Or what if I were to show Joseph Pulitzer Medium? Or what if I were to hand Gutenberg a Kindle? Or Project Gutenberg for that matter? What if I were to explain to Thomas Bodley how I shared the new Lena Dunham book with a friend by sending her the file instead of actually handing her the physical book? What if I were to try to explain Lena Dunham?
These innovations have all taken place within the last twenty years, and I would argue that we haven’t even scratched the surface in terms of the innovations that are to come.
We need to accept that the future of the printed word may vary from words on paper to an ereader or computer in 500 years, but I want to emphasize that in the 500 years to come, it will more likely vary from the ereader to a giant question mark.
International literacy rates have risen rapidly over the past 100 years and companies are scrambling to be the first to reach what they call “developing markets” in terms of connectivity. In the vein of Mark Surman’s talk at the Mozilla Festival this year, I will instead call these economies post-colonial economies.
Because we (as people of the book) are fundamentally idealists who believe that the printed word can change lives, we need to be engaged with rethinking the printed word in a way that recognizes power structures and does not settle for the limited choices that the corporate Internet provides (think Facebook vs WhatsApp). This is not as a panacea to fix the world’s ills.
In the Atlantic last year, Phil Nichols wrote an excellent piece that paralleled Web literacy and early 20th century literacy movements. The dualities between “connected” and “non-connected,” he writes, impose the same kinds of binaries and blind cure-all for social ills that the “literacy” movement imposed in the early 20th century. In equating “connectedness” with opportunity, we are “hiding an ideology that is rooted in social control.”
Surman, who is director of the Mozilla Foundation, claims that the Web, which had so much potential to become a free and open virtual meeting place for communities, has started to resemble a shopping mall. While I can go there and meet with my friends, it’s still controlled by cameras that are watching my every move and its sole motive is to get me to buy things.
How do you envision a fully connected world? How do you envision a fully literate world? How can we empower a new generation of connected communities to become learners rather than consumers?
I’m not one of these technology nuts who’s going to argue that books are going to somehow leave their containers and become networked floating apparatuses, and I’m not going to argue that the ereader is a significantly different vessel than the physical book.
I’m also not going to argue that we’re going to have a world of people who are only Web literate and not reading books in twenty years. To make any kind of future prediction would be a false prophesy, elitist, and perhaps dangerous.
Although I don’t know what the printed word will look like in the next 500 years,
I want to take a moment to think outside the book,
to think outside traditional publishing models, and to embrace the instantaneousness, randomness, and spontaneity of the Internet as it could be, not as it is now.
One way I want you to embrace the wonderful wide Web is to try to at least partially decouple your social media followers from your community.
Twitter and other forms of social media are certainly a delightful and fun way for communities to communicate and get involved, but your viral campaign, if you have it, is not your community.
True communities of practice are groups of people who come together to think beyond traditional models and innovate within a domain. For a touchstone, a community of practice is something like the Penguin Labs internal innovation center that Tom Weldon spoke about this morning and not like Penguin’s 600,000 followers on Twitter. How can we bring people together to allow for innovation, communication, and creation?
The Internet provides new and unlimited opportunities for community and innovation, but we have to start managing communities and embracing the people we touch as makers rather than simply followers or consumers.
The maker economy is here— participatory content creation has become the norm rather than the exception. You have the potential to reach and mobilize 2.1 billion people and let them tell you what they want, but you have to identify leaders and early adopters and you have to empower them.
How do you recognize the people who create content for you? I don’t mean authors, but instead the ambassadors who want to get involved and stay involved with your brand.
I want to ask you, in the spirit of innovation from the edges
What is your next platform for radical participation? How are you enabling your community to bring you to the next level? How can you differentiate your brand and make every single person you touch psyched to read your content, together? How can you create a community of practice?
Community is conversation. Your users are not your community.
Ask yourself the question Rachel Fershleiser asked when building a community on Tumblr: Are you reaching out to the people who want to hear from you and encouraging them or are you just letting your community be unplanned and organic?
There reaches a point where we reach the limit of unplanned organic growth. Know when you reach this limit.
Target, plan, be upbeat, and encourage people to talk to one another without your help and stretch the creativity of your work to the upper limit.
Does this model look different from when you started working in publishing? Good.
As the story of the Bodelian Library illustrated, sometimes a totally crazy idea can be the beginning of an enduring institution.
To repeat, the book is one of the most durable technologies and publishing is one of the most durable industries in history. Its durability has been put to the test more than once, and it will surely be put to the test again. Think of your current concerns as a minor stumbling block in a history filled with success, a history that has documented and shaped the world.
Don’t be afraid of the person who calls you up and says, “I have this crazy idea that may just change the way you work…” While the industry may shift, the printed word will always prevail.
Publishing has been around in some shape or form for 1000 years. Here’s hoping that it’s around for another 1000 more.
(Left to right) ALA Washington Office Executive Director Emily Sheketoff, Jonathan Band, Brandon Butler and Mary Rasenberger.
On Tuesday, November 18th, the American Library Association (ALA) held a panel discussion on recent judicial interpretations of the doctrine of fair use. The discussion, entitled “Too Good to be True: Are the Courts Revolutionizing Fair Use for Education, Research and Libraries?” is the first in a series of information policy discussions to help us chart the way forward as the ongoing digital revolution fundamentally changes the way we access, process and disseminate information. This event took place at Arent Fox, a major Washington, D.C. law firm that generously provided the facility for our use.
These events are part of the ALA Office for Information Technology Policy’s broader Policy Revolution! initiative—an ongoing effort to establish and maintain a national public policy agenda that will amplify the voice of the library community in the policymaking process and position libraries to best serve their patrons in the years ahead.
Tuesday’s event convened three copyright experts to discuss and debate recent developments in digital fair use. The experts—ALA legislative counsel Jonathan Band; American University practitioner-in-practice Brandon Butler; and Authors Guild executive director Mary Rasenberger—engaged in a lively discussion that highlighted some points of agreement and disagreement between librarians and authors.
The library community is a strong proponent of fair use, a flexible copyright exception that enables use of copyrighted works without prior authorization from the rights holder. Fair use can be determined by the consideration of four factors. A number of court decisions issued over the last three years have affirmed the use of copyrighted works by libraries as fair, including the mass digitization of books housed in some research libraries, such as Authors Guild v. HathiTrust.
Band and Butler disagreed with Rasenberger on several points concerning recent judicial fair use interpretations. Band and Butler described judicial rulings on fair use in disputes like the Google Books case and the HathiTrust case as on-point, and rejected arguments that the reproductions of content at issue in these cases could result in economic injury to authors. Rasenberger, on the other hand, argued that repositories like HathiTrust and Google Books can in fact lead to negative market impacts for authors, and therefore do not represent a fair use.
Rasenberger believes that licensing arrangements should be made between authors and members of the library, academic and research communities who want to reproduce the content to which they hold rights. She takes specific issue with judicial interpretations of market harm that require authors to demonstrate proof of a loss of profits, suggesting that such harm can be established by showing that future injury is likely to befall an author as a result of the reproduction of his or her work.
Despite their differences of opinion, the panelists provided those in attendance at Tuesday’s event with some meaningful food for thought, and offered a thorough overview of the ongoing judicial debates over fair use. We were pleased that the Washington Internet Daily published an article “Georgia State Case Highlights Fair Use Disagreement Among Copyright Experts,” on November 20, 2014, about our session. ALA continues to fight for public access to information as these debates play out.
Stay tuned for the next event, planned for early 2015!
Last year, we reached a milestone at Cherry Hill when we moved all of our projects into a managed deployment system. We have talked about Jenkins, one of the tools that we use to manage our workflow and there has been continued interest on what our "recipe" consists of. Being that we are using open source tools, and we think of ourselves as part of the (larger than Drupal) open source community, I want to share a bit more of what we use and how it is stitched together. Our hope is that this helps to spark a larger discussion of the tools others are using, so we can all learn from each other.
Git is a distributed code revision control system. While we could use any revision control system such as CSV, Subversion (and even though this is a given with most agencies, we strongly suggest you use *some* system over nothing at all), git is fairly easy to use, has great...
In a continuation of our weekly facial hair inspiration (check out last week’s list of Civil War mustached men), we recognize that the “Movember” challenge isn’t easy. Growing an impressive beard or mustache, even for a good cause, can be a struggle. Let us help!
This week: A collection of historic mustache must-haves.
This week we did a guerrilla-style test to see how (or if) people find our subject guides, particularly if they are not in our main listing. We asked “Pretend that someone has told you there is a really great subject guide on the library website about [subject]. What would you do to find it?” We cycled through three different subjects not listed on our main subject guide page: Canadian History, Ottawa, and Homelessness.
Our subject guides use a template created in-house (not LibGuides) and we use Drupal Views and Taxonomy to create our lists. The main subject guide page has an A-Z list, an autocomplete search box, a list of broad subjects (e.g. Arts and Social Sciences) and a list of narrower subjects (e.g. Sociology). The list of every subject guide is on another page. Subject specialists were not sure if users would find guides that didn’t correspond to the narrower subjects (e.g. Sociology of Sport).
The 21 students we saw did all kinds of things to find subject guides. We purposely used the same vocabulary as what is on the site because it wasn’t supposed to be a test about the label “subject guide.” However, less than 30% clicked on the Subject Guides link; the majority used some sort of search.
Here you can see the places people went to on our home page most (highlighted in red), next frequently (in orange) and just once (yellow).
When people used our site search, they had little problem finding the guide (although a typo stymied one person). However, a lot of participants used our Summon search. I think there are a couple of reasons for this:
Students didn’t know what a subject guide was and so looked for guides the way they look for articles, books, etc.
Students think the Summon search box is for everything
Of the 6 students who did click on the Subject Guides link:
2 used broad subjects (and neither was successful with this strategy)
2 used narrow subjects (both were successful)
1 used the A-Z list (with success)
1 used the autocomplete search (with success)
One person thought that she couldn’t possibly find the Ottawa guide under “Subject Guides” because she thought those were only for courses. I found this very interesting because a number of our subject guides do not map directly to courses.
The poor performance of the broad subjects on the subject guide page is an issue and Web Committee will look at how we might address that. Making our site search more forgiving of typos is also going to move up the to-do list. But I think the biggest takeaway is that we really have to figure out how to get our guides indexed in Summon.
Today, the American Library Association (ALA) and its Digital Content Working Group (DCWG) welcomed Simon & Schuster’s announcement that it will allow libraries to opt into the “Buy It Now” program. The publisher began offering all of its ebook titles for library lending nationwide in June 2014, with required participation in the “Buy It Now” merchandising program, which enables library users to directly purchase a title rather than check it out from the library. Simon & Schuster ebooks are available for lending for one year from the date of purchase.
In an ALA statement, ALA President Courtney Young applauded the move:
From the beginning, the ALA has advocated for the broadest and most affordable library access to e-titles, as well as licensing terms that give libraries flexibility to best meet their community needs.
We appreciate that Simon & Schuster is modifying its library ebook program to provide libraries a choice in whether or not to participate in Buy It Now. Providing options like these allow libraries to enable digital access while also respecting local norms or policies. This change also speaks to the importance of sustaining conversations among librarians, publishers, distributors and authors to continue advancing our shared goals of connecting writers and readers.
DCWG Co-Chairs Carolyn Anthony and Erika Linke also commented on the Simon & Schuster announcement:
“We are still in the early days of this digital publishing revolution, and we hope we can co-create solutions that expand access, increase readership and improve exposure for diverse and emerging voices,” said. “Many challenges remain including high prices, privacy concerns, and other terms under which ebooks are offered to libraries. We are continuing our discussions with publishers.”
For more library ebook lending news, visit the American Libraries magazine E-Content blog.
In Brief: We’re creating a nonprofit, Library Pipeline, that will operate independently from In the Library with the Lead Pipe, but will have similar and complementary aims: increasing and diversifying professional development; improving strategies and collaboration; fostering more innovation and start-ups, and encouraging LIS-related publishing and publications. In the Library with the Lead Pipe is a platform for ideas; Library Pipeline is a platform for projects.
At In the Library with the Lead Pipe, our goal has been to change libraries, and the world, for the better. It’s on our About page: We improve libraries, professional organizations, and their communities of practice by exploring new ideas, starting conversations, documenting our concerns, and arguing for solutions. Those ideas, conversations, concerns, and solutions are meant to extend beyond libraries and into the societies that libraries serve.
What we want to see is innovation–new ideas and new projects and collaborations. Innovative libraries create better educated citizens and communities with stronger social ties.
Unfortunately, libraries’ current funding structures and the limited professional development options available to librarians make it difficult to introduce innovation at scale. As we started talking about a couple of years ago, in our reader survey and in a subsequent editorial marking our fourth anniversary, we need to extend into other areas, besides publication, in order to achieve our goals. So we’re creating a nonprofit, Library Pipeline, that will operate independently from In the Library with the Lead Pipe, but will have similar and complementary aims.
Library Pipeline is dedicated to supporting structural changes by providing opportunities, funding, and services that improve the library as an institution and librarianship as a profession. In the Library with the Lead Pipe, the journal we started in 2008, is a platform for ideas; Library Pipeline is a platform for projects. Although our mission is provisional until our founding advisory board completes its planning process, we have identified four areas in which modest funding, paired with guidance and collaboration, should lead to significant improvements.
A few initiatives, notably the American Library Association’s Emerging Leaders and Spectrum Scholars programs, increase diversity and provide development opportunities for younger librarians. We intend to expand on these programs by offering scholarships, fellowships, and travel assistance that enable librarians to participate in projects that shift the trajectory of their careers and the libraries where they work.
Organized, diverse groups can solve problems that appear intractable if participants have insufficient time, resources, perspective, or influence. We would support collaborations that last a day, following the hack or camp model, or a year or two, like task forces or working groups.
We are inspired by incubators and accelerators, primarily YCombinator and SXSW’s Accelerator. The library and information market, though mostly dormant, could support several dozen for-profit and nonprofit start-ups. The catalyst will be mitigating founders’ downside risk by funding six months of development, getting them quick feedback from representative users, and helping them gain customers or donors.
Librarianship will be stronger when its practitioners have as much interest in documenting and serving our own field as we have in supporting the other disciplines and communities we serve. For that to happen, our professional literature must become more compelling, substantive, and easier to access. We would support existing open access journals as well as restricted journals that wish to become open access, and help promising writers and editors create new publications.
These four areas overlap by design. For example, we envision an incubator for for-profit and nonprofit companies that want to serve libraries. In this example, we would provide funding for a diverse group of library students, professionals, and their partners who want to incorporate, and bring this cohort to a site where they can meet with seasoned librarians and entrepreneurs. After a period of time, perhaps six months, the start-ups would reconvene for a demo day attended by potential investors, partners, donors, and customers.
Founding Advisory Board
We were inspired by the Constellation Model for our formation process, as adapted by the Digital Public Library of America and the National Digital Preservation Alliance (see: “Using Emergence to Take Social Innovation to Scale”). Our first step was identifying a founding advisory board, whose members have agreed to serve a two-year term (July 2014-June 2016). At the end of which the Board will be dissolved and replaced with a permanent governing board. During this period, the advisory board will formalize and ratify Library Pipeline’s governance and structure, establish its culture and business model, promote its mission, and define the organizational units that will succeed the advisory board, such as a permanent board of trustees and paid staff.
The members of our founding advisory board are:
Brett Bonfield (co-chair), Director, Collingswood (NJ) Public Library;
Lauren Pressley (co-chair), Director of Learning Environments at Virginia Tech University Libraries;
Mary Abler, Innovation Leadership Resident, Los Angeles Public Library;
Nicole Cooke, Assistant Professor at GSLIS, The University of Illinois;
The board will coordinate activity among, and serve as liaisons to, the volunteers on what we anticipate will eventually be six subcommittees (similar to DPLA’s workstreams). This is going to be a shared effort; the job is too big for ten people. Those six subcommittees and their provisional charges are:
Professional Development within LIS (corresponding to our “Professional Development” area). Provide professional development funding, in the form of scholarships, fellowships, or travel assistance, for librarians or others who are working in behalf of libraries or library organizations, with an emphasis on participation in cross-disciplinary projects or conferences that extend the field of librarianship in new directions and contribute to increased diversity among practitioners and the population we serve.
Strategies for LIS (corresponding to “Collaboration”). Bring together librarians and others who are committed to supporting libraries or library-focused organizations. These gatherings could be in-person or online, could last a day or could take a year, and could be as basic as brainstorming solutions to a timely, significant issue or as directed as developing solutions to a specific problem.
Innovation within LIS (corresponding to “Start-Ups”). Fund and advise library-related for-profit or nonprofit startups that have the potential to help libraries better serve their communities and constituents. We believe this area will be our primary focus, at least initially.
LIS Publications (corresponding with “Publishing”). Fund and advise LIS publications, including In the Library with the Lead Pipe. We could support existing open access journals or restricted journals that wish to become open access, and help promising writers and editors create new publications.
Governance. This may not need to be a permanent subcommittee, though in our formative stages it would be useful to work with people who understand how to create governance structures that provide a foundation that promotes stability and growth.
Sustainability. This would include fundraising, but it also seems to be the logical committee for creating the assessment metrics we need to have in place to ensure that we are fulfilling our commitment to libraries and the people who depend on them.
How Can You Help?
We’re looking for ideas, volunteers, and partners. Contact Brett or Lauren if you want to get involved, or want to share a great idea with us.
For nearly a year-and-a-half, the FCC has been engaged in an ongoing effort to update the E-rate program for the digital age. The American Library Association (ALA) has been actively engaged in this effort, submitting comments and writing letters to the FCC and holding meetings with FCC staff and other key E-rate stakeholders.
Our work on the E-rate modernization has drawn the attention of several media outlets over the past week, as the FCC prepares to consider an order that we expect to help libraries from the most populated cities to the most rural areas meet their needs related to broadband capacity and Wi-Fi:
ALA was also mentioned in articles from CQ Roll Call and PoliticoPro on Monday.
The new E-rate order is the second in the E-rate modernization proceeding. The FCC approved a first order on July 11th, which focuses on Wi-Fi and internal connections. ALA applauds the FCC for listening to our recommendations throughout the proceeding. Its work reflects an appreciation for all that libraries do to serve community needs related to Education, Employment, Entrepreneurship, Empowerment, and Engagement—the E’s of Libraries.
Open Library will be down from 4:30PM to approximately 5:00PM (PST, UTC/GMT -7 hours) on Thursday November 20, 2014 due to scheduled hardware maintenance. We’ll post updates here and on @openlibrary twitter. Thank you for your cooperation.