Planet Code4Lib

Volunteer to join the LITA AV Club / LITA

oldmovieprojectorYou know you always wanted to be part of the cool gang, well now is your big chance. Be a part of creating the LITA AV club. Help make videos of important LITA conference presentations like the Top Tech Trends panel and LITA Forum Keynotes. Create the recordings to share these exciting and informative presentations with your LITA colleagues who weren’t able to attend. Earn the undying gratitude of all LITA members.

Sound right up your alley? We’ll need a couple of chief wranglers plus a bunch of hands on folks. The group can organize via email now, and meet up in San Francisco say sometime Friday or early Saturday, June 26th and 27th. Details galore to be worked on by all. If you have enough fun you can always turn the Club into a LITA Interest Group and achieve immortality, fame and fortune, or more likely the admiration of your fellow LITAns.

To get started email Mark Beatty at:
I’ll get gather names and contacts and create a “space” for you all to play.

Thanks. We can tell you are even cooler now than you were before you read this post.

International Cataloguing Principles, 2015 / Karen Coyle

IFLA is revising the International Cataloguing Principles and asked for input. Although I doubt that it will have an effect, I did write up my comments and send them in. Here's my view of the principles, including their history.

The original ICP dates from 1961 and read like a very condensed set of cataloging rules. [Note: As T Berger points out, this document was entitled "Paris Principles", not ICP.] It was limited to choice and form of entries (personal and corporate authors, titles). It also stated clearly that it applied to alphabetically sequenced catalogs:
The principles here stated apply only to the choice and form of headings and entry words -- i.e. to the principal elements determining the order of entries -- in catalogues of printed books in which entries under authors' names and, where these are inappropriate or insufficient, under the titles of works are combined in one alphabetical sequence.
The basic statement of principles was not particularly different from those stated by Charles Ammi Cutter in 1875.


ICP 1961

 Note that the ICP does not include subject access, which was included in Cutter's objectives for the catalog. Somewhere between 1875 and 1961, cataloging became descriptive cataloging only. Cutter's rules did include a fair amount detail about subject cataloging (in 13 pages, as compared to 23 pages on authors).

The next version of the principles was issued in 2009. This version is intended to be "applicable to online catalogs and beyond." This is a post-FRBR set of principles, and the objectives of the catalog are given in points with headings find, identify, select, obtain and navigate. Of course, the first four are the FRBR user tasks. The fifth one, navigate, as I recall was suggested by Elaine Svenonius and obviously was looked on favorably even though it hasn't been added to the FRBR document, as far as I know.

The statement of functions of the catalog in this 2009 draft is rather long, but the "find" function gives an idea of how the goals of the catalog have changed:

ICP 2009

It's worth pointing out a couple of key changes. The first is the statement "as the result of a search..." The 1961 principles were designed for an alphabetically arranged catalog; this set of principles recognizes that there are searches and search results in online catalogs, and it never mentions alphabetical arrangement. The second is that there is specific reference to relationships, and that these are expected to be searchable along with attributes of the resource. The third is that there is something called "secondary limiting of a search result." This latter appears to reflect the use of facets in search interfaces.

The differences between the 2015 draft of the ICP and this 2009 version are relatively minor. The big jump in thinking takes place between the 1961 version and the 2009 version. My comments (pdf) to the committee are as much about the 2009 version as the 2015 one. I make three points:
    1.  The catalog is a technology, and cataloging is therefore in a close relation to that technology
    Although the ICP talks about "find," etc., it doesn't relate those actions to the form of the "authorized access points." There is no recognition that searching today is primarily on keyword, not on left-anchored strings.

    2. Some catalog functions are provided by the catalog but not by cataloging
    The 2015 ICP includes among its principles that of accessibility of the catalog for all users. Accessibility, however, is primarily a function of the catalog technology, not the content of the catalog data. It also recommends (to my great pleasure) that the catalog data be made available for open access. This is another principle that is not content-based. Equally important is the idea, which is expressed in the 2015 principles under "navigate" as: "... beyond the catalogue, to other catalogues and in non-library contexts." This is clearly a function of the catalog, with the support of the catalog data, but what data serves this function is not mentioned.

    3. Authority control must be extended to all elements that have recognized value for retrieval
    This mainly refers to the inclusion of the elements that serve as limiting facets on retrieved sets. None of the elements listed here are included in the ICP's instructions on "authorized access points," yet these are, indeed, access points. Uncontrolled forms of dates, places, content, carrier, etc., are simply not usable as limits. Yet nowhere in the document is the form of these access points addressed.

    There is undoubtedly much more that could be said about the principles, but this is what seemed to me to be appropriate to the request for comment on this draft.

      Setting the Right Environment: Remote Staff, Service Provider Participants, and Big-Tent Open Source Communities / Peter Murray

      NOTE! I was asked recently to prepare a 15 minute presentation on lessons learned working with a remote team hosting open source applications. The text of that presentation is below with links added to more information. Photographs are from DPLA and Flickr, and are used under Public Domain or Creative Commons derivatives-okay licenses. Photographs link to their sources.

      Thank you for the opportunity to talk with you today. This is a description of a long-running project at LYRASIS to host open source software on behalf of our members and others in the cultural heritage community. The genesis of this project is member research done at the formation of LYRASIS from SOLINET, PALINET and NELINET. Our membership told us that they wanted the advantages of open source software but did not have the resources within their organization to host it themselves. Our goals were — and still are — to create sustainable technical infrastructure for open source hosting, to provide top-notch support for clients adopting that hosted open source, and to be a conduit through which clients engage in the open source community.

      Islandora, ArchivesSpace and CollectionSpace logosIn the past couple of years, this work has focused on three software packages: the Islandora digital asset system, the ArchivesSpace system for archival finding aids, and most recently the CollectionSpace system for museum objects. Each of these, in sequence, involved new learning and new skills. First was Islandora. For those who are not familiar with Islandora, it is a digital asset system built atop Drupal and Fedora Commons repository system. It is a powerful stack with a lot of moving parts, and that makes it difficult for organizations to set up. One needs experience in PHP and Drupal, Java servlet engines, SOLR, and Fedora Commons among other components. In our internal team those skills were distributed among several staff members, and we are spread out all over the country: I’m in central Ohio, there is a developer in California, a data specialist in Baltimore, a sysadmin in Buffalo, two support and training staff in Atlanta, and servers in the cloud all over North America.

      Importance of Internal Communication

      Picture: “Zooming along the terminal” by Peter ThoenyThat first goal I mentioned earlier — creating a sustainable technical architecture — took a lot of work and experimentation for us. All of us had worked in library IT in earlier jobs. Except for our sysadmin, though, none of us had built a hosted service to scale. It was a fast-moving time, with lots of small successes and failures, swapping of infrastructure components, and on-the-fly procedures. It was hard to keep up. We took a page from the Scrum practice and instituted a daily standup meeting. The meeting started at 10:30am eastern time, which got our west coast person up just a little early, and — since we were spread out all over the country — used a group Skype video conference.

      Picture: “the first standup” by Karthik ChandrasekariahThe morning standups usually took longer than the typical 15 minutes. In addition to everyone’s reports, we shared information about activities with the broader LYRASIS organization as well as informal things about our personal lives — what our kids were doing, our vacation plans, or laughing about the latest internet meme. It was the sort of sharing that would happen naturally when meeting someone at the building entrance or popping a head over a cubicle wall, and that helped cement our social bonds. We kept the Skype window open throughout the day and used the text chat function to post status updates, ask questions of each other, and share links to funny cat pictures. Our use of this internal communication channel has evolved over the years. We no longer have the synchronous video call every morning for our standup; we post our morning reports as chat messages. If we were to hire a new team member, I would make a suggestion to the team that we restart the video calls at least for a brief period to acclimate the new person to the group. We’ve also moved from Skype to Slack — a better tool for capturing, organizing, searching, and integrating our activity chatter. What started out as a suggestion by one of our team members to switch to Slack for the seven of us has grown organically to include about a third of the LYRASIS staff.

      Cover art from "Remote: Office Not Required"In their book “Remote: Office Not Required” the founders of 37Signals describe the “virtual water cooler”. They say that the idea is to have a single, permanent chat room where everyone hangs out all day to shoot the breeze, post funny pictures, and generally goof around. They acknowledge that it can also be used to answer questions about work, but its primary function is to provide social cohesion. With a distributed team, initiating communication with someone is an intentional act. It doesn’t happen serendipitously by meeting at a physical water cooler. The solution is to lower the barrier of initiating that communication while still respecting the boundaries people need to get work done.

      How does your core team communicates among itself. How aware are they of what each other are doing? Do they know each other’s strengths and feel comfortable enough to call on each other for help? Do the members share a sense of forward accomplishment with the project as a whole?

      Clear Demarcation of Responsibilities between Hosting Company and Organizational Home

      Picture: “Community” by niallkennedyOne of the unique thing about the open source hosting activity at LYRASIS is that for two of the projects we are also closely paired with organizational homes. Both ArchivesSpace and CollectionSpace have separate staff within LYRASIS that report to their own community boards and have their own financial structure. LYRASIS provides human resource and fiscal administrative services to the organizational homes, and we share resources and expertise among the groups. From the perspective of a client to our services, though, it can seem like the hosting group and the organizational home are one entity. We run into confusion about why we in the hosting group cannot add new features or address bugs in the software. We gently remind our clients that the open source software is bigger than our hosting of it — that there is an organizational home that is advancing the software for all users and not just our hosted clients.

      Picture: “Sony RX1, A User Report” by Justin KernRoles between open source organizations and hosting companies should be clearly defined as well, and the open source organization must help hosting providers make this distinction clear to the provider’s clients as well as self-hosting institutions. For instance, registered service provider agreements could include details for how questions about software functionality are handed off between the hosting provider and the organizational home. I would also include a statement from the registered service provider about the default expectations for when code and documentation will be contributed back to the community’s effort. This would be done in such a way as to give a service provider an avenue to distinguish itself from others while also strengthening the core community values of the project. While there is significant overlap, there are members of ArchivesSpace that are not hosted by LYRASIS and there are clients hosted by LYRASIS that are not members of ArchivesSpace.

      How does your project divide responsibilities between the community and the commercial affiliates? What are the expectations that hosted adopters should have about the roles of support, functionality enhancement, and project governance?

      Empowering the Community

      Picture: "Raise your hand if you're a geek! Keep them up if you don't care!" by colorblindPICASOLastly, one of the clear benefits of developing software as open source is the shared goals of the community participants. Whether someone is a project developer, a self-hosted user of the software, a service provider, or a client hosted by a service provider, everyone wants to see the software thrive and grow. While the LYRASIS hosting service does provide a way for clients to use the functionality of open source software, what we are really aiming to offer is a path for clients to get engaged in the project’s community by removing the technology barriers to hosting. We are selling a service, but sometimes I think the service that we are selling is not necessarily the one that the client is initially looking for. What clients come to us seeking is a way to make use of the functions that they see in the open source software. What we want them to know is how adopting open source software is different. As early as the first inquiry about hosting, we let clients know that the organizational home exists and offer to make an introduction to the project’s community organizer. When a project announces important information, we reflect that information on to our client mailing list. When a client files an issue in the LYRASIS hosting ticket system for an enhancement request, we forward that request to the project but we also urge the client to send the description of their use case through the community channels.

      Picture: “I Want You for U.S. Army” by James Montgomery FlagMaintaining good client support while also gently directing the client into the community’s established channels is a tough balancing act. Some clients get it right away, and become active participants in the community. Others are unable or unwilling to take that leap to participation in the project’s greater community. As a hosting provider we’ve learned to be flexible and supportive where ever the client is on its journey in adopting open source. Open source communities need to be looking for ways a hosted client’s staff — no matter what the level of technical expertise — can participate in the work of the community.

      Do you have low barriers of entry for comments, corrections, and enhancements to the documentation? Is there a pipeline in place for triaging issue reports and comments that both help the initiator and funnel good information into the project’s teams? And is that triaging work valued on par with code contributions? Can you develop a mentoring program that aids new adopters into the project’s mainstream activities?


      Picture: “Open here” by Nick ShermanAs you can probably tell, I’m a big believer in the open source method of developing the systems and services that our patrons and staff need. I have worked on the adopter side of open source — using DSpace and Fedora Commons and other library-oriented software…to say nothing of more mainstream open source projects. I have worked on the service provider side of open source — making Islandora, ArchivesSpace and CollectionSpace available to organizations that cannot host the software themselves and empowering them to join the community. Through this experience I’ve learned a great deal about how many software projects think, what adopters look for in projects, what other service providers need to be successful community participants. Balancing the needs of the project, the needs of self-hosted adopters, and the needs of service providers is delicate — but the results are worth it.

      I also believe that by using new technologies and strategies, distributed professionals can build a hosting service that is attractive to clients. We may not be able to stand around a water cooler or conference table, but we can replicate the essence of those environments with tools, policies, and a collaborative attitude. In doing so we have more freedom to hire the staff the make the right fit for our organization, no matter where they are located.

      What I’m Glad I Didn’t Know Upon Graduating / Roy Tennant

      While writing my last post about what I wish I had known upon graduating (from library school), I decided that I wanted to write a companion piece about what I was glad I didn’t know. Perhaps the reason for this will soon become clear. So here we go:

      • You know nothing. No, seriously, you don’t. You know all that time you just spent learning how to connect to Dialog database via a 300-baud acoustic coupler modem? That’s worth 3-5 years, tops. Then it’s toast. What comes next YOU HAVE NO IDEA. So just stop with the anguish, and meet it like a woman.
      • I mean, seriously, YOU KNOW NOTHING. All that stuff you wrote papers about? Gone, too late. Something else is on the horizon, about to mow you down like so much new grass.
      • If you can’t learn constantly, like ALL THE TIME, then you are toast. BITNET, UUNET, bulletin boards, WAIS, Gopher, Hytelnet, Veronica, Gopher, ALL of these would come and go within a small number of years. You will learn them, forget them, and bury them all within a decade. Have a nice life.
      • You should have taken more cataloging courses. I was more of a public services kind of person, but frankly, public services was completely transformed during the period when we still had the MARC record. So more cataloging courses regarding MARC would have had more staying power than how to give bibliographic instruction talks. I’m sorry to say that I actually would wheel in a book truck with the Reader’s Guide, Biography Index, and other such print volumes to pitch to students who within a year would be lining up to search INFOTRAC instead. Talk about futility.
      • You are being disintermediated. None of us at the time could possibly have predicted Google. I mean not even close. We’re talking about a period before AltaVista, which I would argue was the first web search engine that actually worked well. I lived through the era where we switched from people having to come to us to us to meeting people where they are (via mobile, out in the community, etc.). This isn’t a bad thing, but I can assure you it wasn’t what we expected when we graduated in 1986.
      • There are fewer jobs than you think there are. I graduated into a job situation where there were plenty of professionals who graduated ahead of me who already had jobs and weren’t going to give them up for decades. We still have this issue. Only in the last few years have we begun to see the beginning of the tidal wave of Baby Boomer retirements that will open up positions for new librarians.
      • The future jobs will be more technical or more community oriented than you may expect. If you look at today’s jobs, which really are only a trend that started back close to when I graduated, you will see a distinct shift toward two directions: technical positions, whether it be a software developer or a digital archivist, or toward engaging a community like a university-based “embedded librarian” or someone who serves teens at a public library. The point is that either you are public-facing or you are highly skilled in new technical requirements.
      • Personal connections are much more important than a degree. When you obtain your degree you may think that you are done, that you have punched your card and you are good. In reality, it is only just the beginning. What you really need are connections to others. You need to know people who also know you. That is why I am so focused on mentoring young librarians. Young librarians need to focus on building networks of both peers and potential mentors. Who can help you be successful? Who can give you a needed recommendation? Seek these people out and make a connection.

      Finally, on a personal note, there is one last thing that I am glad I didn’t know upon graduation. But first I must explain. To get through library school I did the following:

      • Worked 30 hours a week as an Evening/Weekend Circulation Assistant as UC Berkeley staff.
      • Drove 5 1/2 hours each way nearly every weekend to visit my wife working in Northern California (Arcata).
      • Took a one-calendar year full course load MLIS program at UC Berkeley.

      Doing this meant that I would leave on Friday for Arcata with an aching back and barely able to stay awake, having gotten perhaps 5 hours of sleep for five nights straight.

      As it turns out, this was all simply good training for having twins. It’s really remarkable how life turns out sometimes.

      Access YYZ Keynote Speaker: Amy Buckland! / Access Conference

      The Access 2015 Organizing Committee is thrilled to announce that our keynote speaker is Amy Buckland!

      ABuckland_AccessAmy recently moved to the University of Chicago where she is the Institutional Repository Manager. Previously she was Coordinator of Scholarly Communications at McGill University Library, where she was responsible for open access initiatives, publishing support, copyright, digital humanities, and research data services. She has a bachelor of arts degree from Concordia University (Montreal) where she studied political science and women’s studies, and an MLIS from McGill University. Prior to joining the library world, she worked in publishing for 14 years, and thinks academic libraryland is ripe for a revolution.

      Given Amy’s longstanding participation at and with Access over the years and her long-standing involvement in Canadian – and now American – library tech communities, the Organizing Committee was unanimous in its opinion that Amy would be a perfect choice to anchor the conference.  For more information, check out her website or connect with her on twitter at @jambina.

      Keep an eye out for the announcement of our 2015 David Binkley Memorial Lecture speaker, coming soon!

      It’s time that we had a digital Constitutional Convention / District Dispatch

      Illustration of Washington Constitutional Convention 1787

      Illustration of Washington Constitutional Convention 1787.

      Is it time to organize a digital constitutional convention on the future of the internet? In a thought-provoking op-ed published in The Hill, Alan S. Inouye, director of the American Library Association’s (ALA) Office for Information Technology Policy (OITP) calls on the nation’s leaders in government, philanthropy, and the not-for-profit sector to convene a digital constitutional convention for the future of the internet.

      Today we stand at the crossroads of establishing digital society for generations to come. By now, it is clear to everyone—not just network engineers and policy wonks—that the Internet is at the same time a huge mechanism for opportunity and for control. Though the advent of the Internet is propelling a true revolution in society, we’re not ready for it. Not even close.

      For one thing, we are so politically polarized at the national level. The latest evidence: the net neutrality debate. Except it wasn’t. For the most part, it was characterized by those who favor assertive regulatory change for net neutrality stating their position, restating their position, then yelling out their position. Those arguing for the status quo policy did likewise. As the battle lines were drawn, there was little room to pragmatically consider a compromise advanced by some stakeholders.

      The current state of digital privacy seems along these lines, as well. With copyright, it is even worse, as a decades-long “debate” has those favoring the strongest copyright protection possible dominating the discourse.

      Another problem, as the preceding discussion suggests, is that issues clearly related to each other—such as telecommunications, privacy, and copyright—are debated mostly in their own silos. We need a radically different approach to address these foundational concerns that will have ramifications for decades to come. We need something akin to the Constitutional Convention that took place 228 years ago. We need today’s equivalents of George Washington, James Madison, and Benjamin Franklin to come together and apply their intellectual and political smarts—and work together for the good of all, to lay out the framework for many years to follow.

      Most people in the country are in the middle of the political spectrum. They (We) are reasonable people. We want to do as we please, but realize that we don’t have the right to impinge on others’ freedom unduly. We’re Main Street USA.

      This sounds so simple and matter-of-fact, but we inside-the-beltway people understand how hard this is to achieve in national policy. We just look outside our windows and see the U.S. Capitol and White House to remind ourselves of the challenge of achieving common-sense compromise in a harsh political climate.

      Of course this is not easy. Some of us think of the challenge we need to address as access to information in the digital society, but really we’re talking about the allocation of power—so the stakes are even higher than some may think.

      In a number of respects, power is more distributed in digital society. Obviously, laws, regulation, and related public policy remain important. Large traditional telecommunications and media companies remain influential. But now the national information industry includes Google, Apple, Facebook, Microsoft, and other major corporate players who also effectively make “public policy” through product decisions. Similarly, the continuing de facto devolvement from copyright law to a licensing regime (with the rapid growth of ebooks as the latest major casualty) also is shifting power from government to corporations. In some respects, individuals also have more power thanks to the proliferation of digital information and the internet that enable capabilities that previously only organizations could muster (e.g., publishing, national advocacy).

      Read more of Inouye’s op-ed

      The post It’s time that we had a digital Constitutional Convention appeared first on District Dispatch.

      Expert to discuss copyright trends at the 2015 ALA Conference / District Dispatch

      It’s been an exciting year in copyright law, with important precedents set on fair use and mass digitization, search engines, and what is protected by copyright. Join copyright leaders at the 2015 American Library Association (ALA) Annual Conference session “Copyright Litigation: The Year in Review (and What’s Coming Next).” The interactive session takes place from 10:30 to 11:30 a.m. on Monday, June 29, 2015, in San Francisco. The session will be held at the Moscone Convention Center in room 2018 of the West building.

      Corynne McSherry

      Corynne McSherry

      Corynne McSherry, legal director for the Electronic Frontier Foundation, will discuss the ways that recent court decisions—Georgia State, HathiTrust, Google Books—have interpreted fair use and the implications for education, research and equitable access. Additionally, McSherry will explore the potential implications of recent copyright leading decisions for libraries and their patrons, and what you need to know about upcoming copyright issues.

      View all ALA Washington Office conference sessions

      The post Expert to discuss copyright trends at the 2015 ALA Conference appeared first on District Dispatch.

      Digital Archiving Programming at Four Liberal Arts Colleges / Library of Congress: The Signal

      Clockwise from top left: Vassar, Bryn Mawr, Wheaton  and Amherst Colleges. Photo of Thompson Library at Vassar from Wikimedia by Jim Mills.

      Clockwise from top left: Vassar, Bryn Mawr, Wheaton and Amherst Colleges. Photo of Thompson Library at Vassar from Wikimedia by Jim Mills.

      The following guest post is a collaboration from Joanna DiPasquale (Vassar College), Amy Bocko (Wheaton College), Rachel Appel (Bryn Mawr College) and Sarah Walden (Amherst College) based on their panel presentation at the recent Personal Digital Archiving 2015 conference. I will write a detailed post about the conference — which the Library of Congress helped organize — in a few weeks..

      When is the personal the professional? For faculty and students, spending countless hours researching, writing, and developing new ideas, the answer (only sometimes tongue-in-cheek) is “always:” digital archiving of their personal materials quickly turns into the creation of collections that can span multiple years, formats, subjects, and versions. In the library, we know well that “save everything” and “curate everything” are very different. What role, then, could the liberal arts college library play in helping our faculty and students curate their digital research materials and the scholarly communication objects that they create with an eye towards sustainability?

      At Vassar, Wheaton, Bryn Mawr, and Amherst Colleges, we designed Personal Digital Archiving Days (PDAD) events to push the boundaries of outreach and archiving, learn more about our communities’ needs, and connect users to the right services needed to achieve their archiving goals. In Fall 2014, we held sessions across each of our campuses (some for the first time, some as part of an ongoing PDAD series), using the Library of Congress personal digital archiving resources as a model for our programming. Though our audiences and outcomes varied, we shared common goals: to provide outreach for the work we do, make the campus community aware of the services available to them, and impart best practices on attendees that will have lasting effects for their digital information management.

      Joanna DiPasquale. Photo courtesy of Rachel Appel.

      Joanna DiPasquale. Photo courtesy of Rachel Appel.

      Joanna DiPasquale, digital initiatives librarian at Vassar, learned about personal digital archiving days from the Library of Congress’ resources and how they worked for public or community libraries. She saw these resources as an opportunity to communicate to campus about the library’s new Digital Initiatives Group and how each part of the group complemented other services on campus (such as media services, computing and preservation). Her workshop was geared toward faculty and faculty-developed digital projects and scholarship. Vassar began the workshops in 2012, and faculty continued to request them each year. By 2014, the event featured a case study from a faculty member (and past attendee) about the new strategies he employed for his own work.

      Amy Bocko. Photo courtesy of Rachel Appel.

      Amy Bocko. Photo courtesy of Rachel Appel.

      Amy Bocko, Digital Asset Curator at Wheaton, saw PDAD’s success during her time as a Vassar employee. Now at Wheaton, Amy wanted to publicize her brand-new position on campus and ability to offer new digitally-focused services in Library and Information Services, and her Personal Digital Archiving Day brought together a diverse group of faculty members to work on common issues. The reactions were favorable and the attendees were grateful for the help they needed to manage their digital scholarship.

      Approaching everything as a whole could have been overwhelming, so Amy boiled it down to “what step could you take today that would improve your digital collection? which led to iterative, more effective results. Common responses included “investing in an external hard drive”, “adhering to a naming structure for digital files” and “taking inventory of what I have”. Amy made herself available after her workshop to address the specific concerns of faculty members in relation to their materials. She spoke at length with a printmaking professor that had an extensive collection of both analog slides and digital images with little metadata. They discussed starting small, creating a naming schema that would help her take steps towards becoming organized. The faculty member remarked how just a brief conversation, and knowing that the library was taking steps to help their faculty in managing their digital scholarship, put her mind at ease.

      Rachel Appel. Photo courtesy of Rachel Appel.

      Rachel Appel. Photo courtesy of Rachel Appel.

      Rachel Appel, digital collections librarian at Bryn Mawr, wanted to focus on student life. Rachel worked directly with Bryn Mawr’s Self-Government Association to work specifically with student clubs to bring awareness about their records, help them get organized and think ahead to filling archival silences in the College Archives. Like the other institutions, PDAD provided a great avenue to introduce her work to campus.  The students were also very interested in the concept of institutional memory and creating documented legacies between each generation of students. Rachel was able to hold the workshop again for different groups of attendees and focus on basic personal digital file management.

      Sarah Walden

      Sarah Walden. Photo courtesy of Rachel Appel.

      Sarah Walden, digital projects librarian at Amherst, focused on student thesis writers for PDAD. Sarah worked with Criss Guy, a post-bac at Amherst, and they developed the workshop together. Their goal was to expose students to immediate preservation concerns surrounding a large research project like a thesis (backups, organization, versioning), as well as to give them some exposure to the idea of longer-term preservation. They offered two versions of their workshop. In the fall, they gave an overview of file identification, prioritization, organization, and backup. The second version of the workshop in January added a hands-on activity in which the students organized a set of sample files using the organizing-software program, Hazel.

      Although our workshops had varying audiences and goals, they empowered attendees to become more aware of their digital data management and the records continuum. They also provided an outreach opportunity for the digital library to address issues of sustainability in digital scholarship.

      This benefits both the scholar and the library. The potential for sustainable digital scholarship (whether sustained by the library, the scholar or both) increases when we can bring our own best practices to our constituents. We believe that PDAD events like ours provide an opportunity for college libraries to meet our scholars in multiple project phases:

      • While they are potentially worried about their past digital materials
      • While they are actively creating (and curating) their current materials
      • When they move beyond our campus services (particularly for students).

      While we dispense good advice, we also raise awareness of our digital-preservation skills, our services and our best practices, and we only see that need growing as digital scholarship flourishes. On the college campus, the personal heavily overlaps with the professional. We anticipate that we will be holding more targeted workshops for specific groups of attendees and would like to hear experiences from other institutions on how their PDADs evolved.

      Time for another IoT rant / David Rosenthal

      I haven't posted on the looming disaster that is the Internet of Things You Don't Own since last October, although I have been keeping track of developments in brief comments to that post. The great Charlie Stross just weighed in with a brilliant, must-read examination of the potential the IoT brings for innovations in rent-seeking, which convinced me that it was time for an update. Below the fold, I discuss the Stross business model and other developments in the last 8 months.

      Back in February, Stephen Balkam's Guardian article What will happen when the internet of things becomes artificially intelligent? sparked some discussion on Dave Farber's IP list, including this wonderfully apposite Philip K. Dick citation from Ian Stedman via David Pollak. It roused Mike O'Dell to respond with Internet of Obnoxious Things, a really important insight into the fundamental problems underlying the Internet of Things. Just go read it. Mike starts:
      The PKDick excerpt cited about a shakedown by a door lock is, I fear, more prescient than it first appears.

      I very much doubt that any "Internet of Things" will become Artificially Impudent because long before that happens, all the devices will be co-opted by The Bad Guys who will proceed to pursue shakedowns, extortion, and "protection" rackets on a coherent global scale.

      Whether it is even possible to "secure" such a collection of devices empowered with such direct control over physical reality is a profound and, I believe, completely open theoretical question. (We don't even have a strong definition of what that would mean.)

      Even if it is theoretically possible, it has been demonstrated in the most compelling possible terms that it will not be done for a host of reasons. The most benign fall under the rubric of "Never ascribe to malice what is adequately explained by stupidity" while others will be aggressively malicious. ...

      A close second, however, is a definition of "security" that reads, approximately, "Do what I should have meant." Eg, the rate of technology churn cannot be reduced just because we haven't figured out what we need it to do (or not do) - we'll just "iterate" every time Something Bad(tm) happens.
      Charlie goes further, and follows Philip K. Dick more closely, by pointing out that the causes of Something Bad(tm) are not just stupidity and malice, but also greed:
      The evil business plan of evil (and misery) posits the existence of smart municipality-provided household recycling bins. ... The bin has a PV powered microcontroller that can talk to a base station in the nearest wifi-enabled street lamp, and thence to the city government's waste department. The householder sorts their waste into the various recycling bins, and when the bins are full they're added to a pickup list for the waste truck on the nearest routing—so that rather than being collected at a set interval, they're only collected when they're full.

      But that's not all.

      Householders are lazy or otherwise noncompliant and sometimes dump stuff in the wrong bin, just as drivers sometimes disobey the speed limit.

      The overt value proposition for the municipality (who we are selling these bins and their support infrastructure to) is that the bins can sense the presence of the wrong kind of waste. This increases management costs by requiring hand-sorting, so the individual homeowner can be surcharged (or fined). More reasonably, households can be charged a high annual waste recycling and sorting fee, and given a discount for pre-sorting everything properly, before collection—which they forefeit if they screw up too often.

      The covert value proposition ... local town governments are under increasing pressure to cut their operating budgets. But by implementing increasingly elaborate waste-sorting requirements and imposing direct fines on households for non-compliance, they can turn the smart recycling bins into a new revenue enhancement channel, ... Churn the recycling criteria just a little bit and rely on tired and over-engaged citizens to accidentally toss a piece of plastic in the metal bin, or some food waste in the packaging bin: it'll make a fine contribution to your city's revenue!
      Charlie sets out the basic requirements for business models like this:
      Some aspects of modern life look like necessary evils at first, until you realize that some asshole has managed to (a) make it compulsory, and (b) use it for rent-seeking. The goal of this business is to identify a niche that is already mandatory, and where a supply chain exists (that is: someone provides goods or service, and as many people as possible have to use them), then figure out a way to colonize it as a monopolistic intermediary with rent-raising power and the force of law behind it.
      and goes on to use speed cameras as an example. What he doesn't go into is what the IoT brings to this class of business models; reduced cost of detection, reduced possibility of contest, reduced cost of punishment. A trifecta that means profit! But Charlie brilliantly goes on to incorporate:
      the innovative business model that Yves Smith has dubbed "crapification". A business that can reduce customer choice sufficiently then has a profit opportunity; it can make its product so awful that customers will pay for a slightly less awful version.
      He suggests:
      Sell householders a deluxe bin with multiple compartments and a sorter in the top: they can put their rubbish in, and the bin itself will sort which section it belongs in. Over a year or three the householder will save themselves the price of the deluxe bin in avoided fines—but we don't care, we're not the municipal waste authority, we're the speed camera/radar detector vendor!
      Cory Doctorow just weighed in, again, on the looming IoT disaster. This time he points out that although it is a problem that Roomba's limited on-board intelligence means poor obstacle avoidance, solving the problem by equipping them with cameras and an Internet connection to an obstacle-recognition service is an awesomely bad idea:
      Roombas are pretty useful devices. I own two of them. They do have real trouble with obstacles, though. Putting a camera on them so that they can use the smarts of the network to navigate our homes and offices is a plausible solution to this problem.
      But a camera-equipped networked robot that free-ranges around your home is a fucking disaster if it isn't secure. It's a gift to everyone who wants to use cameras to attack you, from voyeur sextortionist creeps to burglars to foreign spies and dirty cops.
      Looking back through the notes on my October post, we see that Google is no longer patching known vulnerabilities in Android before 4.4. There are only about 930 million devices running such software. More details on why nearly a billion users are being left to the mercy of the bad guys are here.

      The Internet of Things With Wheels That Kill People has featured extensively. First, Progressive Insurance's gizmo that tracks their customer's driving habits has a few security issues:
      "The firmware running on the dongle is minimal and insecure," Thuen told Forbes.

      "It does no validation or signing of firmware updates, no secure boot, no cellular authentication, no secure communications or encryption, no data execution prevention or attack mitigation technologies ... basically it uses no security technologies whatsoever."

      What's the worst that can happen? The device gives access to the CAN bus.

      "The CAN bus had been the target of much previous hacking research. The latest dongle similar to the SnapShot device to be hacked was the Zubie device which examined for mechanical problems and allowed drivers to observe and share their habits."

      "Argus Cyber Security researchers Ron Ofir and Ofer Kapota went further and gained control of acceleration, braking and steering through an exploit."  
      Second, a vulnerability in BMWs, Minis and Rolls-Royces:
      "BMW has plugged a hole that could allow remote attackers to open windows and doors for 2.2 million cars."
      "Attackers could set up fake wireless networks to intercept and transmit the clear-text data to the cars but could not have impacted vehicle acceleration or braking systems."
      BMW's patch also updated its patch distribution system to use HTTPS."
      What were they thinking?

      Third, Senator Ed Markey has been asking auto makers questions and the answers are not reassuring. No wonder he was asking questions. At an industry-sponsored hackathon last July a 14-year old with $15 in parts from Radio Shack showed how easy it was:
      "Windshield wipers turned on and off. Doors locked and unlocked. The remote start feature engaged. The student even got the car's lights to flash on and off, set to the beat from songs on his iPhone."
      Key to an Internet of Things that we could live with is, as Vint Cerf pointed out, a secure firmware update mechanism. The consequences of not having one can be seen in Kaspersky's revelations of the "Equation group" compromising hard drive firmware. Here's an example of how easy it can be. To be fair, Seagate at least has deployed a secure firmware update mechanism, initially to self-encrypting drives but now I'm told to all their current drives.

      Cooper Quintin at the EFF's DeepLinks blog weighed in with a typically clear overview of the issue entitled Are Your Devices Hardwired For Betrayal?. The three principles:
      • Firmware must be properly audited.
      • Firmware updates must be signed.
      • We need a mechanism for verifying installed firmware.
      would greatly reduce the problem, except that they would make firmware companies targets for Gemalto-like key exfiltration. I agree with Quintin that:
      "None of these things are inherently difficult from a technological standpoint. The hard problems to overcome will be inertia, complacency, politics, incentives, and costs on the part of the hardware companies."
      Among the Things in the Internet are computers with vulnerable BIOSes:
      "Though there's been long suspicion that spy agencies have exotic means of remotely compromising computer BIOS, these remote exploits were considered rare and difficult to attain.

      Legbacore founders Corey Kallenberg and Xeno Kovah's Cansecwest presentation ... automates the process of discovering these vulnerabilities. Kallenberg and Kovah are confident that they can find many more BIOS vulnerabilities; they will also demonstrate many new BIOS attacks that require physical access."
      GCHQ has the legal authority to exploit these BIOS vulnerabilities, and any others it can find, against computers, phones and any other Things on the Internet wherever they are. Its likely that most security services have similar authority.

      Useful reports appeared, including this two part report from Xipiter, and this from Veracode on insecurities, this from DDOS-protection company Incasula, on the now multiple botnets running on home routers, and this from the SEC Consult Vulnerability Lab about a yet another catastrophic vulnerability in home routers. This last report, unlike the industry happy-talk, understands the economics of IoT devices:
      "the (consumer) embedded systems industry is always keen on keeping development costs as low as possible and is therefore using vulnerability-ridden code provided by chipset manufacturers (e.g. Realtek CVE-2014-8361 - detailed summary by HP, Broadcom) or outdated versions of included open-source software (e.g. libupnp, MiniUPnPd) in their products."
      And just as I was finishing this rant, Ars Technica posted details of yet another botnet running on home routers, this one called Linux/Moose. It collects social network credentials.

      That's all until the next rant. Have fun with your Internet-enabled gizmos!

      Amazon Crawl: part eu / Open Library Data Additions

      Part eu of Amazon crawl..

      This item belongs to: data/ol_data.

      This item has files of the following types: Data, Data, Metadata, Text

      Effects of subject normalization on DPLA Hubs / Mark E. Phillips

      In the previous post I walked through some of the different ways that we could normalize a subject string and took a look at what effects these normalizations had on the subjects in the entire DPLA metadata dataset that I have been using.

      This post I wanted to continue along those lines and take a look at what happens when you apply these normalizations to the subjects in the dataset, but this time focus on the Hub level instead of working with the whole dataset.

      I applied the normalizations mentioned in the previous post to the subjects from each of the Hubs in the DPLA dataset.  This included total values, unique but un-normalized values, case folded, lower cased, NACO, Porter stemmed, and fingerprint.  I applied the normalizations on the output of the previous normalization as a series, here is an example of what the normalization chain looked like for each.

      total > unique
      total > unique > case folded
      total > unique > case folded > lowercased
      total > unique > case folded > lowercased > NACO
      total > unique > case folded > lowercased > NACO > Porter
      total > unique > case folded > lowercased > NACO > Porter > fingerprint

      The number of subjects after each normalization is presented in the first table below.

      Hub Name Total Subjects Unique Subjects Folded Lowercase NACO Porter Fingerprint
      ARTstor 194,883 9,560 9,559 9,514 9,483 8,319 8,278
      Biodiversity_Heritage_Library 451,999 22,004 22,003 22,002 21,865 21,482 21,384
      David_Rumsey 22,976 123 123 122 121 121 121
      Digital_Commonwealth 295,778 41,704 41,694 41,419 40,998 40,095 39,950
      Digital_Library_of_Georgia 1,151,351 132,160 132,157 131,656 131,171 130,289 129,724
      Harvard_Library 26,641 9,257 9,251 9,248 9,236 9,229 9,059
      HathiTrust 2,608,567 685,733 682,188 676,739 671,203 667,025 653,973
      Internet_Archive 363,634 56,910 56,815 56,291 55,954 55,401 54,700
      J_Paul_Getty_Trust 32,949 2,777 2,774 2,760 2,741 2,710 2,640
      Kentucky_Digital_Library 26,008 1,972 1,972 1,959 1,900 1,898 1,892
      Minnesota_Digital_Library 202,456 24,472 24,470 23,834 23,680 22,453 22,282
      Missouri_Hub 97,111 6,893 6,893 6,850 6,792 6,724 6,696
      Mountain_West_Digital_Library 2,636,219 227,755 227,705 223,500 220,784 214,197 210,771
      National_Archives_and_Records_Administration 231,513 7,086 7,086 7,085 7,085 7,050 7,045
      North_Carolina_Digital_Heritage_Center 866,697 99,258 99,254 99,020 98,486 97,993 97,297
      Smithsonian_Institution 5,689,135 348,302 348,043 347,595 346,499 344,018 337,209
      South_Carolina_Digital_Library 231,267 23,842 23,838 23,656 23,291 23,101 22,993
      The_New_York_Public_Library 1,995,817 69,210 69,185 69,165 69,091 68,767 68,566
      The_Portal_to_Texas_History 5,255,588 104,566 104,526 103,208 102,195 98,591 97,589
      United_States_Government_Printing_Office_(GPO) 456,363 174,067 174,063 173,554 173,353 172,761 170,103
      University_of_Illinois_at_Urbana-Champaign 67,954 6,183 6,182 6,150 6,134 6,026 6,010
      University_of_Southern_California_Libraries 859,868 65,958 65,882 65,470 64,714 62,092 61,553
      University_of_Virginia_Library 93,378 3,736 3,736 3,672 3,660 3,625 3,618

      Here is a table that shows the percentage reduction after each field is normalized with a specific algorithm.  The percent reduction makes it a little easier to interpret.

      Hub Name Folded Normalization Lowercase Normalization Naco Normalization Porter Normalization Fingerprint Normalization
      ARTstor 0.0% 0.5% 0.8% 13.0% 13.4%
      Biodiversity_Heritage_Library 0.0% 0.0% 0.6% 2.4% 2.8%
      David_Rumsey 0.0% 0.8% 1.6% 1.6% 1.6%
      Digital_Commonwealth 0.0% 0.7% 1.7% 3.9% 4.2%
      Digital_Library_of_Georgia 0.0% 0.4% 0.7% 1.4% 1.8%
      Harvard_Library 0.1% 0.1% 0.2% 0.3% 2.1%
      HathiTrust 0.5% 1.3% 2.1% 2.7% 4.6%
      Internet_Archive 0.2% 1.1% 1.7% 2.7% 3.9%
      J_Paul_Getty_Trust 0.1% 0.6% 1.3% 2.4% 4.9%
      Kentucky_Digital_Library 0.0% 0.7% 3.7% 3.8% 4.1%
      Minnesota_Digital_Library 0.0% 2.6% 3.2% 8.3% 8.9%
      Missouri_Hub 0.0% 0.6% 1.5% 2.5% 2.9%
      Mountain_West_Digital_Library 0.0% 1.9% 3.1% 6.0% 7.5%
      National_Archives_and_Records_Administration 0.0% 0.0% 0.0% 0.5% 0.6%
      North_Carolina_Digital_Heritage_Center 0.0% 0.2% 0.8% 1.3% 2.0%
      Smithsonian_Institution 0.1% 0.2% 0.5% 1.2% 3.2%
      South_Carolina_Digital_Library 0.0% 0.8% 2.3% 3.1% 3.6%
      The_New_York_Public_Library 0.0% 0.1% 0.2% 0.6% 0.9%
      The_Portal_to_Texas_History 0.0% 1.3% 2.3% 5.7% 6.7%
      United_States_Government_Printing_Office_(GPO) 0.0% 0.3% 0.4% 0.8% 2.3%
      University_of_Illinois_at_Urbana-Champaign 0.0% 0.5% 0.8% 2.5% 2.8%
      University_of_Southern_California_Libraries 0.1% 0.7% 1.9% 5.9% 6.7%
      University_of_Virginia_Library 0.0% 1.7% 2.0% 3.0% 3.2%

      Here is that data presented as a graph that I think shows the data a even better.

      Reduction Percent after Normalization

      Reduction Percent after Normalization

      You can see that for many of the Hubs you see the biggest reduction happening when applying the Porter Normalization and the Fingerprint Normalization.  Hubs of note are ArtStore which had the highest percentage of reduction of the hubs.  This was primarily caused by the Porter normalization which means that there were a large percentage of subjects that stemmed to the same stem, often this is plural vs singular versions of the same subject.  This may be completely valid with out ArtStore chose to create metadata but is still interesting.

      Another hub I found interesting with this data was that from Harvard where the biggest reduction happened with the Fingerprint Normalization.  This might suggest that there are a number of values that are the same just with different order.  For example names that occur in both inverted and non-inverted form.

      In the end I’m not sure how helpful this is as an indicator of quality within a field. There are fields that would benefit from this sort of normalization more than others.  For example subjects, creator, contributor, publisher will normalize very differently than a field like title or description.

      Let me know what you think via Twitter if you have questions or comments.

      Hydra Connect 2015 – 21-24 September, Minneapolis: Request for Program Suggestions / Hydra Project

      We’re delighted to be able to tell you that detailed planning is now underway for an exciting program at Hydra Connect 2015 in Minneapolis this Fall.  The program committee would love to hear from those of you who have suggestions for items that should be included.  These might be workshops or demonstrations for the Monday, or they might be for 5, 10 or 20 minute presentations, discussion groups or another format you’d like to suggest during the conference proper.  It may be that you will offer to facilitate or present the item yourself or it may be that you’d like the committee to commission the slot from someone else – you could maybe suggest a name.  As in the past, we shall be attempting to serve the needs of attendees from a wide range of experience and background (potential adopters, new adopters, “old hands”; developers, managers, sysops etc) and, if it isn’t obvious, it may be helpful if you tell us who would be the target audience. Those of you going to Open Repositories 2015 might take the opportunity to talk to others about the possibility of joint workshops, presentations, etc.?

      Please let us have your ideas, preferably before Monday 15th June, at or by adding them to the page at HC2015 suggestions for the program.

      Advance warning that, as in past years, we shall ask all attendees who are working with Hydra to bring a poster for the popular “poster show and tell” session.  This is a great opportunity to share with colleague Hydranauts what your institution is doing and to forge connections around the work.  Details later…

      FYI: we plan on opening booking in the next ten days or so and we hope to see you in Minneapolis for what promises to be another great Hydra Connect meeting!


      Peter Binkley, Matt Critchlow, Karen Estlund, Erin Fahy and Anna Headley (the Hydra Connect 2015 Program Committee)

      Amazon Echo Update / LITA

      I wrote about Amazon Echo a few months back. At the time, I did not have it, but was looking forward to using it. Now, that I have had Echo for a while I have a better idea of its strengths and weaknesses.

      It doesn’t pick up every word I say, but its voice recognition is much better than I anticipated.  The app works nicely on my phone and iPad and I found it easy to link Pandora, my music, and to indicate what news channels I want to hear from. I enjoy getting the weather report, listening to a flash news briefing, adding items to my shopping list, listening to music, and being informed of the best route to work to avoid traffic.

      My favorite feature is that it is hands-free.  I’m constantly running around my house juggling a lot of things.  Often I need an answer to a question, I need to add something to a shopping list as I’m cooking, or I want to hear a certain song as I’m elbow-deep in a project.  Having the ability to just “say the words” is wonderful.  Now if it just worked everything…

      I hope updates will come soon though as I’d like to see increased functionality in its ability to answer questions and provide traffic information for different locations other than the one location I can program into the app. I also want to be able to make calls and send text messages using Echo.

      In my first post about Amazon Echo, I stated I was really interested in the device as an information retrieval tool. Currently, Echo doesn’t work as well as I was expecting for retrieving information, but with software updates I still see it (and similar tools) having an impact on our research.

      Overall, I see it as a device that has amazing potential, but it is still in its infancy.

      Has anyone else used Echo? I’d love to hear your thoughts on the device.

      Thursday Threads: Man Photocopies Ebook, Google AutoAwesomes Photos, Librarians Called to HTTPS / Peter Murray

      Receive DLTJ Thursday Threads:

      by E-mail

      by RSS

      Delivered by FeedBurner

      In this week’s threads: a protest — or maybe just an art project — by a reader who saves his e-book copy of Orwell’s 1984 by photocopying each page from his Kindle, the “AutoAwesome” nature of artificial intelligence, and a call to action for libraries to implement encryption on their websites.

      Feel free to send this to others you think might be interested in the topics. If you find these threads interesting and useful, you might want to add the Thursday Threads RSS Feed to your feed reader or subscribe to e-mail delivery using the form to the right. If you would like a more raw and immediate version of these types of stories, watch my Pinboard bookmarks (or subscribe to its feed in your feed reader). Items posted to are also sent out as tweets; you can follow me on Twitter. Comments and tips, as always, are welcome.

      Use Your Photocopier to Backup you E-book

      Picture of the hardback book of scanned Kindle page images.

      E-book backup is a physical, tangible, human readable copy of an electronically stored novel. The purchased contents of an e-book reader were easily photocopied and clip-bound to create a shelf-stable backup for the benefit of me, the book consumer. I can keep it on my bookshelf without worry of remote recall. A second hardcover backup has been made with the help of an online self-publishing house.

      E-book backup, Jesse England, circa 2012

      This project is from around 2012, but it first caught my eye this month. The author — pointing when “some Amazon Kindle users found their copy of George Orwell’s 1984 and Animal Farm had been removed from their Kindles without their prior knowledge or consent” — decided to photocopy each page of his copy of 1984 as it appeared on a Kindle screen and create a bound paper version. The result is as you see in the image to the right.

      Eight days ago, someone took the images from Mr. England’s page and uploaded the sequence to imgur. The project again circulated around the ‘net. There is a digital preservation joke in here, but I might not be able to find it unless the original creator took the text of 1984 and printed it out as QR Codes so the resulting book could be read back into a computer.

      How Awesome is Artificial Intelligence?

      The other day I created a Google+ album of photos from our holiday in France. Google’s AutoAwesome algorithms applied some nice Instagram-like filters to some of them, and sent me emails to let me have a look at the results. But there was one AutoAwesome that I found peculiar. It was this one, labeled with the word “Smile!” in the corner, surrounded by little sparkle symbols.
      It’s a nice picture, a sweet moment with my wife, taken by my father-in-law, in a Normandy bistro. There’s only one problem with it. This moment never happened.

      It’s Official: A.I.s are Now Re-Writing History, Robert Elliott Smith, 7-Oct-2014

      Follow the link above to see the pictures — the two source pictures and the combination that Google’s algorithms created. The differences are subtle. I loaded both of the source images into Gimp and performed a difference operation between the two layers. The result is the image below.

      Difference between two pictures

      The difference between the two pictures that Google combined in its “AutoAwesome” way.

      Black means the pixel color values were identical, so you can see the changes of hand position clearly. (Other artifacts are I assume differences because of the JPEG compression in the original source pictures.)

      This reminds me of the trick of taking multiple pictures of the same shot and using a tool like Photoshop to remove the people. Except in this case it is an algorithm deciding what are the best parts from a multitude of pictures and putting together what its programmers deem to be the “best” combination.

      Call to Librarians To Implement HTTPS

      Librarians have long understood that to provide access to knowledge it is crucial to protect their patrons’ privacy. Books can provide information that is deeply unpopular. As a result, local communities and governments sometimes try to ban the most objectionable ones. Librarians rightly see it as their duty to preserve access to books, especially banned ones. In the US this defense of expression is an integral part of our First Amendment rights.

      Access isn’t just about having material on the shelves, though. If a book is perceived as “dangerous,” patrons may avoid checking it out, for fear that authorities will use their borrowing records against them. This is why librarians have fought long and hard for their patrons’ privacy. In recent years, that include Library Connection’s fight against the unconstitutional gag authority of National Security Letters and, at many libraries, choosing not to keep checkout records after materials are returned.

      However, simply protecting patron records is no longer enough. Library patrons frequently access catalogs and other services over the Internet. We have learned in the last two years that the NSA is unconstitutionally hoovering up and retaining massive amounts of Internet traffic. That means that before a patron even checks out a book, their search for that book in an online catalog may already have been recorded. And the NSA is not the only threat. Other patrons, using off-the-shelf tools, can intercept queries and login data merely by virtue of being on the same network as their target.

      Fortunately, there is a solution, and it’s getting easier to deploy every day.

      What Every Librarian Needs to Know About HTTPS, by Jacob Hoffman-Andrews, Electronic Frontier Foundation, 6-May-2015

      That is the beginning of an article that explains what HTTPS means, why it is important, and how libraries can effectively deploy it. This is something that has come up in the NISO Patron Privacy in Digital Library and Information Systems working group that has been holding virtual meetings this month and will culminate in a two-day in person meeting after the ALA Annual convention in San Francisco next month. As you look at this article, keep an eye out for announcements about the Let&aposs Encrypt initiative to kick-off some time this summer; it will give websites free server encryption certificates and provide a mechanism to keep them up-to-date.

      Firefox privacy extensions / William Denton

      I noticed yesterday that the RequestPolicy Firefox extension wasn’t working because it’s not being developed any more. There’s a replacement in the works but it didn’t look done enough, so I didn’t install it. I did install a couple of other extensions, which I organized in alphabetical order on the right-hand side of the location bar:

      Firefox privacy extension icons ABP, BP, CM, D, HE, L, PB

      They are, in order:

      Is there anything else I should use?

      I’m still being tracked a lot, even though I deny all third-party cookies and most site-specific cookies. With Lightbeam I can block everything from and and other places that do nothing useful for me.

      With good sites, nothing suffers, or when something breaks I don’t care about it. With some sites I need to fire up another browser and allow everything just to achieve some minor goal like buying a ticket. I suffer that now, but maybe I’ll change my mind.

      I’m trying to use Tor more often for browsing sites where I don’t have an account.

      Sometimes I look at how other people use the web, and I’m appalled at how awful the experience is, with everything filled with ads (which they can see) and cookies and tracking (which they can’t). On the other hand, there’s how Richard Stallman does things:

      I am careful in how I use the Internet.

      I generally do not connect to web sites from my own machine, aside from a few sites I have some special relationship with. I usually fetch web pages from other sites by sending mail to a program (see git:// that fetches them, much like wget, and then mails them back to me. Then I look at them using a web browser, unless it is easy to see the text in the HTML page directly. I usually try lynx first, then a graphical browser if the page needs it (using konqueror, which won’t fetch from other sites in such a situation).

      I occasionally also browse using IceCat via Tor. I think that is enough to prevent my browsing from being connected with me, since I don’t identify myself to the sites I visit.

      I’m somewhere in the wide middle.

      Amazon Crawl: part ge / Open Library Data Additions

      Part ge of Amazon crawl..

      This item belongs to: data/ol_data.

      This item has files of the following types: Data, Data, Metadata, Text

      VIVO Keynote announced: Dr. James Onken / DuraSpace News

      We are delighted to welcome Dr. James Onken, Senior Advisor to the NIH Deputy Director for Extramural Research, to deliver a keynote talk at the 2015 VIVO Conference.

      Dr. James Onken is leading a new NIH initiative to develop a semantic NIH Portfolio Analysis and Reporting Data Infrastructure (PARDI) that leverages community data and requirements, including those from the VIVO community.

      Late-breaking Call for Posters Open through this Saturday / DuraSpace News

      Working on a new project? Interested in sharing local research profiling or analysis efforts with attendees of #vivo15? We want to hear from you! Authors are invited to submit abstracts for poster presentations for the Fifth Annual VIVO Conference in August. For details on the Late-breaking Call for Posters, please click here. All submissions must be submitted by Saturday, May 30th by midnight PST. 

      ArchivesDirect Webinar Series Recordings Available / DuraSpace News

      Earlier this year Artefactual Systems and the DuraSpace organization launched ArchivesDirect, a complete hosted solution for preserving valuable institutional collections and all types of digital resources.  This month Artefactual Systems’ Sarah Romkey and Courtney Mumma curated and presented a Hot Topics: The DuraSpace Community Webinar Series entitled, "Digital Preservation with ArchivesDirect: Ready, Set, Go!"

      Now Available: Author Statistics for DSpace / DuraSpace News

      From Bram Luyten, @mire

      @mire released a new version of its Content and Usage Analysis module. The module’s main goal is to visualize DSpace statistics which are otherwise difficult and time-consuming to interpret. By offering a layer on top of those data, DSpace administrator are able to display usage statistics, content statistics, workflow statistics, search statistics and storage reports.

      Longsight Contributes New DSpace Storage Options / DuraSpace News

      Independence, Ohio  Longsight has developed a refactored version of bitstream storage with a Pluggable Assetstore. This reworked version of bitstream storage allows new storage options to be implemented easily in DSpace, including Amazon S3.  

      Data meeting / William Denton

      At work I wanted to get access to enrolment numbers by course, so we could have a better idea of how effective the library’s presence is in the university’s course management system.

      A few weeks ago I met with A, who works in an administrative office that manages data like this.

      He said I should talk to B, who runs the systems where the data lives. A would join the meeting.

      Later I ran into C, B’s boss’s boss, who said he’d be there too, because B’s boss was too busy.

      Today I met with A, B and C. After some discussion they decided they couldn’t give me the data, but I should talk to D in the registrar’s office.

      While we were talking, C messaged D, who said that A should give me the numbers.

      A, somewhat surprised by this, said he’d talk to his boss.

      All that sounds pretty ridiculous, and it is, but not only am I going to get the data, during the course of the meeting when I explained why I wanted the data B and C said there was some other data that would solve another problem I had, B showed me a network profiling tool they’re using to find bottlenecks that would be useful for my colleagues V and W, A said he’d pull me into some other meetings about different kinds of data, B told me about a Moodle usage database I’ll get access to so I can pull out data I had no idea was being tracked, and I told all of them about some library data we can share with them.

      Academia can work very slowly, but in the large private companies I worked at, A wouldn’t have even met with me in the first place, and in a meeting like this, B and C would have been defending their turf, not opening up other data to me I didn’t even know existed. Don’t give me the story about private enterprise always being more efficient.

      Jobs in Information Technology: May 27, 2015 / LITA

      New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

      New This Week

      Head of Technical Services, University of Arkansas, Fayetteville, AR

      Digital Content Specialist, The Morton Arboretum, Lisle, IL

      Digital Humanities Specialist, Purdue University Libraries, West Lafayette, IN


      Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

      Libraries on the offense in the digital revolution / District Dispatch

      Digital Supplement Image

      ALA released Digital Futures, a new supplement.

      Today, American Libraries magazine launched, Digital Futures a new digital supplement that features articles both on how libraries are innovating and leading, as well as paths ahead for taking the initiative. Digital Futures is the fifth American Libraries magazine supplement on ebooks and digital content.

      “I’m so pleased to see story after story about librarians being proactive related to the opportunities and challenges presented by the digital revolution,” said American Library Association (ALA) President Courtney Young in a press statement. “For example, the National Digital Platform proposed by the Institute of Museum and Library Services (IMLS) will accelerate the necessary trend of increased sharing of technology tools and services across libraries, as discussed in an article by Maura Marx, IMLS acting director, and Trevor Owens, IMLS senior program officer.”

      In the report, two articles focus on particular innovative projects, and a trio of articles hones in on future directions for libraries and ebooks.

      “This report includes eight other articles at the intersection of the publishing and library communities,” wrote Alan S. Inouye, guest editor of the supplement and director of ALA Office for Information Technology Policy, in a blog post for Digital Book World.

      Want to participate in discussions about the burgeoning library ebook lending market? Join the digital content discussion at the 2015 ALA Annual Conference in San Francisco. At the session “Making Progress in Digital Content,” Digital Content Working Group (DCWG) co-chairs Carolyn Anthony and Erika Linke discuss the latest trends and then moderate a panel with Yoav Lorch (TotalBooX) and Monica Sendze (Odilo). The session takes place Sunday, June 28, from 10:30 to 11:30 a.m. in the Moscone Convention Center, West Building, Room 2018. Print copies of this supplement will be available at the session.

      The post Libraries on the offense in the digital revolution appeared first on District Dispatch.

      What I Wish I Had Known Upon Graduating / Roy Tennant

      10521733_10152739661886786_3901096584190426726_nOne of my daughters graduated from college last week (see pic). Call me a proud Dad, as she graduated with top honors (Summa cum laude) from Tulane University in New Orleans. This, while holding down two jobs in her last semester. So like many people who have college, high school, middle school, or whatever graduations in this season of graduations, my thoughts turn to what I may have wished to have known when I was graduating.

      In my case, I’m going to look back at my graduation from library school, which was a Master’s degree from UC Berkeley in 1986. Yes, I really am that old. But let’s not dwell on that.

      Here is what I wished I had known back then:

      • Don’t ever expect to get anything handed to you. So many of the things that ended up making a difference in my career I had to actively pursue or initiate. Frankly, needing to make sure I could support two children (twins) spurred me to go after things simply for the money. However, they also helped me build a career that I wouldn’t have had otherwise.
      • If something does get handed to you, run with it. My best career break came from someone who saw something in me and gave me a chance to prove myself. I ran with it, and never looked back. You should too.
      • Don’t let success, should you be lucky enough to experience it, go to your head. My lucky break turned out to be the chance of a lifetime, and for a while I flirted with the idea of quitting my day job and going out on my own as a speaker/consultant. At least for me, that would have been a disaster, as the opportunities starting drying up and the recession killed whatever was left. I had a family to support, and a paycheck you can count on is worth all kinds of consulting opportunities upon which you can’t necessarily count.
      • Know and be true to yourself. This absurdly general statement is meant to signify knowing who you are willing to work for. As a newly-minted librarian, I flirted with the idea of working for a commercial vendor. But after interviewing, I realized that it really wasn’t for me. Others enjoy it and that is perfectly fine. The point is to know yourself enough to know what is right for you.
      • Expect the unexpected. Again, an absurdly general statement that in this case is meant to signify that whatever you learned in library school will likely be not just out of date in 3-5 years, but perhaps even wrong. I would even say the phrase should be welcome the unexpected, as those who do will inherit the future.
      • Pursue connections with others. As someone who benefited greatly from mentors, I have turned, in my later career, to mentoring others. So you could say that I’ve seen both sides of making connections and I can tell you that they are more meaningful and helpful than you can even imagine. Perhaps I am an extreme case, as I had one mentor who truly launched my career. Unfortunately, I know that I have not had the same effect on those I mentor. But a major part of what I try to do is to bring together young librarians of like mind to help form peer networks that will take them forward long after I have left the scene. You, as a young professional, can pursue these kinds of situations. Look for a seasoned professional who can introduce you to people you should know. Suggest a mentor/mentee relationship. I doubt you will be disappointed.
      • Be good to others. At the end of the day, you need to be able to sleep at night. So whatever life throws at you, try to handle it with grace and be good to your fellow travelers. Besides, you never know when you will need them to lend you a hand.
      • Make bridges, don’t burn them. A corollary to the last point is to be good to the organizations you serve. Do your best work, and if they disdain you, then move on. But don’t make a big deal out of it. You never know what the future may bring and it just might be important that you didn’t disrespect your former employer.
      • Have fun. I’ve often said in many of the speeches I’ve made over the years that if you’re not having fun you aren’t doing it right. I realize that sounds flip, and assumes that everyone can find a job they enjoy, but I happen to think you are worth it. If you aren’t happy doing what you are doing then you should seek out that which makes you happy. Seriously, it’s worth the extra effort. If you find yourself dragging yourself out of bed in the morning, loathing the day you face, then that’s a pretty good sign you need to find something else. Don’t settle without a fight. You owe yourself at least that much.

      I realize that advice is all too easy to give and much more difficult to take to heart. I don’t expect anyone to change their life based on this post. But it makes me feel better to get this down on “paper,” and to be able to point people to it should I ever run into someone who seems like they could use the advice.

      But you’re right, I doubt I would have listened back then either. I needed to learn it on my own, one bloody, painful step at a time. I suppose in the end all we ever need is the ability to make good decisions, given the particular realities that face us at any one point in our lives. And that is perhaps the best possible graduation speech: how to make good decisions, as that is what life tends to throw at you — the need to make good decisions, time and time again.


      A Day in the Life the Metadata Librarian for the Mountain West Digital Library / DPLA

      Anna Neatrour is the Digital Metadata Librarian at the Mountain West Digital Library. In that capacity she works with libraries across the western states to support description and discovery of digital collections.

      In this post, Anna describes one of her typical days as a metadata librarian aggregating data on a regional level and as a Service Hub with DPLA.

      What does a Metadata Librarian do? The over ten million records in the Digital Public Library of America represent the work of countless people collecting, digitizing, and describing unique cultural heritage items. Mountain West Digital Library provides access to over 900,000 records, or about 10% of DPLA’s total collection. So, what does it take to be a metadata services librarian at a large DPLA service hub? Let’s find out.

      8:30-10:30. Evaluate New Collections

      I evaluate new collections from partners throughout Utah, Idaho, Montana, Arizona, and Nevada, and harvest their metadata into the Mountain West Digital Library. The MWDL has a well-established Metadata Application Profile, and I check new collections for conformance with the MWDL community’s shared expectations for descriptive metadata. Sometimes there are adjustments a local collection manager will need to make to field mappings, or values in the metadata that need to be revised or added. MWDL runs on ExLibris’ Primo discovery system, and we harvest collections through OAI-PMH. This means that I spend time checking OAI streams prior to harvesting a new collection. For a new repository I’ll send the collection manager a detailed report with information about what to fix. For long-term, established partners of MWDL, I’ll fire off e-mails with quick suggestions.

      10:30-12:00. MWDL Staff Meeting

      Once a week, our team checks in about current projects, technical troubleshooting, and the status of new collections we are adding.

      12:30-1:30. Web Page Updates for New Collections

      I’ve been working recently on harvesting new collections from the University of Idaho Digital Library, which has a wonderfully eclectic collection of materials that covers a variety of topics including jazz history, forestry, and much more.

      There’s some great graphic design in the Vandal Football Program Covers Collection, like this one which proclaims “Mashed Idahoes Comin’ Up!”

      Football Program. Idaho - Arizona State, 09/28/1957, Goodwin Stadium, Phoenix (Arizona). Courtesy of University of Idaho Library via Mountain West Digital Library.

      Football Program. Idaho – Arizona State, 09/28/1957, Goodwin Stadium, Phoenix (Arizona). Courtesy of University of Idaho Library via Mountain West Digital Library.


      The International Jazz Collections at the University of Idaho are a unique resource, and many of the digitized materials from those collections are available in the DPLA, like this photo of Joe Williams and Count Basie from the Leonard Feather Jazz Collection.

      Joe Williams and Count Basie, 1960. Courtesy of the University of Idaho Library via Mountain West Digital Library.

      Joe Williams and Count Basie, 1960. Courtesy of the University of Idaho Library via Mountain West Digital Library.


      We’ve also added great collections from the Arizona Memory Project, including the Petrified Forest Historic Photographs collection that adds to our existing materials on national parks and recreation in the region. My favorite item in this collection is photograph of Albert Einstein touring the park, a detail of which can be seen above in the header image for this post.

      One of the things I enjoy the most about harvesting new collections into MWDL is seeing how the information available on a particular topic gets augmented and expanded as more items are digitized. For example, many MWDL partners have photos and documents that tell the story of the Saltair Resort on the shores of the Great Salt Lake.

      We have many older photos documenting the history of the resort, but we recently added a selection of color photos from 1965, during the time period after the resort was abandoned, but before it was later destroyed by arson.

      Saltair Pavilion, 1965. Bolam, Harry. Courtesy of Utah Valley University Library via the Mountain West Digital Library.

      Saltair Pavilion, 1965. Bolam, Harry. Courtesy of Utah Valley University Library via the Mountain West Digital Library.


      All of these collections from MWDL then combine to help researchers find even more resources on these topic in DPLA.

      2:00-3:00. Virtual Meeting or Training Support

      I enjoy working with librarians from different institutions across our multi-state region, which means meeting online. The meetings might center on the activities of a MWDL Task Force or time with a librarian needing support.

      3:00-4:00 Technical Troubleshooting

      I check harvested collections after they are imported/ingested into Primo and troubleshoot any issues when necessary. This means checking the PNX (Primo Normalized XML) records in our discovery system to make sure that the harvested metadata will display correctly, and also be available for DPLA to harvest.

      4:00-5:30 PLPP Partner Support

      MWDL is one of the four service hubs working on the Public Libraries Partnerships Project, and while we support all our partners, we are spending extra time helping public librarians who are new to digitization get their first collections online!

      Sharing the digital collections regionally at and nationally through DPLA is extremely rewarding. The next time you find a cool digital item in DPLA, thank your local metadata librarian!

      Featured image: Detail of Dr. and Mrs. Albert Einstein visit Rainbow Forest, date unknown. Courtesy of the National Park Service (AZ) via the Arizona Memory Project and Mountain West Digital Library. 

      cc-by-icon All written content on this blog is made available under a Creative Commons Attribution 4.0 International License. All images found on this blog are available under the specific license(s) attributed to them, unless otherwise noted.

      Call for applications for Data Journalism Philippines 2015 / Open Knowledge Foundation

      Screen Shot 2015-05-27 at 08.14.32

      Open Knowledge in partnership with the Philippine Center for Investigative Journalism is pleased to announce the launch of Data Journalism Ph 2015. Supported by the World Bank, the program will train journalists and citizen media in producing high-quality, data-driven stories.

      In recent years, government and multilateral agencies in the Philippines have published large amounts of data such as the government’s recently launched Open Data platform. These were accompanied by other platforms that track the implementation and expenditure of flagship programs such as Bottom-Up-Budgeting via, Infrastructure via and reconstruction platforms including the Foreign Aid Transparency Hub. The training aims to encourage more journalists to use these and other online resources to produce compelling investigative stories.

      Data Journalism Ph 2015 will train journalists on the tools and techniques required to gain and communicate insight from public data, including web scraping, database analysis and interactive visualization. The program will support journalists in using data to back their stories, which will be published by their media organization over a period of five months.

      Participating teams will benefit from the following:

      • A 3-day data journalism training workshop by the Open Knowledge and PCIJ in July 2015 in Manila
      • A series of online tutorials on a variety of topics from digital security to online mapping
      • Technical support in developing interactive visual content to accompany their published stories

      Apply now!

      Teams of up to three members working with the same print, TV, or online media agencies in the Philippines are invited to submit an application here.

      Participants will be selected on the basis of the data story projects they pitch focused on key datasets including infrastructure, reconstruction, participatory budgeting, procurement and customs. Through Data Journalism Ph 2015 and its trainers, these projects will be developed into data stories to be published by the participants’ media organizations.

      Join the launch

      Open Knowledge and PCIJ will host a half-day public event for those interested in the program in July in Quezon City. If you would like to receive full details about the event, please sign up here.

      To follow the programme as it progresses go to the Data Journalism 2015 Ph project website.

      Mathematics and sculpture / William Denton

      I’m reading The Best Writing on Mathematics 2011, edited by Mircea Pitici, catching up on an older instalment of an excellent series. Once I discovered it I made it an annual purchase. Every year there’s a wonderful range of excellent writing about many different aspects of mathematics.

      The 2011 book has a chapter by Helaman Ferguson and Claire Ferguson, a married couple. The chapter doesn’t explain how they work together, but Helaman Ferguson is a mathematician and sculptor. One of his major math achievements is the PSLQ algorithm. His gallery shows his sculptures, including Umbilic Torus NC (1987), which he talks about in the chapter, and related pieces.

      I’ve never read anything by someone working at such levels of mathematics and art before. Some quotes:

      Stone is one of my favorite media. Maybe I choose stone because I was raised by a stone mason who saw beauty in common field stones. My aesthetic choices include geological age, provenance, and subtraction. We learn addition and then we learn subtraction. Subtraction is harder, isn’t it?

      Mathematicians are notorious for wanting to do things themselves, prove their own theorems, or prove other people’s theorems without looking at the known proofs. Sculptors tend to the opposite. Most stone-carving today is like a glamorous rock-music recording production; artists with enough money job it out—outsource. The question for me is “job out what?” How do I job out C functions? Negative Gaussian curvature? More important, having done so, what have I learned?

      If someone digs up my theorems in stone in a few thousand years, I expect that the excavator can decode what I have encoded and continue celebrating mathematics.

      My Fibonacci Fountain contains over 45 tons of billion-year-old Texas granite. It stands 18 feet above the water, supported underwater by concrete and steel to a depth of 14 feet, which is supported in turn by 28 pilings in 40 feet of mud. When the test cores were drilled no bedrock was found.

      To do architectural-size sculpture, I find friends with huge cranes.

      I used a seventy-ton crane to lift my block off the truck and ease it down into my studio, where I could finally sink my diamonds into it.

      Carving uses many tools, my diamond chainsaw principal among them.

      My current sculpture studio is in an industrial park in Baltimore, Maryland. My studio volume is 45,500 cubic feet. My “tool box” is a shipping container, which when filled with hand tools weight 14,000 pounds. As I sit here, I feel in my mind my thirteen-ton block of beautiful billion-year-old Texas red granite, and my fingers sweat. This raw granite block compels me to think of the right timeless theorems. The time is now.

      Bookmarks for May 26, 2015 / Nicole Engard

      Today I found the following resources and bookmarked them on Delicious.

      • Open Hub, the open source network
        Discover, Track and Compare Open Source
      • Arches: Heritage Inventory & Management System
        Arches is an innovative open source software system that incorporates international standards and is built to inventory and help manage all types of immovable cultural heritage. It brings together a growing worldwide community of heritage professionals and IT specialists. Arches is freely available to download, customize, and independently implement.

      Digest powered by RSS Digest

      The post Bookmarks for May 26, 2015 appeared first on What I Learned Today....

      Learn to Teach Coding – Webinar Recording / LITA

      Tuesday May 26, 2015.

      Today we had a lively half hour free webinar presentation by Kimberly Bryant and Lake Raymond from Black Girls CODE about their latest efforts and the exciting LITA preconference they will be giving at ALA Annual in San Francisco. Here’s the link to the recording from todays session:

      LITA Learn to Teach Coding Free information webinar recording, May 26, 2015

      For more information check out the previous LITA Blog entry:

      Did you attend the webinar, or view the recording?  Give us your feedback by taking the Evaluation Survey.

      Learn to Teach Coding and Mentor Technology Newbies – in Your Library or Anywhere!

      Then register for and attend the LITA preconference at ALA Annual. This opportunity is following up on the 2014 LITA President’s Program at ALA Annual where then LITA President Cindi Trainor Blyberg welcomed Kimberly Bryant, founder of Black Girls Code.


      The Black Girl Code Vision is to increase the number of women of color in the digital space by empowering girls of color ages 7 to 17 to become innovators in STEM fields, leaders in their communities, and builders of their own futures through exposure to computer science and technology.

      Presidents and Their Libraries / DPLA

      To bring together the records of the past and to house them in buildings where they will be preserved for the use of men and women in the future, a Nation must believe in three things.

      It must believe in the past.

      It must believe in the future.

      It must, above all, believe in the capacity of its own people so to learn from the past that they can gain in judgement in creating their own future.”

      – Franklin Roosevelt At the dedication of his library on June 30, 1941

      Earlier this month it was announced the President Barack Obama’s Presidential Library will be built on the south side of Chicago. It will be our 14th Presidential Library.

      The idea originated with FDR who in his second term “on the advice of noted historians and scholars, established a public repository to preserve the evidence of the Presidency for future generations”

      Then in 1955, Congress passed the Presidential Libraries Act, establishing a system of privately erected and federally maintained libraries.

      Here’s a sampling  of images from the Digital Public Library of America related to our presidents and their libraries. Enjoy!

      JFK Library and Museum in Boston. Courtesy University of Illinois at Urbana-Champaign.

      JFK Library and Museum in Boston. Courtesy of the University of Illinois at Urbana-Champaign University Library.

      FDR laying the cornerstone of his presidential library,

      FDR laying the cornerstone of his presidential library. Courtesy of the Franklin D. Roosevelt Presidential Library and Museum via the Empire State Digital Network.

      Herbert Hoover Presidential Library, West Branch, Iowa. Courtesy of the University of Illinois at Urbana-Champaign University Library.

      Herbert Hoover Presidential Library, West Branch, Iowa. Courtesy of the University of Illinois at Urbana-Champaign University Library.

      Jimmy Carter Library and Museum. Courtesy of the Jimmy Carter Library via the Digital Library of Georgia

      Jimmy Carter Library and Museum. Courtesy of the Jimmy Carter Library via the Digital Library of Georgia

      Presidential Room at The Eisenhower Presidential Library, Abilene, Kansas. Courtesy of the University of Illinois at Urbana-Champaign University Library.

      Presidential Room at The Eisenhower Presidential Library, Abilene, Kansas. Courtesy of the University of Illinois at Urbana-Champaign University Library.

      Richard Nixon Library & Birthplace Site model, 1971. Photo by Julius Schulman. Courtesy of the J. Paul Getty Trust.

      Richard Nixon Library & Birthplace Site model, 1971. Photo by Julius Schulman. Courtesy of the J. Paul Getty Trust.

      Inside the Harry S. Truman Presidential Library. Courtesy of the University of Illinois at Urbana-Champaign University Library.

      Inside the Harry S. Truman Presidential Library. Courtesy of the University of Illinois at Urbana-Champaign University Library.

      Former President Gerald R. Ford and his Cabinet officers at the dedication of the Gerald R. Ford Library in Ann Arbor, Michigan, April 27-28, 1981. Courtesy of the Georgia State University Libraries Special Collections via the Digital Library of Georgia

      Former President Gerald R. Ford and his Cabinet officers at the dedication of the Gerald R. Ford Library in Ann Arbor, Michigan, April 27-28, 1981. Courtesy of the Georgia State University Libraries Special Collections via the Digital Library of Georgia

      Mourners pay their final respects to former US President Ronald Reagan as his body lay in repose inside a flag draped coffin at the Ronald Reagan Presidential Library. Courtesy of the National Archives and Records Administration.

      Mourners pay their final respects to former US President Ronald Reagan as his body lay in repose inside a flag draped coffin at the Ronald Reagan Presidential Library. Courtesy of the National Archives and Records Administration.

      HathiTrust Resource Center Workset Browser / Eric Lease Morgan

      In my copious spare time I have hacked together a thing I’m calling the HathiTrust Research Center Workset Browser, a (fledgling) tool for doing “distant reading” against corpora from the HathiTrust. [1]

      The idea is to: 1) create, refine, or identify a HathiTrust Research Center workset of interest — your corpus, 2) feed the workset’s rsync file to the Browser, 3) have the Browser download, index, and analyze the corpus, and 4) enable to reader to search, browse, and interact with the result of the analysis. With varying success, I have done this with a number of worksets ranging on topics from literature, philosophy, Rome, and cookery. The best working examples are the ones from Thoreau and Austen. [2, 3] The others are still buggy.

      As a further example, the Browser can/will create reports describing the corpus as a whole. This analysis includes the size of a corpus measured in pages as well as words, date ranges, word frequencies, and selected items of interest based on pre-set “themes” — usage of color words, name of “great” authors, and a set of timeless ideas. [4] This report is based on more fundamental reports such as frequency tables, a “catalog”, and lists of unique words. [5, 6, 7, 8]


      The whole thing is written in a combination of shell and Python scripts. It should run on just about any out-of-the-box Linux or Macintosh computer. Take a look at the code. [9] No special libraries needed. (“Famous last words.”) In its current state, it is very Unix-y. Everything is done from the command line. Lot’s of plain text files and the exploitation of STDIN and STDOUT. Like a Renaissance cartoon, the Browser, in its current state, is only a sketch. Only later will a more full-bodied, Web-based interface be created.

      The next steps are numerous and listed in no priority order: putting the whole thing on GitHub, outputting the reports in generic formats so other things can easily read them, improving the terminal-based search interface, implementing a Web-based search interface, writing advanced programs in R that chart and graph analysis, provide a means for comparing & contrasting two or more items from a corpus, indexing the corpus with a (real) indexer such as Solr, writing a “cookbook” describing how to use the browser to to “kewl” things, making the metadata of corpora available as Linked Data, etc.

      ‘Want to give it a try? For a limited period of time, go to the HathiTrust Research Center Portal, create (refine or identify) a collection of personal interest, use the Algorithms tool to export the collection’s rsync file, and send the file to me. I will feed the rsync file to the Browser, and then send you the URL pointing to the results. [10] Let’s see what happens.

      Fun with public domain content, text mining, and the definition of librarianship.


      1. HTRC Workset Browser –
      2. Thoreau –
      3. Austen –
      4. Thoreau report –
      5. Thoreau dictionary (frequency list) –
      6. usage of color words in Thoreau —
      7. unique words in the corpus –
      8. Thoreau “catalog” —
      9. source code –
      10. HathiTrust Research Center –

      Bad incentives in peer-reviewed science / David Rosenthal

      The inability of the peer-review process to detect fraud and error in scientific publications is getting some mainstream attention. Adam Marcus and Ivan Oransky, the founders of Retraction Watch, had an op-ed in the New York Times entitled What's Behind Big Science Frauds?, in which they neatly summed up the situation:
      Economists like to say there are no bad people, just bad incentives. The incentives to publish today are corrupting the scientific literature and the media that covers it. Until those incentives change, we’ll all get fooled again.
      Earlier this year I saw Tom Stoppard's play The Hard Problem at the Royal National Theatre, which deals with the same issue. The tragedy is driven by the characters being entranced by the prospect of publishing an attention-grabbing result. Below the fold, more on the problem of bad incentives in science.

      Back in April, after a Wellcome Trust symposium on the reproducibility and reliability of biomedical science, Richard Horton, editor of The Lancet, wrote an editorial entitled What is medicine’s 5 sigma? that is well worth a read. His focus is also on incentives for scientists:
      In their quest for telling a compelling story, scientists too often sculpt data to fit their preferred theory. Or they retrofit hypotheses to fit their data.
      and journal editors:
      Our acquiescence to the impact factor fuels an unhealthy competition to win a place in a select few journals. Our love of "significance" pollutes the literature with many a statistical fairy-tale. We reject important confirmations.
      and Universities:
      in a perpetual struggle for money and talent, endpoints that foster reductive metrics, such as high-impact publication. National assessment procedures, such as the Research Excellence Framework, incentivise bad practices.
      Horton points out that:
      Part of the problem is that no-one is incentivised to be right. Instead, scientists are incentivised to be productive and innovative.
      He concludes:
      The good news is that science is beginning to take some of its worst failings very seriously. The bad news is that nobody is ready to take the first step to clean up the system.
      Six years ago Marcia Angell, the long-time editor of a competitor to The Lancet wrote in an review of three books pointing out the corrupt incentives that drug companies provide researchers and Universities:
      It is simply no longer possible to believe much of the clinical research that is published, or to rely on the judgment of trusted physicians or authoritative medical guidelines. I take no pleasure in this conclusion, which I reached slowly and reluctantly over my two decades as an editor of The New England Journal of Medicine.
      In most fields, little has changed since then. Horton points to an exception:
      Following several high-profile errors, the particle physics community now invests great effort into intensive checking and re-checking of data prior to publication. By filtering results through independent working groups, physicists are encouraged to criticise. Good criticism is rewarded. The goal is a reliable result, and the incentives for scientists are aligned around this goal.
      Unfortunately, particle physics is an exception. The cost of finding the Higgs Boson was around $13.25B, but no-one stood to make a profit from it. A single particle physics paper can have over 5,000 authors. The resources needed for "intensive checking and re-checking of data prior to publication" are trivial by comparison. In other fields, the incentives for all actors are against devoting resources which would represent a significant part of the total for the research to such checking.

      Fixing these problems of science is a collective action problem; it requires all actors to take actions that are against their immediate interests roughly simultaneously. So nothing happens, and the long-term result is, as Arthur Caplan (of the Division of Medical Ethics at NYU's Langone Medical Center) pointed out, a total loss of science's credibility:
      The time for a serious, sustained international effort to halt publication pollution is now. Otherwise scientists and physicians will not have to argue about any issue—no one will believe them anyway.
      (see also John Michael Greer). I am not optimistic, based on the fact that the problem has been obvious for many years, and that this is but one aspect of society's inability to deal with long-term problems.

      Metadata normalization as an indicator of quality? / Mark E. Phillips

      Metadata quality and assessment is a concept that has been around for decades in the library community.  Recently it has been getting more interest as new aggregations of metadata become available in open and freely reusable ways such as the Digital Public Library of America (DPLA) and Europeana.  Both of these groups make available their metadata so that others can remix and reuse the data in new ways.

      I’ve had an interest in analyzing the metadata in the DPLA for a while and have spent some time working on the subject fields.  This post will continue along those lines in trying to figure out what some of the metrics that we can calculate with the DPLA dataset that we can use to define “quality”.  Ideally we will be able to turn these assessments and notions of quality into concrete recommendations for how to improve metadata records in the originating repositories.

      This post will focus on normalization of subject strings, and how those normalizations might be useful as a way of assessing quality of a set of records.

      One of the the powerful features of OpenRefine is the ability to cluster a set or data and combine these clusters into a single entry.  Often times this will significantly reduce the number of values that occur in a dataset in a quick and easy manner.

      OpenRefine Cluster and Edit Screen Capture

      OpenRefine Cluster and Edit Screen Capture

      OpenRefine has a number different algorithms that can be used for this work that are documented in their Clustering in Depth documentation.  Depending on ones data one approach may perform better than others for this kind of clustering.


      Case normalization is probably the easiest to kind of normalization to understand.  If you have two strings,  say “Mark” and “marK” if you converted each of the strings to lowercase you would end up with a single value of “mark”. Many more complicated normalizations assume this as a start because it reduces the number of subjects without drastically transforming the original string values.

      Case folding is another kind of transformation that is fairly common in the world of libraries.  This is the process of taking a string like “José” and converting it to “Jose”.  While this can introduce issues if a string is meant to have a diacritic and that diacritic makes the word or phrase different than the one without the diacritic, often times it can help to normalize inconsistently notated versions of the same string.

      In addition to case folding and lower casing, libraries have been normalizing data for a long time,  there have been efforts in the past to formalize algorithms for the normalization of subject strings for use in matching these strings.  Often referred to as NACO normalizations rules, they are Authority File Comparison Rules.  I’ve always found this work to be intriguing and have a preference for the work and simplified algorithm that was developed at OCLC in their NACO Normalization Service.  In fact we’ve taken the sample Python implementation there and created a stand-alone repository and project called pynaco on GitHub for the code so that we could add tests and then work to port it Python 3 in the near future.

      Another common type of normalization that is performed on strings in library land is stemming. This is often done within search applications so that if you search one of the phrases run, runs, running you would get documents that contain each of these.

      What I’ve been playing around with is if we could use the reduction in unique terms for a field in a metadata repository as an indicator of quality.

      Here is an example.

      If we have the following sets of subjects:

       Musical Instruments
       Musical Instruments.
       Musical instrument
       Musical instruments
       Musical instruments,
       Musical instruments.

      If you applied the simplified NACO normalization from pynaco you would end up with the following strings:

      musical instruments
      musical instruments
      musical instrument
      musical instruments
      musical instruments
      musical instruments

      If you then applied the porter stemming algorithm to the new set of subjects you would end up with the following:

      music instrument
      music instrument
      music instrument
      music instrument
      music instrument
      music instrument

      So in effect you have normalized the original set of six unique subjects down to one unique subject strings with a NACO transformation followed by a normalization with the Porter Stemming algorithm.


      In some past posts here, here, here, and here, I discussed some of the aspects of the subject fields present in Digital Public Library of America dataset.  I dusted that dataset off and extracted all of the subjects from the dataset so that I could work with them by themselves.

      I ended up with a set of text files that were 23,858,236 lines long that held the item identifier and a subject value for each subject of each item in the DPLA dataset. Here is a short snippet of what that looks like.

      d8f192def7107b4975cf15e422dc7cf1 Hoggson Brothers
      d8f192def7107b4975cf15e422dc7cf1 Bank buildings--United States
      d8f192def7107b4975cf15e422dc7cf1 Vaults (Strong rooms)
      4aea3f45d6533dc8405a4ef2ff23e324 Public works--Illinois--Chicago
      4aea3f45d6533dc8405a4ef2ff23e324 City planning--Illinois--Chicago
      4aea3f45d6533dc8405a4ef2ff23e324 Art, Municipal--Illinois--Chicago
      63f068904de7d669ad34edb885925931 Building laws--New York (State)--New York
      63f068904de7d669ad34edb885925931 Tenement houses--New York (State)--New York
      1f9a312ffe872f8419619478cc1f0401 Benedictine nuns--France--Beauvais

      Once I have the data in this format I could experiment with different normalizations to see what kind of effect they had on the dataset.

      Total vs Unique

      The first thing I did was to make the 23,858,236 long text file only contain unique values.  I do this with the tried and true method of using unix sort and uniq. 

      sort subjects_all.txt | uniq > subjects_uniq.txt

      After about eight minutes of waiting I ended up with a new text file subjects_uniq.txt that contains the unique subject strings in the dataset. There are a total of 1,871,882 unique subject strings in this file.

      Case folding

      Using a Python script to perform case folding on each of the unique subjects I’m able to see is that causes a reduction in the number of unique subjects.

      I started out with 1,871,882 unique subjects and after case folding ended up with 1,867,129 unique subjects.  That is a difference of 4,753 or a 0.25% reduction in the number of unique subjects.  So nothing huge.


      The next normalization tested was lowercasing of the values.  I chose to do this on the set of subjects that were already case folded to take advantage of the previous reduction in the dataset.

      By converting the subject strings to lowercase I reduced the number of unique case folded subjects from 1,867,129 to 1,849,682 which is a reduction of 22,200 or a 1.2% reduction from the original 1,871,882 unique subjects.

      NACO Normalization

      Next we look at the simple NACO normalization from pynaco.  I applied this to the unique lower cased subjects from the previous step.

      With the NACO normalization,  I end up with 1,826,523 unique subject strings from the 1,849,682 that I started with from the lowercased subjects.  This is a difference of 45,359 or a 2.4% reduction from the original 1,871,882 unique subjects.

      Porter stemming

      Moving along,  I looked at for this work was applying the Porter Stemming algorithm to the output of the NACO normalized subjects from the previous step.  I used the Porter implementation from the Natural Language Tool Kit (NLTK) for Python.

      With the Portal stemmer applied,  I ended up with 1,801,114 unique subject strings from the 1,826,523 that I started with from the NACO normalized subjects. This is a difference of 70,768 or a 3.8% reduction from the original 1,871,882 unique subjects.


      Finally I used a python porting of the fingerprint algorithm that OpenRefine uses for its clustering feature.  This will help to normalize strings like “phillips mark” and “mark phillips” into a single value of “mark phillips”.  I used the output of the previous Porter stemming step as the input for this normalization.

      With the fingerprint algorithm applied, I ended up with 1,766,489 unique fingerprint normalized subject strings. This is a difference of 105,393 or a 5.6% reduction from the original 1,871,882 unique subjects.


      Reduction Occurrences Percent Reduction
      Unique 0 1,871,882 0%
      Case Folded 4,753 1,867,129 0.3%
      Lowercase 22,200 1,849,682 1.2%
      NACO 45,359 1,826,523 2.4%
      Porter 70,768 1,801,114 3.8%
      Fingerprint 105,393 1,766,489 5.6%


      I think that it might be interesting to apply this analysis to the various Hubs in the whole DPLA dataset to see if there is anything interesting to be seen across the various types of content providers.

      I’m also curious if there are other kinds of normalizations that would be logical to apply to the subjects that I’m blanking on.  One that I would probably want to apply at some point is the normalization for LCSH that splits a subject into its parts if it has the double hype — in the string.  I wrote about the effect on the subjects for the DPLA dataset in a previous post.

      As always feel free to contact me via Twitter if you have questions or comments.

      Advancing Patron Privacy on Vendor Systems with a Shared Understanding / Peter Murray

      Last week I had the pleasure of presenting a short talk at the second virtual meeting of the NISO effort to reach a Consensus Framework to Support Patron Privacy in Digital Library and Information Systems. The slides from the presentation are below and on SlideShare, followed by a cleaned-up transcript of my remarks.

      It looks like in the agenda that I’m batting in the clean-up role, and my message might be pithily summarized as “Can’t we all get along?” A core tenet of librarianship — perhaps dating back to the 13th and 14th century when this manuscript was illuminated — is to protect the activity trails of patrons from unwarranted and unnecessary disclosure.

      This is embedded in the ethos of librarianship. As Todd pointed out in the introduction, third principle of the American Library Association’s Code of Ethics states: “We protect each library user’s right to privacy and confidentiality with respect to information sought or received and resources consulted, borrowed, acquired or transmitted.” Librarians have performed this duty across time and technology, and as both have progressed the profession has sought new ways to protect the privacy of patrons.

      For instance, there was once a time when books had a pocket in the back that held a card showing who had checked out the book and when it was due. Upon checkout the card was taken out, had the patron’s name embossed or written on it, and was stored in a date-sorted file so that the library knew when it was due and who had it checked out. When the book was returned, the name was scratched through before putting the card in the pocket and the book on the shelf. Sometimes, as a process shortcut, the name was left “in the clear” on the card, and anyone that picked the book off the shelf could look on the card to see who had checked it out.

      When libraries automated their circulation management with barcodes and database records, the card in the back of the book and the information it disclosed was no longer necessary. This was hailed as one of the advantages to moving to a computerized circulation system. While doing away with circulation cards eliminated one sort of privacy leakage — patrons being able to see what each other had checked out — it enabled another: systematic collection of patron activity in a searchable database. Many automation systems put in features that automatically removed the link between patron and item after it was checked in. Or, if that information was stored for a period of time, it was password protected so only approved staff could view the information. Some, however, did not, and this became a concern with the passage of the USA PATRIOT act by the United States Congress.

      We are now in an age where patron activity is scattered across web server log files, search histories, and usage analytics of dozens of systems, some of which are under the direct control of the library while others are in the hands of second and third party service providers. Librarians that are trying to do their due diligence in living up to the third principle of the Code of Ethics have a more difficult time accounting for all of the places where patron activity is collected. It has also become more difficult for patrons to make informed choices about what information is collected about their library activity and how it is used.

      In the mid-2000s, libraries and content providers had a similar problem: the constant one-off negotiation of license terms was a burden to all parties involved. In order to gain new efficiencies in the process of acquiring and selling licensed content, representatives from the library and publisher communities came together under a NISO umbrella to reach a shared understanding of what the terms of an agreement would be and a registry of organizations that ascribed to those terms. Quoting from the forward of the 2012 edition: “The Shared Electronic Resource Understanding (SERU) Recommended Practice offers a mechanism that can be used as an alternative to a license agreement. The SERU statement expresses commonly shared understandings of the content provider, the subscribing institution and authorized users; the nature of the content; use of materials and inappropriate uses; privacy and confidentiality; online performance and service provision; and archiving and perpetual access. Widespread adoption of the SERU model for many electronic resource transactions offers substantial benefits both to publishers and libraries by removing the overhead of bilateral license negotiation.”

      One of SERU’s best qualities is its brevity, and that is likely a significant factor in its success. For instance, the “Confidentiality and Privacy” section states — in its entirety — these two sentences: “The acquiring institution and the provider respect the privacy of the users of the content and will not disclose or distribute personal information about the user to any third party without the user’s consent unless required to do so by law. The provider should develop and post its privacy policy on its website.” As the complexity of the online information landscape increased, this two sentence paragraph is not sufficient to describe an understanding between library and information provider. Here are some examples of this complexity.

      One of the features of the HTTP protocol — the mechanism used by web browsers to get content from web servers — is for the browser to tell the server how it knew to ask for the web page or image file or JavaScript file on that server. This is called the “Referer” header. Does your library catalog include a link to add a book to an Amazon wishlist? Does your library catalog page load a book cover image from Syndetic Solutions? If so, the address of the catalog page is included in those HTTP transactions with Amazon and Syndetic Solutions as the “Referer” header. What is in that library catalog URL? Are the patron’s search terms in that link? Is there personally identifiable information?

      Today’s web service is filled with social sharing widgets (Facebook, Twitter, and the like), web analytics tools (Google Analytics), and content from advertising syndicates. While these tools provide useful services to the patrons, libraries and service providers, they also become centralized points of data gathering that can aggregate a user’s activity across the web. Does your library catalog page include a Facebook “Like” button? Whether or not the patron clicks on that button, Facebook knows that user has browsed to that web page and can gleen details of user behavior from that. Does your service use Google Analytics to understand user behavior and demographics? Google Analytics tracks user behavior across an estimated one half of the sites on the internet. Your user’s activity as a patron of your services is commingled with their activity as a general user.

      A “filter bubble” is phrase coined by Eli Pariser to describe a system that adapts its output based on what it knows about a user: location, past searches, click activity, and other signals. The system is using these signals to deliver what it deems to be more relevant information to the user. In order to do this, the system must gather, store and analyze this information from patrons. However, a patron may not want his or her past search history to affect their search results. Or, even worse, when activity is aggregated from a shared terminal, the results can be wildly skewed.

      Simply using a library-subscribed service can transmit patron activity and intention to dozens of parties, and all of it invisible to the user. To uphold that third principle in the ALA Code of Ethics, librarians need to examine the patron activity capturing practices its information suppliers, and that can be as unwieldy as negotiating bilateral license agreements between each library and supplier. If we start from the premise that libraries, publishers and service providers want to serve the the patron’s information needs while respecting their desire to do so privately, what is needed is a shared understanding of how patron activity is captured, used, and discarded. A new gathering of librarians and providers could accomplish for patron activity what they did for electronic licensing terms a decade ago. One could imagine discussions around these topics:

      What Information is Collected From the Patron: When is personally identifiable information captured in the process of using the provider’s service. How is activity tagged to a particular patron — both before and after the patron identifies himself or herself? Are search histories stored? Is the patron activity encrypted — both in transit on the network and at rest on the server?

      What Activity That Can Be Gleaned by Other Parties: If a patron follows a link to another website, how much of the context of the patron’s activity is transferred to the new website. Are search terms included in the URL? Is personally identifiable information in the URL? Does the service provider employ social sharing tools or third party web analytics that can gather information about the patron’s activity? Such activity could include IP address (and therefore rough geolocation), content of the web page, cross-site web cookies, and so forth.

      How does patron activity influence service delivery: Is relevancy ranking altered based on the past activity of the user? Can the patron modify the search history to remove unwanted entries or segregate research activities from each other?

      What is the disposition of patron activity data: Is a patron activity data anonymized and co-mingled with others? How is that information used and to whom is it disclosed? How long does the system keep patron activity data? Under what conditions would a provider release information to third parties?

      It is arguably the responsibility of libraries to protect patron activity data from unwarranted collection and distribution. Service providers, too, want clear guidance from libraries so they can efficiently expend their efforts to develop systems that librarians feel comfortable promoting. To have each library and service provider audit this activity for each bilateral relationship would be inefficient and cumbersome. By coming to a shared understanding of how patron activity data is collected, used, and disclosed, libraries and service providers can advance their educational roles and offer tools to patrons to manage the disclosure of their activity.

      MarcEdit 6 update / Terry Reese

      I’ve been working hard on making a few changes to a couple of the MarcEdit internal components to improve the porting work.  To that end, I’ve posted an update that targets improvements to the Deduping and the Merging tools.


      • Update: Dedup tool — improves the handling of qualified data in the 020, 022, and 035.
      • Update: Merge Records Tool — improves the handling of qualified data in the 020, 022, and 035.

      Downloads can be picked up using the automated update tool or by going to:


      Ramping up negotiation skills to advance library agenda / District Dispatch

      SF from Marin Highlands

      From the boardroom to City Hall, powerful negotiation skills make a big difference in advancing library goals. Power up your ability to persuade at the 2015 American Library Association (ALA) Annual Conference interactive session “The Policy Revolution! Negotiating to Advocacy Success!” 1:00 to 2:30 p.m. on Saturday, June 27, 2015. The session will be held at the Moscone Convention Center in room 2016 of the West building.

      American Library Association Senior Policy Counsel Alan Fishel will bring nearly 30 years of legal practice and teaching effective and creative negotiation to the program. Bill & Melinda Gates Foundation Senior Program Officer Chris Jowaisas will share his experience advocating for and advancing U.S. and global library services. From securing new funding to negotiating licenses to establishing mutually beneficial partnerships, today’s librarians at all levels of service are brokering support for the programs, policies and services needed to meet diverse community demands. The session will jump off from a new national public policy agenda for U.S. libraries to deliver new tools you can use immediately at the local, state, national and international levels.

      The Policy Revolution! initiative aims to advance national policy for libraries and our communities and campuses. The grant-funded effort focuses on establishing a proactive policy agenda, engaging national decision makers and influencers, and upgrading ALA policy capacity.

      Speakers include Larra Clark, deputy director, ALA Office for Information Technology Policy; Alan G. Fishel, partner, Arent Fox; and Chris Jowaisas, senior program officer, Bill and Melinda Gates Foundation.

      View all ALA Washington Office conference sessions

      The post Ramping up negotiation skills to advance library agenda appeared first on District Dispatch.

      Islandora Ontology / Islandora

      While working on the migration mappings for fcrepo3->fcrepo4 properties, I documented all known RELS-EXT and RELS-INT predicates in the Islandora 7.x-1.x code base. The predicates came from two namespaces; fedora and islandora.

      The fedora namespace has a published ontology that we use -- relations-external -- and that can be referenced. However, the islandora namespace did not have any published ontologies associated with it.

      That said, I have worked over the last couple of weeks with some very helpful folks on drafting initial version of Islandora RELS-EXT and RELS-INT ontologies, and the Islandora Roadmap Committee voted that it should be published. The published version of the RELS-EXT ontology can be viewed here, and the published version of the RELS-INT ontology can be viewed here. In addition, the ontologies were drafted in rdfs, and include a handy rdf2html.xsl to quickly create a publishable html version. This available on GitHub.

      What does this all mean?

      We have now documented what we have been doing for the last number of years, and we have a referencable version of our ontologies. In addition, this is extremely helpful for referencing and documenting predicates that will be apart of an fcrepo3-fcrepo4 migration.

      What's next?

      The initial versions of each ontology have proposed rdfs comments, ranges and and skos *matches for a number of predicates. However, this is by no means complete, and I would love to see some community input/feedback on rdfs comments, ranges, additional skos *matches, or anything else that you think should be included in the RELS-EXT ontology.

      How to provide feedback?

      I'd like to have everything handled through 'issues' on the GitHub repo. If you comfortable with forking and creating pull requests, by all means do so. If you're more comfortable with replying here, that's works as well. All contributions are welcome! The key thing -- for me at least -- is to have community consensus around our understanding of these documented predicates :-)


      I have not licensed the repository yet. I had planned on using the Apache 2.0 License as is done with PCDM, but I'd like your thoughts/opinions on proceeding before I make a LICENSE commit.


      I hope I have covered it all. But, if you have have any questions, don't hesitate to ask.