Cybersecurity is an interesting and important topic, one closely connected to those of online privacy and digital surveillance. Many of us know that it is difficult to keep things private on the Internet. The Internet was invented to share things with others quickly, and it excels at that job. Businesses that process transactions with customers and store the information online are responsible for keeping that information private. No one wants social security numbers, credit card information, medical history, or personal e-mails shared with the world. We expect and trust banks, online stores, and our doctor’s offices to keep our information secure and safe.
Image from Flickr – https://www.flickr.com/photos/topgold/4978430615
Cybersecurity vs. Usability
To prevent such a data breach, institutional IT staff are trained to protect their systems against vulnerabilities and intrusion attempts. Employees and end users are educated to be careful about dealing with institutional or customers’ data. There are systematic measures that organizations can implement such as two-factor authentication, stringent password requirements, and locking accounts after a certain number of failed login attempts.
While these measures strengthen an institution’s defense against cyberattacks, they may negatively affect the usability of the system, lowering users’ productivity. As a simple example, security measures like CAPTCHAs can cause an accessibility issue. As another example, the USPS website does not provide a way for a user who forgot the password to reset the password at all. I do not know if this is for a security reason, but let’s assume for a second that it is the case. Clearly, the system that does not allow a password reset would be more secure than the one that does since it makes it impossible for anyone to pretend someone else without knowing the password. But needless to say, this security measure creates a huge usability issue for average users who often forget their own passwords and are locked out of the system permanently as a result.
Imagine that a university IT office is concerned about the data security of cloud services and requires all faculty, students, and staff to only use cloud services that are SOC 2 Type II certified. SOC stands for “Service Organization Controls.” They are a series of standards that measures how well a given service organization keeps its information secure. For a business to be SOC 2 certified, it must demonstrate that it has sufficient policies and strategies that will satisfactorily protect its clients’ data in five areas known as “Trust Services Principles,” which include the security of the service provider’s system, the processing integrity of this system, the availability of this system, the privacy of personal information that the service provider collects, retains, uses, discloses and disposes of for user entities, and the confidentiality of the information that the service provider’s system processes or maintains for user entities. And the SOC 2 Type II certification means that the business had maintained relevant security policies and procedures over a period of at least six months and therefore will keep the clients’ sensitive data secure. The Dropbox for Business product is SOC 2 certified but it costs money. While Dropbox is not as secure but many faculty, students, and staff in academic use this cloud service frequently. If a university IT department bans people from using Dropbox and does not offer an alternative that is as easy to use as Dropbox, people will undoubtedly suffer.
Or suppose that your organization requires you to reset the password to your computer and all the various systems you have to log in every week, your PC, the network that it is connected to, and those other systems may be more secure. But it will be a nightmare having to manage and reset all those passwords every week. Most likely, people will start using less complicated passwords or may even start using one password for all of them across different services and may stick to the same password every time the system requires them to reset it if the system does not prevent it.
Security is important, but users also want to be able to do their job without being bogged down by unwieldy cybersecurity measures. The more user-friendly and the simpler the cybersecurity guidelines are to follow, the more users will observe them, thereby resulting in a secure system. Users who encounter cumbersome and complicated security measures, may ignore or try to bypass them, increasing security risks.
Image from Flickr – https://www.flickr.com/photos/topgold/4978430615
The invasion of privacy and the lack of transparency in these network monitoring programs has caused great controversy. Such wide and indiscriminate monitoring programs must have a very good justification and offer clear answers to vital questions regarding what exactly will be collected, who will have access to the information, when and how the information will be used, what controls will be put in place to prevent information from being used for unrelated purposes, and how the information will be disposed of.
We have recently seen another case in which security concerns conflicted with privacy. In February 2016, the FBI requested Apple to create a backdoor application that will bypass the current security measure that in in place in iOS. This was because the FBI wanted to unlock an iPhone 5C recovered from one of the shooters in San Bernadino shooting incident. Apple iOS secures users’ devices by permanently erasing all data when a wrong password is entered more than ten times. The FBI’s request was met with strong opposition from Apple and others. Such a backdoor application can easily be used for illegal purposes by criminals or used for unjustified privacy infringement by the government or other capable parties. Apple refused to comply with the request, and the court hearing was to take place in March 22. But the FBI withdrew the request saying that it found a way to hack into the phone in question without Apple’s help. Now, Apple has to find out what the vulnerability in its iOS if it wants its encryption mechanism to be foolproof. In the meanwhile, iOS users know that their data on devices are no longer as secure as they believed.
Around the same time as this FBI-Apple encryption case, the Senate’s draft bill title as “Compliance with Court Orders Act of 2016,” proposed that people should be required comply with any authorized court order for data—and if that data is “unintelligible,” that is, encrypted, it must be decrypted for the court. This bill is problematic because it makes any end-to-end encryption, which we use everyday from our iPhones to messaging services like Whatsapp and Signal, practically illegal.
Because security is essential to privacy, it is ironic that certain cybersecurity measures can be used to greatly invade privacy rather than protect it. Because we do not always fully understand how the technology actually works or how it can be exploited for both good and bad purposes, we need to be careful about giving blank permission to any party to access, collect, and use our private data without clear understanding, oversight, and consent. As we share more and more information online, cyberattacks will only increase, and organizations and the government will struggle even more to balance privacy concerns with security issues.
Why Libraries Should Advocate Online Privacy
The fact that people may no longer have privacy on the Web concerns many librarians. Historically, librarians have been strong advocates of intellectual freedom, and libraries have been striving to keep patron’s data safe and protected from unwanted eyes. The Library Freedom Project reflects this type of concern from librarians. It educates librarians and their local communities about surveillance threats, privacy rights and law, and privacy-protecting technology tools to help safeguard digital freedom, and helped the Kilton Public Library in Lebanon, New Hampshire, to become the first library to operate a Tor exit replay, which provides anonymous browsing on the Internet for library patrons.
New technologies brought us unprecedented convenience but it also carries with them the potential for the unparalleled level of invasion of privacy. While the majority of librarians have a very strong stance in favor of intellectual freedom and against censorship, many librarians are unsure about online privacy particularly when it is pitted against cybersecurity. Some argue that those who have nothing to hide do not need their privacy. However, privacy is not identical to hiding a wrongdoing, nor do people keep certain things secrets because they are necessarily illegal or unethical. Being watched 24/7 will derive any person crazy whether s/he is guilty of any wrongdoing or not. Privacy is an essential part of being human, not some instrument that we can do without in the face of a greater concern. Privacy allows us safe space to form our thoughts and review our actions on our own without being subject to others’ judgment.
The Electronic Frontier Foundation states that privacy means respect for individuals’ autonomy, anonymous speech, and the right to free association. If we want to remain as autonomous human beings free to speak our minds and think on our own without worrying about being observed and/or censored, we need to defend our privacy both online and offline and in all forms of technologies and technology devices, which are increasingly part of our everyday lives.
Conforguration is a basic working example of configuration management in Org. I use source code blocks and tangling to make shell scripts that get synced to a remote machine and then download, install and configure R from source.
conforguration.org (that’s a file, not a site) has all the code. It really did work for me, and it might work for you. Is this a reasonable way of doing configuration management? I don’t know, but it’s worth trying. I’ll add things as I come across them.
I don’t know anything about formal configuration management, and I’ve never done literate programming and tangling in Org before. Anyone who’s interested in having a go at conforguring something else is most welcome to do so!
New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.
We’re pleased to invite our extended community to attend a free two-part DPLA workshop webinar, RightsStatements.org: Why We Need It, What It Is (and Isn’t) and What Does It Mean for the DPLA Network and Beyond? Over the course of two workshops,Emily Gore, DPLA, and Greg Cram, NYPL, will discuss the recently launched RightsStatements.org project.
The goal of RightsStatements.org is to provide standardized rights statements for cultural heritage institutions and aggregators. This two-part webinar series will demonstrate and describe the need for the statements, the rationale behind the statements, and NYPL and DPLA’s implementation plans. Part I of the webinar will cover the need for the statements and the philosophy behind the statements. Part II of the webinar will cover the statements themselves along with the implementation strategy. Participants are highly encouraged to attend both parts, as they will build off each other.
The first part of the series will take place on Tuesday, May 10 at 3:30 PM Eastern (90 minutes), while the second part will take place a week later on Tuesday, May 17 at 3:30 PM Eastern (90 minutes).
DPLA Workshops are online learning opportunities highlighting subjects central to our community, such as education, metadata, technology, copyright, and more. These events are open to the public (registration required).
In May 2015, Brooklyn-based developer Dan Vanderkam launched OldNYC, providing users a new way to experience NYPL's Photographic Views of New York City collection and discover the history behind the places New Yorkers see everyday. Now Orian Breaux and Christina Leuci—fans of NYC history, maps, and technology—have brought the OldNYC experience to mobile phones. I spoke with them to find out more about the development of this app, which is now available in the iTunes App Store.
Who are you, and what do you do?
Orian: We both work in NYC’s tech world. I’m a product manager, currently focused on user growth at LiveAuctioneers. I love exploring topics like data visualization, urban science, and user behavior. On the side, I mentor aspiring technologists at a tech school called General Assembly and build projects like NYC Time Machine. Before entering tech, I studied aeronautical engineering at Rensselaer Polytechnic Institute (RPI).
Christina: As for me, I'm a software developer at Cyrus Innovation, where we help companies build and scale their software products. Previously, I studied web development at Flatiron School, after attending Rutgers University for two years. When we’re not working, you can find us dancing Swing or Argentine Tango!
How did you decide to make a mobile app for OldNYC?
Orian: It was totally spontaneous. Since OldNYC launched last May, I had spent many hours admiringly exploring its photographs. In a November mailing list update, creator Dan Vanderkam asked if anyone wanted to build a mobile app for OldNYC. As someone who loves this intersection of NYC history, maps, and technology, it was an automatic “Yes!” for me. Coincidentally, Christina had been talking about creating an iPhone app, so I immediately volunteered ourselves.
Christina: It was an exciting project to start, and we’re thankful for Dan’s openness to our building off his work. After exchanging emails, we all met for coffee one winter morning, then Orian and I got to work. Admittedly, neither of us had built an iOS app before, but with our combined coding, product, and design experience, we thought we could learn quickly enough to build a solid app.
What was your process?
Orian: We started with a question: “How can we use the mobile platform to enable an experience not possible on a regular desktop website?” From the beginning, we had this vision where you could walk down a street and easily access historical photos nearest to your location. Like a self-guided historical tour, where you’re encouraged to discover “what was there” anywhere you go.
We worked backwards from that idea, breaking down the pieces of design, development, and research work we needed to do. Dan had already built some infrastructure for storing photos and related data, so we spent some time figuring out what we could use and what we had to build ourselves.
Christina: For our version 1 release, we focused on foundational features, like map navigation, photo viewing, and allowing the user to center on their location. We did several iterations of wireframes on paper, thinking about how the user would interact with each of these features. From there, mocking up the experience in Sketch, a digital design tool, helped us refine the user interface and interactions further.
Of course, we spent most of our time coding the app. We were both new to iOS development, so for everything we built, we spent time teaching ourselves.
Orian: All throughout, we tested the app often. I made it a point to test not only while walking in the streets, but on boats, in the subway, and deep inside buildings. Since our app relies on location and downloading images from the NYPL’s website, testing helped us find and fix bugs to ensure the app runs smoothly.
What is most interesting about bringing 1900s photographs to 2016 phones?
Orian: The idea that technological progress is constantly redefining our relationship with the past. To retrieve a historical record 100 years ago meant pinpointing its specific location within a specific library or museum. Only in the last few decades have computers and the Internet allowed us to centralize, organize, and distribute that information throughout the world.
Beyond making information accessible and searchable, I think the next problem is discovery. With so much of the world’s information available online, it’s easy to find something when you generally know what you’re looking for. But how do you find information that you would love when you don’t know it exists?
This fundamental problem of discovery is why Amazon and Spotify build recommendation engines and why Pinterest tells you about trending pins. They are methods for surfacing content you’re unlikely to find by yourself. I see OldNYC’s map interface + location tracking as an extension of that idea, where a user can discover new photos highly relevant to where they are in the present moment.
It’s fun to envision what new ways future technology will help us discover the past. I’m personally waiting for the day I can take a virtual reality tour of 1600s New Amsterdam!
Christina: For me, the most interesting thing is the discussion that will rise from these resurfaced pieces of history. By exploring photos from New York City’s past, you really get a sense of the society that was present from the late 1800s into the early 1900s and how it has changed throughout the years.
People are now able to step through the city’s life and watch as it has grown and changed into the metropolis it is now. Can you imagine, hearing stories from your Grandmother who lived in New York City in the 1930s and being able to find the intersection she lived at and explore that photo in detail?
The stories we discover are preserved through their documented photos and text and provide a sense of identity for the city. As we expand our dataset, users will be able to trace any NYC intersection’s life from a drawing of farmland to a modern skyscraper.
What’s next for your app?
Christina: We’re just getting started! We have a full backlog of features and improvements to make the app experience more special. Our app users can give feedback from within the app, which also helps us prioritize what to build next.
For incremental improvements, we’d like to increase photo resolution, improve the photo viewing interface, and enable search by location. We also need to include Dan’s feature of allowing users to submit corrections of photograph descriptions.
Not all improvements are user-facing though; there’s a lot we can do to improve our data architecture and make the app run more efficiently.
Orian: That said, we’re most excited for the bigger initiatives, including:
Commenting on photographs, so app users can tell their stories. The OldNYC desktop site captured thousands of comments, so we’ll make these readable from the app, as well.
Enabling users to easily generate and share before-and-after photos. It’s a thrill to discover “what was there” in the areas we live and visit often. We want to help people share that moment of discovery with the world.
What other NYPL data sets or resources would you like to explore or build on?
Christina: If we could wave a magic wand, every historical record would be discoverable by location. All datasets are fair game, but the next logical step is to incorporate more photographs. Currently, OldNYC only uses photographs from the Milstein Collection’s “Photographic Views of New York City, 1870s-1990s”
Beyond that, we’d love to include lithographs that capture scenes from the 1600s to 1870s, as well as any oral records. Imagine walking around the city with the ability to explore every facet of your current location’s history!
Orian: We’re also exploring using rectified maps from NYPL Map Warper, like the 1920s Aerial survey map. The idea is that app users can use a “timeline slider” to sift through the layers of history, to see photographs in their historical context.
How can libraries—or any holder of large open datasets—make it easier for developers to make new things with our collections?
Christina: I think the key ingredients are: 1) helping developers feel supported and 2) maintaining clean and open datasets.
As a developer, I feel supported when an organization is actively giving me the tools and instruction I need to start working with the data. A library can do this by offering tutorials/code samples and walkthroughs and by describing interesting use cases for the datasets. All this cuts down on “research time” and empowers developers to take action. The NYPL Digital Collections API is a great approach for libraries to model. It’s an open database that gives developers the chance to build applications that access NYPL data.
Over time, as developers build applications off a dataset, you can foster community by showcasing work. This encourages more developers to join and extend the work of others, like how Orian and I have done with Dan’s OldNYC.
FIDO (Format Identification for Digital Objects) is a command-line tool maintained by the OPF and used for identification of digital files based on their PRONOM signature. This webinar will provide an update on the FIDO project by outlining recent improvements to the tool. We will also discuss:
How do we measure the impact of our outreach programming? While there is a lot of information about successful outreach activities in the library literature, there is far less documentation of assessment strategies. There may be numerous barriers to conducting assessment, including a lack of time, money, staff, knowledge, and administrative support. Further, many outreach activities are not tied back to institutional missions and event goals, meaning they are disjointed activities that do not reflect particular outcomes. High attendance numbers may show that there was excellent swag and food at an event, but did the event relate back to your missions and goals? In this article, we examine the various kinds of outreach that libraries are doing, sort these activities into six broad categories, explore assorted assessment techniques, and include a small survey about people’s experience and comfort with suggested assessments. Using hypothetical outreach scenarios, we will illustrate how to identify appropriate assessment strategies to evaluate an event’s goals and measure impact. Recognizing there are numerous constraints, we suggest that all library workers engaging in outreach activities should strongly consider incorporating goals-driven assessment in their work.
During times of competing interests and demands for funding, where do libraries stand? All types of libraries are feeling the pressure to demonstrate return on investment and the value of libraries to their constituents and communities. In recent reports gathered by the American Library Association, public libraries have demonstrated that for every dollar invested, three dollars (or more) of value are generated (Carnegie Mellon University Center for Economic Development, 2006; Bureau of Business and Economic Research, Labovitz School of Business and Economics, University of Minnesota Duluth, 2011). In academic libraries, the focus has been on demonstrating correlations between library use and students’ grades, retention, and graduation rates (Oakleaf, 2010; Soria, K. M., Fransen, J., & Nackerud, S., 2013; Mezick, 2007), illustrating that student library use is positively correlated to higher grade point averages and improved second-year retention and four-year graduation rates. Most of these methods utilize quantitative measures such as gate counts, circulation statistics, and other count-based data. Although these techniques tell a portion of the story of libraries’ impact, they do not tell the whole story. Many libraries use various kinds of outreach to reach and engage with users but numbers alone cannot capture the impact of outreach programming. Other types of measures are needed in order to assess participants’ perception and level of engagement.
In reviewing the library literature, we found many articles that explain “how-to” do a specific type of outreach. However, these articles rarely discuss how to assess the outreach activities beyond head counts, nor do they examine how the activities were tied to particular goals. Outreach is most effective when tied to institutional goals. To measure success we must begin with a goal in mind, as this can help staff prioritize activities, budgets, and time. Zitron (2013) outlines a collage exercise that can help create an outreach vision that is mapped to the institution’s mission and specific goals. Once the foundation is laid, you can create activities that fulfill the goals, along with alternative plans to anticipate various situations. In their paper about creating video game services, Bishoff et al. (2015) demonstrate how their institutional environment had a large impact on the kinds of services that were developed. They discovered that their “just for fun” gaming events were neither well received nor tied to the libraries’ goals. After some assessment and reflection, they shifted their focus to student and faculty-driven activities, such as a showcase of gaming technology and discussions related to trends in the gaming industry and gaming-related research.
Without goal-driven activities and assessment, how is the time, money, and energy justified? Conducting assessment involves several challenges, including those related to time and budget constraints, staffing levels, and staff education. However, there are many strategies that can be employed, ranging from those that are quick and easy to those that are more time-consuming and complex. In this paper, we will outline the various categories of outreach that are prevalent in libraries, compile recommended assessment strategies, and using sample scenarios, we will illustrate how to use various assessment strategies in conjunction with defined goals. Lastly, we will explore people’s comfort and familiarity with various assessment techniques through an informal survey, leaving you with a call to action to employ assessment in your daily work and outreach efforts.
There are as many definitions for library outreach as there are creative activities and strategies. While there is no one definition to define outreach within the library community, there are certainly themes across the profession. Schneider (2003) identifies three factors to consider regarding outreach, “whether a need is expressed from outside the academy, whether they see their mission as an invitation to pursue an action on their own accord, or whether they construct a form of outreach in response to a specific problem or crisis.” In addition, it could be said that all outreach involves marketing library services, collections, and spaces. “Library marketing is outreach. It is making people aware of what we can do for them, in a language they can understand.. [we] need to tell people we’re here, explain to them how we can help, and persuade them to come in through the doors, virtual or physical“ (Potter, 2012). Emily Ford (2009) states, “Instead of integrating library promotion, advocacy, and community-specific targeted services, we have left ‘outreach’ outside of the inclusive library whole to be an afterthought, a department more likely to get cut, or work function of only a few, such as your subject librarians.” Outreach should not fall on the shoulders of just one individual. In order to be successful, a team-based approach is needed to generate and execute strategies. Gabrielle Annala (2015) affirms Ford, and reflects that outreach is too broad of a term, and that we should instead describe exactly what we are doing in terms that are meaningful to our target audience. She states that outreach is actually a compilation of activities from advocacy, public relations, publicity, promotion, instruction, and marketing. Hinchliffe and Wong (2010) discuss how the six dimensions of the “wellness wheel” (intellectual, emotional, physical, social, occupational, and spiritual) can help libraries form partnerships across departments and serve patrons by considering their whole selves when developing resources and services. In this article, we view outreach as activities and services that focus on community and relationship-building, in addition to marketing collections and services to targeted audiences.
There are numerous kinds of outreach activities that occur in libraries. In order to get an idea of the range of activities, we looked in the library literature, surveyed colleagues, and examined lists of past and upcoming events on a handful of academic and public library websites. Although there are a wide range of activities, they appear to fall under a few broad categories. The variety of outreach activities are reflected in the survey conducted by Meyers-Martin and Borchard (2015) of programming provided by academic libraries during finals. Meyers-Martin and Borchard looked at the survey respondents’ budgets, who they partner with, their marketing, and how much time they spent coordinating the activities. Here we build on Meyers-Martin and Borchard’s groupings and propose some broad categories for outreach activities. Activities were categorized based on what we interpreted as the primary intention. We acknowledge that most outreach crosses numerous categories, so they cannot be viewed singularly, but rather are themes that are touched upon during marketing, public relations, instruction, and programming. Each category is accompanied by some illustrative examples. Some of the examples we use could theoretically be placed in another category (or multiple categories), based on the intention and goals of the activity.
Knowing the kinds of outreach that are being used in libraries helps us to think about the kinds of assessment measures that might be appropriate for various activities. Further, institutions can use the following groupings as a starting point to evaluate if they are employing diverse strategies or are targeted in one particular area.
Category 1: Collection-Based Outreach
Activities in this category are those that are linked to a library’s collection, or parts of a library’s collection. Despite this, there is a lot of variety in the kinds of outreach that can fall within this category, ranging from library-led book clubs, to author talks, to scavenger hunts (where patrons are asked to find specific items within the library). Other examples of collection-based outreach include: summer reading programs, community read programs (where members of a particular community all read the same book at the same time), “blind date with a book” (where books are wrapped up in paper to conceal their titles and placed in a display to be checked out), exhibits (physical or online), and pop-up libraries (portable libraries, with materials from the home library’s collection). Numerous libraries have built community around “one book” programs (Evans, 2013; Hayes-Bohanan, 2011; Schwartz, 2015) and summer reading programs (Dynia, Plasta, and Justice, 2015; Brantley, 2015), along with bringing materials outside the library walls (Davis et. al., 2015).
Activities in this category focus on presentations and public guidance regarding library services. Typical activities that would be placed in this category are: workshops (e.g. how to access new e-book platforms), classes (e.g. data management classes that are linked to library data management services, Carlson et. al., 2015), and roving reference (where librarians go to alternate locations to answer reference questions; Nunn and Ruane, 2015; Henry, Vardeman, Syma, 2012).
Category #3: “Whole Person” Outreach
Zettervall (2014) coined the term “whole person librarianship” and states that it “is a nascent set of principles and practices to embed social justice in every aspect of library work.” Activities in this category are primarily concerned with helping people on an individual level and to help them make personal progress in some aspect of their life. Many of these activities revolve around health-based activities (e.g. health screenings, crisis support) and stress reduction programs (e.g. animal therapy, yoga), but many other services can fall in this category as well (including homework help, job search assistance, and tax preparation).
While many libraries in particular have implemented animal therapy during finals stress relief (Jalongo, and McDevitt, 2015), or introduced programs to develop literacy skills (e.g. K-12 students reading aloud to dogs; Inklebarger, 2014), libraries have also been on the front lines supporting their community during a crisis, such as providing a safe refuge for school children in Ferguson (Peet, 2015), or providing internet access, tablets, and charging stations after Hurricane Sandy (Epps and Watson, 2014).
Category #4: Just For Fun Outreach
Activities in this category are those that are typically “just for fun”. They focus on providing a friendly or welcoming experience with the library and/or to redefine the library as place. Examples include: arts and crafts activities, Do-It-Yourself (DIY) festivals (Hutzel and Black, 2014; Lotts, 2015), concerts, games/puzzles (Cilauro, 2015), and coloring stations (Marcotte, 2015).
Category #5: Partnerships and Community-Focused Outreach
Activities in this category are primarily focused on creating partnerships and working with community groups. This may also include working to cross-promote services with other organizations, promoting library collections/services at community-focused events, or providing space for external groups (e.g. anime or gaming clubs) within the library. Partnering with outside organizations can aid in expanding the library’s reach for non-library users or underserved populations, such as partnering with a disability center for digital literacy training (Wray, 2014), or improving access to health information through an academic-public library partnership (Engeszer et. al., 2016).
Category #6: Multi-Pronged Themed Events and Programming
Activities in this category take place on a large-scale, and usually involve numerous activities and levels of support, frequently over several days. Many of these events involve a combination of collection-based outreach, instruction, and events that are “just for fun”, among other things. Examples include: National Library Week (Chambers, 2015), Banned Books Week (Hubbard, 2009), History Day (Steman and Post, 2013), and college and university orientation events (Boss, Angell, and Tewell, 2015; Noe and Boosinger, 2010; Johnson, Buhler, & Hillman, 2010).
We hope that the above categories will help you reflect on the types of outreach that you provide in order to develop effective outreach assessment strategies. There are a variety of assessment methods that can be employed to determine if your outreach activity or event was successful in terms of your pre-determined goals. Some methods may take more time, involve more staff, or necessitate having a larger budget. Some are only appropriate under certain conditions. Further, different methods will yield different types of data, from hard numbers (quantitative data), to quotes or soundbites (qualitative data), to a combination of the above. There is no one perfect method; each has its pros and cons.
When determining which method to use, you need to keep in mind what you are hoping to learn. Are you primarily interested in the number of people who attended? Do you want to know if particular demographic groups were represented? Do you want to know how satisfied participants were with the event? Do you want to know if people associated the event with the library? Do you want to know if participants walked away knowing more about the library? Each of these questions would need different types of data to yield meaningful answers, and thus, would require thinking about data collection ahead of time. In addition, keep in mind the audience for your outreach activity and let that inform your assessment strategies (e.g. outreach services off-site or with an older population might warrant a postcard survey whereas a teen poetry slam might involve gathering social media responses).
When developing an assessment strategy it is best not to work in a vacuum. If it all possible, you should solicit people’s feedback on whether the strategy fits the goal. Depending on the scale of the event or activity, you may need to solicit help or employ volunteers to gather data. Depending on the amount or type of data generated, you may need help analyzing and/or coding data, generating statistics, or producing charts and graphs.
We have compiled several assessment strategies here based on an Audiences London (2012) report that discusses assessment strategies related to outdoor events and festivals. Although this report was not library-focused, we believe these strategies can be utilized by libraries who conduct outreach.
Definitions of each of the strategies follows. Table 1 illustrates the kind of data that each strategy generates.
Capturing comments:Collect thoughts of motivated participants; can occur on paper, white boards, or other media (e.g. Bonnand and Hanson, 2015).
Compiling social media comments or press cuttings:Gather coverage of an event through social media, newspapers, and other media outlets (e.g. Harmon and Messina, 2013; Murphy and Meyer, 2013).
Documentation: Capture photographs and anecdotes in a document or report to paint an overall picture of an event.
Face-to-face audience surveys: Administer questionnaires to participants at the event, led by an interviewer (e.g. Wilson, 2012).
Focus groups: Interview participants in groups, following the event (e.g. Walden, 2015).
Follow up e-surveys: Collect email addresses during the event and distribute an e-survey via collected emails following the event.
Head counts: Count the number of people present at an event.
Mini interviews during the event: Conduct very short interviews during the event, led by a staff member or volunteer. The type of interview conducted can vary, whether it asks an open-ended question or has a set list of questions to be answered.
Minute papers: Ask participants to take one minute to write down an answer to a question. As the name suggests, it does only take one minute and provides rapid feedback. The time involved is in generating a meaningful question and in analyzing and coding the results (e.g. Choinski, 2006).
Observations during the event: Note how participants move through the event and how they interface and interact with the event’s content (e.g. Bedwell and Banks, 2013).
Self-addressed postcard surveys: Use a self-addressed postcard to deliver a short survey.
Vox pops: Document participants’ thoughts and feelings via short audio or video recordings.
Table 1. Types of data generated by various assessment strategies.
Another technique, not to be overlooked, is a self-reflection. Char Booth (2011) outlines three simple questions in her book related to teaching and instructional design that can easily be applied to outreach activities and used in conjunction with other assessment methods. The questions are:
What was positive about the interaction? What went well?
What was negative about the interaction? What went wrong?
Describe one thing you would like to improve or follow up on.
It is easy to get into ruts with programming and advocacy. If the event was “successful” the last time, it is tempting to just repeat the same actions. If it did not go well, there may be an opportunity to change the event and make it better. By immediately following an activity with the three question reflection exercise, you have a starting point for expansion or improvement. Sometimes it takes persistence and/or adaptation for outreach to take hold with an audience.
Using the themes and assessment methods above, we have outlined a few hypothetical examples to help illustrate how the various techniques can be used and to help you expand your repertoire of evaluation strategies. Each example describes the outreach activity, along with the type of library and staff involved, and identifies discrete goals that influenced the outreach activity and assessment strategy. Each scenario outlines the time, budget, manpower investment, type of data to collect, and possible limitations. Not every assessment strategy that could be employed in each example was discussed; rather, we have highlighted a few.
We chose these examples to address the following questions:
What is the event supposed to accomplish?
How do we know if the event was successful?
What data can be gathered to demonstrate if the goals were accomplished?
What is the best way to gather the data?
As these examples are hypothetical, we have not aligned these goals to larger organizational goals, but in a real-world scenario, that is another question that should be addressed.
Scenario #1: Whole Person Outreach
A medium-sized academic library is collaborating with their local health and wellness center to bring in a team of five animal handlers. This event will take place over the course of four hours midday during finals week inside the library in a reserved space. Two library staff will be in attendance; one person is designated to support the animal handlers while the other interacts with attendees. Based on previous events, 150 students are expected to attend. The budget for the event includes parking passes for the animal handlers, treats for the animals, signage, and fees associated with the coordinating organization.
Goal 1: To reduce student anxiety and stress during finals after participating in the pet therapy activity. Students participating will demonstrate a reduction of three points on a 10-point scale between a pre/post survey.
Face to face audience survey: Recruit a sample of 20 participants to complete a pre/post survey to measure their stress/anxiety levels.
Time: Moderate. It will take a fair amount of time to develop good survey questions. There are “free” easy to use tools that can be employed via paper or online (e.g. Google Forms).
Manpower: Moderate. Multiple people should be involved in developing questions and testing the instrument. Depending on staff knowledge, you may need to seek assistance with analysis of the data.
Data: This survey will ask participants to rank their stress on a numerical scale; therefore, it will be quantitative data. If participants write in open responses, you will have to code qualitative data to pull out themes and make meaning of the feedback.
Limitation: Students are busy, and may not have time to complete a pre/post survey. This will require tracking participants to ensure they complete the post survey.
Goal 2: To illustrate engagement with the mental health/wellness activity at the library. At least 150 students will attend and 50% of these attendees will spend five minutes with the animal teams.
Head counts and observations during the event: One staff member will observe for 15 minutes at the top of the hour throughout the course of the event. Data will be collected on: How many people attended? How long are people staying during the event? How many people walked away or rushed past? How many people asked questions?
Time: Moderate. Since we will only be observing for a portion of the event, this limits the time involved, but the data will also have to be compiled and analyzed.
Manpower: Minimal. Only one staff member needs to be involved based on this sample size.
Data: Quantitative since we will be measuring the number of people doing specific activities, and length of time. We would only have qualitative data to analyze if we decide to record any comments made during the event.
Limitations: The data is limited to simple numbers and you do not know people’s reasons for dwelling at the station or walking away. You must be careful to choose appropriate behaviors to observe and to not attribute false assumptions to the behaviors (e.g. if a person rushes away, they must be scared of animals). Combining this data with a qualitative strategy would answer the “why” behind people’s behaviors.
Scenario #2: Large-Scale, Multi-Pronged Themed Events and Programming
The library of a liberal arts college is hosting a first-year orientation event for the incoming class of 500 students. This event will be one day, over the course of six hours, with multiple activities available, such as meeting with subject librarians, hands-on time with special collection specimens, a photo booth, and a trivia contest. Students will also be able to learn about library resources and services through a scavenger hunt throughout the library. Since this event is so large, it involves participation from all library workers (approximately 30 people). The budget for this event is moderate: approximately $2,000 for library swag items (e.g. pencils, water bottles), marketing, and food to entice the students to attend.
Goal 1: To introduce students to library resources and services by having at least 30% of the first-year class participate in the library orientation.
Head counts: Counting all the students that come into the library to attend the event via gate counts before/after the event. This will measure what percentage of the incoming first-year students participated in the event, since it is optional.
Time: Minimal. Barely any time involved since this will be automated.
Manpower: Minimal. One person will check gate counts.
Data: Quantitative since we will only be measuring who attended the event.
Limitations: Strict head counts do not measure engagement, how long they stayed, or what information they retained. Also, if a participant stands in the way of the sensor, or a large group enters, gate count numbers may be inaccurate.
Goal 2: To increase undergraduate student engagement with the library.
Observations during the event: There will be two people assigned to each activity station; one will record general observations throughout the event while the other engages with the students. Data will be collected on: How many people participated in the activity? What questions did they ask? What comments did they make? Did they ask follow-up questions about library services?
Time: Intensive. There are numerous stations and the data will have to be compiled and analyzed.
Manpower: Extensive. In this case, 15 people will collect data.
Data: Qualitative and quantitative since we will be measuring the number of people doing specific activities and questions and comments made.
Limitations: It is important to not make assumptions about people’s motivations and experiences. You must be careful of the conclusions that you draw from observations alone. However, evaluating the data behind what questions were asked or what comments were made will provide richer information.
Compiling social media comments and documentation: Gathering information about the event (including photos) via Twitter, Facebook, Instagram, and student newspaper sources related to the event by searching on the library’s name and designated event hashtag.
Time: Moderate. Staff will have to actively seek out this information, as hashtags are not consistently used.
Manpower: Moderate. One person will gather social media comments.
Data: Qualitative since text and images from posts and articles will be gathered and coded based on participants’ impressions.
Limitations: There are questions of access; if comments are not public, you cannot find them. In addition, if participants post on alternate platforms (e.g. Snapchat, YikYak) that you do not check, they will not be discovered.
Goal 3: Students will become aware of at least two library resources and services that can help them with their upcoming research projects.
Vox pops: Using a video recorder, willing participants will be asked what library resources and services were new to them, and how they might use them over the coming year. This will take place in an area offset from the main library doors, so comments can be captured before people leave the event.
Time: Intensive. It will take a fair amount of to set-up equipment, find and interview willing subjects, and process and edit video afterwards. There will also be a substantial amount of time involved in analyzing the content of the interviews.
Manpower: Moderate. One to two people. This activity may require assistance with lighting, recording, interviewing and recruiting subjects, and processing video.
Data: Qualitative since we will only be capturing people’s comments and reactions.
Limitations: This measures short-term awareness of resources just mentioned. This does not measure willingness to use such resources during point-of-need.
Focus groups: A week after the orientation event, gather a sample of ten first-year students and break them into two focus groups, lasting 45 minutes. Students will be asked: what library services do they remember, what services have they used or do they plan to use, and what services would help them in their research. (Depending on the time allotted for the focus group, you could prepare questions to address multiple goals; we will not do that in this example). Students will receive a free pizza lunch for participation.
Time: Intensive. Recruiting volunteers, creating good questions to ask, observing and recording the focus groups, and coding responses will all take a lot of time.
Manpower: Moderate. Two people minimum will be needed to run the focus group and analyze data.
Data: Qualitative since we will be gathering people’s comments.
Limitations: Funds to provide lunch to participants might not be feasible for all libraries. Further, although focus groups generate rich, qualitative data, the sessions may need to be audio recorded and transcribed (which may cost money).
Goal 4: Determine if the layout of activities are efficient, and that high quality customer service is provided by library staff.
Mystery shopper: Recruit a few (3-4) upperclass students who are not heavy library users (so they will be unknown to library staff) and have them go through the event as a participant while recording their observations. They will use a checklist and observation form that asks questions about each activity, such as: Did the library staff greet you at the activity? Were there bottlenecks at the activity, and if so, where? Where could signage be used to improve traffic flow? Students serving as mystery shoppers will receive a bookstore giftcard for their time.
Time: Intensive; It will take a lot of time to recruit and train the volunteers, create a checklist or form to record observations, debrief after the event, and code responses.
Manpower: Extensive. It will involve library staff for training, debriefing, coding and analyzing the data and student volunteers to carry out the activity.
Data: Both qualitative and quantitative (yes/no) depending on the feedback form for the student volunteers.
Limitations: Funds to provide gift certificates to evaluators might not be feasible for all libraries. Recruitment of mystery shoppers that are unfamiliar with the library’s orientation activities may be difficult, especially since recruitment will be prior to the semester beginning.
Focus groups: A week after the orientation event, gather a sample of ten first-year students and break them into two focus groups. During the debrief, students will be asked: which activities were most engaging and/or fun, what library services do they remember, and suggestions for improvement of the event. Students will receive a bookstore gift card for participation.
Time: Intensive. Recruiting volunteers, creating good questions to ask, observing and recording the focus groups, and coding responses will all take a lot of time.
Manpower: Moderate. Two people minimum will be needed to run the focus group and analyze data.
Data: Focus groups generate rich, qualitative data.
Limitations: Funds to provide gift certificates to participants might not be feasible for all libraries. The focus group sessions may need to be audio recorded and transcribed (which may cost money).
Scenario #3: Community-organization Outreach
A medium-sized public library is putting on a free two-hour workshop for adults to learn how to make and bind a book. Library materials related to the topic will be displayed. The library will be partnering with a local book arts organization to put on the event. Funding for the workshop is part of a larger community engagement grant. The library hopes to draw fifteen citizens for this hands-on activity.
Goal 1: To have patrons use or circulate three items related to bookbinding and paper arts from the library’s collection during, or immediately following, the event.
Circulation/use counts: Determine how many items were removed from the selection displayed at the event.
Time: Minimal. Selecting and creating an informal book display will require very little time.
Manpower: Minimal. Only one person will be needed to count remaining items from the display or look at circulation statistics.
Data: Quantitative since we will only be measuring materials used.
Limitations: This does not gather what participants learned or how engaged they were with the event.
Goal 2: To develop a relationship with the book arts organization and co-create future programming.
Mini-interview: Interview the activity instructor from the arts organization following the event and ask them how they felt the event went, how the event could be improved, and if they believe that the partnership coalesces with their organization’s mission.
Time: Minimal. Since we will only be measuring the opinion of one person, preparation (generating questions) and analyzing and coding the results will be fairly simple and straightforward.
Manpower: Minimal. Only one staff person will interview the instructor.
Data: Qualitative, as the questions will be open-ended.
Limitation: This assesses the opinion of the instructor, which might not be the opinion of the administration of the organization, which directs funds and time allotment.
White board comments: Ask participants: Which of the following activities would you be interested in (pop-up card class, artist’s book talk, papermaking class)? Have participants indicate interest with a checkmark. This will gather data on what kinds of programs could be created as a follow-up to the book-binding workshop.
Time: Minimal. No time at all to set up. A count-based white board survey will require little preparation and post-collection time.
Manpower: Minimal. One person will be able to post and record survey responses.
Data: Quantitative since we will be using checkmarks to indicate preferences.
Limitations: Questions must be short and require simple responses. If you use white boards at a large events, someone will need to save data (such as by taking a photo) in order to gather new responses.
Goal 3: To provide an engaging event where 75% of attendees rate the activity satisfactory or higher.
Tear-off Postcard: Give out a self-addressed stamped tear-off postcard to workshop participants at the conclusion of the event. Postcards will feature one side with dual marketing of the library and book arts organization while the tear-off side will have a brief survey. We want to use postcards as they will feature promotional materials from both the arts organization and the library, along with the survey. Survey questions include: “Rate your satisfaction with the workshop (from extremely dissatisfied to extremely satisfied, on a 5 point scale)”, “Rate the length of the workshop (from too long to too short, on a 3 point scale)”, and “What suggestions do you have for improvement?”
Time: Moderate. Since this is a partnership, final approval will be needed from both organizations, meaning advanced planning is required.
Manpower: Moderate. Someone will need to design the postcard, develop survey questions for the mailing, and compile and distribute data to both organizations.
Data: Quantitative and qualitative, as two questions will have numerical responses and one will be open-ended.
Limitations: Postcards may not be suitable depending on the survey audience. It costs money to print and provide postage for the postcards. Finally, unless people complete these on the spot, they may not be returned.
A Survey and A Call to Action
We hope that the introduction to various assessment techniques drives people toward new ways of thinking about their outreach work. We know, based on the literature, that assessment is often an afterthought. However, we suspected this may have to do with people’s comfort level with the various assessment strategies. To test our suspicion, we conducted an informal online survey of librarians about their experience with outreach activities and various assessment methods. The survey was first distributed in conjunction with a poster presentation that we gave at the 2015 Minnesota Library Association annual conference. The survey asked respondents: what kinds of outreach they have done, what kinds of outreach they would like to attempt, how they have assessed their outreach, how comfortable they would be administering suggested assessment methods, and what kind of library they work at.
We had 39 responses to our survey, primarily from those who worked in academic libraries (see Figure 1).
Figure 1. Type of institutions where survey respondents work, measured by number who responded and percentage of the total.
Survey participants illustrated a range of experience with the various assessment methods that we listed (see Figure 2). The only assessment method listed that was used by everyone was head counts. However, the majority of people have also tried taking observations during the event (76.9%). No other assessment method was used by the majority of survey respondents. The most highly used methods were, in descending order: compiling social media comments (41%), documentation (38.5%), face-to-face audience surveys (30.8%), follow up e-surveys (25.6%), and white board comments (25.6%). Focus groups, minute papers, and vox pops were used very minimally. Only two people had used focus groups and minute papers, and only one person had used vox pops. No one indicated that they had used either mystery shoppers or self-addressed postcard surveys. Unfortunately, we do not know how frequently these various methods have been employed, as that question was not asked. A positive response could have been triggered from an event that happened several years ago.
Figure 2. Number of survey participants who have utilized different methods for assessing outreach activities (out of 39 total responses).
Survey participants ranged in their level of comfort with the various assessment methods. Figure 3 compiles our data on participants’ level of comfort, ranging from “Not At All Comfortable” to Very Comfortable”. Everyone stated that they were very comfortable using head counts. The vast majority of respondents were comfortable with compiling social media comments and observations (95% each).
Figure 3. Survey participants level of comfort with using various methods for assessing outreach activities. Values are measured as percentages, with positive percentages indicating comfort and negative percentages indicating discomfort.
Table 2 compares participants use of each of these assessment methods to their comfort levels. For the purpose of this table, the “Somewhat Comfortable” and “Very Comfortable” responses were combined to define the percent of people who were “Comfortable” and the “Not Very Comfortable” and “Not At All Comfortable” responses were combined to define the “% Uncomfortable.” Interestingly, not everyone who indicated a high level of comfort has employed these methods. For example, only 16 of 37 people who are comfortable compiling social media comments have ever done so and only 30 of 37 people who are comfortable conducting observations have ever done so. The data clearly show that people are somewhat comfortable with a majority of the methods listed. A majority of people (over 50%) claim to be very or somewhat comfortable with 10 of the 13 methods. Only 3 methods, minute papers, mystery shoppers and vox pops, had relatively high levels of discomfort among participants (28%, 33%, 41%, respectfully). In these cases, very few people stated that they had used these assessment methods, however. We fully realize that the results could have been different if we had more representation from public and school library staff.
Table 2. Survey participants use of, and level of comfort with using, various methods for assessing outreach activities. Percentages for “% Comfortable” were calculated by combining “Very Comfortable” and “Somewhat Comfortable” values. Percentages for “% Uncomfortable” were calculated by combining “Not Very Comfortable” and “Not At All Comfortable”.
In general, this illustrated to us that even if people are aware and know about these methods, for some reason they are not putting them to use when assessing outreach events. We had a suspicion, based on our experiences, conversations, and the library literature, that this was the case. This brief survey confirmed our theories. However, the survey did not delve into any of the reasons why people are not using a variety of assessment methods and we can only guess what their reasons may include: time, budget constraints, lack of education, and administrative support. We have illustrated that many of the methods do not require much time or money as long as the assessment planning goes hand-in-hand with setting event goals and activity planning. Lack of staff education is a real concern and those needing information about quantitative and qualitative methods may need to seek out classes, workshops, and professional development outside of their institution. That being said, there has been an increase in freely available or low cost web courses (e.g. Coursera’s “Methods and Statistics in Social Sciences Specialization”) and there are numerous books and articles published on the topic (National Library of Medicine’s “Planning and Evaluating Health Information Outreach Projects”). Research methods have not traditionally been a required course in library schools, and therefore, it is unreasonable to expect that library workers have extensive experience in the area (O’Connor and Park, 2002; Powell et al., 2002; Partridge et al., 2014). However, due to external and internal pressure to show value in our libraries and activities, it is of utmost importance that we make assessment a priority.
Libraries have determined the need to participate in outreach activities for numerous reasons including: to connect with current and potential users, to stay relevant, to build goodwill, and to gain support. Demonstrating the value of libraries has generated lively discussion in the last few years, much of which is focused on collection metrics and head counts, such as number of visitors to the library, number of items checked out, number of reference questions asked, or number of workshop attendees. Although the library literature is filled with examples of various kinds of outreach activities and how libraries are connecting with communities, there is a distinct lack of discussion about how outreach is assessed. Assessment strategies are needed to demonstrate a return on investment for our constituents, and to improve our marketing, public relations, advocacy and ultimately library patronship.
As we have illustrated, there are a variety of qualitative and quantitative methods available that can be used to assess virtually any kind of outreach. The technique(s) that you choose will depend on various factors, including the goals associated with the programming, target audience for the activity, and the type of outreach. There are pros and cons to each assessment method, with some involving a larger budget, more staffing and time, or familiarity with research methods. However, it is not always necessary to use the most complicated, expensive, and time-intensive method to collect valuable data. We cannot leave assessment to library administrators and those with assessment in their job titles. There are many opportunities to test assessment techniques and gather data in our daily work. We want to encourage those in libraries who are doing outreach work to incorporate goal-driven assessment when sharing the results of their work. Not only does it help illustrate impact, it helps others think critically about their own work and determine which kinds of outreach are most appropriate for their institutions and communities. By only focusing on head counts we undermine our ability to accurately understand the qualitative and quantitative relevance of the assessments made when evaluating library outreach objectives and goals.
We would like to thank Erin Dorney, our internal editor, for shepherding us throughout the writing process, and providing good feedback. In addition, our gratitude goes to Adrienne Lai, our external reviewer, who posed excellent questions and comments to consider that strengthened this article. We would also like to thank our colleagues at the University of Minnesota for providing feedback on our proposed outreach categories.
Bedwell, L. L., & Banks, C. c. (2013). Seeing through the eyes of students: Participant observation in an academic library. Partnership: The Canadian Journal Of Library & Information Practice & Research, 8(1), 1-17.
Carlson, J., Nelson, M. S., Johnston, L. R., & Koshoffer, A. (2015). Developing data literacy programs: Working with faculty, graduate students and undergraduates. Bulletin of the American Society for Information Science and Technology, 41(6), 14–17. http://doi.org/10.1002/bult.2015.1720410608
Dynia, J. D., Piasta, S. P., & Justice, L. J. (2015). Impact of library-based summer reading Clubs on primary-grade children’s literacy activities and achievement. Library Quarterly, 85(4), 386-405.
Engeszer, R. J., Olmstadt, W., Daley, J., Norfolk, M., Krekeler, K., Rogers, M., … & McDonald, B. (2016). Evolution of an academic–public library partnership. Journal of the Medical Library Association: JMLA, 104(1), 62.
Epps, L., & Watson, K. (2014). EMERGENCY! How queens library came to patrons’ rescue after Hurricane Sandy. Computers In Libraries, 34(10), 3-30.
Evans, C. (2013). One book, one school, one great impact!. Library Media Connection, 32(1), 18-19.
Hinchliffe, L. J., & Wong, M. A. (2010). From services-centered to student-centered: A “wellness wheel” approach to developing the library as an integrative learning commons. College & Undergraduate Libraries, 17(2-3), 213-224.
Jalongo, M. R., & McDevitt, T. (2015). Therapy dogs in academic libraries: A way to foster student engagement and mitigate self-reported stress during finals. Public Services Quarterly, 11(4), 254–269. http://doi.org/10.1080/15228959.2015.1084904
Peet, L. (2015). Ferguson Library: A Community’s Refuge. Library Journal, 140(1), 12.
Potter, N. (2012). The library marketing toolkit. Facet Publishing.
Powell, R., Baker, L. M., & Mika, J. (2002). Library and information science practitioners and research. Library & Information Science Research, 24(1), 49-72. doi:10.1016/S0740-8188(01)00104-9
Schneider, T. (2003). Outreach: Why, how and who? Academic libraries and their involvement in the community. The Reference Librarian, 39(82), 199–213. http://doi.org/10.1300/J120v39n82_13
Schwartz, M. (2014). DIY one book at Sacramento PL. Library Journal, 139(4), 30.
Soria, K., Fransen, J., & Nackerud, S. (2013). Library use and undergraduate student outcomes: New evidence for students’ retention and academic success. Portal: Libraries and the Academy. Retrieved from http://conservancy.umn.edu/handle/11299//143312
I’m sure those of you who are still reading have noticed that I haven’t been updating this site much in the past few years. I was sharing my links with you all but now Delicious has started adding ads to that. I’m going to rethink how I can use this site effectively going forward. For now you can read my regular content on Opensource.com at https://opensource.com/users/nengard.
curl -O http://cran.utstat.utoronto.ca/src/base/R-3/R-$VERSION.tar.gz
tar xzvf R-$VERSION.tar.gz
make && make check
rm R Rscript
ln -s R-$VERSION/bin/R R
ln -s R-$VERSION/bin/Rscript Rscript
PACKAGE_LIST="dplyr readr ggplot2 devtools lubridate shiny knitr ggvis seriation igraph arules arulesViz tm wordcloud cluster fpc topicmodels"for PKG in$PACKAGE_LIST; do ./Rscript --vanilla -e "install.packages('$PKG', repos=c('https://cran.hafro.is/'))"; done
./Rscript --vanilla -e "devtools::install_github('rstudio/shinyapps')"
When 3.3.1 comes out, just change VERSION, rerun, and there you go. There’s nothing to catch errors, but I’m pretty sure everything will always work, and if there’s some horrible accident and it doesn’t, the previous version of R is still there and it’s just a matter of changing symlinks.
The aim of the symlinks is to always be able to refer to /usr/local/src/R/R and /usr/local/src/R/Rscript in a stable way, so this addition to my $PATH in .bashrc always works:
If you have that set, and you can write to /usr/local/src/, then you can paste in those shell commands and it should all just work (assuming you’ve already installed the necessary packages for building from source generally, and topicmodels requires the GNU Scientific Library).
I was talking to someone the other day who uses Ansible and explained how he uses it for keeping all his machines in sync and set up the way he likes. It looks very powerful, but right now it’s not for me. I’ll keep the block above in an Org file and copy and paste as needed, and I’ll do something similar with other packages. I could even run them remotely from Org.
Austin, TX The DSpace community is currently running a Testathon until May 6th. As a community who benefits from free, open source software, it is our shared responsibility to at least try to hit the highest level of quality possible for our new release.
Way back in early December of last year District Dispatch reported that the Email Privacy Act (H.R. 699), legislation to plug an enormous hole in Americas’ constitutional privacy rights by rewriting the 1986 Electronic Communications Privacy Act (ECPA), had finally got a hearing in the House Judiciary Committee. “Strong further advocacy by librarians, in harness with our many coalition partners,” we wrote, “may well be what it takes to ‘spring’ HR. 699 from the Committee in which it’s been mired for years but from which, this week, it may just have begun to emerge. Stay tuned!” As it happens, that’s exactly what it took over the past six months not only to have the Email Privacy Act clear the Committee, but for the House of Representatives to pass it unanimously last week by the unheard of vote (for any bill not naming a post office) of 419 – 0.
Source: Architect of the Capitol
Librarians and library supporters were among the thousands of Americans to contact their Representatives to push for passage of H.R. 699, and ALA consistently was among scores of national organizations keeping the pressure on key legislators to abandon their efforts to weaken the bill with an exception for civil agencies like the Securities and Exchange Commission (SEC) and IRS as late as the day before the vote. Eventually, they did, paving the way for this week’s hugely positive result.
The successful joint effort produced a bill that will amend ECPA so that, for the first time, any authority seeking the actual content of an individual’s emails, texts, tweets, online photos, files stored in the cloud and other electronic communications will first have to get a real search warrant from a real judge. Previously, for anachronistic reasons, the full content of most such communications and documents were simply available by subpoena once they were more than sixth months old. (This ACLU infographic lays it all out very well.)
BUT, ECPA reform’s not a done deal yet. The Senate now needs to act, either by passing H.R. 699 as written (as ALA and its coalition partners will be urging it to do), or to take another path like considering the Senate’s own similar bill, S. 356. In either event, ECPA reform’s fate rests with Senate Judiciary Committee Chairman Charles Grassley who’s reportedly open to moving a bill but may be flirting with embracing the “civil agency carve-out” clause ultimately rejected by the House but still being actively sought publicly by the SEC. If he does, that might prove the end of ECPA reform for this Congress as most bill-backers, ALA among them, view that proposal as an exception that will swallow the rule and, therefore, unacceptable.
Sooooo . . . absolutely do take a (half) bow for the House’s action at this historic half-way point in the march to ECPA reform but, as we said six months ago, “stay tuned!” The fight’s not nearly over yet.
OCLC has just published the report from the 14-member OCLC Research task group on Representing Organizations in ISNI: Addressing the Challenges with Organizational Identifiers and ISNI.* This work originated from discussions with OCLC Research Library Partners on a previous effort by another OCLC Research Library Partners task group on researcher identifiers, Registering Researchers in Authority Files, published in 2014. That report noted that a key attribute distinguishing researchers with identical names is to accurately associate them with their institutional affiliations.
A variety of stakeholders need to identify organizations accurately and define relationships among their sub-units and with other organizations. Academic institutions want to aggregate their researchers’ output as comprehensively as possible, as such aggregations affect their reputations which in turn can influence their success in obtaining funding and attracting or retaining faculty. Organizational identifiers provide the means to do that. The International Standard Name Identifier (ISNI), with a database that already includes over 500,000 institutional identifiers derived from registries of agencies with business needs for identifying institutions, can be used to disambiguate organizations.
OCLC Research Library Partner metadata managers questioned how their institutions were represented in the ISNI database, which led to forming the task group. The report documents:
The special challenges represented by organizations
New modeling of organizations that others can adapt for their own uses
Twelve use-cases and scenarios
Examples of how ISNI can meet the needs identified by the twelve use cases
23 recommendations for improving ISNI
Issues for which there are no easy or immediate answers
The report also includes an outreach document targeted to academic administrators presenting the reasons why organizational identifiers are important and the benefits of ISNI membership.
The report will be of interest to academic administrators eager to more accurately aggregate the scholarly output of their institutions; to linked data implementers who need to represent relationships between and among organizational entities; and to all librarians who have had to associate a work’s creator with an institutional affiliation.
Five of the task group members will be presenting highlights from this report on 9 May 2016 in a free webinar open to all interested (register here). We welcome your feedback—post comments to this blog entry or tweet your questions or comments using the hashtag #orgidreport.
SHLB Director John Windhausen (upper far right), SHLB members and allies launch the #Grow2Gig campaign.
As many of you know, the ALA is a founding member of the Schools, Health and Libraries Broadband (SHLB) Coalition, which was created in 2009 to advocate for community anchor institutions as part of the Broadband Technology Opportunities Program. I also am proud to serve on the SHLB board of directors and to share the new “Connecting Anchor Institutions: A Vision of our Future” report and Grow2Gig+ campaign.
While public library broadband speeds have steadily improved over the last five years, less than 5 percent of libraries have yet reached the 2020 gigabit goal outlined for community anchor institutions in the National Broadband Plan. With growth in digital media labs that enable creation and sharing of content, video teleconferencing that collapse geographical distances, and the proliferation of mobile devices streaming more and more digital content from our virtual shelves, libraries need to Grow2Gig+.
The new vision paper includes a section dedicated to library broadband that features insights from ALA President Sari Feldman, Digital Public Library of America (DPLA) Executive Director Dan Cohen and Bibliotech author John Palfrey (among others) talking about emerging roles and demands to meet the library mission and community needs.
“Libraries are transforming into community hubs for digital content creation and collaboration,” Feldman said. “Having a high-speed and resilient Internet connection at each and every library is essential to ensuring a full range of services related to education, employment, entrepreneurship, empowerment and community engagement for all.”
Importantly, this paper is only the first of many that will make up a broadband action plan for connecting community anchor institutions. Ten papers will follow that make recommendations for federal, state, and local policy changes on issues ranging from wireless networking to broadband funding to digital equity.
Ensuring and leveraging affordable, high-capacity broadband connections for libraries and our communities is central to meeting the ALA and library mission of advancing access to online information and resources for everyone. A commitment to digital equity and opportunity drives our advocacy on E-rate, Lifeline, network neutrality and the need for unlicensed spectrum – as well as the larger Policy Revolution! initiative.
SHLB’s new #Grow2Gig campaign and “Connecting Anchor Institutions: A Broadband Action Plan” provide new and needed resources to make the case for the robust broadband capacity needed to best serve our communities. The ALA will use these documents as a foundation for our work preparing for the upcoming Presidential transition, and we look forward to our continued collaboration with SHLB and its diverse members around the nation to achieve our common goals and vision.
National Small Business Week: May 1-7, 2016. Image from Joe the Goat Farmer, via Flickr.
This first week of May 2016 marks the 53rd annual National Small Business Week. When President Kennedy recognized the first National Small Business Week, the entrepreneurship ecosystem hadn’t yet been animated by digital “making” equipment or the worldwide web, but it was then and is today the beating heart of our national economy. Small businesses make up 99.7 percent of all U.S. businesses and employ 48 percent of U.S. employees, according to the Small Business Administration.
Every beating heart needs lifeblood to drive its cadence – and no network of institutions represents richer lifeblood for America’s vast cohort of innovators and small business owners than the library community.
Last year, in recognition of National Start-Up Day Across America, I wrote about various ways in which libraries propel entrepreneurship. I highlighted the growth of co-working areas – dedicated space for conducting day-to-day business – within libraries, and described how MapStory, an interactive platform for mapping change over time, has used D.C. Public Library’s co-working space to grow and make community connections. I also highlighted the business plan assistance and prototyping equipment – i.e. 3D printers, laser cutters and computer numerical control (CNC) routers – libraries make available to innovators looking to bring an idea for a new product to fruition.
Since then, I’ve done a lot more research on the role libraries play in assisting entrepreneurs. It’s been fun and inspiring work. Among my many discoveries, I’ve learned that libraries provide copious amounts of information on patent and trademark issues through a network of libraries known as Patent and Trademark Resource Centers. I’ve also learned about library production facilities – like Chicago Public Library’s YouMedia Lab – that prepare young people for careers as arts and engineering entrepreneurs. More than 5,000 public libraries provide small business development programs and resources, according to the Digital Inclusion Survey.
I assume that this information is not new to many District Dispatch readers. If you’re a working library professional, you probably know that libraries play an important role in the innovation economy. However, this post is meant to serve not as an exposition of facts, but as a call to action.
Given all libraries do to support entrepreneurs, we must seek opportunities to work with other actors in the entrepreneurship ecosystem to create stronger opportunities for all innovators – including those who lack the basic resources necessary to convert an idea into a venture. Libraries are ubiquitous; they have a footprint in low-income and rural areas that are generally underserved by other business incubators and accelerators. The library community must increase awareness of the role we play as a natural starting point for innovators and seek opportunities to bolster the assistance we offer through partnerships with government agencies, non-profits and private firms.
In short, during this Small Business Week, library professionals should celebrate our contributions to the entrepreneurship ecosystem, but also realize the potential of our community to do a great deal more to advance this ecosystem through effective advocacy and relationship building. Sharing what your library offers with the #DreamSmallBiz hashtag this week is one way to join the online conversation, and we invite you to share a new video featuring small business owner John Fuduric.
ALA is already working to put libraries on the radar of leaders in the entrepreneurship space through engagement with government officials and small business leaders. You can read about a panel of entrepreneurship experts that OITP Deputy Director Larra Clark recently moderated at ALA’s first-ever National Policy Convening here.
Privacy tools are a hot topic in libraries, as librarians all over the country have begun using and teaching privacy-enhancing technologies, and considering the privacy and security implications of library websites, databases, and services. Attend the LITA up to the minute privacy concerns webinars and pre-conference featuring experts in the field on these important and timely topics.
Email is neither secure nor private, and the process of fixing its problems can be mystifying, even for technical folks. In this one hour webinar, Nima and Alison from Library Freedom Project will help shine some light on email issues and the tools you can use to make this communication more confidential. They will cover the issues with email, and teach about how to use GPG to encrypt emails and keep them safe.
Heard about the Tor network but not sure how it applies to your library? Join Alison and Nima from the Library Freedom Project in this one hour webinar to learn about the Tor network, running the Tor browser and a Relay, and other basic services to help your patrons have enhanced browsing privacy in the library and beyond.
Learn strategies on how to make you, your librarians and your patrons more secure & private in a world of ubiquitous digital surveillance and criminal hacking. Jessamyn and Blake will teach tools that keep your data safe inside of the library and out – how to secure your library network environment, website, and public PCs, as well as tools and tips you can teach to patrons in computer classes and one-on-one tech sessions. We’ll tackle security myths, passwords, tracking, malware, and more, covering a range of tools from basic to advanced, making this session ideal for any library staff.
I gave a talk at Seagate as part of a meeting to prepare myself for an upcoming workshop on The Future of Storage. It pulls together ideas from many previous posts. Below the fold, a text of the talk with links to the sources that has been edited to reflect some of what I learnt from the discussions. I met Dave Anderson years ago at the Library of Congress' Storage Architecture workshop. When Dave heard I'd been invited to a workshop on The Future of Storage, he invited me here to preview my position paper for it. What you'll hear is an expanded version of the talk I've been asked to give at the workshop. In it, I'd like to suggest answers to five questions:
How far into the future should we be looking?
What do the economics of storing data for that long look like?
How long should the media last?
How reliable do the media need to be?
What should the architecture of a future storage system look like?
And for you I'd like to discuss a question Dave asked me to address that I don't have an answer for:
How much storage will be consumed in 2020?
How far into the future?
Iain Emsley's talk at PASIG2016 on planning the storage requirements of the 1PB/day Square Kilometre Array mentioned that the data was expected to be used for 50 years. How hard a problem is planning with this long a horizon? Lets go back 50 years and see.
In 1966 as I was writing my first program the state of the art in disk storage was the IBM 2314. Each removable disk pack stored 29MB on 11 platters with a 310KB/s data transfer rate. Roughly equivalent to 60MB/rack. The SKA would have needed to add nearly 17M racks/day. But, writing in parallel, there would have been no bandwidth problem. The 2314 could write an entire disk in about a minute and half.
The state of the art in tape storage was the IBM 2401, the first nine-track tape drive, storing 45MB per tape with a 320KB/s maximum transfer rate. Roughly equivalent to 45MB/rack of accessible data.
Your 1966 alter-ego's data management plan would be correct in predicting that 50 years later the dominant media would be "disk" and "tape", and that disk's lower latency would carry a higher cost per byte. But its hard to believe that any more detailed predictions about the technology would be correct. The extraordinary 30-year exponential cost per byte decrease had yet to start. The idea that ordinary citizens would carry tens of gigabytes in their pockets would have seemed ludicrous.
Thus a 50-year time horizon for a workshop on the Future of Storage may seem too long to be useful. But a 10-year time horizon is definitely too short to be useful. Storage is not just a technology, but also a multi-billion dollar manufacturing industry dominated by a few huge businesses, with long, hard-to-predict lead times.
To illustrate the lead times, here is a Seagate roadmap slide from 2008 predicting that perpendicular magnetic recording (PMR) would be replaced in 2009 by heat-assisted magnetic recording (HAMR), which would in turn be replaced in 2013 by bit-patterned media (BPM).
Seagate plans to begin shipping HAMR HDDs next year.
ASTC 2016 roadmap
Here is a recent roadmap from ASTC showing HAMR starting in 2017 and BPM in 2021. So in 8 years HAMR has gone from next year to next year, and BPM has gone from 5 years out to 5 years out. The reason for this real-time schedule slip is that as technologies get closer and closer to the physical limits, the difficulty and above all cost of getting from lab demonstration to shipping in volume increases exponentially.
The report suggests we could see 14TB PMR drives in 2017 and 18TB SMR drives as early as 2018, with 20TB SMR drives arriving by 2020.
I believe this is mostly achieved by using helium-filled drives to add platters, and thus cost, not by increasing density above current levels.
Historically, tape was the medium of choice for long-term storage. Its basic recording technology is around 8 years behind hard disk, so it has a much more credible technology road-map than disk. But its importance is fading rapidly. There are several reasons:
Just under 20 million LTO cartridges were sent to customers last year. As a comparison let's note that WD and Seagate combined shipped more than 350 million disk drives in 2015; the tape cartridge market is less than 0.00567 per cent of the disk drive market in unit terms
In effect there is now a single media supplier, raising fears of price gouging and supply vulnerability. The disk market has consolidated too, but there are still two very viable suppliers.
The advent of data-mining and web-based access to archives make the long access latency of tape less tolerable.
To maximize the value of the limited number of slots in the robots it is necessary to migrate data to new, higher-capacity cartridges as soon as they appear. This has two effects. First, it makes the long data life of tape media less important. Second, it consumes a substantial fraction of the available bandwidth, up to a quarter in some cases.
Eric Brewer's fascinating keynote at this year's FAST conference started from the assertion that the only feasible medium for bulk data storage in the cloud was spinning disk. Flash, despite its many advantages, is and will remain too expensive. Because:
factories to build 3D NAND are vastly more expensive than plants that produce planar NAND or HDDs -- a single plant can cost $10 billion
no-one is going to invest the roughly $80B needed to displace hard disks because the investment would not earn a viable return. Flash is a better medium than hard disk so even if there were no supply constraints it would command a higher price; price would be set by value not by cost.
Here is a graph from a model of the economics of long-term storage I built back in 2012 using data from Backblaze and the San Diego Supercomputer Center. It plots the net present value of all the expenditures incurred in storing a fixed-size dataset for 100 years against the Kryder rate, the rate at which the cost per byte drops. As you can see, at the 30-40%/yr rates that prevailed until 2010, the cost is low and doesn't depend much on the precise Kryder rate. Below 20%, the cost rises rapidly and depends strongly on the precise Kryder rate.
2014 cost/byte projection
As it turned out, we were already well below 20%. Here is a 2014 graph from Preeti Gupta, a Ph.D. student at UC Santa Cruz, plotting $/GB against time. The red lines are projections at the industry roadmap's 20% and my less optimistic 10%. It shows three things:
The slowing started in 2010, before the floods hit Thailand.
Disk storage costs in 2014, two and a half years after the floods, were more than 7 times higher than they would have been had Kryder's Law continued at its usual pace from 2010, as shown by the green line.
If the industry projections pan out, as shown by the red lines, by 2020 disk costs will be between 130 and 300 times higher than they would have been had Kryder's Law continued.
The funds required to deliver on a commitment to store a chunk of data for the long term depend strongly on the Kryder rate, especially in the first decade or two. Industry projections of the rate have a history of optimism, and are vulnerable to natural disasters, industry consolidation, and so on. We aren't going to know the cost, and the probability is that it is going to be a lot more expensive than we expect.
Every few months there is another press release announcing that some new, quasi-immortal medium such as 5D quartz or stone DVDs has solved the problem of long-term storage. But the problem stays resolutely unsolved. Why is this? Very long-lived media are inherently more expensive, and are a niche market, so they lack economies of scale. Seagate could easily make disks with archival life, but they did a study of the market for them, and discovered that no-one would pay the relatively small additional cost. The drives currently marketed for "archival" use have a shorter warranty and a shorter MTBF than the enterprise drives, so they're not expected to have long service lives.
The fundamental problem is that long-lived media only make sense at very low Kryder rates. Even if the rate is only 10%/yr, after 10 years you could store the same data in 1/3 the space. Since space in the data center racks or even at Iron Mountain isn't free, this is a powerful incentive to move old media out. If you believe that Kryder rates will get back to 30%/yr, after a decade you could store 30 times as much data in the same space.
The reason why disks are engineered to have a 5-year service life is that, at 30-40% Kryder rates, they were going to be replaced within 5 years simply for economic reasons. But, if Kryder rates are going to be much lower going forward, the incentives to replace drives early will be much less, so a somewhat longer service life would make economic sense for the customer. From the disk vendor's point of view, a longer service life means they would sell fewer drives.
Additional reasons for skepticism include:
The research we have been doing in the economics of long-term preservation demonstrates the enormous barrier to adoption that accounting techniques pose for media that have high purchase but low running costs, such as these long-lived media.
The big problem in digital preservation is not keeping bits safe for the long term, it is paying for keeping bits safe for the long term. So an expensive solution to a sub-problem can actually make the overall problem worse, not better.
These long-lived media are always off-line media. In most cases, the only way to justify keeping bits for the long haul is to provide access to them (see Blue Ribbon Task Force). The access latency scholars (and general Web users) will tolerate rules out off-line media for at least one copy. As Rob Pike said "if it isn't on-line no-one cares any more".
So at best these media can be off-line backups. But the long access latency for off-line backups has led the backup industry to switch to on-line backup with de-duplication and compression. So even in the backup space long-lived media will be a niche product.
Off-line media need a reader. Good luck finding a reader for a niche medium a few decades after it faded from the market - one of the points Jeff Rothenberg got right two decades ago.
The reason that the idea of long-lived media is so attractive is that it suggests that you can be lazy and design a system that ignores the possibility of failures. But current media are many orders of magnitude too unreliable for the task ahead, so you can't:
Media failures are only one of many, many threats to stored data, but they are the only one long-lived media address.
Long media life does not imply that the media are more reliable, only that their reliability decreases with time more slowly.
Double the reliability is only worth 1/10th of 1 percent cost increase. ...
Replacing one drive takes about 15 minutes of work. If we have 30,000 drives and 2 percent fail, it takes 150 hours to replace those. In other words, one employee for one month of 8 hour days. Getting the failure rate down to 1 percent means you save 2 weeks of employee salary - maybe $5,000 total? The 30,000 drives costs you $4m.
The $5k/$4m means the Hitachis are worth 1/10th of 1 per cent higher cost to us. ACTUALLY we pay even more than that for them, but not more than a few dollars per drive (maybe 2 or 3 percent more). Moral of the story: design for failure and buy the cheapest components you can. :-)
Eric Brewer made the same point in his 2016 FAST keynote. Because for availability and resilience against disasters they need geographic diversity, they have replicas from which to recover. So spending more to increase media reliability makes no sense, they're already reliable enough. This is because the systems that surround the drives have been engineered to deliver adequate reliability despite the current unreliability of the drives. Thus engineering out the value of more reliable drives.
Future Storage System Architecture?
What do we want from a future bulk storage system?
An object storage fabric.
With low power usage and rapid response to queries.
That maintains high availability and durability by detecting and responding to media failures without human intervention.
And whose reliability is externally auditable.
At the 2009 SOSP David Anderson and co-authors from C-MU presented FAWN, the Fast Array of Wimpy Nodes. It inspired me to suggest, in my 2010 JCDL keynote, that the cost savings FAWN realized without performance penalty by distributing computation across a very large number of very low-power nodes might also apply to storage.
Two subsequent developments suggest we were on the right track. First, Seagate's announcement of its Kinetic architecture and Western Digital's subsequent announcement of drives that ran Linux, both exploited the processing power available from the computers in the drives that perform command processing, internal maintenance operations, and signal processing to delegate computation from servers to the storage media, and to get IP communication all the way to the media, as DAWN suggested. IP to the drive is a great way to future-proof the drive interface.
Second, although flash remains more expensive than hard disk, since 2011 the gap has narrowed from a factor of about 12 to about 6. Pure Storage recently announced FlashBlade, an object storage fabric composed of large numbers of blades, each equipped with:
Compute: 8-core Xeon system-on-a-chip, and Elastic Fabric Connector for external, off-blade, 40GbitE networking,
Storage: NAND storage with 8TB or 52TB raw capacity of raw capacity and on-board NV-RAM with a super-capacitor-backed write buffer plus a pair of ARM CPU cores and an FPGA,
On-blade networking: PCIe card to link compute and storage cards via a proprietary protocol.
FlashBlade clearly isn't DAWN. Each blade is much bigger, much more powerful and much more expensive than a DAWN node. No-one could call a node with an 8-core Xeon, 2 ARMs, and 52TB of flash "wimpy", and it'll clearly be too expensive for long-term bulk storage. But it is a big step in the direction of the DAWN architecture.
DAWN exploits two separate sets of synergies:
Like FlashBlade, DAWN moves the computation to where the data is, rather then moving the data to where the computation is, reducing both latency and power consumption. The further data moves on wires from the storage medium, the more power and time it takes. This is why Berkeley's Aspire project's architecture is based on optical interconnect technology, which when it becomes mainstream will be both faster and lower-power than wires. In the meantime, we have to use wires.
Unlike FlashBlade, DAWN divides the object storage fabric into a much larger number of much smaller nodes, implemented using the very low-power ARM chips used in cellphones. Because the power a CPU needs tends to grow faster than linearly with performance, the additional parallelism provides comparable performance at lower power.
So FlashBlade currently exploits only one of the two sets of synergies. But once Pure Storage has deployed this architecture in its current relatively high-cost and high-power technology, re-implementing it in lower-cost, lower-power technology should be easy and non-disruptive. They have done the harder of the two parts.
Storage systems are extremely reliable, but at scale nowhere near reliable enough to mean data loss can be ignored. Internal auditing, in which the system detects and reports it own losses, for example by hashing the stored data and comparing the result with a stored hash, is important but is not enough. The system's internal audit function will itself have bugs, which are likely to be related to the bugs in the underlying functionality causing data loss. Having the system report "I think everything is fine" is not as reassuring as one would like.
Auditing a system by extracting its entire contents for integrity checking does not scale, and is likely itself to cause errors. Asking a storage system for the hash of an object is not adequate, the system could have remembered the object's hash instead of computing it afresh. Although we don't yet have a perfect solution to the external audit problem, it is clear that part of the solution is the ability to supply a nonce that is prepended to the object's data before hashing. The result is different every time, the system cannot simply remember it.
2020 Storage Consumption
How much storage will be consumed in 2020? I'm not an economist or a supply chain expert, and I don't have access to the relevant data, so its not clear why Dave thinks my answer would be interesting.
My simplistic answer is "however much the vendors decide to manufacture". The market will clear at some price, the vendors aren't going to be left with piles of unsellable drives. So this is a question about demand elasticity, capacity planning, and financing. I think what you really want is a graph whose X axis is investment and whose Y axis is margin dollars. Its going to be some kind of bell-shaped curve, low return at low investment and at high investment, with a sweet spot some place in the middle.
I can't locate the sweet spot for you, but here are some things I would be looking at in your shoes, specifically for the hard disk market, which is what we care about for long-term bulk storage:
By 2020 the only large growing market left for hard drives will be cloud storage.
That market has a fairly small number of large customers, which tends to depress margins.
But it has only two suppliers, which tends to increase margins.
Two-supplier markets with large customers tend to be stable, the customers don't want to end up with only one so they buy from both. (See Nvidia vs. ATI after about 1998, Boeing vs. Airbus, ...).
It seems that in this market segment unit shipments are about stable, so the increase in unit capacity is roughly matching the increase in demand.
Overall margins have not increased significantly, which would be a sign that the vendors were not satisfying demand.
Lets consider a service whose data grows 25%/yr. Oversimplifying, lets suppose that the Kryder rate is 15% and disks have a 5-year service life. The total media cost after 10 years is about 7 times the capital cost of the media at the start of the decade. Unless the data grows more slowly than the Kryder rate the cost of "keeping everything for ever" grows rapidly. At some point the cost of keeping the really old data will outweigh any value that can be extracted from it. At that point growth of "cloud storage" media will slow, but I think that point is well beyond 2020.
Both graph's "enterprise" line includes both performance and capacity (nearline) drives. According to Stifel:
High-capacity (nearline) enterprise HDD shipments are now estimated to grow from 37 million units in 2015 to 48 million units by 2020.
The graphs suggest about a 60M/year ship rate for the sector in 2015. By 2020 one would expect the performance market to be almost all flash. If Stifel is right, this means that nearline is the only growing part of the market.
It foresees demand for about 400 million units in 2016, ... It foresees 376 million disk drive shipments in 2017 and then 357m, 343m, and 333m units shipped in 2018, 2019, and 2020.
The reason disk is so cheap is that building drives is a high-volume business, with strong economies of scale. As volumes decrease, economies of scale go into reverse. This effect is amplified if the volume decrease is at the high end. Technology improvements are introduced first at the high end, where higher prices can generate a better return on the investment in making them. Then they migrate down the range. But flash is removing the market for the most expensive drives. Thus it will be harder and harder for the manufacturers to keep driving prices down.
Although disk $/GB has decreased somewhat in the last year or so, my guess is that going forward the Kryder rate for disk will be in the single digits. The effects include:
Further reduction in the cost per byte advantage of disk over flash, and thus increased erosion of disk's share in the total storage market. Maybe even a disk death spiral similar to tape's.
The need for some understanding between cloud storage customers and the disk vendors as to how the improvements Eric Brewer wants can be financed. In the near future, the nearline part of the market doesn't have the volumes needed to justify the investment, especially since some of the changes, such as a different form factor, are not relevant to the other parts of the market.
Following a successful DPLAfest 2016 in Washington, DC, we’re looking for next year’s location for another great interactive, productive, and exciting fest. DPLAfest is an annual event that brings together hundreds of people from the cultural, education, and technology communities to celebrate the Digital Public Library of America, our many partners across the country, and our large and growing community of practitioners and members of the public who contribute to, and benefit from, DPLA.
DPLAfest 2016 was co-hosted by the Library of Congress, the National Archives, and the Smithsonian Institution. Those great institutions were proud to host over 450 attendees from across the world for two-days of discussions, workshops, hands-on activities, and fun events.
DPLAfest host organizations are essential contributors to one of the most prominent gatherings in the country involving librarians, archivists, and museum professionals, developers and technologists, publishers and authors, teachers and students, and many others who work together to further the mission of providing maximal access to our shared cultural heritage. For colleges and universities, DPLAfest is the perfect opportunity to directly engage your students, educators, archivists, librarians and other information professionals in the work of a diverse national community of information and technology leaders. For public libraries, hosting DPLAfest brings the excitement and enthusiasm of our community right to your hometown, enriching your patrons’ understanding of library services through free and open workshops, conversations, and more. For museums, archives, and other cultural heritage institutions, it’s a great way to promote your collections and spotlight innovative work taking place at your organization. Hosting DPLAfest also affords the chance to promote your institution nationally and internationally, given the widespread media coverage of DPLAfest and the energy around the event.
If this opportunity sounds right for you and your organization, let us know! We are calling on universities and colleges, public libraries, archives, museums, historical societies, and others to submit expressions of interests to serve as hosts or co-hosts for DPLAfest 2017, which will take place in mid-April 2017.
To apply, review the information below and submit an expression of interest on behalf of your organization via the form at the bottom of this page. The deadline to apply is Tuesday, June 7, 2016. We will follow up with the most promising proposals shortly following the deadline.
Collaborative applications (such as between a university and a nearby public library) are encouraged. Preference will be given to applicants who can provide venue spaces which are located in the same building complex or campus. Please note that some host partners can contribute staffing or other day-of support in lieu of venue space.
Requirements of a DPLAfest 2017 Host Site
Willingness to make local arrangements and coordinate with DPLA staff and any/all staff at host institution.
An auditorium or similar space suitable for a keynote presentation (minimum 300 people).
10 or more smaller rooms for “breakout” sessions (30 – 50 people).
Preference will be given to hosts that can provide breakout rooms equipped with projection/display capabilities.
Co-location of proposed event spaces (i.e., enough session spaces in the same building or same campus).
Availability of wireless network for all attendees, potentially in excess of 350 simultaneous clients, for free or via conference sponsorship.
An organizational commitment to donate use of all venue spaces. (As a small nonprofit with limited funds, as well as a strong desire to keep DPLAfest maximally open to the public, we’re unable to pursue host proposals that are unable to offer free or deeply-discounted use of venue spaces).
Ability to provide at least one staff person for every session room to help with day-of AV support, logistical support, etc.
Commitment to diversity, inclusion, and openness to all.
Additional Desirable Qualities
Proximity to a major airport and hotels.
Location outside of the Northeast corridor and the Midwest (we’re rotating the location of DPLAfest each year; we celebrated DPLAfest 2013 in Boston, DPLAfest 2015 in Indianapolis, and DPLAfest 2016 in Washington, DC).
David Wilcox, Fedora product manager will offer a workshop entitled, Publishing Assets as Linked Data with Fedora 4 at the Library Publishing Forum (LPForum 2016) to be held at the University of North Texas Libraries, Denton, Texas on May 18 from 1:00 PM-3:30 PM. All LPForum 2016 attendees are welcome—there is no need to pre-register for this introductory-level workshop.
In our third episode of Begin Transmission, we’re lucky enough to sit down with none other than Cinthya Ippoliti. Cinthya is a LITA Blogger and Associate Dean for Research and Learning Services at Oklahoma State University. Enjoy her library tech wisdom and perspectives in this short interview.
This is the last post in a series of posts related to the Description field found in the Digital Public Library of America. I’ve been working with a collection of 11,654,800 metadata records for which I’ve created a dataset of 17,884,946 description fields.
This past Christmas I received a copy of Thing Explainer by Randall Munroe, if you aren’t familiar with this book, Randall uses only the most used ten hundred words (thousand isn’t one of them) to describe very complicated concepts and technologies.
After seeing this book I started to wonder how much of the metadata we create for our digital objects use just the 1,000 most frequent words. Often frequently used words, as well as less complex words (words with fewer syllables) are used in the calculation of the reading level of various texts so that also got me thinking about the reading level required to understand some of our metadata records.
Along that train of thought, one of the things that we hear from aggregations of cultural heritage materials is that K-12 users are a target audience we have and that many of the resources we digitize are with them in mind. With that being said, how often do we take them into account when we create our descriptive metadata?
When I was indexing the description fields I calculated three metrics related to this.
What percentage of the tokens are in the 1,000 most frequently used English words
What percentage of the tokens are in the 5,000 most frequently used English words
What percentage of the tokens are words in a standard English dictionary.
From there I was curious about how the different providers compared to each other.
Average for 1,000, 5,000 and English Dictionary
1,000 most Frequent English Words
The first thing we will look at is the average of amount of a description composed of words from the list of the 1,000 most frequently used English words.
Average percentage of description consisting of 1000 most frequent English words.
For me the providers/hubs that I notice are of course bhl that has very little usage of the 1,000 word vocabulary. This is followed by smithsonian, gpo, hathitrust and uiuc. On the other end of the scale is virginia that has an average of 70%.
5,000 most Frequent English Words
Next up is the average percentage of the descriptions that consist of words from the 5,000 most frequently used English words.
Average percentage of description consisting of 5000 most frequent English words.
This graph ends up looking very much like the 1,000 words graph, just a bit higher percentage wise. This is due to the fact of course that the 5,000 word list includes the 1,000 word list. You do see a few changes in the ordering though, for example gpo switches places with hathitrust in this graph over the 1,000 words graph above.
English Dictionary Words
Next is the average percentage of descriptions that consist of words from a standard English dictionary. Again this includes the 1,000 and 5,000 words in that dictionary so it will be even higher.
Average percentage of description consisting of English dictionary words.
You see that the virginia hub has almost 100% or their descriptions consisting of English dictionary words. The hubs that are the lowest in their use of English words for descriptions are bhl, smithsonian, and nypl.
The graph below has 1,000, 5,000, and English Dictionary words grouped together for each provider/hub so you can see at a glance how they stack up.
1,000, 5,000 most frequent English words and English dictionary words by Provider
Stacked Percent 1,000, 5,000, English Dictionary
Next we will look at the percentages per provider/hub if we group the percentage utilization into 25% buckets. This gives a more granular view of the data than just the averages presented above.
Percentage of descriptions by provider that use 1,000 most frequent English words.
Percentage of descriptions by provider that use 5,000 most frequent English words.
Percentage of descriptions by provider that use English dictionary words.
I don’t think it is that much of a stretch to draw parallels between the language used in our descriptions and the intended audience of our metadata records. How often are we writing metadata records for ourselves instead of our users? A great example that comes to mind is “verso” or “recto” that we use often for “front” and “back” of items. In the dataset I’ve been using there are 56,640 descriptions with the term “verso” and 5,938 with the term “recto”.
I think we should be taking into account our various audiences when we are creating metadata records. I know this sounds like a very obvious suggestion but I don’t think we really do that when we are creating our descriptive metadata records. Is there a target reading level for metadata records? Should there be?
Looking at the description fields in the DPLA dataset has been interesting. The kind of analysis that I’ve done so far can be seen as kind of a distant reading of these fields. Big round numbers that are pretty squishy and only show the general shape of the field. To dive in and do a close reading of the metadata records is probably needed to better understand what is going on in these records.
Based on experience of mapping descriptive metadata into the Dublin Core metadata fields, I have a feeling that the description field is generally a dumping ground for information that many of us might not consider “description”. I sometimes wonder if it would do our users a greater service by adding a true “note” field to our metadata models so that we have a proper location to dump “notes and other stuff” instead of muddying up a field that should have an obvious purpose.
That’s about it for this work with descriptions, or at least it is until I find some interest in really diving deeper into the data.
If you have questions or comments about this post, please let me know via Twitter.
The Harvard Digital Repository Service provides long-term preservation and access to materials from over fifty libraries, archives and museums at Harvard. It’s been in production for about fifteen years. The next generation of the DRS, with increased preservation capabilities, was recently launched, so this is an ideal time to evaluate the DRS and consider how it might be improved in the future. I hope to identify areas needing new policies and/or documentation and, in doing so, help the DRS improve its services. The DRS staff also hope to eventually seek certification as a trusted digital repository and this project will prepare them.
When I started the project, my first step was to become familiar with the ISO16363 standard. I read through it several times and tried to parse out the meaning of the metrics. Sometimes this was straightforward and I found the metric easy to understand. For others, I had to read through a few times before I fully understood what the metric was asking for. I also found it helpful to write down notes about what they meant and put it in my own words. I read about other people’s experiences performing audits, which was very helpful and gave me some ideas about how to go about the process. In particular, I found David Rosenthal’s blogs posts about the CLOCKSS self-audit helpful, as they used the same standard, ISO16363.
By Julie Seifert
Inspired by the CLOCKSS audit, I created a Wiki with a different page for each metric. On these pages, I copied the text from the standard and included space for my notes. I also created an Excel sheet to help track my findings. In the Excel sheet, I gave each metric its own row and , in that row, a column about documentation and a column that linked to the Wiki. (I blogged more about the organization process.)
I reviewed the DRS documentation, interviewed staff members about metrics and asked them to point me to relevant documentation. I realized that many of the actions required by the metric were being performed at Harvard but these actions and policies weren’t documented. Everyone in the organization knew that they happened but sometimes no one had written them down. In my notes, I indicated when something was being done but not documented versus when something was not being done at all. I used a Green, Yellow, Red color scheme in the Excel sheet for the different metrics, with yellow indicating things that were done but not documented.
The assessment was the most time-consuming part. In thinking about how to best summarize and report on my findings, I am looking for commonalities among the gap areas. It’s possible that many of the gaps are similar and several gaps could be filled with a single piece of documentation. For example, many of the “yellow” areas have to do with ingest workflows, so perhaps a single document about this workflow could fill all these gaps at once. I hope that finding the commonalities among the gaps can help the DRS fill these gaps most effectively and efficiently.
Looking to the future, the next big step will be for the very concept of the “device” to fade away. Over time, the computer itself—whatever its form factor—will be an intelligent assistant helping you through your day. We will move from mobile first to an AI first world.
My Library recently finalized a Vision Document for our virtual library presence. Happily, our vision was aligned with the long-term direction of technology as understood by movers and shakers like Google.
In its place, a new mode of information retrieval and creation will move us away from the paper-based metaphor of web pages. Information will be more ubiquitous. It will be more free-form, more adaptable, more contextualized, more interactive.
Part of this is already underway. For example, people are becoming a data set. And other apps are learning about you and changing how they work based on who you are. Your personal data set contains location data, patterns in speech and movement around the world, consumer history, keywords particular to your interests, associations based on your social networks, etc.
All of this information makes it possible for emerging AI systems like Siri and Cortana to better serve you. Soon, it will allow AI to control the flow of information based on your mood and other factors to help you be more productive. And like a good friend that knows you very, very well, AI will even be able to alert you to serendipitous events or inconveniences so that you can navigate life more happily.
People’s expectations are already being set for this kind of experience. Perhaps you’ve noticed yourself getting annoyed when your personal assistant just fetches a Wikipedia article when you ask it something. You’re left wanting. What we want is that kernel of gold we asked about. But what we get right now, is something too general to be useful.
But soon, that will all change. Nascent AI will soon be able to provide exactly the piece of information that you really want rather than a generalized web page. This is what Google means when they make statements like “AI First” or “the Web will die.” They’re talking about a world where information is not only presented as article-like web pages, but broken down into actual kernels of information that are both discrete and yet interconnected.
AI First in the Library
Library discussions often focus on building better web pages or navigation menus or providing responsive websites. But the conversation we need to have is about pulling our data out of siloed systems and websites and making it available to all modes like AI, apps and basic data harvesters.
You hear this conversation in bits and pieces. The ongoing linked data project is part of this long-term strategy. So too with next-gen OPACs. But on the ground, in our local strategy meetings, we need to tie every big project we do to this emerging reality where web browsers are increasingly no longer relevant.
Sign up for this fun, informative, and hands on ALA Annual pre-conference
Technology Tools and Transforming Librarianship Friday June 24, 2016, 1:00 – 4:00 pm Presenters: Lola Bradley, Reference Librarian, Upstate University; Breanne Kirsch, Coordinator of Emerging Technologies, Upstate University; Jonathan Kirsch, Librarian, Spartanburg County Public Library; Rod Franco, Librarian, Richland Library; Thomas Lide, Learning Engagement Librarian, Richland Library
Technology envelops every aspect of librarianship, so it is important to keep up with new technology tools and find ways to use them to improve services and better help patrons. This hands-on, interactive preconference will teach six to eight technology tools in detail and show attendees the resources to find out about 50 free technology tools that can be used in all libraries. There will be plenty of time for exploration of the tools, so please BYOD! You may also want to bring headphones or earbuds.
Lola Bradley is a Public Services Librarian at the University of South Carolina Upstate Library. Her professional interests include instructional design, educational technology, and information literacy for all ages.
Breanne Kirsch is a Public Services Librarian at the University of South Carolina Upstate Library. She is the Coordinator of Emerging Technologies at Upstate and the founder and current Co-Chair of LITA’s Game Making Interest Group.
Jonathan Kirsch is the Head Librarian at the Pacolet Library Branch of the Spartanburg County Public Libraries. His professional interests include emerging technology, digital collections, e-books, publishing, and programming for libraries.
Rod Franco is a Librarian at Richland Library, Columbia South Carolina. Technology has always been at the forefront of any of his library related endeavors.
Thomas Lide is the Learning Engagement Librarian at Richland Library, Columbia South Carolina. He helps to pave a parallel path of learning for community members and colleagues.
The American Library Association (ALA) joined the Harry Potter Alliance in launching “Spark,” an eight-part video series developed to support and guide first-time advocates who are interested in advocating at the federal level for issues that matter to them. The series, targeted to viewers aged 13–22, will be hosted on the YouTube page of the Harry Potter Alliance, while librarians and educators are encouraged to use the videos to engage young people or first time advocates. The video series was launched today during the 42nd annual National Library Legislative Day in Washington, D.C.
The video series provides supporting information for inexperienced grassroots advocates, covering everything from setting up in-person legislator meetings to the process of constructing a campaign. By breaking down oft-intimidating “inside the Beltway” language, Spark provides an accessible set of tools that can activate and motivate young advocates for the rest of their lives. The video series also includes information on writing press releases, staging social media campaigns, using library resources for research or holding events, and best practices for contacting elected officials.
“We are pleased to launch Spark, a series of interactive advocacy videos. We hope that young or new advocates will be inspired to start their own campaigns, and that librarians and educators will be able to use the series to engage young people and get them involved in advocacy efforts.” said Emily Sheketoff, executive director of the American Library Association’s Washington Office.
Janae Phillips, Chapters Director for the Harry Potter Alliance, added, “I’ve worked with youth for a many years now, and I’ve never met a young person who just really didn’t want to get involved – they just weren’t sure how! I think this is true for adults who have never been involved in civic engagement before, too. I hope that Spark will be a resource to people who have heard a lot about getting engaged in the political process but have never been sure where to start, and hopefully—dare I say—spark some new ideas and action.
This post will try and pull together some of the data from the different fields listed above and present them in a way that we will hopefully be able to use to derive some meaning from.
More Description Length Discussion
In the previous posts I’ve primarily focused on the length of the description fields. There are two other fields that I’ve indexed that are related to the length of the description fields. These two fields include the number of tokens in a description and the average token length of fields.
I’ve included those values below. I’ve included two mean values, one for all of the descriptions in the dataset (17,884,946 descriptions) and in the other the descriptions that are 1 character in length or more (13,771,105descriptions).
Mean – Total
Mean – 1+ length
The graphs below are based on the numbers of just descriptions that are 1+ length or more.
This first graph is being reused from a previous post that shows the average length of description by Provider/Hub. David Rumsey and the Getty are the two that average over 250 characters per description.
Average Description Length by Hub
It shouldn’t surprise you that David Ramsey and Getter are two of the Providers/Hubs that have the highest average token counts, with longer descriptions generally creating more tokens. There are a few differences that don’t match this though, USC that has an average of just over 50 characters for the average description length comes in as the third highest in the average token counts at over 40 tokens per description. There are a few other providers/hubs that look a bit different than their average description length.
Average Token Count by Provider
Below is a graph of the average token lengths by providers. The lower the number is the lower average length of a token. The mean for the entire DPLA dataset for descriptions of length 1+ is just over 5 characters.
Average Token Length by Provider
That’s all I have to say about the various statistics related to length for this post. I swear!. Next we move on to some of the other metrics that I calculated when indexing things.
Other Metrics for the Description Field
Throughout this analysis I had a question of when to take into account that there were millions of records in the dataset that had no description present. I couldn’t just throw away that fact in the analysis but I didn’t know exactly what to do with them. So below I present statistics for the average of many of the fields I indexed as both the mean of all of the descriptions and then the mean of just the descriptions that are one or more characters in length. The graphs that follow the table below are all based on the subset of descriptions that are greater than or equal to one character in length.
Mean – Total
Mean – 1+ length
Stopwords are words that occur very commonly in natural language. I used a list of 127 stopwords for this work to help understand what percentage of a description (based on tokens) is made up of stopwords. While stopwords generally carry little meaning for natural language, they are a good indicator of natural language, so providers/hubs that have a higher percentage of stopwords would probably have more descriptions that resemble natural language.
Percent Stopwords by Provider
I was curious about how much punctuation was present in a description on average. I used the following characters as my set of “punctuation characters”
I found the number of characters in a description that were made up of these characters vs other characters and then divided the number of punctuation characters by the total description length to get the percentage of the description that is punctuation.
Percent Punctuation by Provider
Punctuation is common in natural language but it occurs relatively infrequently. For example that last sentence was eighty characters long and only one of them was punctuation (the period at the end of the sentence). That comes to a percent_punctuation of only 1.25%. In the graph above you will see the the bhl provider/hub has over 50% of their description with 25-49% punctuation. That’s very high when compared to the other hubs and the fact that there is an average of about 5% overall for the DPLA dataset. Digital Commonwealth has a percentage of descriptions that are from 50-74% punctuation which is pretty interesting as well.
Next up in our list of things to look at is the percentage of the description field that consists of integers. For review, integers are digits, like the following.
I used the same process for the percent integer as I did for the percent punctuation mentioned above.
Percent Integer by Provider
You can see that there are several providers/hubs that have quite a high percentage integer for their descriptions. These providers/hubs are the bhl and the smithsonian. The smithsonian has over 70% of its descriptions with percent integers of over 70%.
Once we’ve looked at punctuation and integers, that leaves really just letters of the alphabet to makeup the rest of a description field.
That’s exactly what we will look at next. For this I used the following characters to define letters.
I didn’t perform any case folding so letters with diacritics wouldn’t be counted as letters in this analysis, but we will look at those a little bit later.
Percent Letter by Provider
For percent letters you would expect there to be a very high percentage of the descriptions that themselves contain a high percentage of letters in the description. Generally this appears to be true but there are some odd providers/hubs again mainly bhl and the smithsonian, though nypl, kdl and gpo also seem to have a different distribution of letters than others in the dataset.
The next thing to look at was the percentage of “special characters” used in a description. For this I used the following definition of “special character”. If a character is not present in the following list of characters (which also includes whitespace characters) then it is considered to be a “special character”
A note in reading the graph above, keep in mind that the y-axis is only 95-100% so while USC looks different here it only represents 3% of its descriptions that have 50-100% of the description being special characters. Most likely a set of descriptions that have metadata created in a non-english language.
The final graph I want to look at in this post is the percentage of descriptions for a provider/hub that has a URL present in its description. I used the presence of either http:// or https:// in the description to define if it does or doesn’t have a URL present.
Percent URL by Provider
The majority providers/hubs don’t have URLs in their descriptions with a few obvious exceptions. The provider/hubs of washington, mwdl, harvard, gpo and david_ramsey do have a reasonable number of descriptions with URLs with washington leading with almost 20% of their descriptions having a URL present.
Again this analysis is just looking at what high-level information about the descriptions can tell us. The only metric we’ve looked at that actually goes into the content of the description field to pull out a little bit of meaning is the percent stopwords. I have one more post in this series before we wrap things up and then we will leave descriptions in the DPLA along for a bit.
If you have questions or comments about this post, please let me know via Twitter.
Let’s face it. When it comes to relevant open data and transparency in European decision-making, we have a lot to do. Despite growing open data portals, and aggregating European data portal, if you want to make sense of European decision-making and public finance, it takes a lot of efforts.
Dieter Schalk / Open State Foundation
The time is ripe. With the Dutch referendum on the EU-Ukraine Association Agreement and Brexit, debates around immigration and refugees, new bailout talks between the EU and Greece, decisions by the EU affect millions of citizens living and working within its member states and people around the world. As everyone has the right to information, people need to know how these decisions are taken, who participates in preparing them, who receives funding, how you can make your views known, and what information is held or produced to develop and adopt those decisions.
In the wake of the Panama Papers, renewed calls for open company registers and registers on beneficial ownership, the need for open spending, contracting and tenders data, require us to come together, join efforts and help to make the EU more transparent.
TransparencyCamp Europe comes at the right moment. This unconference on open government and open data, to be held on June 1 in Amsterdam will bring together developers, journalists, open data experts, NGOs, policymakers, and activists. In the run-up, an online European-wide open data App Competition (deadline for submissions May 1) and a number of local events or diplohacks are organized. This will all come together at TransparencyCamp Europe, where apart from numerous sessions organized by participants themselves, developers will present their open data app to a jury.
Dieter Schalk / Open State Foundation
EU decision making is quite complex, involving national governments and parliaments, the European Commission and the EuropeanParliament, the European Council and the many EU institutions and agencies involved. Still, there is already quite some open data available, differing in quality and ease of use. Definitely, you want to know more about the EU’s institutions, who work there and how you can contact them. Although the information is available at the EU Whoiswho website, the data is not easily reusable. That is why we scrapped it and had made it available to you on GitHub as CSV and JSON. And if you’re crawling through information on EU budgets, finances, funds, contracts and beneficiaries, you’ll notice there is much room for improvement.
Do you hear an echo? That’s the sound of thousands of library advocates speaking up all over the country. Starting today, May 2nd, almost 400 librarians converge in Washington, DC for National Library Legislative Day (NLLD). They’ve come from all over the nation to tell Members of Congress and their staffs what librarians’ legislative priorities are, but they need library supporters everywhere to help amplify the messages they’ll be delivering in person by participating in Virtual Library Legislative Day (VLLD).
Photo by mikael altemark
This week, while hundreds of librarians and library supporters are hitting the Hill, visit ALA’s Legislative Action Center to back them up. You’ll find everything you’ll need to call, email and/or tweet at your Representative and Senators.
The more messages Congressional offices get about ALA’s top 2016 Legislative Day Priorities all this week (May 3-6), the better!
Ask them to:
CONFIRM Dr. Carla Hayden as the next Librarian of Congress #Hayden4LOC
SUPPORT LSTA and Innovative Approaches to Literacy (IAL) Funding
PASS Electronic Communications Privacy Act Reform
RATIFY the Marrakesh Treaty for the print-disabled ASAP
For more background information about these issues, take a look at the one-page issue briefs that NLLD participants will receive when they get to Washington and that they’ll be sharing with Congressional offices.
This year, ALA has also partnered with the Harry Potter Alliance (HPA) – a group of incredible, library-loving young people who’ve already made a huge impact with their advocacy work worldwide. So far, their members have pledged over 500 calls, emails or tweets to Congress for VLLD 2016. Let’s show those wizard punks how it’s really done and send 1,000 messages to Congress of our own!
Together, we can make the Capitol echo all this week with the voices and messages of library advocates in Washington and online.
Join us for Virtual Library Legislative Day 2016! (And don’t forget to follow along on social media #nlld16.)
This weekend, I posted a new MarcEdit update. This is one of the biggest changes that I’ve made in a while. While the actual changelog is brief – these changes represented ~17k lines of code Windows (~10K not related to UI work) and ~15.5k lines of code on the OSX side (~9K not related to UI work).
Specific changes added to MarcEdit:
Enhancement: UNIMARC Tools: Provides a lite-weight tool to convert data to MARC21 from UNIMARC and to UNIMARC from MARC21.
Enhancement: Replace Function: Option to support External search/replace criteria.
Enhancement: MARCEngine COM Object Updates
Enhancement: UNIMARC Tools: Provides a lite-weight tool to convert data to MARC21 from UNIMARC and to UNIMARC from MARC21.
Enhancement: Replace Function: Option to support External search/replace criteria.
Update: Installation has been changed to better support keeping configuration information sync’d between updates.
Bug Fix: Add/Delete Function — Add field if not a duplicate: Option wasn’t always working. This has been corrected.
I’m created some videos to demonstrate how these two elements work, and then a third video showing how to use the Add Field if not a duplicate (added in the previous update). You can find these videos here:
Austin, TX On April 29, 2016, Robert Miller, CEO of LYRASIS and Debra Hanken Kurtz, CEO of DuraSpace presented the third in a series of online Town Hall Meetings. They reviewed how their organizations came together to investigate a merger in order to build a more robust, inclusive, and truly global community with multiple benefits for members and users. They also unveiled a draft mission statement for the merged organization and provided updates on the status of the proposed merge.
I offer up two tendentious lists. First, some problems in the domain of library software that are natural to work on, and in the hopeful future, solve:
Helping people find stuff. On the one hand, this surely comes off as simplistic; on the other hand, it is the core problem we face, and has been the core problem of library technology from the very moment that a library’s catalog grew too large to stay in the head of one librarian. There are of course a number of interesting sub-problems under this heading:
Helping people produce and maintain useful metadata.
Usefully aggregating metadata.
Helping robots find stuff (presumably with the ultimate purpose of helping people to find stuff).
Artificial intelligence. By this I’m not suggesting that library coders should be aiming to have an ILS kick off the Singularity, but there’s plenty of room for (e.g.) natural language processing to assist in the overall task of helping people find stuff.
Helping people evaluate stuff. “Too much information, little knowledge, less wisdom” is one way of describing the glut of bits infesting the Information Age. Libraries can help and should help—even though pitfalls abound.
Helping people navigate software and information resources. This includes UX for library software, but also a lot of other software that librarians, like it or not, find themselves helping patrons use. There are some areas of software engineering where the programmer can assume that the user is expert in the task that the software assists with; library software isn’t one of them.
Sharing stuff. What is Evergreen if not a decade-long project in figuring out ways to better share library materials among more users? Sharing stuff is not a solved problem even for digital stuff.
Keeping stuff around. This is an increasingly difficult problem. Time was, you could leave a pile of books sitting around and reasonably expect that at least a few would still exist five hundred years hence. Digital stuff never rewards that sort of carelessness.
Protecting patron privacy. This nearly ended up in the unnatural list—a problem can be unnatural but nonetheless crucial to work on. However, since there’s no reason to expect that people will stop being nosy about what other people are reading—and for that nosiness to sometimes turn into persecution—here we are.
Authentication. If the library keeps any transaction information on behalf of a patron so that they can get to it later, the software had better be trying to make sure that only the correct patron can see it. Of course, one could argue that library software should never store such information in the first place (after, say, a loan is returned), but I think there can be an honest conflict with patrons’ desires to keep track of what they used in the past.
Second, some distinctly unnatural problems that library technologists all too often must work on:
Digital rights management. If Ambrose Bierce were alive, I would like to think that he might define DRM in a library context thus: “Something that is ineffective in its stated purpose—and cannot possible be effective—but which serves to compromise libraries’ commitment to patron privacy in the pursuit of a misunderstanding about what will keep libraries relevant.”
Walled garden maintenance. Consider EZproxy. It takes the back of a very small envelope to realize that hundreds of thousands of person-hours have been expended fiddling with EZproxy configuration files for the sake of bolstering the balance sheets of Big Journal. Is this characterization unfair? Perhaps. Then consider this alternative formulation: the opportunity cost imposed by time spent maintaining or working around barriers to the free exchange of academic publications is huge—and unlike DRM for public library ebooks, there isn’t even a case (good, bad, or indifferent) to be made that the effort results in any concrete financial compensation to the academics who wrote the journal articles that are being so carefully protected.
Authorization. It’s one thing to authenticate a patron so that they can get at whatever information the library is storing on their behalf. It’s another thing to spend time coding authentication and authorization systems as part of maintaining the walled gardens.
The common element among the problems I’m calling unnatural? Copyright; in the particular, the current copyright regime that enforces the erection of barriers to sharing—and which we can imagine, if perhaps wistfully, changing to the point where DRM and walled garden maintenance need not occupy the attention of the library programmer, who then might find more time to work on some of the natural problems.
Why is this on my mind? I would like to give a shout-out to (and blow a raspberry at) an anonymous publisher who had this to say in a recent article about Sci-Hub:
And for all the researchers at Western universities who use Sci-Hub instead, the anonymous publisher lays the blame on librarians for not making their online systems easier to use and educating their researchers. “I don’t think the issue is access—it’s the perception that access is difficult,” he says.
I know lots of library technologists who would love to have more time to make library software easier to use. Want to help, Dear Anonymous Publisher? Tell your bosses to stop building walls.
Looking at the #panamapapers capture I've been doing we have, 1,424,682 embedded image urls from 3,569,960 tweets. I'm downloading the 1,424,682 images now, and hope to do something similar to what I did with the #elxn42 images. While we're waiting for the images to download, here are the 10 most tweeted embedded image urls:
With DPLAfest 2016 larger than ever, we reached out to a few attendees ahead of the event to help us capture the (many) diverse experiences of fest participants. These ‘special correspondents’ have graciously volunteered to share their personal perspectives on the fest. In this first guest post by our special correspondents, Sara Stephenson, Kerry Dunne, and Emily Pfotenhauer reflect on their fest experiences from the perspectives of their fields and interests: ebooks, education, and the growing DPLA hub network.
Ebooks and Access at DPLAfest
By Sara Stephenson, Virtual Services Coordinator, St Mary’s County Library
The Library of Congress Jefferson Building, host location for DPLAfest Day 1
DPLAfest 2016 was an informative and exciting conference in a fantastic environment. The Library of Congress and the National Archives are great locations for a conference focusing on archival collections, ebooks, and access. And lunch in the Great Hall of the Library of Congress’ Jefferson Building was certainly a highlight! I did not make it to any of the sessions in the Smithsonian, but I expect it was an equally ideal location.
I am a librarian in a small public library in Maryland, but I also came to DPLAfest as a representative of ReadersFirst, an organization made up of nearly 300 libraries that is working to ensure access to free and easy-to-use library ebooks. The conversations surrounding ebooks at DPLAfest were engaging and provided new information and ideas. For example, during a panel session on ebook research and advocacy in which ReadersFirst was participating, I learned about the Charlotte Initiative, a group working to research various aspects of ebooks in academic libraries in an effort to start discussions about best practices for publishers, librarians, and educators. Meeting and talking with others who work with ebooks in libraries was enlightening– so many of us are working toward the same goals and we could accomplish even more if we work together. DPLAfest facilitated this kind of communication that I believe will continue beyond the conference through the Ebook Working Group.
Though I spent most of DPLAfest thinking and talking about ebooks, I also attended a few sessions relating to archival collections and DPLA service and content hubs. It was fascinating to hear about the difficulties of connecting collections in the widespread western part of the country, and to see some of the many digital collections and exhibits using Omeka as a platform. Ultimately, I left DPLAfest 2016 with a better understanding of the ebook landscape in libraries and within DPLA, as well as a great deal of information relating to digital collections in general. I’m excited to continue the ebook conversation within my own library, within my state, and nationally.
Loving DC, and Thinking About Educational Applications for Digital Resources at DPLAFest
By Kerry Dunne, Director of History and Social Studies for Boston Public Schools.
Franky Abbott from the DPLA invited several members of the Education Advisory Committee to serve on a panel at DPLAfest discussing the Primary Source Sets project. It was great to be able to share our work with an enthusiastic audience! And, as I attended other sessions at DPLAFest, it brought home the point that our university, library, and public/private institutional archives, now largely digitized, are too rarely utilized by K-12 teachers.
At DPLAFest, I attend several showcase sessions, and was particularly intrigued a short presentation by a team curating TV footage via the American Archive of Public Broadcasting . Finding short news clips of historical events is a labor intensive-endeavor for history teachers, but adds tremendous value to student learning experiences;
One cannot go to DC in April without sneaking out to the tidal basin to see cherry blossoms in bloom.
I see incredible potential for developing educational applications for digital archives such as this one. For many institutions, cataloguing and digitizing media and print collections is step one, but I would love to see the development of educational resources and training to help teachers and students access and use the collections for educational purposes become step two.
The DPLA has itself provided a model of this process with the creation of its Education Advisory Committee, which it commissioned to assist with the production of user-friendly primary source sets for educators, saving teachers the work of sifting through thousands of items by identifying 10-15 “gems” on a range of topics and providing questions for student analysis of these items.
I would be happy to work with organizations attending DPLAFest to consider how their collections can best be make accessible, and useful to educators, an endeavor that often involves more marketing than work product. Please reach out to discuss further!
Building DPLA Hubs Across the Country: No Two are the Same
By Emily Pfotenhauer, Recollection Wisconsin Program Manager, WiLS
DPLAFest 2016 was my first Fest experience, and the first time Wisconsin was represented at the Fest as an official Service Hub. The Recollection Wisconsin Service Hub came on board with DPLA this past summer, and we just recently handed off our metadata feed for our first ingest (going live very soon!).
As part of a newbie Hub, I spent much of my time at DPLAfest attending presentations from fellow Hubs. It was fascinating to see such a broad range of approaches the Service Hubs are undertaking to get to the same core functions of bringing together metadata and passing it to DPLA. No two Hubs make it work in exactly the same way, but each one is built on an essential foundation of collaboration across multiple institutions in their state or region.
Kerri Willette introduces the Empire State Digital Network Service Hub
In “A Look at New York’s DPLA Service Hub from the Ground Up,” Empire State Digital Network Manager Kerri Willette and three regional liaisons – Susan D’Entremont, Laura Osterhout and Jennifer Palmentiero –shared how ESDN has leveraged existing regional collaboratives to create a robust, distributed network. The “Wide Open Spaces: Bringing the Rest of the West into DPLA” session highlighted the challenges of collaboration in western states, where populations and resources tend to be spread more thinly. Sandra McIntyre of Mountain West Digital Library and Adrian Turner of the California Digital Library described their work to sustain and expand existing initiatives, while Jodi Allison-Bunnell of Orbis Cascade Alliance outlined efforts to form a new Service Hub from scratch in the Pacific Northwest.
A pre-fest workshop for Service Hubs provided an in-depth look at the newly launched RightsStatements.org, DPLA’s groundbreaking work with Europeana to develop simplified, standardized labels for the copyright status of materials in digital collections. Presenters Melissa Levine of the University of Michigan, Greg Cram of New York Public Library and Dave Hansen of UNC School of Law shared their extensive expertise in copyright law and its impact on open access to cultural heritage (ultimately, it’s less of a barrier than we often assume). This workshop marked the beginnings of a conversation among the Service Hubs about how we can help our contributing partners adopt these rights statements in their digital collections.
For me, one of the most rewarding experiences of DPLAFest was the chance to meet in person some of the amazing people I’ve previously connected with over email, phone calls or Twitter. I now have many faces I can put with names from across the country. These face-to-face connections are especially important for me in a telecommuting position, when many workdays are just me and my laptop. To that end, I’ll close by quoting a tweet from Dan Cohen, referencing an observation by author Virginia Heffernan:
Deep truth by @page88 at #DPLAfest that I really felt at this yr’s fest—living much of your life online makes meeting in person very special
All three releases are bugfix releases. With these releases, support for Debian Squeeze is dropped, as that release of Debian is no longer supported or available. Also, 2.8.8 is the last scheduled release in the 2.8.x series, although future security releases may be made if warranted.
Please visit the downloads page to retrieve the server software and staff clients.
The LITA Forum is a highly regarded annual event for those involved in new and leading edge technologies in the library and information technology field. Please send your proposal submissions here by May 13, 2016, and join your colleagues in Fort Worth Texas.
The Forum Committee welcomes proposals for full-day pre-conferences, concurrent sessions, or poster sessions related to all types of libraries: public, school, academic, government, special, and corporate. Collaborative and interactive concurrent sessions, such as panel discussions or short talks followed by open moderated discussions, are especially welcomed. We deliberately seek and strongly encourage submissions from underrepresented groups, such as women, people of color, the LGBT community and people with disabilities.
Proposals could relate to, but are not restricted to, any of the following topics:
Discovery, navigation, and search
Practical applications of linked data
Library spaces (virtual or physical)
Cybersecurity and privacy
Open content, software, and technologies
Hacking the library
Scalability and sustainability of library services and tools
Consortial resource and system sharing
“Big Data” — work in discovery, preservation, or documentation
Library I.T. competencies
Proposals may cover projects, plans, ideas, or recent discoveries. We accept proposals on any aspect of library and information technology. The committee particularly invites submissions from first time presenters, library school students, and individuals from diverse backgrounds.
Vendors wishing to submit a proposal should partner with a library representative who is testing/using the product.
Presenters will submit final presentation slides and/or electronic content (video, audio, etc.) to be made available on the web site following the event. Presenters are expected to register and participate in the Forum as attendees; a discounted registration rate will be offered.
If you have any questions, contact Tammy Allgood Wolf, Forum Planning Committee Chair, at firstname.lastname@example.org.
On the afternoon of Friday, June 24 from 1 - 4pm myself and Stephen Perkins will be delivering another half-day Islandora for Managers: Open Source Digital Repository Training session at the Library and Information Technology Association (LITA) American Library Association (ALA) conference in Orlando, Florida. There is a registration fee for this workshop. One can purchase a ticket during the ALA conference registration process.