Planet Code4Lib

Hydrax / FOSS4Lib Updated Packages

Last updated May 23, 2017. Created by Peter Murray on May 23, 2017.
Log in to edit this page.

Hyrax is a front-end based on the robust Hydra framework, providing a user interface for common repository features. Hyrax offers the ability to create repository object types on demand, to deposit content via multiple configurable workflows, and to describe content with flexible metadata. Numerous optional features may be turned on in the administrative dashboard or added through plugins. It is implemented as a Rails engine, so it may be the base of, or added to, a Rails application. Hyrax is the consolidation of Sufia and the CurationConcerns gems and behaves in much the same way.

Package Type: 
License: 
Development Status: 
Operating System: 

Releases for Hydrax

Programming Language: 
Open Hub Stats Widget: 
works well with: 

A Research and Learning Agenda for Archives, Special, and Distinctive Collections / HangingTogether

[This post is contributed by our Practitioner Researcher-in-Residence, Chela Scott Weber]

Network rectangle board rings | from Pixabay

Network rectangle board rings | from Pixabay

Some of you may have seen the recent announcement that I’m working with the good folks here at OCLC Research through the end of June, to help shape a research and learning agenda around issues related to archives, special, and distinctive collections in research libraries. In this and a series of upcoming blog posts, I’ll be sharing details about that work. Today I’ll talk a little bit about goals and process, and in later posts I’ll talk more about content.

The goals for this work are twofold. First, to build a guiding agenda for the OCLC Research work in this space over the next several years that is truly aligned with current and emerging needs in the OCLC Research Library Partnership community. Second, to engage in a transparent and iterative approach to building the agenda, with significant input from the RLP. While I’m leading the effort, I’m certainly not doing it alone. Merrilee Proffitt and Jackie Dooley are active collaborators, as are an advisory board who have generously offered their time and expertise to meet with and advise me regularly throughout the process. This group is Athena Jackson (Penn State), Erin O’Meara (University of Arizona), Michelle Light (UNLV), Sharon Farb (UCLA), and Tom Hyry (Harvard/Houghton Library).

I’m now more than a month into the project, which I started with a review of the last 8-10 years of work OCLC Research has done in this space– reading papers, watching webinar recordings, and revisiting conference proceedings. I did this to get an overview of the work, and to identify trajectories that might not yet be complete. I also wanted to get a sense of the full range of the outputs and activities they’ve undertaken, to inform what approaches might best suit future research needs.

I am currently having conversations with colleagues throughout the profession in order to identify major areas of challenge and opportunity, and then try to drill down to better define the problem spaces and think about what kinds of activities and outputs might be helpful to address them. I’ve been talking to people in leadership roles at RLP institutions, as well as specialists with expertise in specific areas like audio/visual collections and born-digital records.

My hope has been to get a well-rounded sense of how issues play out at different levels of the enterprise, from the overarching view of an administrator to the on-the-ground perspective of the librarians, archivists, and conservators working closely with collections and researchers.

Over the next few weeks, I’ll be working on shaping what I’ve learned through my reading and conversations into a draft research agenda, and will be sharing that draft for feedback in a number of ways. We will be hosting an invitational working meeting in June at the  RBMS Conference in Iowa City, where we’ll be convening a small group of leaders from RLP institutions to react to an early stage draft of the agenda, and inform where further work is needed. We’ll then host another similar event in July at the Society of American Archivists Annual Meeting in Portland, asking invited colleagues to engage with the next iteration of the agenda. I’ll also be sharing drafts with colleagues for written feedback throughout. The finalized agenda will be rolled out at the OCLC Research Library Partnership Meeting in November.

Stay tuned for further updates about work on the agenda, and opportunities to give feedback.

Interim Executive Director Announced / DPLA

As part of DPLA’s ongoing transition to a new executive director, the Board of Directors has named Michele Kimpton the interim executive director until a new executive director is hired.

Kimpton is currently the Business Development Director and Senior Strategist at DPLA, and has been working on sustainability models, an ebook pilot, technical services, and strengthening DPLA’s member network. She brings to the interim role considerable experience running similar organizations. Prior to joining DPLA, she worked as Chief Strategist for LYRASIS and CEO of DuraSpace.

“DPLA is fortunate to have a strong senior leadership team in place during this transitionary period, and the board looks forward to working with Michele, Director for Content Emily Gore, Director of Technology Michael Della Bitta, and the rest of the staff to continue to further DPLA’s mission in the months ahead,” said the board president, Amy Ryan.

Kimpton will take on the interim role following the departure of founding executive director Dan Cohen on June 1.

ALA President Responds to the Administration’s 2018 budget proposal / District Dispatch

This morning, ALA issued a statement about the budget proposal released today. You can read it on ALA.org or below.

Julie Todaro response to the Trump budget proposal. “The Administration’s budget is using the wrong math when it comes to libraries.”

WASHINGTON, DC — In response to the Trump Administration’s 2018 budget proposal released today, American Library Association (ALA) President Julie Todaro issued the following statement:

“The Administration’s budget is using the wrong math when it comes to libraries.

“To those who say that the nation cannot afford federal library funding, the American Library Association, American businesses and millions of Americans say emphatically we cannot afford to be without it.

“America’s more than 120,000 public, school, academic and special libraries are visited more than 1.4 billion times a year by hundreds of millions of Americans in every corner of the nation. In 2013, 94 percent of Americans said that having a public library improves the quality of life in a community and the same percentage of parents said that libraries are important for their children.

“Over 80 major companies and trade associations from multiple sectors of the economy called libraries ‘critical national infrastructure’ in a letter to all Senators asking them to support the very agency and programs that the Administration has just proposed to effectively eliminate.

“We and those we serve will collaborate with our stakeholders, business allies and the more than one-third or more of all Members of Congress who have already pledged their support in writing to preserve critical library funding for FY 2018 through the Institute of Museum and Library Services and to save the agency itself, as well as other vital programs in other agencies that help millions of Americans.”

The post ALA President Responds to the Administration’s 2018 budget proposal appeared first on District Dispatch.

Islandora CLAW FAQ / Islandora

Last week was Islandoracon, our community's biggest gathering. We had a great week (and there will be more on that in another post), and a chance to unveil an early alpha version of the Islandora CLAW Minimum Viable Product. This first look at the product also kicked off a lot of questions, so we decided to gather them together with some answers:

When will Islandora CLAW be done?

Islandora CLAW won’t be done until it is deprecated in favor of whatever comes after it in the distant future. Islandora is an active community that constantly builds new tools and improves existing ones.

The Islandora CLAW MVP is scheduled for beta release at the end of June, 2017. The timeline for a full release will depend on community engagement and what features we map out together as necessary for the next phase.

The Islandora CLAW MVP does not do [thing that we really really need]. Are we going to be left behind?

The Islandora CLAW Minimum Viable Product is just a jumping-off point. Since we recognize that it can be challenging to review and comment meaningfully on a concept or a technical spec, the MVP version of CLAW is intended to give the Islandora community a tangible product to work with so that you can engage with the project and help to make sure your use cases are a part of the software as development continues.

Completing the MVP is a beginning for more community-driven development, with a very basic start on a product that the community can now test out and respond to.

How do I join in?

A good place to start is the CONTRIBUTING.md file included on all Islandora CLAW modules. It outlines how to submit a use case, feature request, improvement, or bug report. It also has details about our weekly meetings (‘CLAW Calls’), which are open for anyone to join.

While the meetings may seem very technical, we really mean it when we say anyone is welcome add items to the agenda. If we seem to spend most of our calls discussing very technical issues, that’s because we fall back on tickets and issues when no one has given us something more general to dig into. If you have questions or concerns, putting it on the agenda ensures that there is time and attention reserved for what you need to discuss.

You are also welcome to join the call and not say a thing. We take attendance, but that’s all the participation that’s required. If you would like to just listen to the discussion and get a feel for how things are going, lurking is a popular option, and a way that some very active contributors got their start.

You can also learn more about Islandora CLAW from these introductory pages:

Details of the MVP are here.

What is the MODSPOLCALYPSE? Are we losing MODS in CLAW?

The term “MODSPOCALYPSE” is an exaggeration made in jest about the fact that Islandora CLAW will have to deal with legacy MODS XML in a linked data/RDF world. While CLAW handles RDF as its native language (like Fedora 4), MODS is doable if we put in the work. The challenge is in mapping MODS to RDF, and that’s something we need to do as a community. If we can come together and agree on a standard mapping, the technical implementation will be relatively easy

Because this is not just an issue for Islandora, lot of work has already been done by the MODS and RDF Descriptive Metadata Subgroup in the Hydra community. To help achieve this vital mapping, please join as the Islandora Metadata Interest Group takes the lead on community discussions for Islandora.

Instead of a MODSPOCALYPSE, let’s consider this our “RDFnaissance.”

Will we have XML Form Builder in Islandora CLAW?

XML Form Builder is an amazing tool that plays an important role in Islandora 7.x. It is also an extremely complex tool that carries a significant maintenance burden that is challenging to meet even in the 7.x stack. Reproducing it in Islandora CLAW is unlikely to happen unless an institution or group in the community adopts it as a project and donates the work to the Islandora Foundation.

Editable metadata forms are definitely going to continue to be a part of Islandora CLAW. They are being handled in Drupal, which should be a more sustainable and accessible approach for both developers and end-users.

How long will Islandora 7.x be supported?

Islandora 7.x will be supported as long as the Islandora community needs for it to be supported. The goal of developing CLAW is not to push adoption, but to prepare for it when the majority of the Islandora community wants to move. As with other major upgrades we’ve been through, we will likely see a few institutions lead the way with early adoption, with a gradual migration of other sites as more tools are built and the path to migrate is mapped out by those trailblazers. The time to officially end support for Islandora 7.x will be when most of the Islandora community is done with it, just as we did with 6.x.

It’s also important to note that “ending support” does not mean it cannot still be used. We will (eventually, well down the road) end active development of new features and improvements, and then bug fixes on longer timeline, but there are still many Islandora 6.x sites out in the world more than three years after we officially ended its support. Fedora 3 is itself no longer supported by its community, but it remains a stable platform that hasn’t become less stable for no longer being actively improved.

Deployment of ORCID Integration Services / DuraSpace News

From Emilio Lorenzo, Arvo Consulting  With the development of tight-integration functionalities between DSpace and orcid.org, repositories can obtain numerous advantages by improving data consistency in key information systems. These developments help repositories lower the barriers of ORCID integration and take-up.

Fine-tuning a Python wrapper for the hypothes.is web API and other #ianno17 followup / Raymond Yee

In anticipation of #ianno17 Hack Day, I wrote about my plans for the event, one of which was to revisit my own Python wrapper for the nascent hypothes.is web API.

Instead of spending much time on my own wrapper, I spent most of the day working with Jon Udell's wrapper for the API. I've been working on my own revisions of the library but haven't yet incorporated Jon's latest changes.

One nice little piece of the puzzle is that I learned how to introduce retries and exponential backoff into the library, thanks to a hint from Nick Stenning and a nice answer on Stackoverflow .

Other matters

In addition to the Python wrapper, there are other pieces of follow-up for me. I hope to write more extensively on those matters down the road but simply note those topics for the moment.

Videos from the conference

I might start by watching videos from #ianno17 conference: I Annotate 2017 – YouTube. Because I didn't attend the conference per se, I might glean insight into two particular topics of interest to me (the role of page owner in annotations and the intermingling of annotations in ebooks.)

An extension for embedding selectors in the URL

I will study and try Treora/precise-links: Browser extension to support Web Annotation Selectors in URIs. I've noticed that the same annotation is shown in two related forms:

Does the precise-links extension let me write the selectors into the URL?

How to Price 3D Printing Service Fees / Bohyun Kim

** This post was originally published in ACRL TechConnect on May. 22, 2017.***

Many libraries today provide 3D printing service. But not all of them can afford to do so for free. While free 3D printing may be ideal, it can jeopardize the sustainability of the service over time. Nevertheless, many libraries tend to worry about charging service fees.

In this post, I will outline how I determined the pricing schema for our library’s new 3D Printing service in the hope that more libraries will consider offering 3D printing service if having to charge the fee is a factor stopping them. But let me begin with libraries’ general aversion to fees.

A 3D printer in action at the Health Sciences and Human Services Library (HS/HSL), Univ. of Maryland, Baltimore

Service Fees Are Not Your Enemy

Charging fees for the library’s service is not something librarians should regard as a taboo. We live in the times in which a library is being asked to create and provide more and more new and innovative services to help users successfully navigate the fast-changing information landscape. A makerspace and 3D printing are certainly one of those new and innovative services. But at many libraries, the operating budget is shrinking rather than increasing. So, the most obvious choice in this situation is to aim for cost-recovery.

It is to be remembered that even when a library aims for cost-recovery, it will be only partial cost-recovery because there is a lot of staff time and expertise that is spent on planning and operating such new services. Libraries should not be afraid to introduce new services requiring service fees because users will still benefit from those services often much more greatly than a commercial equivalent (if any). Think of service fees as your friend. Without them, you won’t be able to introduce and continue to provide a service that your users need. It is a business cost to be expected, and libraries will not make profit out of it (even if they try).

Still bothered? Almost every library charges for regular (paper) printing. Should a library rather not provide printing service because it cannot be offered for free? Library users certainly wouldn’t want that.

Determining Your Service Fees

What do you need in order to create a pricing scheme for your library’s 3D printing service?

(a) First, you need to list all cost-incurring factors. Those include (i) the equipment cost and wear and tear, (ii) electricity, (iii) staff time & expertise for support and maintenance, and (iv) any consumables such as 3d print filament, painter’s tape. Remember that your new 3D printer will not last forever and will need to be replaced by a new one in 3-5 years.

Also, some of these cost-incurring factors such as staff time and expertise for support is fixed per 3D print job. On the other hand, another cost-incurring factor, 3D print filament, for example, is a cost factor that increases in proportion to the size/density of a 3d model that is printed. That is, the larger and denser a 3d print model is, the more filament will be used incurring more cost.

(b) Second, make sure that your pricing scheme is readily understood by users. Does it quickly give users a rough idea of the cost before their 3D print job begins? An obscure pricing scheme can confuse users and may deter them from trying out a new service. That would be bad user experience.

Also in 3D printing, consider if you will also charge for a failed print. Perhaps you do. Perhaps you don’t. Maybe you want to charge a fee that is lower than a successful print. Whichever one you decide on, have that covered since failed prints will certainly happen.

(c) Lastly, the pricing scheme should be easily handled by the library staff. The more library staff will be involved in the entire process of a library patron using the 3D printing service from the beginning to the end, the more important this becomes. If the pricing scheme is difficult for the staff to work with when they need charge for and process each 3D print job, the new 3D printing service will increase their workload significantly.

Which staff will be responsible for which step of the new service? What would be the exact tasks that the staff will need to do? For example, it may be that several staff at the circulation desk need to learn and handle new tasks involving the 3D printing service, such as labeling and putting away completed 3D models, processing the payment transaction, delivering the model, and marking the job status for the paid 3D print job as ‘completed’ in the 3D Printing Staff Admin Portal if there is such a system in place. Below is the screenshot of the HS/HSL 3D Printing Staff Admin Portal developed in-house by the library IT team.

The HS/HSL 3D Printing Staff Admin Portal, University of Maryland, Baltimore

Examples – 3D Printing Service Fees

It’s always helpful to see how other libraries are doing when you need to determine your own pricing scheme. Here are some examples that shows ten libraries’ 3D printing pricing scheme changed over the recent three years.

  • UNR DeLaMare Library
    • https://guides.library.unr.edu/3dprinting
    • 2014 – $7.20 per cubic inch of modeling material (raised to $8.45 starting July, 2014).
    • 2017 – uPrint – Model Material: $4.95 per cubic inch (=16.38 gm=0.036 lb)
    • 2017 – uPrint – Support Materials: $7.75 per cubic inch
  • NCSU Hunt Library
    • https://www.lib.ncsu.edu/do/3d-printing
    • 2014-  uPrint 3D Printer: $10 per cubic inch of material (ABS), with a $5 minimum
    • 2014 – MakerBot 3D Printer: $0.35 per gram of material (PLA), with a $5 minimum
    • 2017 – uPrint – $10 per cubic inch of material, $5 minimum
    • 2017 – F306 – $0.35 per gram of material, $5 minimum
  • Southern Illinois University Library
    • http://libguides.siue.edu/3D/request
    • 2014 – Originally $2 per hour of printing time; Reduced to $1 as the demand grew.
    • 2017 – Lulzbot Taz 5, Luzbot mini – $2.00 per hour of printing time.
  • BYU Library
  • University of Michigan Library
    • The Cube 3D printer checkout is no longer offered.
    • 2017 – Cost for professional 3d printing service; Open access 3d printing is free.
  • GVSU Library
  • University of Tennessee, Chattanooga Library
  • Port Washington Public library
  • Miami University
    • 2014 – $0.20 per gram of the finished print; 2017 – ?
  • UCLA Library, Dalhousie University Library (2014)
    • Free

Types of 3D Printing Service Fees

From the examples above, you will notice that many 3d printing service fee schemes are based upon the weight of a 3D-print model. This is because these libraries are trying recover the cost of the 3d filament, and the amount of filament used is most accurately reflected in the weight of the resulting 3D-printed model.

However, there are a few problems with the weight-based 3D printing pricing scheme. First, it is not readily calculable by a user before the print job, because to do so, the user will have to weigh a model that s/he won’t have until it is 3D-printed. Also, once 3D-printed, the staff will have to weigh each model and calculate the cost. This is time-consuming and not very efficient.

For this reason, my library considered an alternative pricing scheme based on the size of a 3D model. The idea was that we will have roughly three different sizes of an empty box – small, medium, and large –  with three different prices assigned. Whichever box into which a user’s 3d printed object fits will determine how much the user will pay for her/his 3D-printed model. This seemed like a great idea because it is easy to determine how much a model will cost to 3d-print to both users and the library staff in comparison to the weight-based pricing scheme.

Unfortunately, this size-based pricing scheme has a few significant flaws. A smaller model may use more filament than a larger model if it is denser (meaning the higher infill ratio). Second, depending on the shape of a model, a model that fits  in a large box may use much less filament than the one that fits in a small box. Think about a large tree model with think branches. Then compare that with a 100% filled compact baseball model that fits into a smaller box than the tree model does. Thirdly, the resolution that determines a layer height may change the amount of filament used even if what is 3D-printed is a same model.

Different infill ratios – Image from https://www.packtpub.com/sites/default/files/Article-Images/9888OS_02_22.png

Charging Based upon the 3D Printing Time

So we couldn’t go with the size-based pricing scheme. But we did not like the problems of the weight-based pricing scheme, either. As an alternative, we decided to go with the time-based pricing scheme because printing time is proportionate to how much filament is used, but it does not require that the staff weigh the model each time. A 3D-printing software gives an estimate of the printing time, and most 3D printers also display actual printing time for each model printed.

First, we wanted to confirm the hypothesis that 3D printing time and the weight of the resulting model are proportionate to each other. I tested this by translating the weight-based cost to the time-based cost based upon the estimated printing time and the estimated weight of several cube models. Here is the result I got using the Makerbot Replicator 2X.

  • 9.10 gm/36 min= 0.25 gm per min.
  • 17.48 gm/67 min= 0.26 gm per min.
  • 30.80 gm/117 min= 0.26 gm per min.
  • 50.75 gm/186 min=0.27 gm per min.
  • 87.53 gm/316 min= 0.28 gm per min.
  • 194.18 gm/674 min= 0.29 gm per min.

There is some variance, but the hypothesis holds up. Based upon this, now let’s calculate the 3d printing cost by time.

3D plastic filament is $48 for ABS/PLA and $65 for the dissolvable per 0.90 kg  (=2.00 lb) from Makerbot. That means that filament cost is $0.05 per gram for ABS/PLA and $0.07 per gram for the dissolvable. So, 3D filament cost is 6 cents per gram on average.

Finalizing the Service Fee for 3D Printing

For an hour of 3D printing time, the amount of filament used would be 15.6 gm (=0.26 x 60 min). This gives us the filament cost of 94 cents per hour of 3D printing (=15.6 gm x 6 cents). So, for the cost-recovery of filament only, I get roughly $1 per hour of 3D printing time.

Earlier, I mentioned that filament is only one of the cost-incurring factors for the 3D printing service. It’s time to bring in those other factors, such as hardware wear/tear, staff time, electricity, maintenance, etc., plus “no-charge-for-failed-print-policy,” which was adopted at our library. Those other factors will add an additional amount per 3D print job. And at my library, this came out to be about $2. (I will not go into details about how these have been determined because those will differ at each library.) So, the final service fee for our new 3D printing service was set to be $3 up to 1 hour of 3D printing + $1 per additional hour of 3D printing. The $3 is broken down to $1 per hour of 3D printing that accounts for the filament cost and $2 fixed cost for every 3D print job.

To help our users to quickly get an idea of how much their 3D print job will cost, we have added a feature to the HS/HSL 3D Print Job Submission Form online. This feature automatically calculates and displays the final cost based upon the printing time estimate that a user enters.

 

The HS/HSL 3D Print Job Submission form, University of Maryland, Baltimore

Don’t Be Afraid of Service Fees

I would like to emphasize that libraries should not be afraid to set service fees for new services. As long as they are easy to understand and the staff can explain the reasons behind those service fees, they should not be a deterrent to a library trying to introduce and provide a new innovative service.

There are clear benefits to running through all cost-incurring factors and communicating how the final pricing scheme was determined (including the verification of the hypothesis that 3D printing time and the weight of the resulting model are proportionate to each other) to all library staff who will be involved in the new 3D printing service. If any library user inquire about or challenges the service fee, the staff will be able to provide a reasonable explanation on the spot.

I have implemented this pricing scheme at the same time as the launch of my library’s makerspace (the HS/HSL Innovation Space at the University of Maryland, Baltimore – http://www.hshsl.umaryland.edu/services/ispace/) back in April 2015. We have been providing 3D printing service and charging for it for more than two years. I am happy to report that during that entire duration, we have not received any complaint about the service fee. No library user expected our new 3D printing service to be free, and all comments that we received regarding the service fee were positive. Many expressed a surprise at how cheap our 3D printing service is and thanked us for it.

To summarize, libraries should be willing to explore and offer new innovating services even when they require charging service fees. And if you do so, make sure that the resulting pricing scheme for the new service is (a) sustainable and accountable, (b) readily graspable by users, and (c) easily handled by the library staff who will handle the payment transaction. Good luck and happy 3D printing at your library!

An example model with the 3D printing cost and the filament info displayed at the HS/HSL, University of Maryland, Baltimore

How to Price 3D Printing Service Fees / ACRL TechConnect

Many libraries today provide 3D printing service. But not all of them can afford to do so for free. While free 3D printing may be ideal, it can jeopardize the sustainability of the service over time. Nevertheless, many libraries tend to worry about charging service fees.

In this post, I will outline how I determined the pricing schema for our library’s new 3D Printing service in the hope that more libraries will consider offering 3D printing service if having to charge the fee is a factor stopping them. But let me begin with libraries’ general aversion to fees.

A 3D printer in action at the Health Sciences and Human Services Library (HS/HSL), Univ. of Maryland, Baltimore

Service Fees Are Not Your Enemy

Charging fees for the library’s service is not something librarians should regard as a taboo. We live in the times in which a library is being asked to create and provide more and more new and innovative services to help users successfully navigate the fast-changing information landscape. A makerspace and 3D printing are certainly one of those new and innovative services. But at many libraries, the operating budget is shrinking rather than increasing. So, the most obvious choice in this situation is to aim for cost-recovery.

It is to be remembered that even when a library aims for cost-recovery, it will be only partial cost-recovery because there is a lot of staff time and expertise that is spent on planning and operating such new services. Libraries should not be afraid to introduce new services requiring service fees because users will still benefit from those services often much more greatly than a commercial equivalent (if any). Think of service fees as your friend. Without them, you won’t be able to introduce and continue to provide a service that your users need. It is a business cost to be expected, and libraries will not make profit out of it (even if they try).

Still bothered? Almost every library charges for regular (paper) printing. Should a library rather not provide printing service because it cannot be offered for free? Library users certainly wouldn’t want that.

Determining Your Service Fees

What do you need in order to create a pricing scheme for your library’s 3D printing service?

(a) First, you need to list all cost-incurring factors. Those include (i) the equipment cost and wear and tear, (ii) electricity, (iii) staff time & expertise for support and maintenance, and (iv) any consumables such as 3d print filament, painter’s tape. Remember that your new 3D printer will not last forever and will need to be replaced by a new one in 3-5 years.

Also, some of these cost-incurring factors such as staff time and expertise for support is fixed per 3D print job. On the other hand, another cost-incurring factor, 3D print filament, for example, is a cost factor that increases in proportion to the size/density of a 3d model that is printed. That is, the larger and denser a 3d print model is, the more filament will be used incurring more cost.

(b) Second, make sure that your pricing scheme is readily understood by users. Does it quickly give users a rough idea of the cost before their 3D print job begins? An obscure pricing scheme can confuse users and may deter them from trying out a new service. That would be bad user experience.

Also in 3D printing, consider if you will also charge for a failed print. Perhaps you do. Perhaps you don’t. Maybe you want to charge a fee that is lower than a successful print. Whichever one you decide on, have that covered since failed prints will certainly happen.

(c) Lastly, the pricing scheme should be easily handled by the library staff. The more library staff will be involved in the entire process of a library patron using the 3D printing service from the beginning to the end, the more important this becomes. If the pricing scheme is difficult for the staff to work with when they need charge for and process each 3D print job, the new 3D printing service will increase their workload significantly.

Which staff will be responsible for which step of the new service? What would be the exact tasks that the staff will need to do? For example, it may be that several staff at the circulation desk need to learn and handle new tasks involving the 3D printing service, such as labeling and putting away completed 3D models, processing the payment transaction, delivering the model, and marking the job status for the paid 3D print job as ‘completed’ in the 3D Printing Staff Admin Portal if there is such a system in place. Below is the screenshot of the HS/HSL 3D Printing Staff Admin Portal developed in-house by the library IT team.

The HS/HSL 3D Printing Staff Admin Portal, University of Maryland, Baltimore

Examples – 3D Printing Service Fees

It’s always helpful to see how other libraries are doing when you need to determine your own pricing scheme. Here are some examples that shows ten libraries’ 3D printing pricing scheme changed over the recent three years.

  • UNR DeLaMare Library
    • https://guides.library.unr.edu/3dprinting
    • 2014 – $7.20 per cubic inch of modeling material (raised to $8.45 starting July, 2014).
    • 2017 – uPrint – Model Material: $4.95 per cubic inch (=16.38 gm=0.036 lb)
    • 2017 – uPrint – Support Materials: $7.75 per cubic inch
  • NCSU Hunt Library
    • https://www.lib.ncsu.edu/do/3d-printing
    • 2014- uPrint 3D Printer: $10 per cubic inch of material (ABS), with a $5 minimum
    • 2014 – MakerBot 3D Printer: $0.35 per gram of material (PLA), with a $5 minimum
    • 2017 – uPrint – $10 per cubic inch of material, $5 minimum
    • 2017 – F306 – $0.35 per gram of material, $5 minimum
  • Southern Illinois University Library
    • http://libguides.siue.edu/3D/request
    • 2014 – Originally $2 per hour of printing time; Reduced to $1 as the demand grew.
    • 2017 – Lulzbot Taz 5, Luzbot mini – $2.00 per hour of printing time.
  • BYU Library
  • University of Michigan Library
    • The Cube 3D printer checkout is no longer offered.
    • 2017 – Cost for professional 3d printing service; Open access 3d printing is free.
  • GVSU Library
  • University of Tennessee, Chattanooga Library
  • Port Washington Public library
  • Miami University
    • 2014 – $0.20 per gram of the finished print; 2017 – ?
  • UCLA Library, Dalhousie University Library (2014)
    • Free

Types of 3D Printing Service Fees

From the examples above, you will notice that many 3d printing service fee schemes are based upon the weight of a 3D-print model. This is because these libraries are trying recover the cost of the 3d filament, and the amount of filament used is most accurately reflected in the weight of the resulting 3D-printed model.

However, there are a few problems with the weight-based 3D printing pricing scheme. First, it is not readily calculable by a user before the print job, because to do so, the user will have to weigh a model that s/he won’t have until it is 3D-printed. Also, once 3D-printed, the staff will have to weigh each model and calculate the cost. This is time-consuming and not very efficient.

For this reason, my library considered an alternative pricing scheme based on the size of a 3D model. The idea was that we will have roughly three different sizes of an empty box – small, medium, and large – with three different prices assigned. Whichever box into which a user’s 3d printed object fits will determine how much the user will pay for her/his 3D-printed model. This seemed like a great idea because it is easy to determine how much a model will cost to 3d-print to both users and the library staff in comparison to the weight-based pricing scheme.

Unfortunately, this size-based pricing scheme has a few significant flaws. A smaller model may use more filament than a larger model if it is denser (meaning the higher infill ratio). Second, depending on the shape of a model, a model that fits in a large box may use much less filament than the one that fits in a small box. Think about a large tree model with think branches. Then compare that with a 100% filled compact baseball model that fits into a smaller box than the tree model does. Thirdly, the resolution that determines a layer height may change the amount of filament used even if what is 3D-printed is a same model.

Different infill ratios – Image from https://www.packtpub.com/sites/default/files/Article-Images/9888OS_02_22.png

Charging Based upon the 3D Printing Time

So we couldn’t go with the size-based pricing scheme. But we did not like the problems of the weight-based pricing scheme, either. As an alternative, we decided to go with the time-based pricing scheme because printing time is proportionate to how much filament is used, but it does not require that the staff weigh the model each time. A 3D-printing software gives an estimate of the printing time, and most 3D printers also display actual printing time for each model printed.

First, we wanted to confirm the hypothesis that 3D printing time and the weight of the resulting model are proportionate to each other. I tested this by translating the weight-based cost to the time-based cost based upon the estimated printing time and the estimated weight of several cube models. Here is the result I got using the Makerbot Replicator 2X.

  • 9.10 gm/36 min= 0.25 gm per min.
  • 17.48 gm/67 min= 0.26 gm per min.
  • 30.80 gm/117 min= 0.26 gm per min.
  • 50.75 gm/186 min=0.27 gm per min.
  • 87.53 gm/316 min= 0.28 gm per min.
  • 194.18 gm/674 min= 0.29 gm per min.

There is some variance, but the hypothesis holds up. Based upon this, now let’s calculate the 3d printing cost by time.

3D plastic filament is $48 for ABS/PLA and $65 for the dissolvable per 0.90 kg (=2.00 lb) from Makerbot. That means that filament cost is $0.05 per gram for ABS/PLA and $0.07 per gram for the dissolvable. So, 3D filament cost is 6 cents per gram on average.

Finalizing the Service Fee for 3D Printing

For an hour of 3D printing time, the amount of filament used would be 15.6 gm (=0.26 x 60 min). This gives us the filament cost of 94 cents per hour of 3D printing (=15.6 gm x 6 cents). So, for the cost-recovery of filament only, I get roughly $1 per hour of 3D printing time.

Earlier, I mentioned that filament is only one of the cost-incurring factors for the 3D printing service. It’s time to bring in those other factors, such as hardware wear/tear, staff time, electricity, maintenance, etc., plus “no-charge-for-failed-print-policy,” which was adopted at our library. Those other factors will add an additional amount per 3D print job. And at my library, this came out to be about $2. (I will not go into details about how these have been determined because those will differ at each library.) So, the final service fee for our new 3D printing service was set to be $3 up to 1 hour of 3D printing + $1 per additional hour of 3D printing. The $3 is broken down to $1 per hour of 3D printing that accounts for the filament cost and $2 fixed cost for every 3D print job.

To help our users to quickly get an idea of how much their 3D print job will cost, we have added a feature to the HS/HSL 3D Print Job Submission Form online. This feature automatically calculates and displays the final cost based upon the printing time estimate that a user enters.

 

The HS/HSL 3D Print Job Submission form, University of Maryland, Baltimore

Don’t Be Afraid of Service Fees

I would like to emphasize that libraries should not be afraid to set service fees for new services. As long as they are easy to understand and the staff can explain the reasons behind those service fees, they should not be a deterrent to a library trying to introduce and provide a new innovative service.

There are clear benefits to running through all cost-incurring factors and communicating how the final pricing scheme was determined (including the verification of the hypothesis that 3D printing time and the weight of the resulting model are proportionate to each other) to all library staff who will be involved in the new 3D printing service. If any library user inquire about or challenges the service fee, the staff will be able to provide a reasonable explanation on the spot.

I have implemented this pricing scheme at the same time as the launch of my library’s makerspace (the HS/HSL Innovation Space at the University of Maryland, Baltimore – http://www.hshsl.umaryland.edu/services/ispace/) back in April 2015. We have been providing 3D printing service and charging for it for more than two years. I am happy to report that during that entire duration, we have not received any complaint about the service fee. No library user expected our new 3D printing service to be free, and all comments that we received regarding the service fee were positive. Many expressed a surprise at how cheap our 3D printing service is and thanked us for it.

To summarize, libraries should be willing to explore and offer new innovating services even when they require charging service fees. And if you do so, make sure that the resulting pricing scheme for the new service is (a) sustainable and accountable, (b) readily graspable by users, and (c) easily handled by the library staff who will handle the payment transaction. Good luck and happy 3D printing at your library!

An example model with the 3D printing cost and the filament info displayed at the HS/HSL, University of Maryland, Baltimore

OKI Agile: Picking/Designing a Methodology / Open Knowledge Foundation

This is the seond in a series of blogs on how we are using the Agile methodology at Open Knowledge International. Originating from software development, the Agile manifesto describes a set of principles that prioritise agility in work processes: for example through continuous development, self-organised teams with frequent interactions and quick responses to change (http://agilemanifesto.org). In this blogging series we go into the different ways Agile can be used to work better in teams and to create more efficiency in how to deliver projects. The first post dealt with user stories: this time we go into methodologies.

More efficiency in project delivery is the name of the game. Working together in teams or with other people requires us to put in places methods and methodologies to accomplish that. A shared set of methods that allows team members to walk into a project and start delivering as soon as possible.

Glossary

  • Method – A systematic procedure.
  • Methodology – A series of related methods or techniques.
  • Methodology size – Number of methods/techniques used in the methodology.
  • Methodology density – Amount of precision/checkpoints needed. More density is a higher ceremony methodology (more formal one)
  • Criticality – The nature of damage of undetected defects (impact if we forget something). Higher criticality means worse impact.

What is a methodology

A methodology consists of 10 elements, of which one, the team values, permeates all of them. Let’s first put them in a pretty picture and then list them out with descriptions:

  • Team values – What the team strives for, how they’d like to communicate and work together. The values affect each element of the methodology so different values create different methodologies.
  • Roles – It’s best to think of this as the job descriptions we’d put into the ads when you need to hire more staff.
  • Skills – The skills needed for the roles we need.
  • Team – This is the group of people that will tackle a project and what roles they have.
  • Tools – The tools people use either within a technique/method or to produce a deliverable according to the standard.
  • Techniques – The methods used to get stuff done (generate work product), this can be everything from work changing techniques like breaking up into sprints (time blocks for work), to games played to achieve a specific output like planning poker (for estimates), to just a description of what is done to make things happen like “write a blog post”.
  • Activities – The meetings, reviews, milestones and other things people do or attend. The most obvious activity is “create deliverable” or something like that, but the more interesting activities are the events that take place.
  • Standards – Description of what is permitted and not permitted in the work product. These can be standards such as what programming language or security measures to use, how management is handled or how decisions get made (e.g. RASCI) and other project conventions.
  • Work Products – Not only the final product, this is also the internal products, i.e. what each person or team hands over to another person or team, something like user stories or mockups.
  • Quality – Often not considered explicitly but these are the rules and concerns that need to be tracked for each deliverable (work product). This could be a part of activities but it’s so important that it’s better to split it out.

Principles for picking/designing methodologies

There is no one size fits all methodology. There are methods one reuses between them (the shared set of methods/techniques people know about) and probably a default set one uses in absence of something more fitting. However, in general one needs to think about and pick/design methodologies based on two things:

  1. The project itself
  2. The number of people on the project team

The nature of the project calls for different methodologies, some projects can get away with a very lightweight methodology, where it won’t be the end of the world if something is forgotten, while others call for more publicly visible correctness where bad things happen if things are forgotten. For example, paying salaries requires a methodology with more visible checkpoints and correctness than responding to Tweets. People might lose their houses if they don’t get paid, but nobody will be out on the street for missing a possibility to retweet.

A lot changes based on the number of people involved. Communications become harder to do and the effectiveness of individuals decreases as a result the methodology must get bigger to tackle all of these:

Picking/designing a methodology has to be based on these four principles (they have to be kept in mind so because they aren’t all achievable always, notably number 4 in our case):

  1. Bigger teams call for a bigger methodology
  2. More critical projects call for more methodology density (publicly visible correctness)
  3. Cost comes with weight (a small increase in methodology adds large amount of cost of the project)
  4. The most effective communication is face to face and interactive (everyone participates instead of just doing a broadcast).

Agile development

We have agreed on adopting the agile values (slightly adapted from the agile software values) as our team values where we can. That means we value:

  • Individuals and interactions over processes and tools
  • Working stuff over comprehensive documentation
  • Customer collaboration over contract negotiation
  • Responding to change over following a plan

We value everything that’s mentioned above, we just value the things on the left (the bold) more than the things on the right. There are also 12 principles we try to follow as much as we can:

  1. Our highest priority is to satisfy the customer through early and continuous delivery of valuable things.
  2. Welcome changing requirements, even late in development. Agile processes harness change for the customer’s competitive advantage.
  3. Deliver working stuff frequently, from a couple of weeks to a couple of months, with a preference to the shorter timescale.
  4. Business people and implementers must work together daily throughout the project.
  5. Build projects around motivated individuals. Give them the environment and support they need, and trust them to get the job done.
  6. The most efficient and effective method of conveying information to and within a team is face-to-face conversation.
  7. Working stuff is the primary measure of progress.
  8. Agile processes promote sustainable development. The sponsors, developers, and users should be able to maintain a constant pace indefinitely.
  9. Continuous attention to excellence and good design enhances agility.
  10. Simplicity–the art of maximizing the amount of work not done–is essential.
  11. The best architectures, requirements, and designs emerge from self-organizing teams.
  12. At regular intervals, the team reflects on how to become more effective, then tunes and adjusts its behavior accordingly.

As an encouragement and because you were just bombarded with a long list of things and a discussion about planning how we plan work, here’s something from XKCD to keep in mind as we dive into methodologies:

Wikidata documentation on the 2017 Hackathon in Vienna / Jakob Voss

At Wikimedia Hackathon 2017, a couple of volunteers sat together to work on the help pages of Wikidata. As part of that Wikidata documentation sprint. Ziko and me took a look at the Wikidata glossary. We identified several shortcomings and made a list of rules how the glossary should look like. The result are the glossary guidelines. Where the old glossary partly replicated Wikidata:Introduction, the new version aims to allow quick lookup of concepts. We already rewrote some entries of the glossary according to these guidelines but several entries are outdated and need to be improved still. We changed the structure of the glossary into a sortable table so it can be displayed as alphabetical list in all languages. The entries can still be translated with the translation system (it took some time to get familiar with this feature).

We also created some missing help pages such as Help:Wikimedia and Help:Wikibase to explain general concepts with regard to Wikidata. Some of these concepts are already explained elsewhere but Wikidata needs at least short introductions especially written for Wikidata users.

Image taken by Andrew Lih (CC-BY-SA)

Small-Batch Memories of The Unready / Hugh Rundle

Small-Batch Memories of The Unready

In 1016, Æthelred The Unready died, to be replaced as King of England by the invading Cnut the Great. In 1017, Cnut married Æthelred’s widow, Emma of Normandy, and divided the kingdom into the four Earldoms of Wessex, Mercia, East Anglia and Northumbria. A thousand years later, names like Æthelred and Cnut sound distinctly un-English, and the world is utterly changed. The English may still fear foreigners arriving from across the Channel, but the Vikings are long gone.

3017

Will anyone remember our stories in the year 3017, and will they too feel that we are like aliens, far removed from the reality in which they now live? Like Edward Shaddow, my mind takes a turn towards the post-apocalyptic when thinking about what life might be like a thousand years hence. I’m not optimistic about the future of humans, given our long history of dealing poorly with environmental degradation, and our utter failure to make any meaningful progress towards stopping or even slowing catastrophic climate change. But yesterday’s newCardigan Cardi Party with Cory Doctorow encouraged me to think that, perhaps, there might be some hope after all.

Cory talked about John Maynard Keynes’ prediction of a fifteen-hour working week by (* checks watch *) about now. His point was that Keyne’s was actually right - he simply didn’t predict that our desires would increase such that we are no longer satisfied with the lifestyle of a well-to-do 1930s European or American. So whilst I tend to image a kind of Tank Girl future of repression and chronic water wars, perhaps this is wide of the mark. Maybe the legacy of our times will be a warning about where hubrus and individualism can take humanity, and our much wiser descendants will live full and satisfying lives living mostly in harmony. Just without jetpacks or flying cars. The recent news that the Svalbard seed bank has already flooded as the warming climate causes permafrost to melt is yet another warning that for all our technological skills, there’s no guarantee that anything from our cultural and scientific storehouses will survive in 3017. And yet, cultural and scientific knowledge has passed down generations over long stretches of human time. Whether it’s Australian Aboriginal Star Maps, Japanese Tsunami Stones or simply family heirlooms, doing GLAM in a small and decentralised manner can be surprisingly effective.

Ubiquity and Abundance

The projects I find most exciting and intriguing - LOCKSS, IPFS, DIY squat archives, The Enspiral Network, the Internet itself - are all based on principles of decentralised networks, autonomy with connection, and local control. If I was going to wake up in a post-apocalyptic future, I’d want to be in the Enspiral kibbutz. In yesterday’s interview, Cory and Tom riffed on the idea of abundance, and the future Cory imagined in Walkaway where machines are always the best they can be, as opposed to today’s experience where “everyone has the eleventh-best drill in the world”. I like this idea, but it still feels a little bit too much like what we have today - not so much abundance, but rather ubiquity. A few years ago, every technology, business and economics writer was talking about the rise of personalisation. The idea was that with more advanced manufacturing techniques, the near future would be one where we would all order our own customised products and the mass-production system would someone manage to personalise each item to an individual consumer’s tastes. To the small extent that has come to pass, it’s a hollow sort of personalisation. What has been much more evident, at least in my First World hipster bubble, has been the rise of a sort of anti-mass production movement. From coffee to gin, vegetables to soap, “small batch” and “hand crafted” are the thing. More interestingly, whilst personalised mass production assumed that people were focussed on themselves, increasingly people are moving in the opposite direction - wanting to know who grew their coffee beans and sewed their shirt, or what kind of life the cow had before it turned into steak. It’s a sort of “shopping literacy” I suppose.

When I think of abundance, I go back to what Keynes wrote about - what he imagined as a three-hour workday but I prefer to imagine as a two-day working week. A world full of tinkerers, artists, storytellers, obsessives and bullshit artists. Sure, we’ll still need surgeons, electricians, and science laboratories, but most people could live as dilettantes - spending the majority of their time working out how to grow the most exquisite orchid, build a faster bicycle, or paint the perfect sunset. Or perhaps they will create the most thorough index, the most detailed catalogue, or simply the greatest gin ever distilled.

Small batches and long notes

Is the future of GLAM one of small-batch culture and long notes about the creator of each artefact? Will there be songlines to guide travellers between the archives? The question of what galleries, libraries, archives, and museums will look like a thousand years from now perhaps shouldn’t make us think of crystal storage that requires supercomputers to actually read, or 3D-printed Roman ruins. The important thing isn’t really the technology used or even the physical artefacts that survive - it’s the stories and lessons that are passed on. Apart from anything else, most of our current institutions are likely to be under the warm, acidic sea in a thousand years. All that will be left is stories of the people of 2017. We probably should get cracking on the world being imagined into being by groups like Enspiral, Unmonastery and Open Source Ecology because if we don't, I have a feeling I know what our descendants might call us.

The Unready.

"Privacy is dead, get over it" [updated] / David Rosenthal

I believe it was in 1999 that Scott McNealy famously said "privacy is dead, get over it". It is a whole lot deader now than it was then. A month ago in Researcher Privacy I discussed Sam Kome's CNI talk about the surveillance abilities of institutional network technology such as central wireless and access proxies. There's so much more to report on privacy that below the fold there can't be more than some suggested recent readings, as an update to my 6-month old post Open Access and Surveillance. [See a major update at the end]

There are four main types of entity motivated to violate your privacy:
  • Companies: who can monetize this information directly by selling it and indirectly by exploiting it in their internal business. Tim Wu's The Attention Merchants: The Epic Scramble to Get Inside Our Heads is a valuable overview of this process, as is Maciej Cegłowski's What Happens Next Will Amaze You.
  • Governments: both democratic and authoritarian governments at all levels from nations to cities are addicted to violating the privacy of citizens and non-citizens alike, ostensibly in order to "keep us safe", but in practice more to avoid loss of power. Parts of Wu's book cover this too, but it at least since Snowden's revelations it has rarely been far from the headlines.
  • Criminals: can be even more effective at monetizing your private information than companies.
  • Users: you are motivated to give up your privacy for trivial rewards:
    More than 70% of people would reveal their computer password in exchange for a bar of chocolate, a survey has found.

Companies

Cliff Lynch has a long paper up at First Monday entitled The rise of reading analytics and the emerging calculus of reader privacy in the digital world:
It discusses what data is being collected, to whom it is available, and how it might be used by various interested parties (including authors). I explore means of tracking what’s being read, who is doing the reading, and how readers discover what they read.
Many months ago Cliff asked me to review a draft, but the final version differs significantly from the draft I reviewed. Cliff divides the paper into four sections:
  1. Introduction: Who’s reading what, and who knows what you’re reading?
  2. Collecting data
  3. Exploiting data
  4. Some closing thoughts
You should read the whole thing, but here are a few tastes.

Cliff agrees in less dramatic language with Maciej Cegłowski's Haunted by Data, and his analogy between stored data and nuclear waste:
Those trying to protect reader privacy gradually realized that the best guarantee of such privacy was to collect as little data as possible, and to retain what had to be collected as briefly as possible. The hard won lesson: if it exists, it will ultimately be subpoenaed or seized, and used against readers in steadily less measured and discriminating ways over time.
Cliff notices, as did Sam Kome, that readers are now tracked at the page level:
One of the byproducts of this transformation is a major restructuring of ideas and assumptions about reader privacy in light of the availability of information about what is being read, who is reading it, and (a genuinely new development) exactly how it is being read, including the end to frustrating reliance upon purchase, borrowing, or downloading as surrogate indicators for actually reading the work in question. ... one might wish for more than sparse anecdote on the ways and extents to which very detailed data on how a given book is (or is not) read, and by whom, actually benefits the various interested parties: authors, publishers, retailers, platform providers, and even readers.
Cliff points out an important shift in the rhetoric about privacy:
Historically, most of the language has been about competing values and how they should be prioritized and balanced, using charged and emotional phrases: “reader privacy,” “intellectual freedom,” “national security,” “surveillance,” “accountability,” “protecting potential victims” ... These conversations are being supplanted by a sterile and anodyne, value-free discussion of “analytics:” reader analytics, learning analytics, etc. These are presented as tools that smart and responsible modern organizations are expected to employ; indeed, not doing analytics is presented as suggesting some kind of management failure or incompetence in many quarters. The operation of analytics systems, ... tends to shift discussions from whether data should be collected to what we can do with it, and further suggests that if we can do something with it, we should.
Privacy is among the reasons readers have for using ad-blockers; the majority of the bytes they eliminate are not showing you ads but implementing trackers. The Future of Ad Blocking: An Analytical Framework and New Techniques by Grant Storey, Dillon Reisman, Jonathan Mayer and Arvind Narayanan reports on several new ad-blocking technologies, including one based on laws against misleading advertising:
ads must be recognizable by humans due to legal requirements imposed on online advertising. Thus we propose perceptual ad blocking which works radically differently from current ad blockers. It deliberately ignores useful information in markup and limits itself to visually salient information, mimicking how a human user would recognize ads. We use lightweight computer vision techniques to implement such a tool and show that it defeats attempts to obfuscate the presence of ads.
They are optimistic that ad-blockers will win out:
Our second key observation is that even though publishers increasingly deploy scripts to detect and disable ad blocking, ad blockers run at a higher privilege level than such scripts, and hence have the upper hand in this arms race. We borrow ideas from rootkits to build a stealthy adblocker that evades detection. Our approach to hiding the presence and purpose of a browser extension is general and might be of independent interest.
I don't agree. The advent of DRM for the Web requires that the DRM implementation run at a higher privilege level than the ad-blocker, and that it prevent less-privileged code observing the rendered content (less it be copied). It is naive to think that advertisers will not notice and exploit this capability.

Governments

As usual, Maciej Cegłowski describes the situation aptly:
We're used to talking about the private and public sector in the real economy, but in the surveillance economy this boundary doesn't exist. Much of the day-to-day work of surveillance is done by telecommunications firms, which have a close relationship with government. The techniques and software of surveillance are freely shared between practitioners on both sides. All of the major players in the surveillance economy cooperate with their own country's intelligence agencies, and are spied on (very effectively) by all the others.
Steven Bellovin, Matt Blaze, Susan Landau and Stephanie Pell have a 101-page review of the problems caused by the legacy model of communication underlying surveillance law in the Harvard Journal of Law and Technology entitled Its Too Complicated: How The Internet Upends Katz, Smith and Electronic Surveillance Law. Its clearly important but I'm only a short way into it, I may have more to say about it later.

And this, of course, assumes that the government abides by the law. Marcy Wheeler disposes of that idea:
All of which is to say that the authority that the government has been pointing to for years to show how great Title VII is is really a dumpster fire of compliance problems.

And still, we know very little about how this authority is used.
and also:
one reason NSA analysts were collecting upstream data is because over three years after DOJ and ODNI had figured out analysts were breaking the rules because they forgot to exclude upstream from their search, they were still doing so. Overseers noted this back in 2013!

Criminals

The boundaries between government entities such as intelligence agencies and law enforcement and criminals have always been somewhat fluid. The difficulty of attributing activity on the Internet (also here) to specific actors has made them even more fluid:
Who did it? Attribution is fundamental. Human lives and the security of the state may depend on ascribing agency to an agent. In the context of computer network intrusions, attribution is commonly seen as one of the most intractable technical problems, as either solvable or not solvable, and as dependent mainly on the available forensic evidence. But is it? Is this a productive understanding of attribution? — This article argues that attribution is what states make of it.
The most important things to keep private are your passwords and PINs. They're the primary target for the bad guys, who can use them to drain your bank accounts. Dan Goodin at Ars Technica has an example of how incredibly hard it is to keep them secret. In Meet PINLogger, the drive-by exploit that steals smartphone PINs, he reports on Stealing PINs via mobile sensors: actual risk versus user perception by Maryam Mehrnezhad, Ehsan Toreini, Siamak F. Shahandashti and Feng Hao. Goodin writes:
The demonstrated keylogging attacks are most useful at guessing digits in four-digit PINs, with a 74-percent accuracy the first time it's entered and a 94-percent chance of success on the third try. ... The attacks require only that a user open a malicious webpage and enter the characters before closing it. The attack doesn't require the installation of any malicious apps.
Malvertising, using ad servers to deliver malware, is a standard technique for the bad guys, and this attack can use it:
Malicious webpages—or depending on the browser, legitimate sites serving malicious ads or malicious content through HTML-based iframe tags—can mount the attack by using standard JavaScript code that accesses motion and orientation sensors built into virtually all iOS and Android devices. To demonstrate how the attack would work, researchers from Newcastle University in the UK wrote attack code dubbed PINLogger.js. Without any warning or outward sign of what was happening, the JavaScript was able to accurately infer characters being entered into the devices.

"That means whenever you are typing private data on a webpage [with] some advert banners ... the advert provider as part of the page can 'listen in' and find out what you type in that page," ... "Or with some browsers as we found, if you open a page A and then another page B without closing page A (which most people do) page A in the background can listen in on what you type in page B."
The authors are pessimistic about blocking attacks using sensor data:
Access to mobile sensor data via JavaScript is limited to only a few sensors at the moment. This will probably expand in the future, specially with the rapid development of sensor-enabled devices in the Internet of things (IoT). ... Many of the suggested academic solutions either have not been applied by the industry as a practical solution, or have failed. Given the results in our user studies, designing a practical solution for this problem does not seem to be straightforward. ... After all, it seems that an extensive study is required towards designing a permission framework which is usable and secure at the same time. Such research is a very important usable security and privacy topic to be explored further in the future.
The point is not to focus on this particular channel, but to observe that it is essentially impossible to enumerate and block all the channels by which private information can leak from any computer connected to the Internet.

Users

Because it is effectively impossible for you to know what privacy risks you are running, you are probably the main violator of your privacy on the Internet, for two main reasons:
  • You have explicitly and implicitly agreed to Terms of Service (and here) that give up your privacy rights in return for access to content. Since the content probably isn't that important to you, your privacy can't be that important either.
  • You have not taken the simple precautions necessary to maintain privacy by being anonymous when using the Web. Techniques such as cookie syncing and browser fingerprinting mean that even using Tor isn't enough. Even though Tor obscures your IP address, if you're using the same browser as you did without Tor or when you logged in to a site, the site will know its you. Fortunately, there is a very simple way to avoid these problems. Tails (The Amnesic Incognito Live System) can be run from a USB flash drive or in a VM. Every time it starts up it is in a clean state. The browser looks the same to a Web site as every other Tails browser. Use it any time privacy is an issue, from watching pr0n to searching for medical information.
It is very sad that the responsibility for maintaining privacy rests on the shoulders of the individual, with essentially no support from the law, but everyone else finds your lack of privacy so useful and profitable that this situation isn't going to change. After all, The Panopticon Is Good For You.

Update:

At The Atlantic, Arvind Narayanan and Dillon Reisman's The Thinning Line Between Commercial and Government Surveillance reports:
As part of the Princeton Web Transparency and Accountability Project, we’ve been studying who tracks you online and how they do it. Here’s why we think the fight over browsing histories is vital to civil liberties and to a functioning democracy.

Privacy doesn’t merely benefit individuals; it fundamentally shapes how society functions. It is crucial for marginalized communities and for social movements, such as the fight for marriage equality and other once-stigmatized views. Privacy enables these groups to network, organize, and develop their ideas and platforms before challenging the status quo. But when people know they’re being tracked and surveilled, they change their behavior. This chilling effect hurts our intellectual freedoms and our capacity for social progress.
They stress the effectiveness of the tracking techniques I mentioned above:
Web tracking today is breathtaking in its scope and sophistication. There are hundreds of entities in the business of following you from site to site, and popular websites embed about 50 trackers on average that enable such tracking. We’ve also found that just about every new feature that’s introduced in web browsers gets abused in creative ways to “fingerprint” your computer or mobile device. Even identical looking devices tend to behave in subtly different ways, such as by supporting different sets of fonts. It’s as if each device has its own personality. This means that even if you clear your cookies or log out of a website, your device fingerprint can still give away who you are.
And that, even if used by companies, governments (and ISPs) can piggy-back on them:
Worse, the distinction between commercial tracking and government surveillance is thin and getting thinner. The satirical website The Onion once ran a story with this headline: “CIA's ‘Facebook’ Program Dramatically Cut Agency's Costs.” Reality isn’t far off. The Snowden leaks revealed that the NSA piggybacks on advertising cookies, and in a technical paper we showed that this can be devastatingly effective. Hacks and data breaches of commercial systems have also become a major part of the strategies of nation-state actors.
Ironically, The Atlantic's web-site is adding tracking information to their article's URL (note the 524592):
>https://www.theatlantic.com/technology/archive/2017/05/the-thinning-line-between-commercial-and-government-surveillance/524952/
and to the attributes of the links in it:
data-omni-click="r'article',r'link',r'6',r'524952'"
At Gizmodo, Kashmir Hill's Uber Doesn’t Want You to See This Document About Its Vast Data Surveillance System is a deep dive into the incredibly detailed information Uber's database maintains about each and every Uber user. It is based on information briefly revealed in a wrongful termination lawsuit, before Uber's lawyers got it sealed.
For two days in October, before Uber convinced the court to seal the material, one of Spangenberg’s filings that was publicly visible online included a spreadsheet listing more than 500 pieces of information that Uber tracks for each of its users. ...

For example, users give Uber access to their location and payment information; Uber then slices and dices that information in myriad ways. The company holds files on the GPS points for the trips you most frequently take; how much you’ve paid for a ride; how you’ve paid for a ride; how much you’ve paid over the past week; when you last canceled a trip; how many times you’ve cancelled in the last five minutes, 10 minutes, 30 minutes, and 300 minutes; how many times you’ve changed your credit card; what email address you signed up with; whether you’ve ever changed your email address.
Both articles are must-reads.

Introduction to Phabricator at Wikimedia Hackathon / Jakob Voss

This weekend I participate at Wikimedia Hackathon in Vienna. I mostly contribute to Wikidata related events and practice the phrase "long time no see", but I also look into some introductionary talks.

In the late afternoon of day one I attended an introduction to Phabricator project management tool given by André Klapper. Phabricator was introduced in Wikimedia Foundation about three years ago to replace and unify Bugzilla and several other management tools.

Phabricator is much more than an issue tracker for software projects (although it is mainly used for this purpose by Wikimedia developers). In summary there are tasks, projects, and teams. Tasks can be tagged, assigned, followed,discussed, and organized with milestones and workboards. The latter are Kanban-boards like those I know from Trello, waffle, and GitHub project boards.

Phabricator is Open Source so you can self-host it and add your own user management without having to pay for each new user and feature (I am looking at you, JIRA). Internally I would like to use Phabricator but for fully open projects I don’t see enough benefit compared to using GitHub.

P.S.: Wikimedia Hackathon is also organized with Phabricator. There is also a task for blogging about the event.

Evergreen 3.0 development update #6: feedback fest results / Evergreen ILS

Image from page 668 of “The American farmer. A complete agricultural library, with useful facts for the household, devoted to farming in all its departments and details” (1882). Image digitized by NCSU Libraries.

Since the previous update, another 30 patches have been committed to the master branch.

This was the week of the first feedback fest in the 3.0 release cycle. A total of 57 bugs were identified last week as having an active pull request but no signoff; an additional bug was add to the wiki page today. Of those 58 bugs, 43 received substantive feedback, and 17 of them had their patches merged.

I would like to acknowledge the following people who left feedback for fest bugs:

  • Galen Charlton
  • Jeff Davis
  • Bill Erickson
  • Jason Etheridge
  • Rogan Hamby
  • Kathy Lussier
  • Mike Rylander
  • Ben Shum
  • Jason Stephenson
  • Dan Wells

A special shout-out also goes to Andrea Neiman, who helped keep the fest’s wiki page up to date this week.

There was also a fair amount of activity outside of the feedback fest, including a number of bug reports filed by folks testing the web staff client.

Duck trivia

Ducks occasionally need to be rescued from Library of Congress buildings. To date, no duck has been known to request a book whose call number starts with QL696.A52.

Submissions

Updates on the progress to Evergreen 3.0 will be published every Friday until general release of 3.0.0. If you have material to contribute to the updates, please get them to Galen Charlton by Thursday morning.

LIL Talks: Seltzer! / Harvard Library Innovation Lab

In this week’s LIL talk, Matt Phillips gave us an effervescent presentation on Seltzer, followed by a tasting.

We tasted

  • Perrier – minerally, slightly salty, big bubbles with medium intensity
  • Saratoga – varied bubble size, clean… Paul says that this reminds him of typical German seltzers
  • Poland Springs – soft, smooth, sweet and clean
  • Gerolsteiner – Minerally with low carbonation
  • Borjomi – Graphite, very minerally, small bubbles, funk

Of course, throughout the conversation, we discussed the potential for the bottles affecting our opinions. We agreed that for a truly objective comparison, we’d transfer the samples to generic containers.

Though our tech and law talks are always educational and fun, our carbonated water talk was a refreshing change.

Evaluating Databases: Prioritizing on a Shoestring / LITA

Libraries have limited resources, and the portion afforded to electronic resources requires some savvy prioritization to meet patrons’ needs while sticking to budgets. Allocation of spending is a key issue for many libraries, and database subscriptions can cost thousands, even tens of thousands, of dollars. For smaller libraries, it’s possible to spend the electronic resources budget on a single amazing all-purpose database or piece together a collection from low-cost alternatives. What’s a librarian to do?

It’s important to note that there’s no right/wrong dichotomy in deciding which electronic resources are “best”; it’s always a matter of “best for the community”, i.e., the librarian’s rule of thumb: know thy service population. Does your library serve a population with a high unemployment rate? You may need to prioritize electronic resources focused on job training, skill-building, and resume writing. Are you situated in an area where students hang out after the school day? Consider electronic resources like educational games, homework helpers, and web-based tutoring. Are you nestled in the heart of an emerging tech boomtown? You might include resources on programming languages (reference sources, learning programs, etc).

Over the years, I’ve explored various sources – from my MLIS textbooks to library websites to blog posts – and here’s a list of preliminaries that I consider when I’m tasked with evaluating electronic resources for selection to serve my library’s community.

Content

In the same way I’d evaluate a print source, I consider the content of an electronic resource. Is it relevant to my community? What about scope – is the information comprehensive, or, if not, is it a good fit to fill gaps in depth on a topic of special interest to patrons in the community? Is it updated often with timely and reliable information? Does a database include full text content, abstracts, citations? Is there a print resource that’s more useful?

Functionality

The how is as important as the content of a resource (the what). I ask myself: how simple is it for a layperson to use? Is the interface user-friendly? Is the indexing accurate and thorough? What about search – how does the database handle truncation, search types, filters, alternate spellings, and search history? Is there a FAQ or tutorial to help users solve issues if they get stuck? Can they export and download materials? I’ve learned that these questions can be important in how valuable patrons find the resource to be. A database may contain the deepest, most broad content possible, but if users can’t find it, it’s not much use to them. Like the question of a tree making sound when it falls in an empty forest, we can’t answer the question of whether the content is useful if no one is there to witness it.

Technical Bits

Before digging deeper into authentication and content format, I have a list of technical odds and ends that I consider in the preliminary evaluation. Does the vendor provide IT support for when technical issues inevitably arise? What about staff training or tutorials so librarians can learn how best to assist patrons in using the resource, or teach classes on the database’s functionality? How do patrons access the database – some vendors may allow in-library access only, some may provide limited content in their licensed versions, and some may not be optimized for mobile; in my evaluation, the resource will need to be stellar in other ways if these limitations exist. There’s also the biggie: cost. I weigh the expected value against the cost of the resource in electronic versus print format, e.g., is the electronic version more timely, cheaper per use, or vastly easier to use?

Once an electronic resource is in use, I add a parameter or two in the annual evaluation process – such as whether a database generates enough use to warrant the expense; any patron feedback staff has received; how much librarian-patron interaction is required for users to engage with the resource effectively; and how often the resource crashes as well as how the vendor’s IT staff assists in resolving those inevitable issues that crop up. In the preliminary stages of electronic resource selection, I use content, function, and basic technical elements as the litmus. If a resource passes all of these tests, then a library can dig a level deeper to finalize its decision. I’ll discuss this next month in a follow-up post.

Do you have any pro-tips? What has been your experience in implementing databases at your library?

Matching authors against VIAF identities / LibreCat/Catmandu blog

At Ghent University Library we enrich catalog records with VIAF identities to enhance the search experience in the catalog. When searching for all the books about ‘Chekov’ we want to match all name variants of this author. Consult VIAF http://viaf.org/viaf/95216565/#Chekhov,_Anton_Pavlovich,_1860-1904 and you will see many of them.

  • Chekhov
  • Čehov
  • Tsjechof
  • Txékhov
  • etc

Any of the these names variants can be available in the catalog data if authority control is not in place (or not maintained). Searching any of these names should result in results for all the variants. In the past it was a labor intensive, manual job for catalogers to maintain an authority file. Using results from Linked Data Fragments research by Ruben Verborgh (iMinds) and the Catmandu-RDF tools created by Jakob Voss (GBV) and RDF-LDF by Patrick Hochstenbach, Ghent University started an experiment to automatically enrich authors with VIAF identities. In this blog post we will report on the setup and results of this experiment which will also be reported at ELAG2015.

Context

Three ingredients are needed to create a web of data:

  1. A scalable way to produce data.
  2. The infrastructure to publish data.
  3. Clients accessing the data and reusing them in new contexts.

On the production site there doesn’t seem to be any problem creating huge datasets by libraries. Any transformation of library data to linked data will quickly generate an enormous number of RDF triples. We see this in the size of public available datasets:

Also for accessing data, from a consumers perspective the “easy” part seems to be covered. Instead of thousands of APIs available and many documents formats for any dataset, SPARQL and RDF provide the programmer a single protocol and document model.

The claim of the Linked Data Fragments researchers is that on the publication side, reliable queryable access to public Linked Data datasets largely remains problematic due to the low availability percentages of public SPARQL endpoints [Ref]. This is confirmed by the 2013 study by researchers from Pontificia Universidad Católica in Chili and National University of Ireland where more than half of the public SPARQL endpoints seem to be offline 1.5 days per month. This gives an availability rate of less than 95% [Ref].

The source of this high rate of inavailability can be traced back to the service model of Linked Data where two extremes exists to publish data (see image below).

At one side, data dumps (or dereferencing of URLs) can be made available which requires a simple HTTP server and lots of processing power on the client side. At the other side, an open SPARQL endpoint can be provided which requires a lot of processing power (hence, hardware investment) on the serverside. With SPARQL endpoints, clients can demand the execution of arbitrarily complicated queries. Furthermore, since each client requests unique, highly specific queries, regular caching mechanisms are ineffective, since they can only optimized for repeated identical requests.

This situation can be compared with providing a database SQL dump to endusers or open database connection on which any possible SQL statement can be executed. To a lesser extent libraries are well aware of the different modes of operation between running OAI-PMH services and Z39.50/SRU services.

Linked Data Fragment researchers provide a third way, Triple Pattern Fragments, to publish data which tries to provide the best of both worlds: access to a full dump of datasets while providing a queryable and cachable interface. For more information on the scalability of this solution I refer to the report  presented at the 5th International USEWOD Workshop.

The experiment

VIAF doesn’t provide a public SPARQL endpoint, but a complete dump of the data is available at http://viaf.org/viaf/data/. In our experiments we used the VIAF (Virtual International Authority File), which is made available under the ODC Attribution License.  From this dump we created a HDT database. HDT provides a very efficient format to compress RDF data while maintaining browser and search functionality. Using command line tools RDF/XML, Turtle and NTriples can be compressed into a HDT file with an index. This standalone file can be used to without the need of a database to query huge datasets. A VIAF conversion to HDT results in a 7 GB file and a 4 GB index.

Using the Linked Data Fragments server by Ruben Verborgh, available at https://github.com/LinkedDataFragments/Server.js, this HDT file can be published as a NodeJS application.

For a demonstration of this server visit the iMinds experimental setup at: http://data.linkeddatafragments.org/viaf

Using Triple Pattern Fragments a simple REST protocol is available to query this dataset. For instance it is possible to download the complete dataset using this query:


$ curl -H "Accept: text/turtle" http://data.linkeddatafragments.org/viaf

If we only want the triples concerning Chekhov (http://viaf.org/viaf/95216565) we can provide a query parameter:


$ curl -H "Accept: text/turtle" http://data.linkeddatafragments.org/viaf?subject=http://viaf.org/viaf/95216565

Likewise, using the predicate and object query any combination of triples can be requested from the server.


$ curl -H "Accept: text/turtle" http://data.linkeddatafragments.org/viaf?object="Chekhov"

The memory requirements of this server are small enough to run a copy of the VIAF database on a MacBook Air laptop with 8GB RAM.

Using specialised Triple Pattern Fragments clients, SPARQL queries can be executed against this server. For the Catmandu project we created a Perl client RDF::LDF which is integrated into Catmandu-RDF.

To request all triples from the endpoint use:


$ catmandu convert RDF --url http://data.linkeddatafragments.org/viaf --sparql 'SELECT * {?s ?p ?o}'

Or, only those Triples that are about “Chekhov”:


$ catmandu convert RDF --url http://data.linkeddatafragments.org/viaf --sparql 'SELECT * {?s ?p "Chekhov"}'

In the Ghent University experiment a more direct approach was taken to match authors to VIAF. First, as input a MARC dump from the catalog is being streamed into a Perl program using a Catmandu iterator. Then, we extract the 100 and 700 fields which contain $a (name) and $d (date) subfields. These two fields are combined in a search query, as if we would search:


Chekhov, Anton Pavlovich, 1860-1904

If there is exactly one hit in our local VIAF copy, then the result is reported. A complete script to process MARC files this way is available at a GitHub gist. To run the program against a MARC dump execute the import_viaf.pl command:


$ ./import_viaf.pl --type USMARC file.mrc
000000089-2 7001  L $$aEdwards, Everett Eugene,$$d1900- http://viaf.org/viaf/110156902
000000122-8 1001  L $$aClelland, Marjorie Bolton,$$d1912-   http://viaf.org/viaf/24253418
000000124-4 7001  L $$aSchein, Edgar H.
000000124-4 7001  L $$aKilbridge, Maurice D.,$$d1920-   http://viaf.org/viaf/29125668
000000124-4 7001  L $$aWiseman, Frederick.
000000221-6 1001  L $$aMiller, Wilhelm,$$d1869- http://viaf.org/viaf/104464511
000000256-9 1001  L $$aHazlett, Thomas C.,$$d1928-  http://viaf.org/viaf/65541341

[edit: 2017-05-18 an updated version of the code is available as a Git project https://github.com/LibreCat/MARC2RDF ]

All the authors in the MARC dump will be exported. If there is exactly one single match against VIAF it will be added to the author field. We ran this command for one night in a single thread against 338.426 authors containing a date and found 135.257 exact matches in VIAF (=40%).

In a quite recent follow up of our experiments, we investigated how LDF clients can be used in a federated setup. When combining in the LDF algorithm the triples result from many LDF servers, one SPARQL query can be run over many machines. These results are demonstrated at the iMinds demo site where a single SPARQL query can be executed over the combined VIAF and DBPedia datasets. A Perl implementation of this federated search is available in the latest version of RDF-LDF at GitHub.

We strongly believe in the success of this setup and the scalability of this solution as demonstrated by Ruben Verborgh at the USEWOD Workshop. Using Linked Data Fragments a range of solutions are available to publish data on the web. From simple data dumps to a full SPARQL endpoint any service level can be provided given the resources available. For more than a half year DBPedia has been running an LDF server with 99.9994% availability on a 8 CPU , 15 GB RAM Amazon server with 4.5 million requests. Scaling out, services such has the LOD Laundromat cleans 650.000 datasets and provides access to them using a single fat LDF server (256 GB RAM).

For more information on the Federated searches with  Linked Data Fragments  visit the blog post of Ruben Verborgh at: http://ruben.verborgh.org/blog/2015/06/09/federated-sparql-queries-in-your-browser/


Learning Objects: Teach Me Goodness, Discipline and Knowledge / Mita Williams

Last week, a tweet pointing to this article “A Stanford researcher’s 15-minute study hack lifts B+ students into the As” caught my attention.

The article describes how Stanford researcher Patricia Chen improved her class’ performance in a test by sending out a 15 minute pre-survey designed to get them thinking about how they were going to prepare. Chen was applying a metacognition intervention in her teaching practice.

According to the Educational Endowment Foundation (EEF), which performs studies to try and close achievement gaps, metacognition is one of two of the most effective educational interventions it has tested. (Feedback is the other.) Students involved in programs designed to improve how they think about thinking accelerated their learning by an average of eight months’ worth of academic progress. The effect was greatest for low-achieving and older pupils.

This article reminded me that I had unfinished work to do.

Some months ago I quietly launched a project that I designed as a librarian’s “intervention” to help students think about their thinking. It is a box of objects and zines that was made available at the Leddy Library’s Course Reserves Desk called ‘Learning Objects’.

The west building of the Leddy Library features a cornerstone that bears the motto of the University of Windsor: TEACH ME GOODNESS DISCIPLINE AND KNOWLEDGE.

Learning Objects is a box of objects that you can borrow from the Leddy Library. Each object is  accompanied by a booklet that let you know how these things can teach you GOODNESS, DISCIPLINE, and KNOWLEDGE.

And yet I had not yet properly explained the thinking behind my thinking behind this project. I had meant to write up my work but I found I kept putting it off. This is particularly ironic because one of the objects in the box was specifically chosen by me to help students deal with procrastination.

Panic Monster and Box

So let me turn the dial of my tomato timer and do the work.

You can re-create this zine by using my production template [.docx] and following this helpful zine making guide.

The text of "The Pomodoro Technique" zine can be found here.

If I had to define what was the starting point of my Learning Objects project, I think it would have to be this tweet from McMaster University’s Andrew Colgoni:

In response, I tweeted back:

 

There are passages in Clive Thompson’s Smarter than you think: How Technology Is Changing Our Minds for the Better that still sit with me since I read the book in 2013.

For example, his work was the first instance that I’ve ever come across that suggested that it is the writer’s desire for a ‘speed of recall’ that is as close as possible to one’s speed of thought which is the core driver behind the reader’s need for convenience – a drive that always puts library resources (whether in print but across campus or online but behind a tiresome authentication process) at a permanent disadvantage to any other text that was closer to hand even when the reader states that they appreciate the experience of browsing items on library shelves.

As with Drexel, Dewey, and Otlet before him, [Vannevar] Bush argued that speed of recall was key. Without it, one’s external store of facts would be useless. When he called his invention “an enlarged intimate supplement” to memory, the crucial word wasn’t so much “enlarged” or “supplement”; books had long enlarged and supplemented our minds. No, it was “intimate”—the idea that the memex would be physically and cognitively proximal, in a nearly sensual fashion. That was a key to its power. Indeed, Bush suspected the comparative difficulties of using libraries is what had prevented them from being of widespread use to the public. “Even the modern great library,” he wrote, “is not generally consulted; it is nibbled at by a few.” To truly harness our external knowledge, we needed to bring it closer to our minds.

But the passages that came to mind from that were prompted by Andrew’s tweet, was the the book’s introduction to the science behind the learning technique of “spaced repetition” which is based on the Ebbinghaus curve of forgetting:

Machines can also remind us of facts precisely when we need reminding. If you’ll recall the Ebbinghaus curve of forgetting from the second chapter, Ebbinghaus found that we forget things in a predictable pattern: More than half our facts are gone in an hour, about two thirds are gone within a day, and within a month we’re down to about 20 percent. Ebbinghaus and his followers theorized that this process could work in reverse. If you reviewed a fact one day after you first encountered it, you’d fight the curve of loss. This process is called “spaced repetition,” and experiments and anecdotes suggest it can work. It explains why students who cram for a test never retain much; the material dissolves because they never repeat it. But though spaced repetition is clever and effective, it has never caught on widely, because ironically, the technique relies on our frail human memories. How would you remember to review something the next day? Then a few days later, a week, and three months?

Machines, however, are superb at following these rote schedules. In the last decade, software programmers began selling tools intended to let you feed in facts, which the computer then reminds you to review on a reverse Ebbinghaus curve. Use of this software has remained a subculture, mostly by people seeking to learn a foreign language, though devout adherents use it to retain everything from recipes to poetry…

screenshot of kindle daily review

As librarians, we don’t concern ourselves with the memory work of our readers. Our focus is on the research process of scholarship and not on the learning and recall of said scholarship. And yet arguably more student time is spent studying in the library than researching within it.

For many of our students much of their time is spent in the learning of material. And despite the fact that some of our most prestigious students need to memorize content (there is a good chance that your doctor, as a medical student, used flash cards or memory palaces to learn the biomedical foundation of their care) educators and librarians frequently choose to focus their teaching on ‘higher level learning’ instead.

Appealing though it might be to offload the responsibility for teaching our students basic knowledge to their elementary school teachers or to Google, the research of cognitive psychologists who study learning and the basic study habits of most students suggest that we cannot do this. One of our first and most important tasks as teachers is to help students develop a rich body of knowledge in our content areas– without doing so, we handicap considerably their ability to engage in cognitive activities like thinking and evaluating and creating. As cognitive psychologist Daniel Willlingham argued, you can’t think creatively about information unless you have information in your head that you can think about. “Research from cognitive science has shown,” he explained, “that the sorts of skills that teachers want for their students — such as the ability to analyze and think critically — require extensive factual knowledge” (Willingham 2009, p. 25). We have to know things, in other words, to think critically about them. Without any information readily available to us in our brains, we tend to see new facts (from our Google searches) in isolated, noncontextual ways that lead to shallow thinking. Facts are related to other facts, and the more of those relationships we can see, the more we will prove capable of critical analysis and creative thinking. Students who don’t bother to memorize anything will never get much beyond skating the surface of a topic.

The above passage comes from James M. Lang, the author of Small Teaching: Everyday Lessons from the Science of Learning, which I found an extraordinarily helpful book. I included a passage from his “Small Teaching” in the “Teach me knowledge: Why study the facts” zine that I included in the Learning Objects box.

You can re-create this zine by using my production template [.docx] and following this helpful zine making guide. 

The text of the "Why study the facts" zine can be found here.

I also included a separate zine dedicated specifically to the topic of spaced repetition. To accompany the zine, I included a small box of index cards in which the cards explained how to create a ‘Leitner Flashcard Game’ for one’s own learning goal.

Leitner Flashcard Game

You can re-create this zine by using my production template [.docx] and following this helpful zine making guide. 

The text of the "Spaced Repetition" zine can be found here.

(Did you know that I’m into index cards? I’m really into index cards.)


It’s one of my theories that when people give you advice, they’re really just talking to themselves in the past. ~Mark Epstein

The zines that accompany the Rubber Duck in the Learning Objects box are really for my past self.

rubber duck and zines

Do you study by reading and re-reading your notes to yourself silently? Stop! I know it feels good, in a monkish, masochistic, pain equals progress sort of way to beat your brains against a book hour after hour, but it’s also a terribly inefficient way to review. Instead, lecture to an imaginary class, out-loud, about the main topics, without reading off your notes. If you can state an idea once, in complete sentences, out-loud, it will stick. You don’t need to re-read it a dozen times. If you can’t capture it out-loud then you don’t understand it yet. Go back. Review. Then try again.

That except is from Cal Newport’s Monday Master Class: 5 Bad Study Habits You Should Resolve to Avoid in 2008. It can also be found in the zine, “Teach Me Knowledge: Rubber Duck: Reciting”:

You can re-create this zine by using my production template [.docx] and following this helpful zine making guide.

The text of the "Rubber Duck: Reciting" zine can be found here.

I’m particularly pleased that I found and was able to share an example of why you might want to use a rubber duck to improve both one’s computer debugging…

You can re-create this zine by using my production template [.docx] and following this helpful zine making guide. 

The text of the "Rubber Duck Debugging" zine can be found here.

… as well as I why you might want to talk to a rubber duck to improve one’s engineering practice.

You can re-create this zine by using my production template [.docx] and following this helpful zine making guide.

The text of the "Rubber Duck: Problem Solving: Engineering Design" zine can be found here.

tarot card: 3 of wands

If you were asked to fill a box of objects to give to a student to help them in their journey, what would you give to inspire DISCIPLINE, KNOWLEDGE, and GOODNESS?

It’s a bit of a cop-out but I chose two books for the objects by which I wanted to carry goodness. Well, two books and a deck of cards.

In the box of Learning Objects you can find a deck of Rider Waite tarot cards and Jessa Crispin’s The Creative Tarot

book cover: the creative tarot

You can re-create this zine by using my production template [.docx] and following this helpful zine making guide.

The text of the "Jessa Crispin's The Creative Tarot" zine can be found here.

… and Ursula Le Guin’s Tau Te Ching.

book cover Tao Te ching

You can re-create this zine by using my production template [.docx] and following this helpful zine making guide.

The text of the "Ursua K. Le Guin's Tau Te Ching" zine can be found here.

Now all I have to do is figure out how to get the students to borrow the box 🙂

Save

Notes for BC SirsiDynix Users Group Meeting 2017 / Cynthia Ng

We got a bunch of presentations from both SD and a couple of presentations from libraries. COSUGI Updates Rick Branham, VP, Pre-Sales Solutions; Steve Donoghue, Senior Library Relations Manager; Tom Walker, Executive Accounts Manager Striving to listen to what customers want, and branding/marketing focus on customers and libraries. Popular “add-on” products: Enterprise, MobileCirc, Visibility, eResource … Continue reading Notes for BC SirsiDynix Users Group Meeting 2017

LIL Talks: A Small Study of Epic Proportions / Harvard Library Innovation Lab

(This is a guest post by John Bowers, a student at Harvard College who is collaborating with us on the Entropy Project. John will be a Berktern here this Summer.)

In last week’s LIL talk, team member and graduating senior Yunhan Xu shared some key findings from her prize-winning thesis “A Small Study of Epic Proportions: Toward a Statistical Reading of the Aeneid.” As an impressive entry into the evolving “digital humanities” literature, Yunhan’s thesis blended the empirical rigor of statistical analysis with storytelling and interpretive methods drawn from the study of classics.

The presentation dealt with four analytical methodologies applied in the thesis. For each, Yunhan offered a detailed overview of tools and key findings.

  1. 1. Syntactic Analysis. Yunhan analyzed the relative frequencies with which different verb tenses and parts of speech occur across the Aeneid’s 12 books. Her results lent insight into the “shape” of the epic’s narrative, as well as its stylistic character in relation to other works.
  2. 2. Sentiment Analysis. Yunhan used sentiment analysis tools to examine the Aeneid’s emotional arc, analyze the normative descriptive treatment of its heroes and villains, and differentiate—following more conventional classics scholarship—the tonality of its books.
  3. 3. Topic Modeling. Here, Yunhan subjected existing bipartite and tripartite “partitionings” of the Aeneid to statistical inquiry. By applying sophisticated topic modelling techniques including Latent Dirichlet Allocation and Non-Negative Matrix Factorization, she made a compelling case for the tripartite interpretation. In doing so, she added a novel voice to a noteworthy debate in the classics community.
  4. 4. Network Analysis. By leveraging statistical tools to analyze the coincidence of and interactions between the Aeneid’s many characters, Yunhan generated a number of compelling visualizations mapping narrative progression between books in terms of relationships.

 

In the closing minutes of her presentation, Yunhan reflected on the broader implications of the digital humanities for the study of classics. While some scholars remain skeptical of the digital humanities, Yunhan sees enormous potential for collaboration and coevolution between the new way and the old.

2017 LITA Forum – Call for Proposals, Deadline Extended / LITA

The LITA Forum is a highly regarded annual event for those involved in new and leading edge technologies in the library and information technology field. Please send your proposal submissions here by June 2, 2017, and join your colleagues in Denver Colorado.

The 2017 LITA Forum Committee seeks proposals for the 20th Annual Forum of the Library and Information Technology Association in Denver, Colorado, November 9-12, 2017 at the Embassy Suites by Hilton Denver Downtown Convention Center.

Submit your proposal at this site

The Forum Committee welcomes proposals for concurrent sessions, workshops, poster sessions, or full-day preconferences related to all types of libraries: public, school, academic, government, special, and corporate. Collaborative, hands-on, and interactive concurrent sessions, such as panel discussions, hands-on practical workshops, or short talks followed by open moderated discussions, are especially welcomed. Proposals may cover projects, plans, ideas, or recent discoveries. We accept proposals on any aspect of library and information technology. The committee particularly invites submissions from first time presenters, and library school students. We deliberately seek and strongly encourage submissions from underrepresented groups, such as women, people of color, the LGBTQA+ community and people with disabilities.

The New Submission deadline is Friday June 2, 2017. 

Presenters will submit final presentation slides and/or electronic content (video, audio, etc.) to be made available online following the event. Presenters are expected to register and participate in the Forum as attendees; a discounted registration rate will be offered.

If you have any questions, contact Vincci Kwong, Forum Planning Committee Chair, at vkwong@iusb.edu.

Additional details are at the Submission site

More information about LITA is available from the LITA websiteFacebook and Twitter.

Questions or Comments?

Contact LITA at (312) 280-4268 or Mark Beatty, mbeatty@ala.org

 

On open source, consensus, vision, and scope / Jonathan Rochkind

Around minute 27 of Building Rails ActionDispatch::SystemTestCase Framework from Eileen Uchitelle.

What is unique to open source is that the stakeholders you are trying to find consensus with have varying levels of investment in the end result…

…but I wasn’t prepared for all the other people who would care. Of course caring is good, I got a lot of productive and honest feedback from community members, but it’s still really overwhelming to feel like I needed to debate — everyone.

Rails ideologies of simplicity differ a lot from capybara’s ideology of lots of features. And all the individuals who were interested in the feature had differing opinions as well… I struggled with how to respect everyone’s opinions while building system tests, but also maintaining my sense of ownership.

I new that if I tried to please all groups and build systems tests by consensus, then I would end up pleasing no one. Everyone would end up unhappy because consensus is the enemy of vision. Sure, you end up adding everything everyone wants, but the feature will lose focus, and the code will lose style, and I will lose everything that I felt like was important.

I needed to figure out a way to respect everyone’s opinions without making systems tests a hodepodge of idoelogies of feeling like I threw out everything I cared about. I had to remind ourselves that we all had one goal: to integrate systems testing into rails. Even if we disagreed about the implementation, htis was our common ground.

With this in mind, there are a few ways you can keep your sanity when dealing with multiple ideologies in the open source world. One of the biggest things is to manage expectations. In open source there are no contracts, you can’t hold anyone else acountable (except for yourself) and nobody else is going to hold you accountable either… You are the person who has to own the scope, and you are the person who has to say ‘no’. There were a ton of extra features suggested for systems tests that I would love to see, but if I had implemented all of them it still wouldn’t be in rails today. I had to manage the scope and the expectations of everyone involved to keep the project in budget…

…When you are building open source features, you are building something for others. If you are open to suggestions the feature might change for the better. Even if you don’t agree, you have to be open to listening to the other side of things. It’s really easy to get cagey about the code that you’ve worked so hard to write. I still have to fight the urge to be really protective of systems test code… but I also have to remember that it’s no longer mine, and never was mine, it now belongs to everyone that uses Rails….

I new that if I tried to please all groups and build systems tests by consensus, then I would end up pleasing no one. Everyone would end up unhappy because consensus is the enemy of vision. Sure, you end up adding everything everyone wants, but the feature will lose focus, and the code will lose style…


Filed under: General

Label Printing & July 9 WMS API Install / OCLC Dev Network

As OCLC first notified the community in April, WMS APIs will be upgraded on July 9, 2017 to add security enhancements, which will affect libraries that rely on the label printing application from the University of New Mexico. The changes to the APIs are not backward-compatible with the current UNM application and will break this functionality.  OCLC and UNM have worked together to update the code and ensure its compatibility with the scheduled changes to the WMS APIs.

What is the difference between budget, spending and procurement data? / Open Knowledge Foundation

Fiscal data is a complex topic. It comes in all different kind of formats and languages, its’ availability cannot be taken for granted and complexity around fiscal data needs special skills and knowledge to unlock and fully understand it. The Global Open Data Index (GODI) assesses three fiscal areas of national government: budgets, spending, and procurement.

Repeatedly our team receives questions why some countries rank low in budgets, public procurement or spending, even though fiscal data is openly communicated. The quick answer: often we find information that is related to this data but does not exactly describe it in accordance with the described GODI data requirements. It appears to us that a clarification is needed between different fiscal data. This blogpost is dedicated to shed light on some of these questions.

As part of our public dialogue phase, we also want to address our experts in the community. How should we continue to measure the status of these three key datasets in the future? Your input counts! Should we set the bar lower for GODI and avoid measuring transactional spending data at all? Is our assessment of transactional spending useful for you? You can leave us your feedback or join the discussion on this topic in our forum.

The different types of fiscal data

A government budget year produces different fiscal data types.

Budgeting is the process where a government body sets its priorities as to how it intends to spend an amount of money over a specific time period (usually annually or semi-annually). Throughout the budgeting cycle  (the process of defining the budget), an initial budget can undergo revisions to result in a revised budget.

Spending is the process of giving away money. This mean, the money might be given as a subsidy, a contract, refundable tax credit, pension or salary.

Procurement is the process of selecting services from a supplier who fits best the need. That might involve selecting vendors, establishing payment terms, some strategic tender or other vetting mechanism meant to prevent corruption.

Not only are the processes linked to each other, the data describing these processes can be linked too (e.g. in cases where identifiers exist linking spending to government budgets and public contracts). For laypersons, it might be difficult to tell the difference when they are confronted with a spending or procurement dataset: Is the money I see in a dataset spending, or part of contracting? The following paragraphs explain the differences.

Budget

As mentioned above, budgeting is called the process where a government body decides how to spend money over a certain time period. The amount is broken into smaller amounts (budget items) which can be classified as follows:

  • Administrative (which government sub-unit gets the money)
  • Functional (what the money is going to be used for)
  • Economic (how the money is going to be used, e.g., procurement, subsidies, salaries etc.)
  • Financing source (where the money should come from).

After the budget period ends, we know how much money was actually spent on each item – in theory. The Global Open Data Index assesses budget information at the highest administrative level (e.g. national government, federal government), which is broken down in one of these classifications. Here is an example of some fully open budget data of Argentina’s national government.

Example of Argentina’s national government budget 2017 (table shortened and cleaned)

The image shows the government entity, and expenditures split into economic classification (how the money is used). At the far right, we can see a column describing the total amount of money effectively spent on a planned budget expenditure. It basically compares allocated and paid money. This column must not be mixed with spending information on a transactional level (which displays each single transaction from a government unit to a recipient).

Spending

The Spending Data Handbook describes spending as “data relating to the specific expenditure of funds from the government”. Money might be given as a subsidy, as payment for a provided service, a salary (although salaries will seldom be published on a transactional level), a pension fund payment, a contract or a loan, to name just a few.
GODI focusses on transactions of service payments (often resulting from a prior procurement process). Monetary transactions are our baseline for spending data. GODI assesses the following information:

  • The amount that was transferred
  • The recipient (an entity external to the government unit)
  • When the transaction took place
  • Government office paying the transaction
  • Data split into individual transactions

GODI exclusively looks for single payment transfers. The reason why we are looking at this type of data is that spending patterns can be detected, and fraud or corruption uncovered. Some of the questions one might be able to address include: Who received what amount of money? Could government get its services from a cheaper service provider? Is government contracting to a cluster of related companies (supporting cartels)?

GODI’s definition of spending data, even though ambitious in scope, does not consider the entire spectrum of transactional spending data. Being produced by many agencies, spending data is scattered  across different places online. We usually pick samples of specific spending data such as government payments to external suppliers (e.g. the single payments through a procurement process). Other types of payment, such as grants, loans or subsidies are then left aside.

Our assessment is also ‘generous’ because we accept spending data that is only published above a certain threshold. The British Cabinet Office, a forerunner in disclosing spending data, only publishes data above £25,000. GODI accepts this as valid, even though we are aware that spending data below this amount remains opaque. There are also many more ways to expand GODI’s definition of spending data. For instance, we could ask if each transaction can be linked to a budget item or procurement contract so that we understand the spending context better.

Example image of British Spending data (Cabinet Office spending over £25,000)

Above is an example image of Great Britain’s Cabinet Office spending. You can see the date and the amount paid by government entity. Using the supplier name, we can track how much money was paid to the supplier. However, in this data no contract ID or contract name is provided that could allow to fully understand as part of what contracts these payments have been made.

Procurement

When purchasing goods and services from an external source, government units require a certain process for choosing the supplier who fits best the need. This process is called procurement and includes planning, tendering, awarding, contracting and implementation. Goals are to enable a fair competition among service providers and to prevent corruption.

Many data traces enable to shed light on each procurement stage. For example one might want to understand from which budget a service is gonna be paid, or what amount of money has been awarded (with some negotiation possible) or finally contracted to a supplier. This blogpost by the Open Contracting Partnership illustrates how each of the procurement stages can be understood through different data.

GODI focuses on two essential stages, that are considered to be a good proxy to understand procurement. These however do not display all information.

Tender phase

  • Tenders per government office
  • Tender name
  • Tender description
  • Tender status

Award phase

  • Awards per government office
  • Award title
  • Award description
  • Value of the award
  • Supplier’s name

Any payment resulting out of government contracts with external suppliers (sometimes only one, sometimes more) has to  be captured in government spending. For example, there might a construction contractor that is being paid by milestone, or an office supplies dealer which is chosen as a supplier. Then each spending transaction is for a specific item purchased through a procurement process.

Below you can see a procurement database of Thailand. It displays procurement phases, but does not display individual transactions following from these. This particular database does not represent actual spending data (monetary transactions), but preceding stages of the contracting process. Despite this the platform is misleadingly called “Thailand Government Spending”.

Procurement database in Thailand

Another example is a procurement database indicating how much money has been spent on a contract:

Example for the procurement website ‘Cuánto y a quién se contrató’ (Colombia)

The road ahead – how to measure spending data in the future

Overall, there is slow but steady progress around the openness of fiscal data. Increasingly, budget and procurement data is provided in machine-readable formats or openly licensed, sometimes presented on interactive government portals or as raw data (more detail see for example in the most recent blogpost of the Open Contracting Partnership around open procurement data).

Yet, there is a long way to go for transactional spending data. Governments do first laudable steps by creating budget or procurement websites which demonstrate how much money will or has been spent in total. These may be confusingly named ‘spending’ portals because in fact they are linked to other government processes such as budgeting (e.g. how much money should be spent) or procurement (how much money has been decided to pay for an external service). The actual spending in form of single monetary transactions is missing. And to date there is no coherent standard or specification that would facilitate to document transactional spending.

We want to address our experts in the community. How should we continue to measure the status of these three key datasets in the future? Your input counts!  You can leave us your feedback and discuss this topic in our forum.

 

This blog was jointly written by Danny Lämmerhirt, Diana Krebs (Project Manager for Fiscal Projects at Open Knowledge International), and Adam Kariv (OpenSpending Technical Lead and Engineering Lead at Open Knowledge International).

Open Data Day events, MyData Japan 2017 and other OK Japan updates / Open Knowledge Foundation

This blog post is part of our on-going Network series featuring updates from chapters across the Open Knowledge Network and was written by the Open Knowledge Japan team.

International Open Data Day

We had a lot of localities joining the International Open Data Day (IODD) – the international website for the IODD shows 42 localities in Japan, but our listing shows 65. OK Japan members helped promote the event via pre-event, social media, and the Japanese website.

We saw a lot of discussions, hackathons, and some mapping parties, among others. Many ‘Code For’s’ were involved in hosting the event.

Open Knowledge Japan Award at VLED

Annually, OK Japan joins a group of other organisations celebrating major and noteworthy achievements in open data in Japan, by issuing unsolicited awards to whoever we think deserves the annual award. We are happy to share that this year OK Japan awarded the digitisation project of classic Japanese materials by the National Institute of Japanese Literature and Center for Open Data in the Humanities. Their dataset includes some cooking books from Edo Period, and some recipes are modified and put into modern Japanese language and released in the Edo period recipe section of largest recipe sharing site in Japan, Cookpad.

This year’s awardees (in Japanese) include the legislators who worked on the basic law for government and private sector data use promotion, which now provide legal ground for open data (see below), which is the best award; health-related open data by Ministry of Health, Labor, and Welfare; and one-stop search on meeting minutes and transcripts of prefectural and major city legislatures by Bitlet and Yasuo Oda, and so many more.

Basic law to promote data use, including open data

The Japanese Parliament passed a law on data use in early December 2016. Under the law, the Japanese government creates a new high-level committee to promote data usage. National and prefectural governments are required under this law to develop their plans to disseminate easily usable data online. Municipal governments and private sector businesses are also expected to make efforts to help the cause. The goal is to gain economic benefits.

MyData Japan 2017

Inspired by the event hosted by OK Finland, MyData 2016, some attendees and others interested in the proper and active use of personal data have decided to hold MyData Japan 2017. The OK Japan Chapter will serve as the host and organiser of this whole-day event, which takes place on 19 May 2017 in Tokyo.

Contact Tomoaki Watanabe [watanabe@glocom.ac.jp], the coordinator of Open Knowledge Japan for more information regarding their events and activities. 

Infradata / Ed Summers


Screenshot of images of Arpanet Maps from Google Images

If you enjoy reading about the relationship between data and metadata and/or the history of the Internet you’ll definitely want to check out Bradley Fidler and Amelia Acker’s excellent paper Metadata, infrastructure, and computer-mediated communication in historical perspective (Fidler & Acker, 2016). If you need a copy drop me an email.

In this paper Acker and Fidler develop the idea of infradata which is a specific kind of metadata that is required to maintain infrastructure. They use this idea to examine the evolution of the Arpanet by taking a close look at the development of the Host/Host Protocol that allowed computers to connect to each other. The source for this history is found in the IETF RFC’s many of which were originally circulated in hard copy but have been digitized and made available online.

In my work with metadata as a software developer I’ve always been biased to metadata that is immediately useful. I’ve found focusing on use helps ground and scope discussions about what metadata standards should be. For example during my work on the National Digital Newspaper Project I had to work with a data specification that was quite complicated, and (I felt) raised the bar unreasonably high for data producers (awardees). I worked on a team that built a website that provided access to the historical newspaper content. In building that application we only used a small fraction of the metadata that awardees were required to produce. The specification struck me as unnecessarily complicated at the time, and perhaps it still is.

But maybe I was overly focused on the infradata, or the data that was required for me to build an access system, and not seeing the larger picture that includes the (sometimes unknown) requirements of digital preservation. Future scenarios when knowing what camera was used to image the microfilm frame was actually important are easy to talk about, but they can also expand to fill up all available space.

At the time I comforted myself with the idea that “digital preservation is access in the future” (Summers, 2013) and therefore focusing on access in the present was the best way to ensure preservation. But now I’m not so sure. This idea of infradata highlights that while some metadata is actively used to maintain the system it is a part of, not all of it is, or necessarily should be.

Acker and Fidler got me thinking about the difficulties of studying these systems historically. The IETF has done such a great job of publishing RFCs over its history. But I wonder how easy it is to get at the stories around these specifications, and their various versions?

Since I’m actively engaged in a project to think about the preservation of social media, I began thinking about how the metadata in a tweet has changed over the years. Projects like Hitch make it possible to look at how APIs like Twitter’s change over time. Documentation is sometimes available in the Internet Archive where it can be used to bring historical snapshots of documentation back to life. I thought it could be useful to create a bot that watches the Twitter sample stream and notices any new or changed metadata in the JSON for a Tweet. If you are are interested you can follow it at (???).

Here’s how it works. The bot watches the Twitter sample stream, and for each tweet it creates a blueprint of the data. It then compares this blueprint against a master blueprint, and announces any new or changed data properties on Twitter. The master blueprint is really just a snapshot of all the previous data fields the bot has seen, which is currently 1229 fields that you can see here.

The blueprint is a uses a jq like syntax to represent each path in the JSON data. It’s a bit more difficult to notice when fields are removed because not all tweets contain all fields. Just because a given field isn’t present in a tweet doesn’t mean it has been removed. I guess the bot could keep some kind of timestamp associated with each field and then if it grows really stale (like months?) it could assume that it has been removed? That was a bit more adventurous for the time I had available to test this idea out.

Anyway, I thought I’d write this up briefly here as a small example of how research can be generative for software projects. I hope that it can work the other way sometimes too.

References

Fidler, B., & Acker, A. (2016). Metadata, infrastructure, and computer-mediated communication in historical perspective. Journal of the Association for Information Science and Technology. Retrieved from http://onlinelibrary.wiley.com/doi/10.1002/asi.23660/abstract

Summers, E. (2013). The web as a preservation medium. Retrieved from https://inkdroid.org/2013/11/26/the-web-as-a-preservation-medium/

Recording Available–Powering Linked Data and Hosted Solutions with Fedora / DuraSpace News

On May 16, 2017 Fedora community members presented a webinar entitled " Powering Linked Data and Hosted Solutions with Fedora." David Wilcox, Fedora Product Manager with DuraSpace, provided an overview of Fedora with a focus on its native linked data capabilities. Hannah Frost, Manager, Digital Library Product and Service Management with Stanford University, presented Hyku, the Hydra-in-a-box repository product, which provides a hosted option for Fedora-based repositories.

Jobs in Information Technology: May 17, 2017 / LITA

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week

City of Logan, Library/Community Center Director, Logan, UT

Wesleyan University, Associate University Librarian for Technical and Digital Services, Middletown, CT

Center for Digital Humanities / Princeton University Library, Digital Humanities Project Manager, Princeton, NJ

New York University, Division of Libraries, Digital Scholarship Librarian, New York, NY

Princeton University Library, Metadata Librarian, Spanish/Portuguese Specialty, Princeton, NJ

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

Internet Association debunks claims that strong Net Neutrality protections hurt internet investment / District Dispatch

Some opponents of the FCC’s 2015 Open Internet Order claim the order created a regulatory environment that kept Internet Service Providers from investing and building better broadband. Today, the Internet Association (IA)’s Chief Economist responded, finding that ISPs continue to invest and innovate at similar or greater levels in the current regulatory environment, including after Title II reclassification of internet services. IA will release its full research paper on internet service provider (ISP) investment in the coming months. Using multiple sources, IA demonstrates that strong net neutrality protections have NOT harmed investment or innovation in our broadband networks. Some key findings include:

  • SEC filings show a 5.3% or $7.3 billion increase in telecom investment among publicly traded companies from 2013-14 to 2015-16;
  • OECD and U.S. Telecom data show a 5.1% or $4.7 billion increase in telecom investment in 2014 to 2015; and
  • SNL Kagan and NCTA: the Internet and Television Association data show a 56% or $89.9 Billion increase in cable investment from 2009 to 2016.

The Internet Association represents many of the largest and most rapidly growing internet companies. Find IA’s Net Neutrality fact sheet here.

Tomorrow, the FCC will vote on a proposed rulemaking that would begin to undo strong net neutrality protections. The ALA has and will continue to advocate for strong, enforceable net neutrality protections. You can watch the FCC’s Open Meeting live here beginning at 10:30 a.m. EDT.

The post Internet Association debunks claims that strong Net Neutrality protections hurt internet investment appeared first on District Dispatch.

Two FRBRs, Many Relationships / Karen Coyle

There is tension in the library community between those who favor remaining with the MARC21 standard for bibliographic records, and others who are promoting a small number of RDF-based solutions. This is the perceived conflict, but in fact both camps are looking at the wrong end of the problem - that is, they are looking at the technology solution without having identified the underlying requirements that a solution must address. I contend that the key element that must be taken into account is the role of FRBR on cataloging and catalogs.

Some background:  FRBR is stated to be a mental model of the bibliographic universe, although it also has inherent in it an adherence to a particular technology: entity-relation analysis for relational database design. This is stated fairly clearly in the  introduction to the FRBR report, which says:

The methodology used in this study is based on an entity analysis technique that is used in the development of conceptual models for relational database systems. Although the study is not intended to serve directly as a basis for the design of bibliographic databases, the technique was chosen as the basis for the methodology because it provides a structured approach to the analysis of data requirements that facilitates the processes of definition and delineation that were set out in the terms of reference for the study. 

The use of an entity-relation model was what led to the now ubiquitous diagrams that show separate entities for works, expressions, manifestations and items. This is often read as a proposed structure for bibliographic data, where a single work description is linked to multiple expression descriptions, each of which in turn link to one or more manifestation descriptions. Other entities like the primary creator link to the appropriate bibliographic entity rather than to a bibliographic description as a whole. In relational database terms, this would create an efficiency in which each work is described only once regardless of the number of expressions or manifestations in the database rather than having information about the work in multiple bibliographic descriptions. This is seen by some as a potential efficiency also for the cataloging workflow as information about a work does not need to be recreated in the description of each manifestation of the work.

Two FRBRs


What this means is that we have (at least) two FRBR's: the mental model of the bibliographic universe, which I'll refer to as FRBR-MM; and the bibliographic data model based on an entity-relation structure, which I'll refer to as FRBR-DM. These are not clearly separated in the FRBR final report and there is some ambiguity in statements from members of the FRBR working group about whether both models are intended outcomes of the report. Confusion arises in many discussions of FRBR when we do not distinguish which of these functions is being addressed.

FRBR-Mental Model


FRBR-MM is the thinking behind the RDA cataloging rules, and the conceptual entities define the structure of the RDA documentation and workflow. It instructs catalogers to analyze each item they catalog as being an item or manifestation that carries the expression of a creative work. There is no specific data model associated with the RDA rules, which is why it is possible to use the mental model to produce cataloging that is entered into the form provided by the MARC21 record; a structure that approximates the catalog entry described in AACR2.

In FRBR-MM, some entities can be implicit rather than explicit. FRBR-MM does not require that a cataloguer produce a separate and visible work entity. In the RDA cataloging coded in MARC, the primary creator and the subjects are associated with the overall bibliographic description without there being a separate work identity. Even when there is a work title created, the creator and subjects are directly associated with the bibliographic description of the manifestation or item. This doesn't mean that the cataloguer has not thought about the work and the expression in their bibliographic analysis, but the rules do not require those to be called out separately in the description. In the mental model you can view FRBR as providing a checklist of key aspects of the bibliographic description that must be addressed.

The FRBR report defines bibliographic relationships more strongly than previous cataloging rules. For her PhD work, Barbara Tillett (a principal on both the FRBR and RDA work groups) painstakingly viewed thousands of bibliographic records to tease out the types of bibliographic relationships that were noted. Most of these were implicit in free-form cataloguer-supplied notes and in added entries in the catalog records. Previous cataloging rules said little about bibliographic relationships, while RDA, using the work of Tillett which was furthered in the FRBR final report, has five chapters on bibliographic relationships. In the FRBR-MM encoded in MARC21,  these continue to be cataloguer notes ("Adapted from …"), subject headings ("--adaptations"), and added entry fields. These notes and headings are human-readable but do not provide machine-actionable links between bibliographic descriptions. This means that you cannot have a system function that retrieves all of the adaptations of a work, nor are systems likely to provide searches based on relationship type, as these are buried in text. Also, whether relationships are between works or expressions or manifestations is not explicit in the recorded data. In essence, FRBR-MM in MARC21 ignores the separate description of the FRBR-defined Group 1 entities (WEMI), flattening the record into a single bibliographic description that is very similar to that produced with AACR2.

FRBR-Data Model


FRBR-DM adheres to the model of separate identified entities and the relationships between them. These are seen in the diagrams provided in the FRBR report, and in the section on bibliographic relationships from that report. The first thing that needs to be said is that the FRBR report based its model on an analysis that is used for database design. There is no analysis provided for a record design. This is significant because databases and records used for information exchange can have significantly different structures. In a database there could be one work description linked to any number of expressions, but when exchanging information about a single  manifestation presumably the expression and work entities would need to be included. That probably means that if you have more than one manifestation for a work being transmitted, that work information is included for each manifestation, and each bibliographic description is neatly contained in a single package. The FRBR report does not define an actual database design nor a record exchange format, even though the entities and relations in the report could provide a first step in determining those technologies.

FRBR-DM uses the same mental model as FRBR-MM, but adds considerable functionality that comes from the entity-relationship model. FRBR-DM implements the concepts in FRBR in a way that FRBR-MM does not. It defines separate entities for work, expression, manifestation and item, where MARC21 has only a single entity. FRBR-DM also defines relationships that can be created between specific entities. Without actual entities some relationships between the entities may be implicit in the catalog data, but only in a very vague way. A main entry author field in a MARC21 record has no explicit relationship to the work concept inherent in the bibliographic description, but many people's mental model would associate the title and the author as being a kind of statement about the work being described. Added entries may describe related works but they do not link to those works.

The FRBR-DM model was not imposed on the RDA rules, which were intended to be neutral as to the data formats that would carry the bibliographic description. However, RDA was designed to support the FRBR-DM by allowing for individual entity descriptions with their own identifiers and for there to be identified relationships between those entities. FRBR-DM proposes the creation of a work entity that can be shared throughout the bibliographic universe where that work is referenced. The same is true for all of the FRBR entities. Because each entity has an identified existence, it is possible to create relationships between entities; the same relationships that are defined in the FRBR report, and more if desired. FRBR-DM, however, is not supported by the MARC21 model because MARC21 does not have a structure that would permit the creation of separately identified entities for the FRBR entities. FRBR-DM does have an expression as a data model in the RDA Registry. In the registry, RDA is defined as an RDF vocabulary in parallel with the named elements in the RDA rule set, with each element associated with the FRBR entity that defines it in the RDA text. This expression, however, so far has only one experimental system implementation in RIMMF. As far as I know, no libraries are yet using this as a cataloging system.

The replacement proposed by the Library of Congress for the MARC21 record, BIBFRAME, makes use of entities and relations similar to those defined in FRBR, but does not follow FRBR to the letter. The extent to which it was informed by FRBR is unclear but FRBR was in existence when BIBFRAME was developed. Many of the entities defined by FRBR are obvious, however, and would be arrived at by any independent analysis of bibliographic data: persons, corporate bodies, physical descriptions, subjects. How BIBFRAME fits into the FRBR-MM or the FRBR-DM isn't clear to me and I won't attempt to find a place for it in this current analysis. I will say that using an entity-relation model and promoting relationships between those entities is a mainstream approach to data, and would most likely be the model in any modern bibliographic data design.


MARC v RDF? 


The decision we are facing in terms of bibliographic data is often couched in terms of "MARC vs. RDF", however, that is not the actual question that underlies that decision. Instead, the question should be couched as: entities and relations, or not? if you want to share entities like works and persons, and if you want to create actual relationships between bibliographic entities, something other than MARC21 is required. What that "something" is should be an open question, but it will not be a "unit record" like MARC21.

For those who embrace the entity-relation model, the perceived "rush to RDF" is not entirely illogical; RDF is the current technology that supports entity-relation models. RDF is supported by a growing number of open source tools, including database management and indexing. It is a World Wide Web Consortium (W3C) standard, and is quickly becoming a mainstream technology used by communities like banking, medicine, and academic and government data providers. It also has its down sides: there is no obvious support in the current version of RDF for units of data that could be called "records" - RDF only recognizes open graphs; RDF is bad at retaining the order of data elements, something that bibliographic data often relies upon. These "faults" and others are well known to the W3C groups that continue to develop the standard and some are currently under development as additions to the standard.

At the same time, leaping directly to a particular solution is bad form. Data development usually begins with a gathering of use cases and requirements, and technology is developed to meet the gathered requirements. If it is desired to take advantage of some or all of the entity-relation capabilities of FRBR, the decision about the appropriate replacement for MARC21 should be based on a needs analysis. I recall seeing some use cases in the early BIBFRAME work, but I also recall that they seemed inadequate. What needs to be addressed is the extent to which we expect library catalogs to make use of bibliographic relationships, and whether those relationships must be bound to specific entities.

What we could gain by developing use cases would be a shared set of expectations that could be weighed against proposed solutions. Some of the aspects of what catalogers like about MARC may feed into those requirements, as well what we wish for in the design of the future catalog. Once the set of requirements is reasonably complete, we have a set of criteria against which to measure whether the technology development is meeting the needs of everyone involved with library data.

Conclusion: It's the Relationships


The disruptive aspect of FRBR is not primarily that it creates a multi-level bibliographic model between works, expressions, manifestations, and items. The disruption is in the definition of relationships between and among those entities that requires those entities to be separately identified. Even the desire to share separately work and expression descriptions can most likely be done by identifying the pertinent data elements within a unit record. But the bibliographic relationships defined in FRBR and RDA, if they are to be actionable, require a new data structure.

The relationships are included in RDA but are not implemented in RDA in MARC21, basically because they cannot be implemented in a "unit record" data format. The key question is whether those relationships (or others) are intended to be included in future library catalogs. If they are, then a data format other than MARC21 must be developed. That data format may or may not implement FRBR-defined bibliographic relationships; FRBR was a first attempt to redefine a long-standing bibliographic model and should be considered the first, not the last, word in bibliographic relationships.

If we couch the question in terms of bibliographic relationships, not warring data formats, we begin to have a way to go beyond emotional attachments and do a reasoned analysis of our needs.

Hospital Waiting List – Open Knowledge Ireland Workshop #1 / Open Knowledge Foundation

This blog post is part of our on-going Network series featuring updates from chapters across the Open Knowledge Network and was written by the Open Knowledge Ireland team.

This post was first published on 12th April 2017 by Flora Fleischer on OK Ireland’s website: https://openknowledge.ie/hwl1/

On the sunny Saturday of March 25th, Open Knowledge Ireland held a workshop powered by citizens which focused on discovering how Open Data can help the ever present Hospital Waiting List problem. With the workshop, we created a space to build engagement around open data and hospital waiting lists and offered participants a practical way to get involved. The workshop was possible because, in December 2016, the National Treatment Purchase Fund (NTPF) published Hospital Waiting List Data on data.gov.ie as machine readable data for the first time. Hospital Waiting List data can now be found here, here, and here.

Hospital Waiting List Workshop #1 focused on identifying & discovering the patient journey, the data that is available, an operating model for use case creation using open data and a long list of potential use cases that for prioritisation at Hospital Waiting List Citizen Workshop #2.

The workshop benefited from having experienced professionals from a range of new and disruptive fields of industries and expertise. On the day OK Ireland facilitated Data Analysts, Customer Experience SMEs, Technology Solution Consultants, Digital Natives, Students, and Coders. We also provided Open Data insights from Ireland and abroad and framed the topic for the day – ways of using open data to address the growing Hospital Waiting Lists in Ireland.

Here is an account of Piush Vaish – a participant at the 1st Hospital Waiting List workshop citizen about how the day went. The post first appeared on his LinkedIn page.

Ways to Improve Hospital Waiting List Using Open Data

Ireland has one of the worst hospital’s waiting lists as a developed country. We all have or know someone who has experienced the uncertainty of the length of time to wait before seeing a specialist. We constantly wonder about our health while we wait, affecting not only our physical but mental health as well. For instance, I had to wait overnight to be seen by a specialist at Beaumont hospital.

Therefore, when an opportunity came to tackle the problem of hospital waiting list using data, I had to do something. That chance came through a workshop/hackathon organised by Open Knowledge Ireland on 25th March 2017. It was the first in a series of hospital waiting list focused workshops held at Guinness Enterprise Center. Open Knowledge Ireland is a part of Open Knowledge International with the goal of opening all essential public interest information. It is a non-profit organisation dedicated to promoting open data and open content in all forms to create insights that drive change and benefit the public at large.

When I arrived at the venue there was a short session where we got to know the other participants over a cup of tea and biscuits. The group of participants came from a different background with various different skill sets and industry experience. Some of them were UX designers, web/ application developers, statisticians, past participants and data scientists. However, we all had one reason to be at the workshop.

The motivation was to tackle a very real social problem as a group of experts and for our citizens by using public data about hospital waiting lists to make that information easily accessible for everybody.

Afterwards, we took up an office in a special set-up meeting room to learn about the work of Open Knowledge Ireland, what open data is and the reasons why we should be interested in the hospital waiting list data.

Open Knowledge Ireland explained their mission, vision, and values. The hospital waiting list datasets are produced by the NTPF. Since July 2012, the NTPF is responsible for the publication of outpatient and inpatient waiting lists. However, they originally published this data in pdf format which is not considered an ‘open’ data format. It limits the usability of the data.

Hence, Open Knowledge Ireland has worked over the last two years to create examples of how the Out-Patient Waiting List and Inpatient/Day Case Waiting List can be published in easily accessible format. They also worked together with the NTPF and the Department of Public Expenditure and Reform to get this data published in machine readable format. In December 2016 hospital waiting list data was for the first time made available in machine readable format on data.gov.ie. This now enables anyone to download the datasets and do any sort of analysis on it.

The format of the workshop was unconference or open space conference. It was my first time attending such a conference. We were given a problem statement but we were free to tackle it in any way the group thought to be most useful to understand the problem more. The agenda was driven by the participants and their expertise in technology, digital, User Experience design, Digital, Analytics and backgrounds from various industries.

There were no narrow topics pre-determined, no keynote speakers invited and no panel had been arranged – so the workshop was very interactive and very driven by the participants themselves. The topics to be discussed were refined through the participation of the attendees to problem statements that could be tackled and looked at in one day. If a session among a group did not inspire an attendee or was not contributing, then he/she were free to get up and find a different group. This enabled everyone to leverage and play on their strength, do research and contribute to understanding the problem statement based on their own experience.

We convened at the individual breakout sessions to discuss the progress of each working group and share learning’s between the working groups. In my opinion, this process helped to apply ideas and empowered participants to share their ability. This offered an opportunity to have an unfiltered exchange of creative ideas.

My first work group was working on mapping the journey for the patient right from getting a symptom till diagnosed by the specialist. The aim was to document the end to end experience of the patient through their perspective, understand how patients are interacting with their general practitioner or hospital, find pain points, identify areas for improvement and improve the experience moving forward.

mapping a patient’s journey: from getting a symptom to being diagnosed by a specialist

The visualisation inspired us to seek value-driven decisions based on a patient’s experience model of performance.

There was another group who mapped a patient’s journey from A&E, how this journey is currently tracked and how the data is collated by one specific hospital. This was to understand the pain points that hospitals may come across when gathering and providing the data.

Later, we swapped our findings to create a complete picture of the patient’s journey.

I then swapped from the journey mapping group to another group that was working on data validation. It was essential for the long-term success of the project that the data is open, correct and useful.

We ensured that the data gathered by NTPF was using data/statistical standards. While I was engaging with different groups, the other participants were engaged in data analysis, creating an API and researching the problem in other countries. The figure below shows an early view of the type of insights that can be generated using the hospital waiting list data that is available on data.gov.ie today.

We also had a short video presentation by Bob Harper from Detail Data who created the Hospital Waiting List Dashboards that are available for Northern Ireland. He explained how he is using the data provided by NHS on his website to present information in a way that is more easily accessible to and understandable by the public in Northern Ireland.

At the end of the day, we all presented our findings to the group and decided what we’ll focus on during the next workshop.

Some of the points we aim to discuss in the next workshop are:

  • Understand existing Hospital Wait Time data publicly available in the Republic of Ireland
  •  Understand and highlight data gaps
  • Recommend additional data points required to build tools useful to citizens (suggest via data.gov.ie)
  •  Identify quick-win use cases and begin prototyping
  • Identify more complex use cases and next steps

If you are inspired by what we have achieved and interested to continue the journey to empower the public please register your interest by attending the next workshop: Hospital Waiting List Citizen Workshop #2.

Contact: flora.fleischer@openknowledge.ie

Preservation in Practice: A Survey of New York City Digital Humanities Researchers / In the Library, With the Lead Pipe

In Brief

Digital Humanities (DH) describes the emerging practice of interpreting humanities content through computing methods to enhance data gathering, analysis, and visualization. Due to factors including scale, complexity, and uniqueness, the products of DH research present unique challenges in the area of preservation. This study collected data with a survey and targeted interviews given to New York City metro area DH researchers intended to sketch a picture of the methods and philosophies that govern the preservation efforts of these researchers and their institutions. Due to their familiarity with evolving preservation principles and practices, librarians are poised to offer expertise in supporting the preservation efforts of digital humanists. The data and interviews described in this report help explore some of the current practices in this area of preservation, and suggest inroads for librarians as preservation experts.

By Malina Thiede (with significant contributions from Allison Piazza, Hannah Silverman, and Nik Dragovic)

Introduction

If you want a definition of Digital Humanities (DH), there are hundreds to choose from. In fact, Jason Heppler’s whatisdigitalhumanities.com alone offers 817 rotating definitions of the digital humanities, pulled from participants from the Day of DH between 2009-2014. A few of these definitions are listed below:

Digital Humanities is the application of computer technology to make intellectual inquiries in the humanities that either could not be made using traditional methods or are made significantly faster and easier with computer technology. It can include both using digital tools to make these inquiries or developing these tools for others to use. –Matthew Zimmerman

DH is the study, exploration, and preservation of, as well as education about human cultures, events, languages, people, and material production in the past and present in a digital environment through the creation and use of dynamic tools to visualize and analyze data, share and annotate primary sources, discuss and publish findings, collaborate on research and teaching, for scholars, students, and the general public. –Ashley Sanders

For the purposes of this article, digital humanities will be defined as an emerging, cross-disciplinary field in academic research that combines traditional humanities content with technology focused methods of display and interpretation. Most DH projects are collaborative in nature with researchers from a variety of disciplines working together to bring these complex works to fruition. DH projects can range from fairly traditional research papers enhanced with computing techniques, such as text mining, to large scale digital archives of content that include specialized software and functionality.

Due to the range of complexity in this field and the challenges of maintaining certain types of digital content, long-term preservation of DH projects has become a major concern of scholars, institutions, and libraries in recent years. While in the sciences, large scale collaborative projects are the norm and can expect to be well funded, DH projects are comparatively lacking in established channels for financial and institutional support over the long term, which can add another layer of difficulty for researchers. As librarians at academic institutions take on responsibility for preserving digital materials, they certainly have a role in ensuring that these DH projects are maintained and not lost.

For the purposes of this paper, a digital humanities project will be broadly defined as cross-disciplinary collaboration that manifests itself online (i.e. via a website) as both scholarly research and pedagogical resource using digital method(s). Methods can include, but are not limited to, digital mapping, data mining, text analysis, visualization, network analysis, and modeling.

Literature Review

The Library of Congress’s (n.d.) catchall definition of digital preservation is “the active management of digital content over time to ensure ongoing access.” Hedstrom (1998) offers a more specific definition of digital preservation as “the planning, resource allocation, and application of preservation methods and technologies necessary to ensure that digital information of continuing value remains accessible and usable.”
Digital preservation is a complex undertaking under the most favorable conditions, requiring administrative support, funding, personnel, and often specialized software and technology expertise.

Kretzschmar and Potter (2010) note that digital preservation, and, in particular, digital humanities preservation, faces a “stand-still-and-die problem” because it is necessary to “continually…change media and operating environments just to keep our information alive and accessible.” This is true of preserving most digital objects, but the complex, multi-faceted nature of many DH projects adds additional layers of complexity to the already challenging digital preservation process. Zorich (2008) lists other components of the “digital ecosystem” that must be preserved in addition to the actual content itself: “software functionality, data structures, access guidelines, metadata, and other…components to the resource.”

Kretzschmar and Potter (2010) lay out three seemingly simple questions about preserving digital projects: “How will we deal with changing media and operating environments? Who will pay for it? And who will do the work?” whose answers are often difficult to pin down. When working with DH projects, ‘what exactly are we preserving?’ may also be an important question because as Smith (2004) notes that “there are…nagging issues about persistence that scholars and researchers need to resolve, such as…deciding which iteration of a dynamic and changing resource should be captured and curated for preservation.” In 2009, Digital Humanities Quarterly published a cluster of articles dedicated to the question of “doneness” in DH projects. Kirschenbaum (2009) notes in the introduction to the cluster that “digital humanities…[is] used to deriving considerable rhetorical mileage and the occasional moral high-ground by contrasting [its] radical flexibility and mutability with the glacial nature of scholarly communication in the fixed and frozen world of print-based publication.” Unlike some digital assets that undergo preservation, DH projects and the components thereof are often in a state of flux and, indeed, may never truly be finished. This feature of DH projects makes their preservation a moving target. Kretzschmar (2009) detailed the preservation process for the Linguistic Atlas Project, a large scale DH project that spanned decades, explaining “we need to make new editions all the time, since our idea of how to make the best edition changes as trends in scholarship change, especially now in the digital age when new technical possibilities keep emerging.” Another example of a DH project that has undergone and continues to undergo significant revisions is described in Profile #5 below.

In addition to the particular technological challenges of preserving often iterative and ever-evolving DH projects, there are structural and administrative difficulties in supporting their preservation as well. Maron and Pickle (2014) identified preservation as a particular risk factor for DH projects with faculty naming a wide range of entities on campus as being responsible for supporting their projects’ preservation needs, which suggested “that what preservation entails may not be clear.” Bryson, Posner, St. Pierre, and Varner (2011) also note that “The general lack of policies, protocols, and procedures has resulted in a slow and, at times, frustrating experience for both library staff and scholars.” Established workflows and procedures are still not easily found in the field of DH preservation, leading scholars, librarians, and other support staff to often attempt to reinvent the wheel with each new project. Other difficult to avoid problems noted across the literature are those of staff attrition and siloing.

Although rife with challenges, the preservation of DH projects is far from a lost cause, and libraries have a crucial role to play in ensuring that, to some degree, projects are successfully maintained. The data and interviews summarized in this paper reveal how some of these projects are being preserved as well as their particular difficulties. There are certainly opportunities for librarians to step in and offer their preservation expertise to help scholars formulate and achieve their preservation goals.

Methodology

The methodology for this project was influenced by time frame and logistics. Initially the project was slated to be completed within five months, but the deadline was later extended to nine months. Because it would have been difficult to interview multiple individuals across New York City within the original time frame, we decided on a two phase approach to conducting the survey, similar to Zorich’s methodology, where an information gathering phase was followed by interviews (Zorich, 2008). The survey involved (1) conducting an online survey of NYC faculty members engaged in digital humanities, and (2) performing in-person or phone interviews with those who agreed to additional questioning. The survey provided a broad, big picture overview of the practices of our target group, and the interviews supplemented that data with anecdotes about specific projects and their preservation challenges. The interviews also provided more detailed insight into the thoughts of some DH scholars about the preservation of their projects and digital preservation in general.

The subjects of our survey and interviews were self-selected faculty members and PhD candidates engaged in digital humanities research and affiliated with an academic institution within the New York City area. This population of academics was specifically targeted to reach members of the DH community that had access to an institutional library and its resources. We limited our scope to the New York City for geographic convenience.

We targeted survey respondents using the NYC Digital Humanities website as a starting point. As of October 2015, when the selection process for this project was underway there were 383 members listed in the NYC Digital Humanities online directory. An initial message was sent to the NYCDH listserv on June 3, 2015, and individual emails were sent to a subset of members in June 15, 2015. We approached additional potential survey respondents that we knew fit our criteria via email and Twitter.

Figure 1: NYC Digital Humanities Logo

Survey

The survey tool was a 34-item online Qualtrics questionnaire asking multiple choice and short answer questions about the researchers’ work and their preservation strategies and efforts to date. The survey questions were developed around 5 specific areas: background information about the projects and their settings, tools used, staff/management of preservation efforts, future goals, and a query about their availability for follow up interviews. As all DH projects are unique, respondents were asked to answer the questions as they pertain to one particular project for which they were the Principal Investigator (PI).

Interviews

Interviewees were located for the second phase of the research by asking survey respondents to indicate if they were willing to participate in a more in-depth interview about their work. Interested parties were contacted to set up in-person or conference call interviews. The interviews were less formal and standardized than the survey, allowing for interviewees to elaborate on the particular issues related to the preservation of their projects. Each interview was recorded but not fully transcribed. Team members reviewed the recordings and took detailed notes for the purpose of comparing and analyzing the results.

Limitations

Although the scope of this project was limited to a particular geographic area with a large population base, the sample size of the survey respondents was fairly small. The institutions of all but three respondents are classified as moderate to high research activity institutions according to the Carnegie Classifications. These types of institutions are by no means the only ones involved in DH work, but the high concentration of respondents from research institutions may indicate that there is greater support for DH projects at these types of institutions. As a result, this paper does not provide much discussion of DH preservation practices at smaller baccalaureate or masters institutions with a stronger emphasis on undergraduate education.

A Note about Confidentiality

Individuals who participated in the online survey were asked to provide their names and contact information so we could follow-up with them if they chose to participate in the interview. Individuals who took part in the interviews were guaranteed confidentiality to encourage open discussion. All findings are reported here anonymously.

Survey Results

The survey was live from June 3, 2015 to July 10, 2015. In total, 18 respondents completed the survey.

Demographics of the Faculty Engaged in Digital Humanities

Our survey respondents represented 10 New York City academic institutions, with the most responses coming from Columbia University. Department affiliations and professional titles are listed below (figure 2).

Institutional Affiliation # of respondents
Columbia University 5
CUNY Graduate Center 3
New York University 2
Bard Graduate Center 1
Hofstra University 1
Jozef Pilsudski Institute of America 1
New York City College of Technology 1
Queensborough Community College 1
St. John’s University 1
The New School 1
Department Affiliation # of respondents
Library/Digital Scholarship Lab 7
English 4
History 3
Art History 2
Linguistics 1
Unreported 1
Academic Titles # of respondents
Professor 4
Assistant Professor 3
Associate Professor 2
Adjunct/Lecturer 2
Digital Scholarship Coordinator or Specialist 2
PhD Candidate 2
Director 2
Chief Librarian 1

Figure 2: Survey respondent demographics (n=18)

We asked respondents where they received funding for their projects (figure 3). Responses were split, with some respondents utilizing two funding sources.

Funding Source # of respondents
Institutional funding 28%
Grant funding 22%
Personal funds 17%
Institutional and grant funding 17%
No funding 11%
Institutional and personal funds 6%

Figure 3: Funding Source

DH Project Characteristics

As previously mentioned, respondents were asked to choose one digital humanities project in which to answer the survey questions. Questions were asked to determine the number of people collaborating on the project and the techniques and software used. The majority of respondents (88%) were working collaboratively with one or more colleagues (figure 4).

# of collaborators # of respondents
2-3 collaborators 33%
6+ collaborators 33%
0 collaborators 22%
4-5 collaborators 11%

Figure 4: Collaborators involved in DH project (n=18)

The techniques utilized are listed in figure 5, with 61% of projects utilizing more than one of these techniques.

Technique # of projects
Data Visualizations 39%
Other* 32%
Data Mining and Text Analysis 28%
Geospatial Information Systems (GIS) 22%
Network Analysis 17%
Text Encoding 11%
3-D Modeling 6%

*maps, interactive digital museum exhibition, audio (2), software code analysis, data analysis tools, OHMS (Oral History Metadata Synchronizer)

Figure 5: Techniques used in DH project (n=18)

The techniques mentioned above are created with software or code, which can be proprietary, open-source, or custom. Respondents utilized a mix of these software types, with 33% of respondents saying that they used proprietary software in their projects, 89% report using open-source software, and 33% used custom software. A list of software examples can be found in figure 6.

Proprietary Software Open-Source Software
Adobe Photoshop (2) WordPress (6)
Adobe Dreamweaver Omeka (3)
Adobe Lightroom Python (2)
Google Maps MySQL (2)
TextLab Timeline.js (2)
SketchUp QGIS (2)
Weebly
DSpace

Figure 6: Software utilized by respondents

Knowledge of Preservation

33% of respondents reported that they had formal training in digital preservation, which the authors intended to mean academic coursework or continuing education credit. Informally, respondents have consulted numerous resources to inform preservation of their project (figure 7).

Source Percent
Published scholarly research 72%
Colleagues or informal community resources 66%
Digital Humanities Center, library/librarian, archivist 50%
Grey literature 44%
Professional or scholarly association sponsored events 22%
Conferences 33%
Campus workshops or events 11%
None 6%

Figure 7: Sources consulted to inform Preservation

Project Preservation Considerations

Preservation of their DH project was considered by the majority (72%) of respondents. When asked who first mentioned preservation of their project, 93% of those who had considered preservation said either they or one of their collaborators brought up the issue. In only one instance did a librarian first suggest preservation, and there were no first mentions by either funder or host department.

The majority of initial preservation discussions (53%) took place during the project, with 39% taking place before the project began, and 8% after project completion.

When asked to consider how many years into the future they see their project being usable and accessible, the majority (56%) said 5+ years, followed by 3-4 years (22%), and 17% were unsure. One respondent noted they were not interested in preservation of the project.

Preservation Strategy

Version control, migration, metadata creation, emulation, durable persistent media, and bit stream preservation are just a few strategies for preserving digital materials. We asked respondents to rate each strategy by importance (figure 8).

Figure 8: Preservation strategies by importance

All respondents reported that they backup their work in some capacity. The most respondents (78%) are using cloud services. Half report the use of institutional servers, and 44% use home computers. GitHub was mentioned by two respondents as a safe storage solution for their projects. The majority of respondents (66%) are utilizing more than one way of backing up their work.

Interview Findings

Through follow-up interviews with five respondents, we delved into several of these projects in greater detail. Interviewees gave us more information about their projects and their partnerships, processes, and policies for the preserving the work.

Profile #1: DH Coordinator

Interview conducted and summarized by Nik Dragovic

Respondent 1 was a coordinator in a Digital Humanities Center at their institution and had undertaken the work in collaboration with librarian colleagues because the library works closely with researchers on DH projects at this particular institution.

This initiative was unique in that no preservation measures were being undertaken, a strategy that resulted from discussion during the conception of the project. The resulting life expectancy for the project, comprising a geography-focused, map-intensive historical resource incorporating additional digital content, was three to four years. The reason for the de-emphasis of preservation stemmed from a shared impression that the complexity of preservation planning acts as a barrier to initiating a project. Given their intention to produce a library-produced exemplar work rather than a traditional faculty portfolio piece, the initiative was well-suited to this approach. The technical infrastructure of the project included a PHP stack used to dynamically render the contents of a mySQL database. The general strategy incorporated elements of custom software and open source technologies including Neatline and Omeka.

The unique perspective of the respondent as an institutional DH liaison as well as a practitioner made the interview more amenable to a general discussion of the issues facing a broad set of digital humanists and their interaction with library services. The overriding sentiment of the respondent echoed, to a large extent, existing literature’s assertion that DH preservation is nascent and widely variable.

Specifically, the interviewee opined that no one framework, process, or solution exists for those seeking to preserve DH outputs, and that every project must have its own unique elements taken into account. This requires an individual consultation with any project stakeholder concerned with the persistence of their work. A primary element of such conversations is expectation management. In the respondent’s experience, many practitioners have the intention of preserving a fully functional interface in perpetuity. In most cases, the time, cost, and effort required to undertake such preservation measures is untenable.

The variegated and transformative code stack environments currently underpinning DH projects is a leading issue in permanent maintenance of the original environment of a DH project. As a result, the respondent advocated for a “minimal computing” approach to preservation, in which more stable formats such as HTML are used to render project elements in a static format, predicated on a data store instead of a database, with languages like Javascript as a method for coordinating the front-end presentation. This technique allows not only for a simpler and more stable preservation format, but also enables storage on GitHub or Apache servers, which are generally within institutional resources.

Another preservation solution the respondent explained was the dismantling of a DH project into media components. Instead of migrating the system into a static representation, one leverages an institutional repository to store elements such as text, images, sound, video, and data tables separately. The resulting elements would then require a manifest to be created, perhaps in that format of a TAR file, to explain the technology stack and how the elements can be reassembled. An Internet Archive snapshot is also a wise element to help depict the user interface and further contextualize the assets.

In the experience of the respondent, helping digital humanists understand strategic and scaled approaches to preservation is one of the greatest challenges of acting as a library services liaison. Students and faculty have an astute understanding of the techniques underpinning the basic functionality their work, but not the landscape of current preservation methodologies. Not only is the learning curve steep for these more library-oriented topics, but the ambitions of the library and the practitioner often diverge. Whereas the scholar’s ambition is often to generate and maintain a body of their own work, the library focuses more on standardization and interoperability. This creates a potential point of contention between library staff and those they attempt to counsel. Often the liaison must exercise sensitivity in their approach to users, who themselves are experts in their field of inquiry.

The broader picture also includes emerging funding consideration for national grants. When asked about the intentions of the National Endowment for the Humanities to incorporate preservation and reusability into funding requirements, the respondent expressed skepticism of the agency’s conceptualization of preservation, stating that a reconsideration and reworking of the term’s definition was in order.

To apply too exhaustive a standard would encourage a reductive focus on the resource-intensive preservation methods that the respondent generally avoids. Like most facets of the DH preservation question, this warrants further inquiry from practical and administrative standpoints. In a general sense, realistic expectations and practical measures ruled the overall logic of the respondent, as opposed to adherence to any given emerging standard presently available.

Profile #2: Library Director

The impetus behind respondent 2’s project was not to advance scholarship in a particular subject, so the preservation strategy and goals differed from projects that had a more explicitly scholarly purpose. The idea was hatched by a team of librarians as a means to help librarians learn and develop new skills in working with digital research with the ultimate goal of enhancing their ability to collaborate and consult with researchers on their projects. The learning and training focus of this project informed the team’s preservation strategy.

A number of tools were used to plan, document, and build out this project, and some levels of the production were designed to be preserved where others were intended to be built out, but then left alone, instead of migrated as updates become available. The process was documented on a WordPress blog, and the ultimate product was built on Omeka. The team did preservation and versioning of code on GitHub, but they do not intend to update the code even if that means the website will ultimately become unusable.

What was very important to this team was to preserve the “intellectual work” and the research that went into the project. To accomplish that, they decided to use software, such as Microsoft Word and Excel, that creates easy to preserve files, and they are looking into ways to bundle the research files together and upload them to the institution’s repository. Respondent 2 expressed that an early problem they had with the technology team was that they “wanted everything to be as well thought out as our bigger digital library projects, and we said that DH is a space for learning, and sometimes I could imagine faculty projects where we don’t keep them going. We don’t keep them alive. We don’t have to preserve them because what was important was what happened in the process of working out things.”

This team encountered some challenges working with Omeka. At one point they had not updated their version of Omeka and ended up losing quite a bit of work which was frustrating. “We need to be thinking about preservation all along the way” to guard against these kinds of losses of data. Working with the IT department also posed challenges because “technology teams are about security and about control” and are not always flexible enough to support the evolving technology needs of a DH project. The project had to be developed on an outside server and moved to the institutional server where the code could not be changed.

Profile #3: Art Professor

Respondent 3’s institution has set up a DH center with an institutional commitment to preserving the materials for the projects in perpetuity. The center relies on an institutional server and has a broad policy to download and maintain files in order to maintain them indefinitely on the back end. Front end production of the project was outsourced to another institution, and the preservation of that element of the project had not been considered at the time of the interview.

This researcher’s main challenge was that although many of the artworks that are examples in the project are quite old and not subject to copyright, certain materials (namely photographs of 3D objects) are copyrighted and can only be licensed for a period of 10 years. The front-end developer expressed that 10 years was a long time in the lifetime of a website (which would make that limitation of little concern), but being able to only license items for a decade at a time clashes with the institutional policy of maintaining materials indefinitely on the server and raises questions about who will be responsible for this content over the long term if the original PI were to move on or retire.

Profile #4: Archivist

Interview conducted and summarized by Hannah Silverman

Respondent 4, who has developed a comprehensive set of open source tools for the purpose of archiving documents and resources related to a specific historical era, sees their work within the sphere of Digital Humanities. The sense that their archival work was essentially related to the Digital Humanities came about over a period of time as their technical needs required them to connect with a larger set of people, first with the librarians and archives community through the Metropolitan New York Library Council (METRO), then as a DH activity introduced at a METRO event. “I myself am writing a [DH] blog which originally was a blog by archivists and librarians…So, the way I met people who are doing similar things is at METRO. We are essentially doing DH because we are on the cross of digital technologies and archives. It is just a label, we never knew we were doing DH, but it is exactly that.”

The respondent goes on to describe the value of developing tools that can read across the archive, allowing researchers to experience a more contextual feel for a person described within the material – adding dimensionality and a vividness to the memory of that person:

What I am struggling with is essentially one major way of presenting the data and that is the library way. The libraries see everything as an object, a book is an object, and everything else is as an object. So they see objects. And if you look at the NY Public Library…you can search and you can find the objects which can be a page of an archive but it is very difficult to see the whole archive, the whole collection; it’s not working this way. If you search for an object you will find something that is much in the object but it is not conducive to see the context and the archives are the context, so what I am trying to see if we can expand this context space presentation. We spent very little money on this project product which we use to display the data. There is a software designer…who built it for us, but if we could get more funding I would work on [creating] a better view for visualizing the data. Several projects [like this] are waiting in line for funding here…We collect records, records are not people. Records are just names. We would like to put the records in such a way that all the people are listed and then give the information about this person who was in this list because he was doing something, and in this list because he was doing something else, and in this document because he traveled from here to here and so on. That would be another way of sort of putting all the soldiers and all the people involved in these three (volunteer) uprisings for which we have complete records of in part of the archive. We have complete records of all the people in such a way that you could follow a story of a person and also maybe his comrades in arms. It may be the unit in which he worked, and so on.

The respondent has addressed preservation with multiple arrays of hard drives that are configured with redundancy schemes and daily scrubbing programs for replacing any corrupted digital bits. Also copies stored on tape are routinely managed in multiple offsite locations, as well as quality assurance checks occurring via in both analog and digital processes.

Profile #5: English and Digital Humanities Professor

Interview conducted by Hannah Silverman and summarized by Malina Thiede.

The project discussed in this interview began as a printed text for which an interactive, online platform was later created. The online platform includes data visualizations from user feedback (such as highlights) and a crowdsourced index, as no index was included in the original print text. The code for the project is preserved and shared on GitHub which the interviewee sees as a good thing. The visualizations of the data are not being preserved, but the data itself is. There is an intent to create and preserve new visualizations, but the preservation plan was not set at the time of the interview.

The initial project was conceived and executed in a partnership between an academic institution and a university press on a very short timeline (one year from call for submissions to a printed volume) with very rigid deadlines. Due to the rapid and inflexible timeline, preservation was not considered from the outset of the project, but a data curation specialist was brought in between the launch of the site and the first round of revisions to review the site and give advice on issues of preservation and sustainability. The institution supporting the project has strong support for digital initiatives; however, an informal report from the data curation specialist tasked with reviewing the project indicated that “precarity in the institutional support for the project could result in its sudden disappearance.”

The interviewee stated that “we are less focused on preservation than we should be” because “we’re looking towards the next iteration. Our focus has been less on preserving and curating and sustaining what we have” than on expanding the project in new directions. At the time of the interview, this project was entering a new phase in which the online platform was going to be adapted into a digital publishing platform that would support regular publications. The interviewee indicated several times that more of a focus on preservation would be ideal but that the digital elements of this project are experimental and iterative. The priority for this project is moving ahead with the next iteration rather than using resources to preserve current iterations.

Analysis & Conclusion

Through this survey of NYC librarians, scholars, and faculty, our aim was to capture a sample of the work being done in the digital humanities, paying close attention to this population’s preservation concerns, beliefs, and practices. Through this research, we offer the following observations regarding DH content creators and preservation:

1. Preservation is important to the researchers working on these projects, but it is often not their main focus.
2. Scholars working on DH projects are looking for advice and support for their projects (including their project’s preservation).
3. Librarians and archivists are already embedded in teams working on DH projects.

Preservation Challenges

We noticed through textual responses and follow-up interviews that preservation rarely came up in the earliest stages of the project – sometimes due to tight deadlines, and other times simply because preservation is not generally in the conversation during the onset of a project. Researchers are typically not accustomed to thinking about how their work will be preserved. The workflows for traditional published research leave preservation in the hands of the consumer of the research, which is often the library. However, DH and other digital projects often have less clearly defined workflows and audiences, making it less obvious who should be responsible for preservation and when the preservation process should begin. Our data indicates that most planning about preservation occurs sometime during the course of the project or after its completion, rather than at the beginning. Best practices for digital projects state that preservation should be a consideration as close to the beginning of the project as possible, but researchers may not be aware of that until they have done significant work on a project.

It is also noteworthy that just over half of our survey respondents set a goal of preserving their work for five or more years, and significant percentages (22 and 17, respectively) set goals of three to four years or were unsure of how long they wanted their work to be preserved. This indicates that not all projects are intended to be preserved for the long term, but that does not mean that preservation planning and methods should be disregarded for such projects.

As these projects go forward, respondents who do want their projects to be available long term grapple with the difficulties that surround preservation of digital content and the added time commitment it demands.

The following survey respondent illustrates this potential for complexity:

Unlike many digital humanities projects this project exists/existed in textual book format, online, and in an exhibition space simultaneously. All utilize different aspects of digital technologies and are ideally experienced together. This poses much more complicated preservation problems since preserving a book is different from preserving an exhibition which is different from preserving an online portion of a project. What is most difficult to preserve is the unified experience (something I am well aware of being a theatre scholar who has studied similar issues of ephemerality and vestigial artifacts) and is something that we have not considered seriously up to this point. However, because books have an established preservation history, the exhibition was designed to tour and last longer than its initial five-month run, and the online component will remain available to accompany the tour and hopefully even beyond, the duration of the project as a whole has yet to be truly determined and I am sure that considerations of preservation and version migration will come up in the near future for both the physical materials and the digital instantiations of the project. It promises to provide some interesting conundrums as well as fascinating revelations.

And another survey respondent:

I feel like I should unpack the perpetuity question. Our project is text (and) images (and) data visualizations on a website. The text (and) images I’d hope would be accessible for a long time, the data (visualization) relies on specific WordPress plugins/map applications and may not be accessible for a long time. Since we’re self-administering everything we will take things forward with updates as long as we can, but…

Roles for Librarians and Archivists

As one librarian interviewee explained, preservation is a process that needs to be considered as a project is developed and built out, not a final step to be taken after a project is completed. Hedstrom noted as far back as 1998 that preservation is often only considered at a project’s conclusion or after a “sensational loss,” and this remains a common problem nearly 20 years later. Therefore, librarians and archivists should try to provide preservation support starting at the inception of a project. Considering preservation at an early stage can inform the process of selecting tools and platforms; prevent data loss as the project progresses; and help to clarify the ultimate goals and products of a project.

Nowviskie (2015) posed the question: “is [digital humanities] about preservation, conservation, and recovery—or about understanding ephemerality and embracing change?” Humanists have to grapple with this question as it regards their own work, but librarians and archivists can provide support and pragmatic advice to practitioners as they navigate these decisions. Sometimes this may mean that information professionals have to resist their natural urge to advocate for maximal preservation and instead to focus on a level of preservation that will be sustainable using the resources at hand. Librarians and archivists would do well to consider this advice from Nowviskie (2015):

We need to acknowledge the imperatives of graceful degradation, so we run fewer geriatric teen-aged projects that have blithely denied their own mortality and failed to plan for altered or diminished futures. But alongside that, and particularly in libraries, we require more a robust discourse around ephemerality—in part, to license the experimental works we absolutely want and need, which never mean to live long, get serious, or grow up.

Profiles #1 and #2 exemplified the ‘graceful degradation’ approach to DH preservation by building a website that was intended to be ephemeral with the idea that the content created for the site could be packaged in stable formats and deposited in an institutional repository for permanent preservation. The project discussed in profile #5, while not explicitly designed as an ephemeral project, has a fast moving, future focused orientation, such that any one particular iteration of the project may not exist indefinitely, or even for very long. Of course, an ephemeral final product may not be an acceptable outcome in some cases, but advice from librarians can inform the decision making process about what exactly will be preserved from any project and how to achieve the level of preservation desired.

Due to variations in the scale and aims of individual DH projects and the resources available in different libraries, it would be virtually impossible to dictate a single procedure that librarians should follow in order to provide preservation support for DH projects, but based on our data and interviews, librarians who want to support preservation of DH research can take the following steps:

1. Keep up with existing, new, or potential DH research projects on campus. Depending on the type of institution, those projects may be anything from large scale projects like the Linguistic Atlas mentioned above to undergraduate student work.

2. Offer to meet with people doing DH on campus to talk about their projects. Begin a discussion of preservation at an early stage even if long term preservation is not a goal of the researchers. Establishing good preservation practices early can help to prevent painful data losses like the one mentioned in profile #2 as the project progresses.

3. Work with the researchers to develop preservation plans for their projects that will help them meet their goals and that will be attainable given the resources available at your institution/library.

– In developing a plan, some of the questions from our survey (see Appendix I) may be helpful, particularly questions about the nature of the project and the intended timeline for preservation.

– Also keep in mind what resources are available at your library or institution. Kretzschmar and Potter (2010) took advantage of a large, extant media archive at their library to support preservation of the Linguistic Atlas. The interviewees in profiles #1 and #2 also mentioned the institutional repository (IR) as a possible asset in preserving some of the components of their work. (While useful for providing access, IRs are not a comprehensive preservation solution, especially at institutions that use a hosting service.)

– Coordinate with other librarians/staff that may have expertise to help with preservation such as technology or intellectual property experts. As discussed in profile #3, copyright can pose some challenges for DH projects, especially those that include images. Many libraries have staff members that are knowledgeable about copyright who could help find solutions to copyright related problems.

– For doing preservation work with limited resources, The Library of Congress Digital Preservation site has a lot of information about file formats and digitization. Another good, frequently updated source from the Library of Congress is the digital preservation blog The Signal. Although created in 2013 and not updated, the POWRR Tool Grid could be a useful resource for learning about digital preservation software and tools.

Conclusion

DH projects are well on their way to becoming commonplace at all types of institutions and among scholars at all levels from undergraduates to full professors. The data and interviews presented here provide a snapshot of how some digital humanists are preserving their work and about their attitudes toward preservation of DH projects in general. They show that there are opportunities for librarians to help define the preservation goals of DH projects and work with researchers on developing preservation plans to ensure that those goals are met, whether the goal is long term preservation or allowing a project to fade over time.


Acknowledgements

Although this article is published under a single author’s name, the survey and interviews were created and conducted by a team of four that also included Allison Piazza, Nik Dragovic, and Hannah Silverman. Allison, Nik, Hannah, and I all worked together to write and conduct the survey, analyze the results, and present our findings in an ALA poster session and to the Metropolitan New York Library Council (METRO). Writing and conducting the interviews was likewise a group effort, and all of them contributed to writing our initial report although it was never fully completed. The contributions of these team members was so substantial that they should really be listed as authors of this paper alongside me, but they declined when I offered.

This project was initially sponsored by the Metropolitan New York Library Council (METRO). Tom Nielsen was instrumental in shepherding this project through its early phases.

Special thanks also to the Pratt Institute School of Information for funding the poster of our initial results that was displayed at the 2015 ALA Annual Conference.

Additional thanks to Chris Alen Sula, Jennifer Vinopal, and Monica McCormick for their advice and guidance during the early stages of this research.

Finally, thanks to publishing editor Ian Beilin, and to reviewers Ryan Randall and Miriam Neptune. Their suggestions were immensely helpful in bringing this paper into its final form.


References

Bryson, T., Posner, M., St. Pierre, A., & Varner, S. (2011, November). SPEC Kit 326:
Digital Humanities. Retrieved from
http://www.arl.org/storage/documents/publications/spec-326-web.pdf

Carnegie Classifications | Basic Classification. (n.d.). Retrieved from http://carnegieclassifications.iu.edu/classification_descriptions/basic.php

Hedstrom, M. (1997). Digital preservation: a time bomb for digital libraries. Computers
and the Humanities, 31(3), 189–202.

Kirschenbaum, M. G. (2009). Done: Finishing Projects in the Digital Humanities, Digital Humanities Quarterly, 3(2). Retrieved from http://www.digitalhumanities.org/dhq/vol/3/2/000037/000037.html

Kretzschmar, W. A. (2009). Large-Scale Humanities Computing Projects: Snakes Eating Tails, or Every End is a New Beginning? Digital Humanities Quarterly, 3(2). Retrieved from http://www.digitalhumanities.org/dhq/vol/3/2/000038/000038.html

Kretzschmar, W. A., & Potter, W. G. (2010). Library collaboration with large digital
humanities projects. Literary & Linguistic Computing, 25(4), 439–445.

Library of Congress. (n.d.). About – Digital Preservation. Retrieved from
http://www.digitalpreservation.gov/about/

Maron, N. L., & Pickle, S. (2014, June 18). Sustaining the Digital Humanities: host
institution support beyond the start-up phase. Retrieved from
http://www.sr.ithaka.org/publications/sustaining-the-digital-humanities/

Nowviskie, B. (2015). Digital Humanities in the Anthropocene. Digital Scholarship in the
Humanities, 30(suppl_1), i4–i15. https://doi.org/10.1093/llc/fqv015

Smith, A. (2004). Preservation. In S. Schreibman, R. Siemens, & J. Unsworth (Eds.). A
Companion to Digital Humanities. Oxford: Blackwell. Retrieved from
http://www.digitalhumanities.org/companion/view?docId=blackwell/978140510313/9781405103213.xml&chunk.id=ss1-5-7&toc.depth=1&toc.id=ss1-5-7&branddefault

Walters, T., & Skinner, K. (2011, March). New roles for new times: digital curation for
preservation. Retrieved from
http://www.arl.org/storage/documents/publications/nrnt_digital_curation17mar11pdf

What is digital humanities? (2015, January). Retrieved from
http://whatisdigitalhumanities.com/

Zorich, D. M. (2008, November). A survey of digital humanities centers in the US. Retrieved from http://f-origin.hypotheses.org/wp-content/blogs.dir/1834/files/2013/08/zorich_2008_asurveyofdigitalhumanitiescentersintheus2.pdf


Appendix: Survey

Preservation in Practice: A Survey of NYC Academics Engaged in Digital Humanities

Thanks for clicking on our survey link! We are a group of four information professionals affiliated with the Metropolitan New York Library Council (METRO) researching the digital preservation of DH projects. Contextual information is available at the myMETRO Researchers page. Our target group is New York City digital humanists working in academia (such as professors or PhD candidates) who have completed or done a significant amount of work on a DH project. If you meet this criteria, we’d appreciate your input. The survey will take less than 15 minutes. The information we gather from this survey will be presented at a METRO meeting, displayed on a poster at the annual conference of the American Library Association, and possibly included as part of a research paper. Published data and results will be de-identified unless prior approval is granted. Please note that your participation is completely voluntary. You are free to skip any question or stop at any time.

You can reach the survey administrators with any questions or comments:
Nik Dragovic, New York University, nikdragovic@gmail.com
Allison Piazza, Weill Cornell Medical College, allisonpiazza.nyc@gmail.com
Hannah Silverman, JDC Archives, hannahwillbe@gmail.com
Malina Thiede, Teachers College, Columbia University, malina.thiede@gmail.com

Is your project affiliated with a New York City-area institution or being conducted in the New York City area?
Yes
No

Title or working title of your DH project:

Does your project have an online component?
Yes (Please provide link, if available):
To be determined
No

What techniques or content types have you used or will you use in your project? Select all that apply.
Data visualizations
Data mining and text analysis
Text encoding
Network analysis
GIS (Geospatial Information Systems)
3-D modeling
Timelines

What date did you begin work on this project (MM/YY)

Approximately how many people are working on this project?
2-3
4-5
6+
I am working on this project alone

Has preservation been discussed in relation to this project?
Yes
No

Who first mentioned the preservation of your project?
Self
Librarian
DH center staff
Project member
Funder
Host department
Other:

At what stage in the project was preservation first discussed?
Before the project began
During the project
After project completion

Who is/will be responsible for preserving this project? Select up to two that best apply.
Self (PI)
Library
Host department
Another team member
Institution
Person or host to be determined
Campus IT
Another institution

How important are each of these processes to your overall preservation strategy for this project?
Bit-stream preservation or replication (making backup copies of your work)
Durable persistent media (storing data on tapes, discs, or another physical medium)
Emulation (using software and hardware to replicate an environment in which a program from a previous generation of hardware or software can run)
Metadata creation
Migration (to copy or convert data from one form to another)
Version control

Are there any other preservation strategies essential to your work that are not listed in the above question? If so, please list them here.

Do you have defined member roles/responsibilities for your project?
Yes
No
Not applicable, I am working on this project alone.

What is your main contribution to this project team? Select all that apply.
Technical ability
Subject expertise
Project management skills

Is there a specific member of your team that is responsible for preservation of the technical infrastructure and/or display of results?
Yes
No

Is there a DH center at your institution?
Yes
No

How often have you consulted with the DH center for your project?
Never
Once
A few times
Many times
DH center staff member is a collaborator on this project
My institution does not have a DH center

How is this project funded? Select all that apply
Institutional funding
Grant funding
Personal funds

Were you required to create a preservation plan for a funding application?
Yes
No

What kinds of resources have you consulted to inform the preservation of your project? Select all that apply.
Published scholarly research (such as books or journal articles)
Guides, reports, white papers and other grey literature
Professional or scholarly association sponsored events or resources (such as webinars)
Conferences
Campus workshops or events
Colleagues or informal community resources
None
DH Center, Library/librarian, archivist

Have you had any training in digital preservation?
Yes
No

How many years into the future do you see your project being usable/accessible?
1-2 years
3-4 years
5+ years
Not sure

Is your resource hosted at your own institution?
Yes
No

If no, where is it hosted?

How are you backing up your work? Select all that apply.
Cloud service
Institutional server
Home computer
DAM tools
Not currently backing up work
Other

Which of the following types of software have you used to create your project? Select all that apply.
Proprietary software (Please list examples)
Open-source software (Please list examples)
Custom software

If you would like to add any perspectives not captured by the previous questions, or clarify your answers, please use the comment box below:

Your full name

Email address

Institutional affiliation

Primary department affiliation

Academic title

If applicable, when did/will you complete your PhD?

Would you be willing to be the subject of an approximately 45-minute interview with a member of our team to talk more in-depth about your project and preservation concerns?


iCampEU Instructors Announced / Islandora

Islandora Camp is going to Delft, Netherlands from June 13 - 15. This will be our only stop in Europe in 2017, and we'll be holding our traditional three day camp, with two days of sessions bookending a day of hands-on training from experienced Islandora instructors. We are very pleased to announce that the instructors for our training workshop will be:

Rosie Le Faive started with Islandora in 2012 while creating the a trilingual digital library for the Commission for Environmental Cooperation. With experience and - dare she say - wisdom gained from creating highly customized sites, she's now interested in improving the core Islandora code so that everyone can use it. Her interests are in mapping relationships between objects, and intuitive UI design. She is the Digital Infrastructure and Discovery librarian at UPEI. This is her third Islandora Camp as an instructor.

Diego Pino is an experienced Islandora developer and an official Committer. This is his second camp as an instructor and he has been helping folks learn how to get the most out of Islandora on our community listserv since he joined up. Diego started with Islandora in the context of handling biodiversity data for REUNA Chile and has transitioned over to develop and support the many Islandora sites of the Metropolitan New York Library Council.

Melissa Anez has been working with Islandora since 2012 and has been the Community and Project Manager of the Islandora Foundation since it was founded in 2013. She has been a frequent instructor in the Admin Track and developed much of the curriculum, refining it with each new Camp.

Frits van Latum worked until his retirement at TU Delft Library on several subjects. He is working with Islandora on an admin and developer level since 2014. He built a considerable part of http://colonialarchitecture.eu/ and also worked on http://repository.tudelft.nl/. At the moment he is a freelance consultant and co-organiser of iCamp Europe 2017.

Our learning from the Open Data Day mini grants scheme / Open Knowledge Foundation

2017 was the third year of OKI Open Data Day Mini-grants scheme. Although we are working on it for a while, we never had the time or capacity to write our learnings from the last two schemes. This year, we decided to take more time to learn about the project and improve it. So we decided to look at the data and share our learnings, so the open data day community can use it in the future.

This year, we used some of our Hewlett grant to team up with groups all over the world who are doing open data day events. We were also lucky to find more funding thanks to Hivos, Article 19, Foreign Commonwealth Office and SPARC. Each partner organisation had their own criteria for the giving the mini-grants. This blog post refers only to the OKI scheme – Open Data for Environment and Open Data for Human Rights. We did include some figures about the other grants, but we can not write about their rationale for how to distribute the money.

How did we decide on the themes for the scheme?

In past years, we awarded the mini-grants without any clear geographical or thematic criteria. We simply selected events that looked interesting to us or that we thought can spark discussion around open data in places where it is not done. We also gave priority to our network members as recipients.

This year, we decided to be more systematic and to test some assumptions. We set up a staff-wide call to discuss the scheme and how it will be built. We decided that Open Data Day is a great opportunity to see how data can be used, and we wanted to limit it to specific topics so we can see this use. Themes like education and health were thrown into the air, but we decided to focus on the environment and human rights – two fields where we saw some use of open data, but not a lot of examples. We tried to gather all that we know on a doc, that then became a staff-wide collaborative work.

We also set other criteria in the meeting. We wanted to see small tangible events rather than big ideas that can not be implemented in one day. We also wanted to see the actual use or promotion of use, rather than a general presentation of open data.

After speaking to David Eaves, Open Data Day spiritual father, we decided to add also a Newbie fund, to support events in places where open data is a new thing.

See all of the details that we gathered here.

 

What themes did people apply to?

 

(Note that FCO joined the grant after the submissions phase closed, and therefore there is no dedicated track for their grant)

Who applied for the grant?

In the 2.5 weeks, we got 204 applications, the majority from the Global South. Just to compare, in the 2016 scheme, we got 61 applications, the majority of them from the Global North. This means that this year we had 3 times more applications to deal with..

View Open Data Day 2017 Mini-Grant Applications in a full screen map

As you can see in the map (made by our talented developer advocate Serah Rono), more than half of the applications (104 if we want to be precise) came from the African continent. Our staff members Serah Rono, David Opoku and Stephen Abbott Pugh, have good networks in Africa and promoted the applications in them. We believe that the aggressive outreach that the three did and the fact that other individuals who champion open data in Africa helped us to promote it are the reason for the increase in applications from there.

In both of our the tracks – human rights and environment, around 25% of the applications we got were from groups who didn’t work with open data or group that didn’t suggest an activity on the theme  – 15 in human rights track and 13 in the environment track.

 

How did we choose who will get the grant?

4 of our staff members – Serah, David, Oscar and Mor gave a score to each application

-1  – the application did not meet the criteria
0  –  the submission met the criteria but did offer anything unique not
1 – The submission met the criteria and offered a new perspective on data use on the topic.

We tried to make the bias as little as possible by having a diverse committee from different genders and locations.  We decided not take into consideration where the application is coming from geographically and gender sex of the applicant.

In our final list, when we had two applications from the same country, we tried to give the money only to one group.

 

What should we have paid attention to?

Gender. Our friends from SPARC checked that they distribute the grant equitably between men and women. We tried to have

We then decided to investigate even further the gender of the applicant. Since we didn’t qualify for the applicant’s gender in the application form, we determined their genders through their names and validate it through a google search. Out of 202 applications, 140 were made by men, and only one applicant was a joint gender application. (See visualisation).

We don’t know why more men apply than women to the grants and it will be good to hear if other organisations had the same experience with this topic. If so, it is important to see why women are not applying for these opportunities.

Who received the grant?

Unlike previous years, this year we took the time to reply to all the applicants about their status as fast as we could. However, we realised that answering back takes longer t

Also, we published all winners in a blog post before open data and tried to keep the process as transparent as we can. See our announcement blog post here. However, during the last couple of month, some groups could not organise the event, and they asked us to give the money to someone else. These groups were from Costa Rica, Morocco, Uganda, Zimbabwe and Brazil. We decided, therefore, to give the grant to another group, Open Knowledge Philippines, for their annual Open Data Day event.

Newbie category

Since some of the groups that applied had no experience in open data, we wanted to try and give the grant to two of these so we can build capacity and see how open data can become part of their work. However, since we announce the winner a week before open data day, we didn’t have enough time to work with them so the event will be meaningful. We are currently looking at how we can cooperate with them in the future.

 

What were the outcomes?

All of the learning from the grant recipients are on our blog where you can see different types of data use and the challenges that the community is facing in getting quality data to work with. Some of our recipients started to inquire more about OK network and how to participate and create more events. We would like to hear more from you about how to improve the next open data day by writing on the open data day mailing list.

Fedora 4 in Production at Penn State ScholarSphere / DuraSpace News

Deploying Fedora 4, or Migrating from Fedora 3 to Fedora 4 is a challenge with built-in rewards. This series of articles, “Fedora 4 in Production” looks into why and how community members are working with Fedora 4 to enhance both collections and workflow at their institutions.

In this article Dan Coughlin, IT Manager, Digital Scholarship and Repository Development, Penn State Libraries, describes Fedora 4 in production at Penn State ScholarSphere.

How can journalists best handle public fiscal data to produce data-driven stories? An interview with Nicolas Kayser-Bril / Open Knowledge Foundation

Nicolas Kayser-Bril is the former CEO and co-founder of Journalism++ (J++), a group of investigative journalists that specialises in data-driven reporting. As part of OKI’s own involvement in Openbudgets.eu, we had the good fortune of working with  J++ on the question how public budget and spending data can be used to tackle corruption. In this short interview, Diana Krebs (Project Manager for Fiscal Projects at OKI) asked Nicolas about his experience on how journalists today can best handle public fiscal data to produce data-driven stories.

 

Are journalists today equipped to work with fiscal data such as budget and spending data?

Different sorts of journalists use budget and spending data in different ways. Investigative outlets such as the International Consortium of Investigative Journalists (of Panama-Papers fame) or investigative lone wolves such as Dirk Laabs (who investigated privatizations in East Germany) are very much able to seek and use such data. Most other types of journalists are not able to do so.

 

Where do you see the gaps? What kind of skill sets, technical and non-technical, do journalists need to have to write data-driven stories that stick and are water-proof?

The largest gap is the lack of incentive. Very few journalists are tasked with investigating government spending and budgets.

The ones who do, either because they are interested in the topic or because they are paid investigative journalists, sometimes lack the field-specific expertise that allows for quick judgments. One can only know what’s abnormal (and therefore newsworthy) if one knows what the normal state of things is. In public budgets, few journalists know what is normal and what’s not.

 

Do you think it’s helpful for journalists to, when in doubt, work closely with experts from the public administration to enhance their fiscal data knowledge?

Journalists are trained to find experts to illustrate their articles or to provide information. It would help to have easy-to-reach experts on public funding that journalists could contact.

 

What are the ingredients for a sustainable increase of fiscal data knowledge among journalists, so that the public can be informed in a credible and informative way?

These are two different issues; it would be a mistake to believe that the information the public receives is in any way linked to the work of journalists. This was true in the last century, when journalists were de facto intermediaries between what happened and reports of what had happened. (They were de facto intermediaries because all means of communication involved a need to package information for film, radio, TV or newspapers).

For journalists to produce more content on budget and spending issues, they must be incentivised to do so by their organizations. This could mean for news organizations to shift their focus towards public accountability. Organizations that have, such as ProPublica in the USA and Correctiv in Germany, happen to employ journalists who know how to decipher budget data.

For the public to be informed about public budget and spending, the availability of interesting and entertaining content on the issue would help. However, demand for such content could also be boosted by the administration, who could celebrate citizens who ask questions on public budgets, which is currently not the case. They could also teach the basics of how government – and government finance – works at school, which is barely done, when at all.

 

J++ has developed several projects around unlocking fiscal data such as Cookingbudgets.com, a quite serious satire tutorial webpage for journalists and civil society activists to look for budget stories in the public administration. Their latest coup is “The Good, the Bad and the Accountant”, an interactive online application that puts users in the shoes of a manager of a big cities to learn about and recognize patterns of corruption within the public administration.

OK Sweden collaborates with the Internet Foundation (.SE)…and other updates / Open Knowledge Foundation

This blog post is part of our on-going Network series featuring updates from chapters across theOpen Knowledge Network and was written by the Open Knowledge Sweden Team. 

We have a new collaboration with the Internet Foundation (.SE) in Sweden, which is an independent organisation which promotes a positive development of the internet for the benefit of the public in Sweden. Open Knowledge Sweden, KTH Mentorspace and other organisations will collaborate under the umbrella of Open Knowledge and Innovation Lab (OKINLAB), and as an initial support, we will be using .SE’s Co-Office in Stockholm

We are hosting a researcher, Xiaowei Chen who received funding from Alexander Humboldt Foundation in Germany to study and compare the Swedish Freedom of Information (FOI) to Germany’s “Informationsfreiheitsgesetz” (Freedom of Information). He is also receiving support from Open Knowledge Foundation Germany for his research. Read more about the Xiaowei’s project here.

Open Knowledge Sweden’s chairman, Serdar Tamiz was invited to be a researcher panel discussant on Open Science and Open Access organised by Swedish National Library and Karlstad University. Jakob Harnesk, Library Director of Karlstad University moderated the discussions where Nadja Neumann, Fil.dr, Karlstads University and Erika Sandlund, Docent, Karlstads University were other discussants. Erika Sandlund could not attend in person due to illness so she sent over her notes/answers via email.

Open Access Meeting- Researcher Panel

In addition to other local researchers and librarians, there were two international guests:

  1. Kathleen Fitzpatrick, Associate Executive Director & Director of Scholarly Communication at the Modern Language Association, New York, USA. She is also the co-founder of the digital scholarly network MediaCommons and presented new ways of publishing
  2. Vincent Bonnet, Director vid the European Bureau of Library, Information and Documentation Associations (EBLIDA), Haag, Holland. Vincent presented how libraries and librarians are changing.

It may appear awfully early, but Asmen Gul, project manager of OKAwards has already started work towards OKAwards 2017 which will be held close to the end of 2017. Asmen is already working with professional Event Manager Erika Szentmartoni for OKAwards 2017. More updates to follow soon. 

As mentioned in our previous update, we are part of the pan-EU CLARITY Project. Together with other 6 partners, we presented our findings to the EU Committee in Brussels as a first-year review. Project partners received very constructive feedback to improve their output and progress for the second half of the project. Project partners will have another meeting on 10th of May in London to coordinate the second half of the project.

Fredrik Sjöberg, Executive Director of OK Sweden

In our previous update, we shared a not so secret with you about OK Sweden having its first Executive Director, Fredrik Sjöberg. He works at the digital agency Creuna and is into everything that’s open and digital. He also likes to find digital opportunities that help create a better and more open society and has created communicative solutions using open source for over 10 years. He is an avid advocate of open data and wants more people to see the benefits of sharing. Frederik has already introduced new structures and strategies for the OK Sweden and after the initial planning period, you will hear more from our new Executive Director.

Also, we are about having a new election for the board and the chairmanship position. The Meeting is scheduled to be on 13th of May. Board members who have fulfilled membership obligations will have the right to elect the new board.

Follow Open Knowledge Sweden twitter page [@OKFSE ] for more updates.

 

UMD, OITP and YALSA announce first cohort of YX Librarians / District Dispatch

ALA’s Office of Information and Technology Policy is pleased to support the University of Maryland’s iSchool YX Graduate Certificate program as part of its Youth & Technology portfolio. We will be working closely with the YX partners to explore how the participating students’ work can augment our Libraries Ready to Code (RtC) initiative. Dr. Mega Subramaniam and Linda Braun, faculty in the YX Program are also RtC team members. Amanda Waugh is a doctoral candidate in the iSchool at the University of Maryland and contributed this post.

11 of the 14 XY cohort members with the XY logo.

This cohort includes 14 librarians from across the country, they serve babies through teens in urban and rural communities and have already shown themselves to be leaders in their field.

We are proud to announce the first cohort of YX Librarians for 2017-2018. The Youth Experience (YX) Certificate is an innovative graduate certificate in professional studies from one of the top library and information studies programs in the nation, University of Maryland. Working with partners, including both ALA’s Office of Information Technology Policy (OITP) and the Young Adult Library Service Association (YALSA), UMD’s iSchool will train this first cohort of youth service librarians to be leaders in harnessing technology, learning and assessment and design thinking. Through the generosity of the Institute of Museum and Library Services, the 2017-2018 cohort is receiving substantial stipends to defray tuition.

This cohort includes 14 librarians from across the country, they serve babies through teens in urban and rural communities and have already shown themselves to be leaders in their field. They have received grants from the National Science Foundation, ALSC and Dollar General, been recognized as ALA emerging leaders, served on national awards committees like the Alex, Morris and Printz Awards and developed innovative programming in their libraries. To learn more about the cohort, see yx.umd.edu/2017-2018-cohort.

The YX Certificate will begin on May 24-25 with an on-campus orientation and attendance at the Human-Computer Interaction Lab’s Symposium, then continue for the next 12 months as the librarians take four online courses focusing on information studies and learning theory, technology and learning, design thinking and youth and developing and sustaining community partnerships. Throughout the program, the librarians will be working in their communities to apply the knowledge they are learning in class, both through programming and through publications and presentations.

For more information about the YX Certificate, see yx.umd.edu. The iSchool gratefully acknowledges the support of the Institute of Museum and Library Services in the creation and continuation of the YX Certificate.

The post UMD, OITP and YALSA announce first cohort of YX Librarians appeared first on District Dispatch.