Planet Code4Lib

We’re only as good as our members are engaged / District Dispatch

Postcard with text that says: "Libraries are a smart investment."
This week at the Association of College and Research Libraries (ACRL) 2017 Conference, academic librarians and information professionals convened around the emerging issues challenging higher education, due to federal funding cuts and new regulations.

On Thursday morning, ALA and ACRL jointly hosted a Postcards to Lawmakers town hall, during which member leaders Emily Drabinski, coordinator of library instruction at Long Island University in Brooklyn, and Clay Williams, deputy chief librarian at Hunter College in Manhattan and our very own Lisa Lindle, grassroots communication specialist at the ALA Washington Office, offered insight to those seeking advice and encouragement about how to effectively advocate for libraries in the face of drastic cuts. The panel offered insight on how to sign up for and use ALA’s Legislative Action Center and collected questions from the audience. American Libraries magazine covered their talk here.

On Friday morning’s Academic Libraries and New Federal Regulations session, Corey Williams, a federal lobbyist at the National Education Association (and formally an ALA lobbyist in the Washington Office) again urged members to step up to the plate. Corey made two illustrative points: ALA has 3 lobbyists for our nearly 60,000 members and one person is only one voice. In other words: Lobbyists are only as good as our members are engaged.

Advocacy is akin to a muscle; you can flex it once, by sending a tweet or an email. But we are one mile into a marathon and advocacy is a muscle that needs to be exercised constantly. Both town halls offered some immediate steps you can take in this next leg of the race.

Do Your Reps
• Did you write a postcard? Great. Now tweet a picture of that postcard to your representatives with the line: “No cuts for funding for museums and libraries. #SaveIMLS

Short Sprints
• Sign up for ALA’s Legislative Action Center.
• Then, prepare a talking point about why IMLS is important to your community and share it with a friend or patron so you can customize your emails to Congress.

Intervals
• Invite your representatives to visit your library (ProTip: Work with your organization’s government relations office to coordinate.)
• Attend a constituent coffee when your reps are home during the weeks of April 10 and April 17 (Note: This period of time that they’re home also happens to be National Library Week. If that time is not possible, other times are good, too, whenever the Member is at home.)
• Think about who can you partner or create a coalition with in your community.
• Pair your data (i.e., how much LSTA funding you receive) with anecdote (i.e. how that money made a transformative difference to your patrons).

In response to other that came up, here are two other helpful references:
• Here’s what the National Archives and Records Administration says about irradiated mail
• Here’s where you can look up your representative’s social media handle

The post We’re only as good as our members are engaged appeared first on District Dispatch.

Look Back, Move Forward: Library Services and Technology Act / District Dispatch

Thank you to everyone for sharing your #SaveIMLS stories. Please keep it coming – more than 7,700 tweets (nearly doubling our count since last Thursday). As we prepare for the appropriations process, here’s a look back on how ALA Council resolved to support the Library Services and Technology Act in June 1995.

As we move forward into the “Dear Appropriator Letters” be sure to sign up for our Legislative Action Center today.

The post Look Back, Move Forward: Library Services and Technology Act appeared first on District Dispatch.

Call for Proposals, LITA education webinars and web courses / LITA

What library technology topic are you passionate about? Have something to teach?

light blub graphicThe Library Information Technology Association (LITA) Education Committee invites you to share your expertise with a national audience! For years, LITA has offered online learning programs on technology-related topics of interest to LITA Members and wider American Library Association audience. Submit a proposal by April 21st, 2017 to teach a webinar, webinar series, or online course for Summer/Fall 2017.

We seek and encourage submissions from underrepresented groups, such as women, people of color, the LGBTQ+ community, and people with disabilities.

All topics related to the intersection of technology and libraries are welcomed. Possible topics include:

  • Privacy
  • Analytics
  • Data librarianship
  • Technology Spaces
  • Visualization
  • Augmented and Virtual Reality
  • Ethics and access
  • Project management
  • Data-driven decision-making

Instructors receive a $500 honorarium for an online course or $150 for webinars, split among instructors. For more information, access the online submission form. Check out our list of current and past course offerings to see what topics have been covered recently.

Proposals will be evaluated by the LITA Education Committee and will be assigned a committee liaison. That person is responsible for contacting you no later than 30 days after your submission to provide feedback.

Please email lita@ala.org with any questions. We’re looking forward to a slate of compelling and useful online education programs this year!

Catmandu 1.04 / LibreCat/Catmandu blog

Catmandu 1.04 has been released to with some nice new features. There are some new Fix routines that were asked by our community:

error

The “error” fix stops immediately the execution of the Fix script and throws an error. Use this to abort the processing of a data stream:

$ cat myfix.fix
unless exists(id)
    error("no id found?!")
end
$ catmandu convert JSON --fix myfix.fix < data.json

valid

The “valid” fix condition can be used to validate a record (or part of a record) against a JSONSchema. For instance we can select only the valid records from a stream:

$ catmandu convert JSON --fix 'select valid('', JSONSchema, schema:myschema.json)' < data.json

Or, create some logging:

$ cat myfix.fix
unless valid(author, JSONSchema, schema:authors.json)
log("errors in the author field")
end
$ catmandu convert JSON --fix myfix.fix < data.json

rename

The “rename” fix can be used to recursively change the names of fields in your documents. For example, when you have this JSON input:

{
"foo.bar": "123",
"my.name": "Patrick"
}

you can transform all periods (.) in the key names to underscores with this fix:

rename('','\.','_')

The first parameter is the fields “rename” should work on (in our case it is an empty string, meaning the complete record). The second and third parameters are the regex search and replace parameters. The result of this fix is:

{
"foo_bar": "123",
"my_name": "Patrick"
}

The “rename” fix will only work on the keys of JSON paths. For example, given the following path:

my.deep.path.x.y.z

The keys are:

  • my
  • deep
  • path
  • x
  • y
  • z

The second and third argument search and replaces these seperate keys. When you want to change the paths as a whole take a look at the “collapse()” and “expand()” fixes in combination with the “rename” fix:

collapse()
rename('',"my\.deep","my.very.very.deep")
expand()

Now the generated path will be:

my.very.very.deep.path.x.y.z

Of course the example above could be written more simple as “move_field(my.deep,my.very.very.deep)”, but it serves as an example  that powerful renaming is possible.

import_from_string

This Fix is a generalisation of the “from_json” Fix. It can transform a serialised string field in your data into an array of data. For instance, take the following YAML record:


---
foo: '{"name":"patrick"}'
...

The field ‘foo’ contains a JSON fragment. You can transform this JSON into real data using the following fix:


import_from_string(foo,JSON)

Which creates a ‘foo’ array containing the deserialised JSON:


---
foo:
- name: patrick

The “import_from_string” look very much like the “from_json” string, but you can use any Catmandu::Importer. It always created an array of hashes. For instance, given the following YAML record:


---
foo: "name;hobby\nnicolas;drawing\npatrick;music"

You can transform the CSV fragment in the ‘foo’ field into data by using this fix:


import_from_string(foo,CSV,sep_char:";")

Which gives as result:


---
foo:
- hobby: drawing
  name: nicolas
- hobby: music
  name: patrick
...

I the same way it can process MARC, XML, RDF, YAML or any other format supported by Catmandu.

export_to_string

The fix “export_to_string” is the opposite of “import_from_string” and is the generalisation of the “to_json” fix. Given the YAML from the previous example:


---
foo:
- hobby: drawing
  name: nicolas
- hobby: music
  name: patrick
...

You can create a CSV fragment in the ‘foo’ field with the following fix:


export_to_string(foo,CSV,sep_char:";")

Which gives as result:


---
foo: "name;hobby\nnicolas;drawing\npatrick;music"

search_in_store

The fix “search_in_store” is a generalisation of the “lookup_in_store” fix. The latter is used to query the “_id” field in a Catmandu::Store and return the first hit. The former, “search_in_store” can query any field in a store and return all (or a subset) of the results. For instance, given the YAML record:


---
foo: "(title:ABC OR author:dave) AND NOT year:2013"
...

then the following fix will replace the ‘foo’ field with the result of the query in a Solr index:


search_in_store('foo', store:Solr, url: 'http://localhost:8983/solr/catalog')

As a result, the document will be updated like:


---
foo:
    start: 0,
    limit: 0,
    hits: [...],
    total: 1000
...

where

  • start: the starting index of the search result
  • limit: the number of result per page
  • hits: an array containing the data from the result page
  • total: the total number of search results

Every Catmandu::Solr can have another layout of the result page. Look at the documentation of the Catmandu::Solr implementations for the specific details.

Thanks for all your support for Catmandu and keep on data converting 🙂


Evergreen - 2.12.0 / FOSS4Lib Recent Releases

Package: 
Release Date: 
Wednesday, March 22, 2017

Last updated March 24, 2017. Created by gmcharlt on March 24, 2017.
Log in to edit this page.

With this release, we strongly encourage the community to start using the new web client on a trial basis in production. All current Evergreen functionality is available in the web client with the exception of serials and offline circulation. The web client is scheduled to be available for full production use with the September 3.0 release.
Other notable new features and enhancements for 2.12 include:

Teaching Networks / Ed Summers

Yesterday I had the good fortune to speak with Miriam Posner, Scott Weingart and Thomas Padilla about their experiences teaching digital humanities students about network visualization, analysis and representation. This started as an off the cuff tweet about teaching Gephi, which led to an appointment to chat, and then to a really wonderful broader discussion about approaches to teaching networks:

Scott suggested that other folks who teach this stuff in a digital humanities context might be interested as well so we decided to record it, and share it online (see below).

The conversation includes some discussion of tools (such as Gephi, Cytoscape, NodeXL, Google Fusion Tables, DataBasic, R) but also some really neat exercises for learning about networks with yarn, balls, short stories and more.

A particular fun part of discussion focuses on approaches to teaching graph measurement and analytics as well as humanistic approaches to graph visualization that emphasize discovery and generative creativity.

During the email exchange that led up to our chat Miriam, Scott and Thomas shared some of their materials which you may find useful in your own teaching/learning:

I’m going to be doing some hands-on exercises about social media, networks and big data in Matt Kirschenbaum‘s Digital Studies class this Spring – and I was really grateful for Miriam, Scott and Thomas’ willingness to share their experiences with me.

Anyhow, here’s the video! If you want to get to the good stuff skip to 8:40 where I stop selfishly talking about the classes were teaching at MITH.


PS. this post was brought to you by the letter B since (as you will see) Thomas thinks that blogs are sooooo late 2000s :-) I suspect he is right, but I’m clearly still tightly clutching onto my vast media empire.

Reader Privacy for Research Journals is Getting Worse / Eric Hellman

Ever hear of Grapeshot, Eloqua, Moat, Hubspot, Krux, or Sizmek? Probably not. Maybe you've heard of Doubleclick, AppNexus, Adsense or Addthis? Certainly you've heard of Google, which owns Doubleclick and Adsense. If you read scientific journal articles on publisher websites, these companies that you've never heard of will track and log your reading habits and try to figure out how to get you to click on ads, not just at the publisher websites but also at websites like Breitbart.com and the Huffington Post.

Two years ago I surveyed the websites of 20 of the top research journals and found that 16 of the top 20 journals placed trackers from ad networks on their web sites. Only the journals from the American Physical Society (2 of the 20) supported secure (HTTPS) connections, and even now APS does not default to being secure.

I'm working on an article about advertising in online library content, so I decided to revisit the 20 journals to see if there had been any improvement. Over half the traffic on the internet now uses secure connections, so I expected to see some movement. One of the 20 journals, Quarterly Journal of Economics, now defaults to a secure connection, significantly improving privacy for its readers. Let's have a big round of applause for Oxford University Press! Yay.

So that's the good news. The bad news is that reader privacy at most of the journals I looked at got worse. Science, which could be loaded securely 2 years ago, has reverted to insecure connections. The two Annual Reviews journals I looked at, which were among the few that did not expose users to advertising network tracking, now have trackers for AddThis and Doubleclick. The New England Journal of Medicine, which deployed the most intense reader tracking of the 20, is now even more intense, with 19 trackers on a web page that had "only" 14 trackers two years ago. A page from Elsevier's Cell went from 9 to 16 trackers.

Despite the backwardness of most journal websites, there are a few signs of hope. Some of the big journal platforms have begun to implement HTTPS. Springer Link defaults to HTTPS, and Elsevier's Science Direct is delivering some of its content with secure connections. Both of them place trackers for advertising networks, so if you want to read a journal article securely and privately, your best bet is still to use Tor.

Threats to stored data / David Rosenthal

Recently there's been a lively series of exchanges on the pasig-discuss mail list, sparked by an inquiry from Jeanne Kramer-Smyth of the World Bank as to any additional risks posed by media such as disks that did encryption or compression. It morphed into discussion of the "how many copies" question and related issues. Below the fold, my reflections on the discussion.

The initial question was pretty clearly based on a misunderstanding of the way self-encrypting disk drives (SED) and hardware compression in tape drives work. Quoting the Wikipedia article Hardware-based full disk encryption:
The drive except for bootup authentication operates just like any drive with no degradation in performance.
The encrypted data is never visible outside the drive, and the same is true for the compressed data on tape. So as far as systems using them are concerned, whether the drive encrypts or not is irrelevant. Unlike disk, tape capacities are quoted assuming compression is enabled. If your data is already compressed, you likely get no benefit from the drive's compression.

SED have one additional failure mode over regular drives; they support a crypto erase command which renders the data inaccessible. The effect as far as the data is concerned is the same as a major head crash. Archival systems that fail if a head crashes are useless, so they must be designed to survive total loss of the data on a drive. There is thus no reason not to use self-encrypting drives, and many reasons why one might want to.

But note that their use does not mean there is no reason for the system to encrypt the data sent to the drive. Depending on your threat model, encrypting data at rest may be a good idea. Depending on the media to do it for you, and thus not knowing whether or how it is being done, may not be an adequate threat mitigation.

Then the discussion broadened but, as usual, it was confusing because it was about protecting data from loss, but not based on explicit statements about what the threats to the data were, other than bit-rot.

There was some discussion of the "how many copies do we need to be safe?" question. Several people pointed to research that constructed models to answer this question. I responded:
Models claiming to estimate loss probability from replication factor, whether true replication or erasure coding, are wildly optimistic and should be treated with great suspicion. There are three reasons:
  • The models are built on models of underlying failures. The data on which these failure models are typically based are (a) based on manufacturers' reliability claims, and (b) ignore failures upstream of the media. Much research shows that actual failures in the field are (a) vastly more likely than manufacturers' claims, and (b) more likely to be caused by system components other than the media.
  • The models almost always assume that the failures are un-correlated, because modeling correlated failures is much more difficult, and requires much more data than un-correlated failures. In practice it has been known for decades that failures in storage systems are significantly correlated. Correlations among failures greatly raise the probability of data loss.
  • The models ignore almost all the important threats, since they are hard to quantify and highly correlated. Examples include operator error, internal or external attack, and natural disaster.
For replicated systems, three replicas is the absolute minimum IF your threat model excludes all external or internal attacks. Otherwise four (see Byzantine Fault Tolerance).

For (k of n) erasure coded systems the absolute minimum is three sites arranged so that k shards can be obtained from any two sites. This is because shards in a single site are subject to correlated failures (e.g. earthquake).
This is a question I've blogged about in 2016 and 2011 and 2010, when I concluded:
  • The number of copies needed cannot be discussed except in the context of a specific threat model.
  • The important threats are not amenable to quantitative modeling.
  • Defense against the important threats requires many more copies than against the simple threats, to allow for the "anonymity of crowds".
In the discussion Matthew Addis of Arkivum made some excellent points, and pointed to two interesting reports:
  • A report from the PrestoPrime project. He wrote:
    There’s some examples of the effects that bit-flips and other data corruptions have on compressed AV content in a report from the PrestoPRIME project. There’s some links in there to work by Heydegger and others, e.g. impact of bit errors on JPEG2000. The report mainly covers AV, but there are some references in there about other compressed file formats, e.g. work by CERN on problems opening zips after bit-errors. See page 57 onwards.
  • A report from the EU's DAVID project. He wrote:
    This was followed up by work in the DAVID project that did a more extensive survey of how AV content gets corrupted in practice within big AV archives. Note that bit-errors from storage, a.k.a bit rot was not a significant issue, well not compared with all the other problems!
Matthew wrote the 2010 PrestoPrime report, building on among others Heydegger's 2008 and 2009 work on the effects of flipping bits in compressed files (Both links are paywalled but the 2008 paper is available via the Wayback Machine). The 2013 DAVID report concluded:
It was acknowledged that some rare cases or corruptions might have been explained by the occurrence of bit rot, but the importance and the risk of this phenomenon was at the present time much lower than any other possible causes of content losses.
On the other hand, they were clear that:
Human errors are a major cause of concern. It can be argued that most of the other categories may also be caused by human errors (e.g. poor code, incomplete checking...), but we will concentrate here on direct human errors. In any complex system, operators have to be in charge. They have to perform essential tasks, maintaining the system in operation, checking that resources are sufficient to face unexpected conditions, and recovering the problems that can arise. However vigilant an operator is, he will always make errors, usually without consequence, but sometimes for the worst. The list is virtually endless, but one can cite:
  • Removing more files than wanted
  • Removing files in the wrong folder
  • Pulling out from a RAID a working disk instead of the faulty one
  • Copying and editing a configuration file, not changing all the necessary parameters
  • Editing a configuration file into a bad one, having no backup
  • Corrupting a database
  • Dropping a data tape / a hard disk drive
  • Introducing an adjustment with unexpected consequences
  • Replacing a correct file or setup from a wrong backup.
Such errors have the potential for affecting durably the performances of a system, and are not always reversible. In addition, the risk of error is increased by the stress introduced by urgency, e.g. when trying to make some room on in storage facilities approaching saturation, or introducing further errors when trying to recover using backup copies.
We agree, and have been saying so since at least 2005. And the evidence keeps rolling in. For example, on January 31st Gitlab.com suffered a major data loss. Simon Sharwood at The Register wrote:
Source-code hub GitLab.com is in meltdown after experiencing data loss as a result of what it has suddenly discovered are ineffectual backups. ... Behind the scenes, a tired sysadmin, working late at night in the Netherlands, had accidentally deleted a directory on the wrong server during a frustrating database replication process: he wiped a folder containing 300GB of live production data that was due to be replicated.
Commendably, Gitlab made a Google Doc public with a lot of detail about the problem and their efforts to mitigate it:
  1. LVM snapshots are by default only taken once every 24 hours. YP happened to run one manually about 6 hours prior to the outage
  2. Regular backups seem to also only be taken once per 24 hours, though YP has not yet been able to figure out where they are stored. According to JN these don’t appear to be working, producing files only a few bytes in size.
    1. SH: It looks like pg_dump may be failing because PostgreSQL 9.2 binaries are being run instead of 9.6 binaries. This happens because omnibus only uses Pg 9.6 if data/PG_VERSION is set to 9.6, but on workers this file does not exist. As a result it defaults to 9.2, failing silently. No SQL dumps were made as a result. Fog gem may have cleaned out older backups.
  3. Disk snapshots in Azure are enabled for the NFS server, but not for the DB servers.
  4. The synchronisation process removes webhooks once it has synchronised data to staging. Unless we can pull these from a regular backup from the past 24 hours they will be lost
  5. The replication procedure is super fragile, prone to error, relies on a handful of random shell scripts, and is badly documented
    1. SH: We learned later the staging DB refresh works by taking a snapshot of the gitlab_replicator directory, prunes the replication configuration, and starts up a separate PostgreSQL server.
  6. Our backups to S3 apparently don’t work either: the bucket is empty
  7. We don’t have solid alerting/paging for when backups fails, we are seeing this in the dev host too now.
So in other words, out of 5 backup/replication techniques deployed none are working reliably or set up in the first place. => we're now restoring a backup from 6 hours ago that worked
The operator error revealed the kind of confusion and gradual decay of infrastructure processes that is common when procedures are used only to recover from failures, not as a routine. Backups that are not routinely restored are unlikely to work when you need them. The take-away is that any time you reach for the backups, you're likely already in big enough trouble that your backups can't fix it. I was taught this lesson in the 70s. The early Unix dump command failed to check the return value from the write() call. If you forgot to write-enable the tape by inserting the write ring the dump would appear to succeed, the tape would look like it was spinning, but no data would be written to the backup tape.

Fault injection should be, but rarely is, practiced at all levels of the system. The results of not doing so are shown by UW Madison's work injecting faults into file systems and distributed storage. My blog posts on this topic include Injecting Faults in Distributed Storage, More bad news on storage reliability, and Forcing Frequent Failures.

Update: much as I love Kyoto, as a retiree I can't afford to attend iPRES2017. Apparently, there's a panel being proposed on the "bare minimum" for digital preservation. If I were on this panel I'd be saying something like the following.

We know the shape of the graph of loss probability against cost - it starts at one at zero cost and is an S-curve that gets to zero at infinite cost. Unfortunately, because the major threats to stored data are not amenable to quantitative modeling (see above), and technologies differ in their cost-effectiveness, we cannot actually plot the graph. So there are no hard-and fast answers.

The real debate here is how to distinguish between "digital storage" and "digital preservation". We do have a hard-and-fast answer for this. There are three levels of certification; the Data Seal of Approval (DSA), NESTOR's DIN31644, and TRAC/ISO16363. If you can't even pass DSA then what you're doing can't be called digital preservation.

Especially in the current difficult funding situation, it is important NOT to give the impression that we can "preserve" digital information with ever-decreasing resources, because then what we will get is ever-decreasing resources. Because there will always be someone willing to claim that they can do the job cheaper. Their short-cuts won't be exposed until its too late. That's why certification is important.

We need to be able to say "I'm sorry, but preserving this stuff costs this much. Less money, no preservation, just storage.".

How to check if your library is leaking catalog searches to Amazon / Eric Hellman

I've been writing about privacy in libraries for a while now, and I get a bit down sometimes because progress is so slow. I've come to realize that part of the problem is that the issues are sometimes really complex and  technical; people just don't believe that the web works the way it does, violating user privacy at every opportunity.

Content embedded in websites is a a huge source of privacy leakage in library services. Cover images can be particularly problematic. I've written before that, without meaning to, many libraries send data to Amazon about the books a user is searching for; cover images are almost always the culprit. I've been reporting this issue to the library automation companies that enable this, but a year and a half later, nothing has changed. (I understand that "discovery" services such as Primo/Summon even include config checkboxes that make this easy to do; the companies say this is what their customers want.)

Two indications that a third-party cover image is a privacy problem are:
  1. the provider sets tracking cookies on the hostname serving the content.
  2. the provider collects personal information, for example as part of commerce. 
For example, covers served by Amazon send a bonanza of actionable intelligence to Amazon.

Here's how to tell if your library is sending Amazon your library search data.

Setup

You'll need a web browser equipped with developer tools; I use Chrome. Firefox should work, too.

Log into Amazon.com. They will give you a tracking cookie that identifies you. If you buy something, they'll have your credit card number, your physical and electronic addresses, records about the stuff you buy, and a big chunk of your web browsing history on websites that offer affiliate linking. These cookies are used to optimize the advertisements you're shown around the web.

To see your Amazon cookies, go to Preferences > Settings. Click "Show advanced setting..." (It's hiding at the bottom.)

Click the  "Content settings.." button.

Now click the "All cookies and site data" button.

in the "Search cookies" box, type "amazon". Chances are, you'll see something like this.

I've got 65 cookies for "amazon.com"!

If you remove all the cookies and then go back to Amazon, you'll get 15 fresh cookies, most of them set to last for 20 years. Amazon knows who I am even if a delete all the cookies except "x-main".

Test the Library

Now it's time to find a library search box. For demonstration purposes, I'll use Harvard's "Hollis" catalog. I would get similar results at 36 different ARL libraries, but Harvard has lots of books and returns plenty of results. In the past, I've used What to expect as my search string, but just to make a point, I'll use Killing Trump, a book that Bill O'Reilly hasn't written yet.

Once you've executed your search, choose View > Developer > Developer Tools

Click on the "Sources" tab and to see the requests made of "images.amazon.com". Amazon has returned 1x1 clear pixels for three requested covers. The covers are requested by ISBN. But that's not all the information contained in the cover request.

To see the cover request, click on the "Network" tab and hit reload. You can see that the cover images were requested by a javascript called "primo_library_web" (Hollis is an instance of Ex Libris' Primo discovery service.)

Now click on the request you're interested in. Look at the request headers.


There are two of interest, the "Cookie" and the "Referer".

The "Cookie" sent to Amazon is this:
x-main="oO@WgrX2LoaTFJeRfVIWNu1Hx?a1Mt0s";
skin=noskin; session-token="bcgYhb7dksVolyQIRy4abz1kCvlXoYGNUM5gZe9z4pV75B53o/4Bs6cv1Plr4INdSFTkEPBV1pm74vGkGGd0HHLb9cMvu9bp3qekVLaboQtTr+gtC90lOFvJwXDM4Fpqi6bEbmv3lCqYC5FDhDKZQp1v8DlYr8ZdJJBP5lwEu2a+OSXbJhfVFnb3860I1i3DWntYyU1ip0s="; x-wl-uid=1OgIBsslBlOoArUsYcVdZ0IESKFUYR0iZ3fLcjTXQ1PyTMaFdjy6gB9uaILvMGaN9I+mRtJmbSFwNKfMRJWX7jg==; ubid-main=156-1472903-4100903;
session-id-time=2082787201l;
session-id=161-0692439-8899146
Note that Amazon can tell who I am from the x-main cookie alone. In the privacy biz, this is known as "PII" or personally identifiable information.

The "Referer" sent to Amazon is this:
http://hollis.harvard.edu/primo_library/libweb/action/search.do?fn=search&ct=search&initialSearch=true&mode=Basic&tab=everything&indx=1&dum=true&srt=rank&vid=HVD&frbg=&tb=t&vl%28freeText0%29=killing+trump&scp.scps=scope%3A%28HVD_FGDC%29%2Cscope%3A%28HVD%29%2Cscope%3A%28HVD_VIA%29%2Cprimo_central_multiple_fe&vl%28394521272UI1%29=all_items&vl%281UI0%29=contains&vl%2851615747UI0%29=any&vl%2851615747UI0%29=title&vl%2851615747UI0%29=any
To put this plainly, my entire search session, including my search string killing trump is sent to Amazon, alongside my personal information, whether I like it or not. I don't know what Amazon does with this information. I assume if a government actor wants my search history, they will get it from Amazon without much fuss.

I don't like it.

Rant

[I wrote a rant; but I decided to save it for a future post if needed.] Anyone want a Cookie?

Notes 12/23/2016:


  1. As Keith Jenkins noted, users can configure Chrome and Safari to block 3rd Party cookies. Firefox won't block Amazon cookies, however. And some libraries advise users to not to block 3rd party cookies because doing so can cause problems with proxy authentication.
  2. If Chrome's network panel tells you "Provisional headers are shown" this means it doesn't know what request headers were really sent because another plugin is modifying headers. So if you have HTTPS Everywhere, Ghostery, Adblock, or Privacy Badger installed, you may not be able to use Chrome developer tools to see request headers. Thanks to Scott Carlson for the heads up.
  3. Cover images from Google leak similar data; as does use of Google Analytics. As do Facebook Like buttons. Et cetera.
  4. Thanks to Sarah Houghton for suggesting that I write this up.

Update 3/23/2017:

There's good news in the comments!

The Global Open Data Index – an update and the road ahead / Open Knowledge Foundation

The Global Open Data Index is a civil society collaborative effort to track the state of open government data around the world. The survey is designed to assess the openness of specific government datasets according to the Open Definition. Through this initiative, we want to provide a civil society audit of how governments actually publish data with input and review from citizens and organisations. This post describe our future timeline for the project. 

 

Here at Open Knowledge International, we see the Global Open Data Index (aka GODI) as a community effort. Without community contributions and feedback there is no index. This is why it is important for us to keep the community involved in the index as much as we can (see our active forum!). However, in the last couple of months, lots has been going on with GODI. In fact so much was happening that we neglected our duty to report back to our community. So based on your feedback, here is what is going on with GODI 2016:

 

New Project Management

Katelyn Rogers, who managed the project until January 2017, is now leading the School of Data program. I have stepped in to manage the Index until its launch this year. I am an old veteran to GODI, being its research and community lead for 2014 and 2015, so this is a natural fit for me and the project. This is done with my work as the International Community Coordinator and the Capacity team lead, but fear not, GODI is a priority!

 

This change in project management allowed us to take some time and modify the way we manage the project internally. We moved all of our current and past tasks: code content and research to the public Github account. You can see our progress on the project here- https://github.com/okfn/opendatasurvey/milestones

 

Project timeline

Now, after the handover is done, it is easier for us to decide on the road forward for GODI (in coordination with colleagues at the World Wide Web Foundation, which publishes the Open Data Barometer). We are happy to share with you the future timeline and approach of the Index:

  • Finalising review: In the last 6 weeks, we have been reviewing the different index categories of 94 places. Like last year, we took the thematic reviewer approach, in which each reviewer checked all the countries under one category. We finished the review by March 20th, and we are now running quality assurance for the reviewed submissions, mainly looking for false positives of datasets that have been defined as complying with the Open Definition.

 

  • Building the GODI site: This year we paid a lot of attention to the development of our methodology and changed the survey site to reflect it and allow easy customization (see Brook’s blog). We are now finalising the result site so it will have even better user experience than past years.
  • Launch! The critical piece of information that many of you wanted! We will launch the Index on May 2nd, 2017! And what a launch it is going to be!
    Last year we gave a 3 weeks period for government and civil society to review and suggest corrections for our assessment of the Index on the survey app, before publishing the permanent index results. This was not obvious to many, and we got many requests for corrections or clarifications after publishing the final GODI.
    This year, we will publish the index results, and data publishers and civil society will have the opportunity to contest the results publicly through our forum for 30 days. We will follow the discussions to decide if we should change some results or not. The GODI team believes that if we are aspiring to be a tool for not only measuring but also for learning open data publication, we need to allow civil society and government to engage around the results in the open. We already see the great engagement of some governments in the review process of GODI (See Mexico and Australia), and we would like to take this even one step further, making this a tool that can help and improve open data publication around the world.
  • Report: After fixing the Index result, we will publish a report on our learnings from GODI 2016. This is the first time that we will write a report on the Global Open Data Index findings, and we hope that this will help us not only in creating better GODI in the future but also to promote and publish better datasets.

 

Have any question? Want to know more about the upcoming GODI? Have ideas for improvements? Start a topic in the forum:  https://discuss.okfn.org/c/open-data-index/global-open-data-index-2016

 

Open data day 2017 in Uganda: Open contracting, a key to inclusive development / Open Knowledge Foundation

This blog is part of the event report series on International Open Data Day 2017. On Saturday 4 March, groups from around the world organised over 300 events to celebrate, promote and spread the use of open data. 44 events received additional support through the Open Knowledge International mini-grants scheme, funded by SPARC, the Open Contracting Program of Hivos, Article 19, Hewlett Foundation and the UK Foreign & Commonwealth Office. This event was supported through the mini-grants scheme under the Open contracting and tracking public money flows theme.

On Friday 3rd March 2017, the Anti-Corruption Coalition Uganda (ACCU) commemorated the International Open Data Day 2017 with a meetup of 37 people from Civil Society Organizations (CSOs), development partners, the private sector and the general public. The goal of this meetup was to inform Ugandan citizens, media and government agencies on the importance of open data in improving public service delivery.

Process  

The process started with an overview of open data since the concept seemed to be new to most participants. Ms. Joy Namunoga, Advocacy Officer at ACCU, highlighted the benefits of open data, including value for money for citizens and taxpayers, knowing governments transactions, holding leaders accountable, constructive criticism to influence policy, boosting transparency, reducing corruption and increasing social accountability.

With such a background, participants observed the fact that in Uganda, 19% of people have access to the internet. Hence the need to embrace media as a third party to interpret data and take the information closer to citizens. Participants noted that, while Uganda has an enabling policy framework for information sharing; the Access to Information Act and regulations require information to be paid for, namely $6, yet the majority of Ugandans live below $2 a day. The financial requirement denies a percentage of Ugandans their right to know. It was also noted that CSOs and government agencies equally do not avail all the information on their websites, which further underscores this fact.

Issues discussed

Open contracting

Mr. Agaba Marlon, Communications Manager ACCU took participants through the process of open contracting as highlighted below:

Figure 1: Open contracting process

He showcased ACCU’s Open Contracting platform commonly known as USER (Uganda System for Electronic open data Records), implemented in partnership with Kampala Capital City Authority (KCCA), a government agency, and funded by the United Nations Development Programme. This platform created a lively conversation amongst the participants, and the following issues were generated to strengthen open contracting in Uganda:

  • Popularizing open data and contracting in Uganda by all stakeholders.
  • Mapping people and agencies in the open contracting space in Uganda to draw lines on how to complement each other.
  • Lobbying and convincing government institutions to embrace the open contracting data standards.
  • Stakeholders like civil society should be open before making the government open up.
  • Simplification of Uganda’s procurement laws for easier understanding by citizens.
  • Faming and shaming of the best and worst contractors as well as advocating for penalties to those who fraud rules.
  • Initiating and strengthening of information portals i.e., both offline and online media.

Bringing new energy and people to the open data movement

Mr. Micheal Katagaya, an open data activist, chaired this session. Some suggestions were made that can bring new energy to the open data movement, such as renegotiate open data membership with the government, co-opting celebrities (especially musicians) to advocate for open data, simplifying data and packaging it in user-friendly formats and linking data to problem-solving principles. Also, thematic days like International women’s day, youth day or AIDS day could be used to spread a message on open data, and local languages could be used to localise the space for Ugandans to embrace open data. Finally, it was seen as important to understand audiences and package messages accordingly, and to identify open data champions and ambassadors.

Sharing open data with citizens who lack internet access

This session was chaired by Ms. Pheona Wamayi an independent media personality. Participants agreed that civil society and government agencies should strengthen community interfaces between government and citizens because these enable citizens to know of government operations. ACCU was encouraged to use her active membership in Uganda to penetrate the information flow and disseminate it to the citizens. Other suggestions included:

  • Weekly radio programs on open data and open contracting should be held. Programs should be well branded to suit the intended audiences.
  • Simplified advocacy materials should be produced for community members’ i.e., leaflets, posters to inform the citizens on open data. Community notice boards could be used to disseminate information on open data.
  • Civil society and government should liaise with telecom companies to provide citizens with the internet.
  • Edutainment through music and forum theatre should be targeted to reach citizens on open data.

Way forward

Ms. Ephrance Nakiyingi, Environmental Governance officer-ACCU took participants through the action planning process. The following points were suggested as key steps to pursue as stakeholders:

  • Consider offline strategies like SMS to share data with citizens
  • Design  massive open data campaigns to bring new energy to the movement
  • Develop a multi-media strategy based on consumer behaviour
  • Creating synergies between different open data initiatives
  • Embrace open data communication
  • Map out other actors in the open data fraternity
  • In-house efforts to share information/stakeholder openness

Twitter / pinboard

Have not read the full report but based on the abstract seems useful to those involved in the #code4lib incorporati…

MarcEdit and Alma Integration: Working with holdings data / Terry Reese

Ok Alma folks,

 I’ve been thinking about a way to integrate holdings editing into the Alma integration work with MarcEdit.  Alma handles holdings via MFHDs, but honestly, the process for getting to holdings data seems a little quirky to me.  Let me explain.  When working with bibliographic data, the workflow to extract records for edit and then update, looks like the following:

 Search/Edit

  1. Records are queried via Z39.50 or SRU
  2. Data can be extracted directly to MarcEdit for editing

 

Create/Update

  1. Data is saved, and then turned into MARCXML
  2. If the record has an ID, I have to query a specific API to retrieve specific data that will be part of the bib object
  3. Data is assembled in MARCXML, and then updated or created.

 

Essentially, an update or create takes 2 API calls.

For holdings, it’s a much different animal.

Search/Edit:

  1. Search via Z39.50/SRU
  2. Query the Bib API to retrieve the holdings link
  3. Query the holdings link api to retrieve a list of holding ids
  4. Query each holdings record API individually to retrieve a holdings object
  5. Convert the holdings object to MARCXML and then into a form editable in the MarcEditor
    1. As part of this process, I have to embed the bib_id and holdin_id into the record (I’m using a 999 field) so that I can do the update

 

For Update/Create

  1. Convert the data to MARCXML
  2. Extract the ids and reassemble the records
  3. Post via the update or create API

 

Extracting the data for edit is a real pain.  I’m not sure why so many calls are necessary to pull the data.

 Anyway – Let me give you an idea of the process I’m setting up.

First – you query the data:

Couple things to note – to pull holdings, you have to click on the download all holdings link, or right click on the item you want to download.  Or, select the items you want to download, and then select CTRL+H.

When you select the option, the program will prompt you to ask if you want it to create a new holdings record if one doesn’t exist. 

 

The program will then either download all the associated holdings records or create a new one.

Couple things I want you to notice about these records.  There is a 999 field added, and you’ll notice that I’ve created this in MarcEdit.  Here’s the problem…I need to retain the BIB number to attach the holdings record to (it’s not in the holdings object), and I need the holdings record number (again, not in the holdings object).  This is a required field in MarcEdit’s process.  I can tell if a holdings item is new or updated by the presence or lack of the $d. 

 

Anyway – this is the process that I’ve come up with…it seems to work.  I’ve got a lot of debugging code to remove because I was having some trouble with the Alma API responses and needed to see what was happening underneath.  Anyway, if you are an Alma user, I’d be curious if this process looks like it will work.  Anyway, as I say – I have some cleanup left to do before anyone can use this, but I think that I’m getting close.

 

–tr

Code for Ghana celebrates Open Data Day tracking public money flows / Open Knowledge Foundation

This blog is part of the event report series on International Open Data Day 2017. On Saturday 4 March, groups from around the world organised over 300 events to celebrate, promote and spread the use of open data. 44 events received additional support through the Open Knowledge International mini-grants scheme, funded by SPARC, the Open Contracting Program of Hivos, Article 19, Hewlett Foundation and the UK Foreign & Commonwealth Office. This event was supported through the mini-grants scheme under the Open contracting and tracking public money flows theme.

This year, Code for Ghana organised their Open Data Day event at Mobile Web Ghana. The theme for the event was “Open Contracting and tracking public money flows”. Open contracting involves analysing government contract data to have a better understanding of how government spend public funds. We had a lot of open contracting resources from Open-Contracting.org. This helped the entire team to understand the concept and its importance in increasing transparency and accountability in the governance of a country.

Florence Toffa, project coordinator of Code for Ghana, did an introductory presentation on Open Contracting. To about 98% of the attendees, open contracting was a new concept and this was the first time they tried their hands on datasets related to open contracting. Participants were introduced to the ‘what’, the benefits and ‘why’ open contracting should be embraced by everyone if we want to get rid of corruption in our society. Moreover, about 15 out of the 20 attendees were new to data scraping, data analysis and data visualisation.

Introduction to D3.JS

The participants were taken through a training session in D3.JS by David Lartey (software developer). D3.js is a JavaScript library for manipulating documents based on data. They were taught the basics of the language and how to make some interesting visualisations.

Data Scraping

Shadrack Boadu (software developer and data enthusiast) also taught data scraping. He introduced the participants to two ways of scraping data using Google sheets and tabular. He talked about the importance of cleaning the data and converting it into a useable format to facilitate accurate data analysis and representations.

Before breaking out into various groups, Code for Ghana provided datasets on Government budget (2015 – 2016), Developmental projects procurement and Ghana health service data. The next task was for the participants to combine their skills to come up with relevant insights and visualisations.

The Open Data Day Projects 

The first team (Washington) presented a visualisation (pie chart) on the procurement of the Ghana health Service for the year 2016. Their visualisation gave insights on the volumes of procurement of the Ghana health service. See visualisation:

The second team (Terrific) designed a series of visualisations. These visualisations included the state of developmental projects in Ghana and sources of developmental projects in Ghana. See images below:

 

Team Meck, the third team, developed a database [web platform] for all the government projects from 2002 to 2016. From the database, one could easily key in a few keywords and bring up a particular result. Unfortunately, the team was not able to complete the web platform on the day.

The fourth team, team Rex after cleaning their data, came up with a visualisation representing the overview of developmental projects. Their project focused on government project success, sources of government funding and project allocations that are done by consultants.

The final team, team Enock developed a web app that visualised government contracts. They focused on analysing procurement contracts from the Ghana health service.

After the presentations, the judges for the event Mr Nehemiah Attigah (Co-founder of Odekro), Mr Wisdom Donkor from National Information Technology Agency (NITA) gave their verdicts. The judges spoke about the importance of open data and the role it plays in the development of transparency and accountability in the Ghanaian society. They also emphasised the need for the participants to always present data in a way that paints an accurate picture and also visualising information that can be easily digested by society. The best three projects were awarded prizes.

Learnings

Our takeaway from the event is: one day is usually too short to develop a sustainable project. So some of the teams are still working on their projects. For some of the youths, it was an eyeopener. They never knew the importance of data and how it shapes the future of development in the country. To these youth, the event was a success because they gained valuable skills that they would build on.

Symfony Forms / Alf Eaton, Alf

At the end of several years working with Symfony, I’m taking a moment to appreciate its strongest points. In particular, allowing users to apply mutations to objects via HTML forms.

The output of a Symfony endpoint (controller action) is usually either a data view (displaying data as HTML, rendered with Twig), or a form view (displaying a form as HTML and receiving the submitted form).

Symfony isn’t particularly RESTful, though you can use collection + resource-style URLs if you like:

  • /articles/ - GET the collection of articles
  • /articles/_create - GET/POST a form to create a new article
  • /articles/{id}/ - GET the article
  • /articles/{id}/_edit - GET/POST a form to edit the article
  • /articles/{id}/_delete - GET/POST a form to delete the article

The entity, controller, form and voter for creating and editing an article look like something this:


// ArticleBundle/Entity/Article.php

class Article
{
    /**
     * @var int
     *
     * @ORM\Id
     * @ORM\GeneratedValue
     * @ORM\Column(type="integer")
     */
    private $id;

    /**
     * @var string
     *
     * @ORM\Column(type="string")
     * @Assert\NotBlank
     */
    private $title;

    /**
     * @var string
     *
     * @ORM\Column(type="text")
     * @Assert\NotBlank
     * @Assert\Length(min=100)
     */
    private $description;

    /**
     * @return int
     */
    public function getId()
    {
        return $this->id;
    }

    /**
     * @return string
     */
    public function getTitle()
    {
        return $this->title;
    }

    /**
     * @param string $title
     */
    public function setTitle($title)
    {
        $this->title = $title;
    }

    /**
     * @return string
     */
    public function getDescription()
    {
        return $this->description;
    }

    /**
     * @param string $description
     */
    public function setDescription($description)
    {
        $this->description = $description;
    }
}


// ArticleBundle/Controller/ArticleController.php

class ArticleController extends Controller
{
    /**
     * @Route("/articles/_create", name="create_article")
     * @Method({"GET", "POST"})
     *
     * @param Request $request
     *
     * @return Response
     */
    public function createArticleAction(Request $request)
    {
        $article = new Article();
        $this->denyAccessUnlessGranted(ArticleVoter::CREATE, $article);

        $article->setOwner($this->getUser());

        $form = $this->createForm(ArticleType::class, $article);
        $form->handleRequest($request);

        if ($form->isValid()) {
            $entityManager = $this->getDoctrine()->getManager();
            $entityManager->persist($article);
            $entityManager->flush();

            $this->addFlash('success', 'Article created');

            return $this->redirectToRoute('articles');
        }

        return $this->render('ArticleBundle/Article/create.html.twig', [
            'form' => $form->createView()
        ]);
    }

    /**
     * @Route("/articles/{id}/_edit", name="edit_article")
     * @Method({"GET", "POST"})
     *
     * @param Request $request
     * @param Article $article
     *
     * @return Response
     */
    public function editArticleAction(Request $request, Article article)
    {
        $this->denyAccessUnlessGranted(ArticleVoter::EDIT, $article);

        $form = $this->createForm(ArticleType::class, $article);
        $form->handleRequest($request);

        if ($form->isValid()) {
            $this->getDoctrine()->getManager()->flush();

            $this->addFlash('success', 'Article updated');

            return $this->redirectToRoute('articles', [
                'id' => $article->getId()
            ]);
        }

        return $this->render('ArticleBundle/Article/edit', [
            'form' => $form->createView()
        ]);
    }
}

// ArticleBundle/Form/ArticleType.php

class ArticleType extends AbstractType
{

    /**
     * {@inheritdoc}
     */
    public function buildForm(FormBuilderInterface $builder, array $options)
    {
        $builder->add('title');

        $builder->add('description', null, [
            'attr' => ['rows' => 10]
        ]);

        $builder->add('save', SubmitType::class, [
            'attr' => ['class' => 'btn btn-primary']
        ]);
    }

    /**
     * {@inheritdoc}
     */
    public function configureOptions(OptionsResolver $resolver)
    {
        $resolver->setDefaults([
            'data_class' => Article::class,
        ]);
    }
}

// ArticleBundle/Security/ArticleVoter.php

class ArticleVoter extends Voter
{
    const CREATE = 'CREATE';
    const EDIT = 'EDIT';

    public function vote($attribute, $article, TokenInterface $token)
    {
        $user = $token->getUser();

        if (!$user instanceof User) {
            return false;
        }

        switch ($attribute) {
            case self::CREATE:
                if ($this->decisionManager->decide($token, array('ROLE_AUTHOR'))) {
                    return true;
                }

                return false;

            case self::EDIT:
                if ($user === $article->getOwner()) {
                    return true;
                }

                return false;
        }
    }
}

// ArticleBundle/Resources/views/Article/create.html.twig

{{ form(form) }}

// ArticleBundle/Resources/views/Article/edit.html.twig

{{ form(form) }}

The combination of Symfony’s Form, Voter and ParamConverter allows you to define who (Voter) can update which properties (Form) of a resource, and when.

The Doctrine annotations allow you to define validations for each property, which are used in both client-side and server-side form validation.

Metadata Analysis at the Command-Line / LibreCat/Catmandu blog

I was last week at the ELAG  2016 conference in Copenhagen and attended the excellent workshop by Christina Harlow  of Cornell University on migrating digital collections metadata to RDF and Fedora4. One of the important steps required to migrate and model data to RDF is understanding what your data is about. Probably old systems need to be converted for which little or no documentation is available. Instead of manually processing large XML or MARC dumps, tools like metadata breakers can be used to find out which fields are available in the legacy system and how they are used. Mark Phillips of the University of North Texas wrote recently in Code4Lib a very inspiring article how this could be done in Python. In this blog post I’ll demonstrate how this can be done using a new Catmandu tool: Catmandu::Breaker.

To follow the examples below, you need to have a system with Catmandu installed. The Catmandu::Breaker tools can then be installed with the command:

$ sudo cpan Catmandu::Breaker

A breaker is a command that transforms data into a line format that can be easily processed with Unix command line tools such as grep, sort, uniq, cut and many more. If you need an introduction into Unix tools for data processing please follow the examples Johan Rolschewski of Berlin State Library and I presented as an ELAG bootcamp.

As a simple example lets create a YAML file and demonstrate how this file can be analysed using Catmandu::Breaker:

$ cat test.yaml
---
name: John
colors:
 - black
 - yellow
 - red
institution:
 name: Acme
  years:
   - 1949
   - 1950
   - 1951
   - 1952

This example has a combination of simple name/value pairs a list of colors and a deeply nested field. To transform this data into the breaker format execute the command:

$ catmandu convert YAML to Breaker < test.yaml
1 colors[]  black
1 colors[]  yellow
1 colors[]  red
1 institution.name  Acme
1 institution.years[] 1949
1 institution.years[] 1950
1 institution.years[] 1951
1 institution.years[] 1952
1 name  John

The breaker format is a tab-delimited output with three columns:

  1. An record identifier: read from the _id field in the input data, or a counter when no such field is present.
  2. A field name. Nested fields are seperated by dots (.) and list are indicated by the square brackets ([])
  3. A field value

When you have a very large JSON or YAML field and need to find all the values of a deeply nested field you could do something like:

$ catmandu convert YAML to Breaker < data.yaml | grep "institution.years"

Using Catmandu you can do this analysis on input formats such as JSON, YAML, XML, CSV, XLS (Excell). Just replace the YAML by any of these formats and run the breaker command. Catmandu can also connect to OAI-PMH, Z39.50 or databases such as MongoDB, ElasticSearch, Solr or even relational databases such as MySQL, Postgres and Oracle. For instance to get a breaker format for an OAI-PMH repository issue a command like:

$ catmandu convert OAI --url http://lib.ugent.be/oai to Breaker

If your data is in a database you could issue an SQL query like:

$ catmandu convert DBI --dsn 'dbi:Oracle' --query 'SELECT * from TABLE WHERE ...' --user 'user/password' to Breaker

Some formats, such as MARC, doesn’t provide a great breaker format. In Catmandu, MARC files are parsed into a list of list. Running a breaker on a MARC input you get this:

$ catmandu convert MARC to Breaker < t/camel.usmarc  | head
fol05731351     record[][]  LDR
fol05731351     record[][]  _
fol05731351     record[][]  00755cam  22002414a 4500
fol05731351     record[][]  001
fol05731351     record[][]  _
fol05731351     record[][]  fol05731351
fol05731351     record[][]  082
fol05731351     record[][]  0
fol05731351     record[][]  0
fol05731351     record[][]  a

The MARC fields are part of the data, not part of the field name. This can be fixed by adding a special ‘marc’ handler to the breaker command:

$ catmandu convert MARC to Breaker --handler marc < t/camel.usmarc  | head
fol05731351     LDR 00755cam  22002414a 4500
fol05731351     001 fol05731351
fol05731351     003 IMchF
fol05731351     005 20000613133448.0
fol05731351     008 000107s2000    nyua          001 0 eng
fol05731351     010a       00020737
fol05731351     020a    0471383147 (paper/cd-rom : alk. paper)
fol05731351     040a    DLC
fol05731351     040c    DLC
fol05731351     040d    DLC

Now all the MARC subfields are visible in the output.

You can use this format to find, for instance, all unique values in a MARC file. Lets try to find all unique 008 values:

$ catmandu convert MARC to Breaker --handler marc < camel.usmarc | grep "\t008" | cut -f 3 | sort -u
000107s2000 nyua 001 0 eng
000203s2000 mau 001 0 eng
000315s1999 njua 001 0 eng
000318s1999 cau b 001 0 eng
000318s1999 caua 001 0 eng
000518s2000 mau 001 0 eng
000612s2000 mau 000 0 eng
000612s2000 mau 100 0 eng
000614s2000 mau 000 0 eng
000630s2000 cau 001 0 eng
00801nam 22002778a 4500

Catmandu::Breaker doesn’t only break input data in a easy format for command line processing, it can also do a statistical analysis on the breaker output. First process some data into the breaker format and save the result in a file:

$ catmandu convert MARC to Breaker --handler marc < t/camel.usmarc > result.breaker

Now, use this file as input for the ‘catmandu breaker’ command:

$ catmandu breaker result.breaker
| name | count | zeros | zeros% | min | max | mean | median | mode   | variance | stdev | uniq | entropy |
|------|-------|-------|--------|-----|-----|------|--------|--------|----------|-------|------|---------|
| 001  | 10    | 0     | 0.0    | 1   | 1   | 1    | 1      | 1      | 0        | 0     | 10   | 3.3/3.3 |
| 003  | 10    | 0     | 0.0    | 1   | 1   | 1    | 1      | 1      | 0        | 0     | 1    | 0.0/3.3 |
| 005  | 10    | 0     | 0.0    | 1   | 1   | 1    | 1      | 1      | 0        | 0     | 10   | 3.3/3.3 |
| 008  | 10    | 0     | 0.0    | 1   | 1   | 1    | 1      | 1      | 0        | 0     | 10   | 3.3/3.3 |
| 010a | 10    | 0     | 0.0    | 1   | 1   | 1    | 1      | 1      | 0        | 0     | 10   | 3.3/3.3 |
| 020a | 9     | 1     | 10.0   | 0   | 1   | 0.9  | 1      | 1      | 0.09     | 0.3   | 9    | 3.3/3.3 |
| 040a | 10    | 0     | 0.0    | 1   | 1   | 1    | 1      | 1      | 0        | 0     | 1    | 0.0/3.3 |
| 040c | 10    | 0     | 0.0    | 1   | 1   | 1    | 1      | 1      | 0        | 0     | 1    | 0.0/3.3 |
| 040d | 5     | 5     | 50.0   | 0   | 1   | 0.5  | 0.5    | [0, 1] | 0.25     | 0.5   | 1    | 1.0/3.3 |
| 042a | 10    | 0     | 0.0    | 1   | 1   | 1    | 1      | 1      | 0        | 0     | 1    | 0.0/3.3 |
| 050a | 10    | 0     | 0.0    | 1   | 1   | 1    | 1      | 1      | 0        | 0     | 1    | 0.0/3.3 |
| 050b | 10    | 0     | 0.0    | 1   | 1   | 1    | 1      | 1      | 0        | 0     | 10   | 3.3/3.3 |
| 0822 | 10    | 0     | 0.0    | 1   | 1   | 1    | 1      | 1      | 0        | 0     | 1    | 0.0/3.3 |
| 082a | 10    | 0     | 0.0    | 1   | 1   | 1    | 1      | 1      | 0        | 0     | 3    | 0.9/3.3 |
| 100a | 9     | 1     | 10.0   | 0   | 1   | 0.9  | 1      | 1      | 0.09     | 0.3   | 8    | 3.1/3.3 |
| 100d | 1     | 9     | 90.0   | 0   | 1   | 0.1  | 0      | 0      | 0.09     | 0.3   | 1    | 0.5/3.3 |
| 100q | 1     | 9     | 90.0   | 0   | 1   | 0.1  | 0      | 0      | 0.09     | 0.3   | 1    | 0.5/3.3 |
| 111a | 1     | 9     | 90.0   | 0   | 1   | 0.1  | 0      | 0      | 0.09     | 0.3   | 1    | 0.5/3.3 |
| 111c | 1     | 9     | 90.0   | 0   | 1   | 0.1  | 0      | 0      | 0.09     | 0.3   | 1    | 0.5/3.3 |
| 111d | 1     | 9     | 90.0   | 0   | 1   | 0.1  | 0      | 0      | 0.09     | 0.3   | 1    | 0.5/3.3 |
| 245a | 10    | 0     | 0.0    | 1   | 1   | 1    | 1      | 1      | 0        | 0     | 9    | 3.1/3.3 |
| 245b | 3     | 7     | 70.0   | 0   | 1   | 0.3  | 0      | 0      | 0.21     | 0.46  | 3    | 1.4/3.3 |
| 245c | 9     | 1     | 10.0   | 0   | 1   | 0.9  | 1      | 1      | 0.09     | 0.3   | 8    | 3.1/3.3 |
| 250a | 3     | 7     | 70.0   | 0   | 1   | 0.3  | 0      | 0      | 0.21     | 0.46  | 3    | 1.4/3.3 |
| 260a | 10    | 0     | 0.0    | 1   | 1   | 1    | 1      | 1      | 0        | 0     | 6    | 2.3/3.3 |
| 260b | 10    | 0     | 0.0    | 1   | 1   | 1    | 1      | 1      | 0        | 0     | 5    | 2.0/3.3 |
| 260c | 10    | 0     | 0.0    | 1   | 1   | 1    | 1      | 1      | 0        | 0     | 2    | 0.9/3.3 |
| 263a | 6     | 4     | 40.0   | 0   | 1   | 0.6  | 1      | 1      | 0.24     | 0.49  | 4    | 2.0/3.3 |
| 300a | 10    | 0     | 0.0    | 1   | 1   | 1    | 1      | 1      | 0        | 0     | 5    | 1.8/3.3 |
| 300b | 3     | 7     | 70.0   | 0   | 1   | 0.3  | 0      | 0      | 0.21     | 0.46  | 1    | 0.9/3.3 |
| 300c | 4     | 6     | 60.0   | 0   | 1   | 0.4  | 0      | 0      | 0.24     | 0.49  | 4    | 1.8/3.3 |
| 300e | 1     | 9     | 90.0   | 0   | 1   | 0.1  | 0      | 0      | 0.09     | 0.3   | 1    | 0.5/3.3 |
| 500a | 2     | 8     | 80.0   | 0   | 1   | 0.2  | 0      | 0      | 0.16     | 0.4   | 2    | 0.9/3.3 |
| 504a | 1     | 9     | 90.0   | 0   | 1   | 0.1  | 0      | 0      | 0.09     | 0.3   | 1    | 0.5/3.3 |
| 630a | 2     | 9     | 90.0   | 0   | 2   | 0.2  | 0      | 0      | 0.36     | 0.6   | 2    | 0.9/3.5 |
| 650a | 15    | 0     | 0.0    | 1   | 3   | 1.5  | 1      | 1      | 0.65     | 0.81  | 6    | 1.7/3.9 |
| 650v | 1     | 9     | 90.0   | 0   | 1   | 0.1  | 0      | 0      | 0.09     | 0.3   | 1    | 0.5/3.3 |
| 700a | 5     | 7     | 70.0   | 0   | 2   | 0.5  | 0      | 0      | 0.65     | 0.81  | 5    | 1.9/3.6 |
| LDR  | 10    | 0     | 0.0    | 1   | 1   | 1    | 1      | 1      | 0        | 0     | 10   | 3.3/3.3

As a result you get a table listing the usage of subfields in all the input records. From this output we can learn:

  • The ‘001’ field is available in 10 records (see: count)
  • One record doesn’t contain a ‘020a’ subfield (see: zeros)
  • The ‘650a’ is available in all records at least once at most 3 times (see: min, max)
  • Only 8 out of 10 ‘100a’ subfields have unique values (see: uniq)
  • The last column ‘entropy’ provides a number how interesting the field is for search engines. The higher the entropy, the more uniq content can be found.

I hope this tools are of some use in your projects!


Evergreen 2.12.0 is released / Evergreen ILS

The Evergreen community is pleased to announce the release of Evergreen 2.12.  The release is available on the Evergreen downloads page.

With this release, we strongly encourage the community to start using the new web client on a trial basis in production. All current Evergreen functionality is available in the web client with the exception of serials and offline circulation. The web client is scheduled to be available for full production use with the September 3.0 release.

Other notable new features and enhancements for 2.12 include:

  • Overdrive and OneClickdigital integration. When configured, patrons will be able to see ebook availability in search results and on the record summary page. They will also see ebook checkouts and holds in My Account.
  • Improvements to metarecords that include:
    • improvements to the bibliographic fingerprint to prevent the system from grouping different parts of a work together and to better distinguish between the title and author in the fingerprint;
    • the ability to limit the “Group Formats & Editions” search by format or other limiters;
    • improvements to the retrieval of e-resources in a “Group Formats & Editions” search;
    • and the ability to jump to other formats and editions of a work directly from the record summary page.
  • The removal of advanced search limiters from the basic search box, with a new widget added to the results page where users can see and remove those limiters.
  • A change to topic, geographic and temporal subject browse indexes that will display the entire heading as a unit rather than displaying individual subject terms separately.
  • Support for right-to-left languages, such as Arabic, in the public catalog. Arabic has also become a new officially-supported language in Evergreen.
  • A new hold targeting service supporting new targeting options and runtime optimizations to speed up targeting.
  • In the web staff client, the ability to apply merge profiles in the record bucket merge and Z39.50 interfaces.
  • The ability to display copy alerts when recording in-house use.
  • The ability to ignore punctuation, such as hyphens and apostrophes, when performing patron searches.
  • Support for recognition of client time zones,  particularly useful for consortia spanning time zones.

Evergreen 2.12 also requires PostgreSQL 9.3, with a recommendation that sites upgrade to PostgreSQL 9.4. It also requires the 2.5 release of OpenSRF. The full feature set for this release is available in the 2.12 Release Notes.

As with all Evergreen releases, many hands contributed to a successful release process. The release is a result of code, documentation, and translation contributions from 46 people representing 23 organizations in the community, along with financial contributions from nine Evergreen sites that commissioned development. Many thanks to everyone who helped make this release happen.

Use capistrano to run a remote rake task, with maintenance mode / Jonathan Rochkind

So the app I am now working on is still in it’s early stages, not even live to the public yet, but we’ve got an internal server. We periodically have to change a bunch of data in our (non-rdbms) “production” store. (First devops unhappiness, I think there should be no scheduled downtime for planned data transformation. We’re working on it. But for now it happens).

We use capistrano to deploy. Previously/currently, the process for making these scheduled-downtime maintenance mode looked like:

  • on your workstation, do a cap production maintenance:enable to start some downtime
  • ssh into the production machine, cd to the cap-installed app, and run a bundle exec run a rake task. Which could take an hour+.
  • Remember to come back when it’s done and `cap production maintenance:disable`.

A couple more devops unhappiness points here: 1) In my opinion you should ideally never be ssh’ing to production, at least in a non-emergency situation.  2) You have to remember to come back and turn off maintenance mode — and if I start the task at 5pm to avoid disrupting internal stakeholders, I gotta come back after busines hours to do that! I also think every thing you have to do ‘outside business hours’ that’s not an emergency is a not yet done ops environment.

So I decided to try to fix this. Since the existing maintenance mode stuff was already done through capistrano, and I wanted to do it without a manual ssh to the production machine, capistrano seemed a reasonable tool. I found a plugin to execute rake via capistrano, but it didn’t do quite what I wanted, and it’s implementation was so simple that I saw no reason not to copy-and-paste it and just make it do just what I wanted.

I’m not gonna maintain this for the public at this point (make a gem/plugin out of it, nope), but I’ll give it to you in a gist if you want to use it. One of the tricky parts was figuring out how to get “streamed” output from cap, since my rake tasks use ruby-progressbar — it’s got decent non-TTY output already, and I wanted to see it live in my workstation console. I managed to do that! Although I never figured out how to get a cap recipe to require files from another location (I have no idea how I couldn’t make it work), so the custom class is ugly inlined in.

I also ordinarily want maintenance mode to be turned off even if the task fails, but still want a non-zero exit code in those cases (anticipating future further automation — really what I need is to be able to execute this all via cron/at too, so we can schedule downtime for the middle of the night without having to be up then).

Anyway here’s the gist of the cap recipe. This file goes in ./lib/capistrano/tasks in a local app, and now you’ve got these recipes. Any tips on how to organize my cap recipe better quite welcome.


Filed under: General

After calling Congress, write a letter to the editor / District Dispatch

The single most impactful action you can take to save funding for libraries right now is to contact your member of Congress directly. Once you’ve done that, there is another action you can take to significantly amplify your voice and urge public support for libraries: writing a letter to the editor of your local newspaper.

excerpt from http://www.pennlive.com/opinion/index.ssf/2013/06/getting_in_touch_with_pennlives_opinion_staff.html

Each newspaper has its own guidelines for submitting letters to the editor. Source: pennlive.com/opinion

If you’ve never done it, don’t let myths get in the way of your advocacy:

Myth 1: My local newspaper is really small, so I don’t want to waste my time. It’s true that the larger the news outlet, the more exposure your letter gets. But it’s also true that U.S. representatives care about the opinions expressed in their own congressional district, where their voters live. For example, if you live in the 15th district of Pennsylvania, your U.S. representative cares more about the Harrisburg Patriot-News and even smaller local newspapers than he does about the Philadelphia Inquirer.

Myth 2: I have to be a state librarian to get my letter printed in the newspaper. Newspaper editorial boards value input from any readers who have specific stories to share about how policies affect real people on a daily basis. Sure, if you’re submitting a letter to the New York Times, having a title increases your chances of getting published. The larger the news outlet, the more competitive it is to get published. But don’t let your title determine the value of your voice. Furthermore, you can encourage your library patrons to write letters to the editor. Imagine the power of a letter written by a veteran in Bakersfield, CA, who received help accessing benefits through the state’s veteransconnect@thelibrary initiative – especially when their U.S. representative is on the Veterans Affairs subcommittee of the House Appropriations Committee.

Myth 3: I don’t have anything special to say in a letter. You don’t need to write a masterpiece, but you need to be authentic. Letters in response to material recently published (within a couple days) stand a better chance of getting printed. How did you feel about a story you read about, for example, the elimination of library programs in the Trump budget? Was there a missing element of the story that needs to be addressed? What new information (statistics) or unique perspective (anecdotes) can you add to what was printed? Is there a library angle that will be particularly convincing to one of your members of Congress (say, their personal interest in small business development)? Most importantly, add a call to action. For example, “We need the full support of Senators {NAME and NAME} and Representative {NAME} to preserve full federal funding for libraries so they can continue to…” Be sure to check our Legislative Action Center for current language you can use.

Ready to write? Here are a few practical tips about how to do it:

Tip 1: Keep it short – in general, maximum 200 words. Every news outlet has its own guidelines for submitting letters to the editor, which are normally published on their website. Some allow longer letters, others shorter. In any case, the more concise and to-the-point, the better.

Tip 2: When you email your letter, paste it into the body of the text and be sure to include your name, title, address and phone number so that you can be contacted if the editor wants to verify that you are the author. Do not send an attachment.

Tip 3: If your letter gets published, send a copy to your representative and senators to reinforce your message (emailing a hyperlink is best). Also, send a copy to the Washington Office (imanager@alawash.org); we can often use the evidence of media attention when we make visits on Capitol Hill.

Finally, get others involved. Recruit patrons, business leaders and other people in your community to write letters to the editor (after they have called their members of Congress, of course!). Editors won’t publish every single letter they get, but the more letters they receive on a specific topic, the more they realize that it is an issue that readers care deeply about – and that can inspire editors to further explore the impact of libraries for themselves.

The post After calling Congress, write a letter to the editor appeared first on District Dispatch.

Truncating a field by a # of words in MarcEdit / Terry Reese

This question came up on the listserv, and I thought that it might be generically useful that other folks might find it interesting.  Here’s the question:

I’d like to limit the length of the 520 summary fields to a maximum of 100 words and adding the punctuation “…” at the end. Anyone have a good process/regex for doing this?
Example:
=520  \\$aNew York Times Bestseller Award-winning and New York Times bestselling author Laura Lippman&#x2019;s Tess Monaghan&#x2014;first introduced in the classic Baltimore Blues&#x2014;must protect an up-and-coming Hollywood actress, but when murder strikes on a TV set, the unflappable PI discovers everyone&#x2019;s got a secret. {esc}(S2{esc}(B[A] welcome addition to Tess Monaghan&#x2019;s adventures and an insightful look at the desperation that drives those grasping for a shot at fame and those who will do anything to keep it.{esc}(S3{esc}(B&#x2014;San Francisco Chronicle When private investigator Tess Monaghan literally runs into the crew of the fledgling TV series Mann of Steel while sculling, she expects sharp words and evil looks, not an assignment. But the company has been plagued by a series of disturbing incidents since its arrival on location in Baltimore: bad press, union threats, and small, costly on-set “accidents” that have wreaked havoc with its shooting schedule. As a result, Mann’s creator, Flip Tumulty, the son of a Hollywood legend, is worried for the safety of his young female lead, Selene Waites, and asks Tess to serve as her bodyguard. Tumulty’s concern may be well founded. Recently, a Baltimore man was discovered dead in his home, surrounded by photos of the beautiful&#x2014;if difficult&#x2014;aspiring star. In the past, Tess has had enough trouble guarding her own body. Keeping a spoiled movie princess under wraps may be more than she can handle since Selene is not as naive as everyone seems to think, and instead is quite devious. Once Tess gets a taste of this world of make-believe&#x2014;with their vanities, their self-serving agendas, and their remarkably skewed visions of reality&#x2014;she&#x2019;s just about ready to throw in the towel. But she&#x2019;s pulled back in when a grisly on-set murder occurs, threatening to topple the wall of secrets surrounding Mann of Steel as lives, dreams, and careers are scattered among the ruins.
So, there isn’t really a true expression that can break on number of words, in part, because how we define word boundaries will vary between different languages.  Likewise, the MARC formatting can cause a challenge.  So, the best approach is to look for good enough – and in this case, good enough is likely breaking on spaces.  My suggestion is to look for 100 spaces, and then truncate.
In MarcEdit, this is easiest to do using the Replace function.  The expression would look like the following:
Find: (=520.{4})(\$a)(?<words>([^ ]*\s){100})(.*)
Replace: $1$2${words}…
Check the use regular expressions option. (image below).
So why does this work.  Let’s break it down.
Find:
(=520.{4}) – this matches the field number, the two spaces related to the mnemonic format, and then the two indicator values.
(\$a) – this matches on the subfield a
(?<words>([^ ]*\s){100}) – this is where the magic happens.  You’ll notice two things about this.   First, I use a nested expression, and second, I name one.  Why do I do that?  Well, the reason is because the group numbering gets wonky once you start nesting expressions.  In those cases, it’s easier to name them.  So, in this case, I’ve named the group that I want to retrieve, and then have created a subgroup that matches on characters that aren’t a space, and then a space.  I then use the qualifier {100}, which means, must match at least 100 times.
(.*) — match the rest of the field.
Now when we do the replace, putting the field back together is really easy.  We know we want to reprint the field number, the subfield code, and then the group that captured the 100 units.  Since we named the 100 units, we call that directly by name.  Hence,
Replace:
$1 — prints out =520  \\
$2 — $a
${words} — prints 100 words
… — the literals
And that’s it.  Pretty easy if you know what you are looking for.
–tr

Jobs in Information Technology: March 22, 2017 / LITA

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week

Columbus Metropolitan Library, IT Enterprise Applications Manager, Columbus, OH

Auburn University Libraries, Research Data Management Librarian, Auburn, AL

Valencia College, Emerging Technology Librarian, Orlando, FL

Western Carolina University/Hunter Library, Web Development Librarian, Cullowhee, NC

Computercraft Corporation, Online Content Specialist, McLean, VA

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

Peer reviewers needed! / Access Conference

Access 2017 is seeking peer reviewers to help select presentations for the conference. Peer reviewers will be responsible for reading Access Conference session proposals and reviewing them for selection.

Peer reviewers will be selected by the Program Committee and will ideally include a variety of individuals across disciplines including Access old-timers and newbies alike. You do not need to be attending the conference to volunteer as a peer reviewer.

The peer review process is double-blind. Those who have submitted Access Conference session proposals are welcome and encouraged to become peer reviewers as well. You won’t receive your own proposal to review. Please note that the peer reviewing activity will take place between April 6 and April 25.

To be considered as a peer reviewer, please attach your abridged CV (max. 5 pages) and provide the following information in the body of an email:

  • Name
  • Position and affiliation
  • A few sentences that explain why you want to be a peer reviewer for Access 2017

Please submit this information to accesslibcon@gmail.com by April 5, 2017.

House library champions release FY18 “Dear Appropriator” letters / District Dispatch

Your limited-time-only chance to ask for your House Member’s backing for LSTA and IAL begins now.

Where does your Representative stand on supporting FY 2018 library funding? Against the backdrop of the President’s proposal last week to eliminate the Institute for Museum and Library Services and virtually all other library funding sources, their answer this year is more important than ever before.

Every Spring, library champions in Congress ask every Member of the House to sign two, separate “Dear Appropriator” letters directed to the Appropriations Committee: one urging full funding for LSTA (which benefits every kind of library), and the second asking the same for the Innovative Approaches to Literacy program. This year, the LSTA support letter is being led by Rep. Raul Grijalva (D-AZ3). The IAL support letter is being jointly led by Reps. Eddie Bernice Johnson (D-TX30), Don Young (R-AK), and Jim McGovern (D-MA2).Several piles of gold coins symbolizing government funding.

The first “Dear Appropriator” letter asks the Committee to fully fund LSTA in FY 2018 and the second does the same for IAL. When large numbers of Members of Congress sign these letters, it sends a strong signal to the House Appropriations Committee to reject requests to eliminate IMLS, and to continue funding for LSTA and IAL at least at current levels.

Members of the House have only until April 3 to let our champions know that they will sign the separate LSTA and IAL “Dear Appropriator” letters now circulating, so there’s no time to lose. Use ALA’s Legislative Action Center today to ask your Member of Congress to sign both the LSTA and IAL letters. Many Members of Congress will only sign such a letter if their constituents ask them to. So it is up to you to help save LSTA and IAL from elimination or significant cuts that could dramatically affect hundreds of libraries and potentially millions of patrons.

Five minutes of your time could help preserve over $210 million in library funding now at risk.

Soon, we will also need you to ask both of your US Senators to sign similar letters not yet circulating in the Senate, but timing is key. In the meantime, today’s the day to ask your Representative in the House for their signature on both the LSTA and IAL “Dear Appropriator” letters that must be signed no later than April 3.

Whether you call, email, tweet or all of the above (which would be great), the message to the friendly office staff of your Senators and Representative is all laid out at the Legislative Action Center and it’s simple:

“Hello, I’m a constituent. Please ask Representative  ________ to sign both the FY 2018 LSTA and IAL ‘Dear Appropriator’ letters circulating for signature before April 3.”

Please, take five minutes to call, email, or Tweet at your Members of Congress  and watch this space throughout the year for more on how you can help preserve IMLS and federal library funding. We need your help this year like never before.

Supporting Documents:

IAL Approps FY18 Letter

IAL FY18 Dear Colleague

Dear Appropriator letter for LSTA FY2018

Dear Collegue letter FY2018

The post House library champions release FY18 “Dear Appropriator” letters appeared first on District Dispatch.

Open Data Day 2017: Tracking money flows on development projects in the Accra Metropolis / Open Knowledge Foundation

This blog is part of the event report series on International Open Data Day 2017. On Saturday 4 March, groups from around the world organised over 300 events to celebrate, promote and spread the use of open data. 44 events received additional support through the Open Knowledge International mini-grants scheme, funded by SPARC, the Open Contracting Program of Hivos, Article 19, Hewlett Foundation and the UK Foreign & Commonwealth Office. This event was supported through the mini-grants scheme under the Open contracting and tracking public money flows theme.

Open data is becoming popular in policy discussions, media discourse and everyday conversations in Ghana, and this year TransGov Ghana had the opportunity, as one of two organisations in the country, to organise Open Data Day 2017. It was under the theme: “Following the money: Tracking money flows on development projects in the Accra Metropolis”. The objective for this year’s event was to clean up and standardise datasets on development projects for deployment to the Ghana Open Data Portal and to give the participants insights into how the government spends public funds on development projects in their local communities.

Who was at the event?

Open Data Day provided an opportunity for various stakeholders within the open data ecosystem to meet for the first time and to network. In attendance were Mohammed Awal, Research Officer from Center for Democratic Development Ghana (CDD-Ghana) , Jerry Akanyi-King, CEO of TransGov Ghana, a startup that enhances civic engagement with government by enabling citizens to monitor and track development projects in their local communities, and Paul Damalie, CEO of Inclusive, a local startup that provides a single identity verification API that connects unbanked Africans to the global economy.

Participants at the Open Data Day event

Also in attendance were Adela Gershon, Project Manager at Oil Journey, a Civil Society Organization (CSO) that uses Open Data to give Ghanaian citizens insights into how oil revenues are spent; and Joel Funu, Chief Product Officer at SynCommerce, an avid open data proponent; and many others including journalists, students from the Computer Science department of the University of Ghana, open data enthusiasts and members of the local developer community.

The state of Open Data in Ghana

The event kicked off at 10:00 am with a discussion on open data in Ghana, its application, the challenges facing the Ghana Open Data Initiative (GODI) and Civil Society Organisations (CSOs) involved in open data work, and the future of the open data ecosystem in Ghana. The discussion also sought to gather consensus on what the key stakeholders in the sector can do to facilitate the passage of Ghana’s Right to Information Bill currently before Parliament. It was an open discussion which involved all participants.  

The discussions were moderated by Paul Damalie, and panellists included Jerry Akanyi-King, CEO of TransGov Ghana, Adela Gershon, Project Manager at Oil Journey and Mohammed Awal, Research Officer at CDD-Ghana.

Guest panellists included (from right) Adela Gershon (OilJourney) and Jerry Akanyi-King (TransGov)

Mohammed Awal spoke about an open data project initiated by CDD-Ghana known as “I’m Aware”, which collects, analyses, archives, and disseminates user-friendly socio-economic data on the state of public goods and public service delivery in 216 districts, located in all the ten regions of Ghana. He agreed with the other panellists on the difficulty of getting data from government and suggested a few strategies that can help other organisations that relied on government for data.

Mr Adela Gershon also spoke about his experiences from working at Oil Journey. He observed that the Ghana Open Data Initiative (GODI) has not been helpful to their cause thus far and he called for closer collaboration between GODI and CSOs. Jerry Akany-King also chimed in with experiences from TransGov. Panellists and participants stressed that CSOs have tended to work in parallel in the past, often working to solve similar problems whilst being totally oblivious of each other and should begin to collaborate more to share knowledge and to enhance open data work in Ghana.

Diving into the Datasets

After the discussions, the attendees formed two teams and were given a dataset of developmental projects. The teams were assigned task leaders and were introduced to the process involved in converting the data from a pdf format to cleaning it up, and to finally visualising it. The two teams were able to come up with visualisations which provided insights into how public funds were spent on development projects.

The first group were able to come up with visualisations on how much was spent on development projects in total from 2012 to 2015. They also visualised how these development projects are distributed across the Accra Metropolis.

Visualisation showing how development projects are distributed across 10 major suburbs in Accra Metropolis

The second group also visualised the number of projects across the city and which contractors got the most contracts within specific time periods. From their analysis, we also found out which development partner organisations provided the most support to the government within the period.

Visualisation displaying the distribution of contracts among top 18 contractors in the Accra Metropolis

Lessons learned 

The developer community in Ghana has not fully embraced open data because there is a yawning knowledge and skills gap. More work has to be done to educate both government and the general public about open data and its social, political and economic benefits. Furthermore, capacity building for CSOs engaged in open data work will go a long way to enhance their work and strengthen the open data ecosystem in Ghana.

What’s Next?

We’ve submitted the cleaned up dataset to the technical team at the Ghana Open Data Initiative and await deployment to the Ghana Open Data portal. We’re also providing support to the student reps from the University of Ghana Computer Science department to form Open Data Clubs at the University to help them build capacity and hopefully grow the ecosystem across other tertiary institutions in Ghana.

A big thanks to Open Knowledge International and the UK Foreign & Commonwealth Office for providing the mini-grant to make this possible.

Recruiting and Retaining LGBTQ-Identified Staff in Academic Libraries Through Ordinary Methods / In the Library, With the Lead Pipe

In Brief

While the American academic library field works hard to include all patrons and materials that represent less dominant populations, it should be more mindful of inclusivity in its own workforce. Particularly, the field does nothing to explicitly recruit or retain LGBTQ-identified librarians. The author proposes practical remedies to these problems that directly respond to workplace studies on interpersonal difficulties LGBTQ-identified librarians and others have cited as barriers to happiness in the workplace, and argues toward more inclusive LIS education and financial support. Most importantly, the author hopes to convince others to abandon the tired rhetoric that positions the library field’s “feminization” as a misunderstanding and damaging consequence to be combated, and instead replace it with feminist conversations about the gendered aspects of the field.

Introduction

The Library and Information Science (LIS) field is its own worst enemy in terms of recruitment and retention of underrepresented employees. While the field has sufficient scholarship on diversity in collections, censorship issues, and how to provide programming for patrons from various backgrounds, LIS articles rarely discuss successfully recruiting and retaining librarians who come from less dominant cultures or are underrepresented within the field. The small amount of scholarship that does exist on this topic is excellent, though limited to visible minorities. Particularly, the field lacks recruitment and retention strategies for academic librarians and staff who identify as lesbian, gay, bisexual, transgender, and queer (LGBTQ).

I argue that the way the LIS field discusses the gendered nature of library work constitutes a microaggression in itself, and that the field does not do enough to recruit and retain academic librarians who identify as LGBTQ. My argument is built on several foundations. First, “Current Research on Recruitment and Retention” reviews the scholarship that proves that comfort and safety at work are the most important factors in retaining employees in academic libraries. For those who are not members of America’s dominant culture, the issue of comfort and safety is both more urgent and pronounced; this section expands on the literature that discusses the issue for people of color. Second, “How Microaggressions Ruin Positive Workplace Culture” explains the current scholarship on microaggressions directed toward people of color in the library workplace, and includes scholarship from other fields that discuss microaggressions toward LGBTQ-identified employees. Third, “LIS Scholarship and Anti-Feminist Rhetoric” discusses the LIS field’s characterization of the “feminization” of librarianship as a detrimental act, and how this view is a sexist, aggressive behavior in itself. Finally, “Practical Methods for Improvement” describes feasible tactics both individual libraries and the global LIS field can practice in order to be more thoughtful toward these issues. I suggest that all these changes be made every day, in the quotidian aspects of our work, in order to lead to a permanent cultural shift.

Current Research on Recruitment and Retention

Scholarship on LIS management practices demonstrates that internal workplace forces are more significant than external forces in keeping librarians happily employed in the academic library field. Though much of the opinion-based scholarship deals with budget cuts, lack of respect for the field from non-librarians, and other forces outside of a library’s control, a wealth of empirical evidence suggests that the most important factor in retaining librarians is an internal factor: a positive workplace environment. Libraries spend large sums of money on the hiring process for new librarians, as Jones argues, but they rarely spend any money or time on acclimating the new librarian into the new work environment, much less acclimating the library to the new employee.1 Given Kawasaki’s argument that the initial period for an incoming librarian is the most significant time in the new-hire’s career in terms of deciding how they feel about their work, we should focus much more on this early period. Even if a successfully-retained librarian stays at a library they dislike, the librarian likely has formed an attitude that decreases workplace satisfaction and inhibits successful performance.2

Some studies that focus on worker attitude exist, though the exact number of academic librarians who quit their job after being hired is unknown. In 2001 the Association of College and Research Libraries (ACRL) formed the Ad Hoc Task Force on Recruitment and Retention Issues to address what they termed a “top issue” in the field. The Task Force surveyed librarians and asked how long one worked in the field, what type of college library they worked for, what their reason for leaving was, and so forth. But, despite the Task Force’s effort, individuals who have left the field infrequently want to participate in the research. What data the Task Force collected tellingly showed that 44.4% of librarians who left academic libraries entirely to go into other fields stated that work environment was the reason for their departure; another 27.8% cited salary, and 16.7% cited respect for the field from others. Luzius and Ard note that a poor workplace environment ultimately contributes to a lack of ability to attract new and interesting people to librarianship.3 Albanese’s survey about workplace satisfaction never directly asked a question about “workplace environment,” instead opting to bank on external reasons; but when Albanese asked “Which attributes contribute most to success?” his survey participants responded with a resounding 68% for interpersonal skills, and only 2% for budget.4

More specifically, scholarship on retaining librarians of underrepresented communities echoes the two sentiments that studies reveal: positive workplace culture and climate keeps happy and productive employees, and underrepresented groups face more varied issues related to workplace culture. Andrade writes that the problem typically begins after potential librarians decide to enter library school, even before they enter the job market. Few library schools have any classes related to diversity training, what it means to be culturally competent, or similar issues that may affect the workplace environments of new librarians.5 This deficit results in a group of people on the job market who are untrained in several significant elements of workplace professionalism.
To illustrate this point, surveys such as the ClimateQUAL at the University of Arizona cited by Andrade6 and Williams II7 indicate that low scores in “interpersonal justice and work unit conflict” prove the main reason for workplace unhappiness, and that the highest amount of unhappiness in these categories came from “individuals who did not associate themselves with the dominate culture.”8 Similarly, Love writes that library “employees in the workforce who are not part of the dominant culture have struggled with subtle demands to ‘adapt and fit in,’”9 rather than with appreciation of who they are. This is a huge factor in retention because an employee who feels unwelcome can create an adversarial environment with the administration, often leading to a less productive employee who wants to leave rather than to increased performance. As Love says, “Change has come at such an alarming pace to every aspect of work life except diversity”10; the sentiment in this quote reflects the amount of talk in the academic library field about changes in technology, roles, and instruction, and the minimal talk about adaptation to increased diversity of staff within the academic library workplace. Love cites a 1994 study that lists internal barriers to job satisfaction in a workplace as “negative attitudes, discrimination, prejudice, stereotyping, racism, and bias,”11 all of which fall within workplace climate descriptions similar to those used in Andrade’s study.

Alire describes how a lack of leadership possibilities also leads to low retention of ethnic and racial minority employees. The secondary benefit of having underrepresented populations in leadership positions is that not only is the individual growing their career, but they are more likely to retain the minority employees under them and aid in recruitment by taking on “the additional responsibility of identifying and developing emerging minority leaders.”12 Similarly, Neely and Peterson write that retaining librarians of color should involve shadowing existing leaders and nomination for awards that will assist in promotion, which does not currently occur enough.13

Damasco and Hodges echo the sentiment that workplace culture is the largest problem facing, specifically, librarians of color. They cite a study from the Association of Research Libraries (ARL) that concluded that African-American academic librarians who reported being unsatisfied with their jobs mentioned “feelings of isolation, [inadequate] library diversity programs, working conditions, [lack of] support from peers”14 and other issues of that nature. Their survey-takers complained that isolation at work led to tokenism, including frequent instances of being asked to lead diversity programing, or being given titles such as “diversity specialist,” though these types of involvement were not acceptable contributions for the tenure and promotion process. They also reported general discouragement from peers, hostility, and disparate expectations of people of different cultures in their workplace. Some scholars, such as Simmons-Welburn15 and Majekodunmi,16 suggest dialogue groups, staff trainings, and open forums to discuss issues such as these in the workplace.

How Microaggressions Ruin a Positive Workplace Culture

The singular study on microaggressions and workplace culture is Alabi’s 2015 survey about the occurrence of racial microaggressions in the academic library workplace and the consequences of these actions.17 The survey was circulated via several listservs in 2011 and completed by 139 participants of various races and ethnicities.18 The results suggest that frequent microinsults, microinvalidations, and other forms of microaggressions often left people of color in the academic library workplace feeling isolated and finding their work environment to be hostile. Even survey participants who did not identify as minorities expressed surprise at some of the comments made by co-workers. In “‘This actually happened’: An analysis of librarians’ responses to a survey about racial microaggressions,” Alabi includes comments from the survey that illustrate how some participants felt that current efforts toward retention and recruitment were insincere. Alabi writes, “Eight comments focused on issues related to recruitment and retention. One non-minority respondent said, ‘I think there needs to be a bigger push for minorities to enter library school and encourage librarianship as a career,’” and another wrote, “In my experience, attempts at ‘increasing diversity’ are still quite superficial.’”19 More poignantly, a participant stated, “‘Racism is a major issue in libraries. We’ve closed it off as a viable career path because it relies on shared cultural values and access to cultural and material capital.’”20 And finally, one participant commented that, “’The reason that many African American and Latino Librarians leave this profession is because of the constant lack of emotional intelligence that is needed in the work place today […]. Academic Libraries are very poor examples of pushing forth Diversity candidates for positions at the administrative level for Minorities’”21 (capitalization in original). Though these statements are made in regard to race, they also speak to diversity in general in the academic library field, and can be used to point to some universal issues. Because no survey on LGBTQ microaggressions has been done in librarianship, this survey can apply, as can additional research from other fields. This research reveals that microaggressions are the route through which academic librarians who live outside of the dominant culture realize they are not welcome in the profession, despite the liberal mask the profession wears.

Microaggressions are a particular type of discrimination; they are different from outright violence, and they fall within subtle or accidental statements or behaviors that reveal one’s heterosexist attitudes. Nadal, Whitman, Davis, Erazo, and Davidoff write:

Microaggressions are behaviors and statements, unconscious or unintentional, that communicate hostile or derogatory messages, particularly to members of targeted social groups… Because people in contemporary times do not engage in overtly hostile or consciously biased behavior toward marginalized groups, some people believe they neither hold biases against other groups nor participate in discriminatory behavior; in fact, many individuals may report that discrimination no longer exists.22

Or as Platt and Alexandra explained, microaggressive “discriminations stem from systemic, deeply ingrained social justice problems such as privilege, inequities in power, stereotyping, and societal biases.”23

LGBTQ microaggressions fall within many categories, but can be characterized by common themes. Nadal, Whitman, Davis, Erazo, and Davidoff write that a microinsult is something such as, “you’re too pretty to be a lesbian.”24 Invalidating reactions to daily experiences (such as suggesting someone is overreacting); applying dominant social norms to all relationships; erotizing people based on their identity; and making assumptions of sexual pathology also constitute forms of microaggressions toward LGBTQ-identified individuals. A difficult and ill-recognized microaggression can also be the denial or defensiveness of the aggressor.

Another difficult aspect of studying LGBTQ microaggression is the lack of scholarly sources in any field on this topic—LGBTQ-focused research has lagged behind other research. Seventy-three articles with key topics related to “race” and “microaggressions” show up in databases, but only five articles show up related to “transgender” and “microaggressions.”25 From the few studies conducted on academic campuses, we can extrapolate some information about academic libraries. Nadal, et al., write that when college students who identified as lesbian, gay, bisexual, or queer were surveyed, 96% reported experiencing interpersonal microaggressions and 98% reported experiencing environmental microaggressions, while only 37% reported blatant discrimination due to sexual orientation.26 Relatedly, Tetreault, Fette, Medlinger, and Hope write that in a survey conducted about perceptions of the LGBTQ climate on campuses, LGBTQ-identified people viewed environments free of overt heterosexism positively because of a lack of violence and lack of attention generally to LGBTQ issues.27 The standard for a positive environment consisted of a lack of negatives rather than of blatant cultural inclusiveness, such as expecting a campus to provide resources and actually welcome LGBTQ for who they are.

Research also suggests that microaggressions have a more dangerous impact on mental wellbeing than overt discrimination. Nadal, et al., write, “Results further indicated that microaggressions were predictors of most self-acceptance and distress, while blatant discrimination did not significantly relate to either variable.”28 This particularly matters for the workplace because, as Buddel notes, LGBTQ-identified people come into a workplace with pre-existing stress: before even meeting new coworkers LGBTQ-identified people must deal with “identity management” and “sexuality disclosure.” An employee entering a new workplace does not know what the consequences of coming out or revealing certain preferences may be. Buddel writes that Degges-White and Shoffner’s 2002 Theory of Workplace Adjustment

describes four facts that people negotiate as they transition into the workplace: 1. Satisfaction describes the ability to engage meaningfully with coworkers; 2. Person-environment correspondence refers to the degree of congruence between the person and the work environment; 3. Reinforcement value refers to the extent that the workplace fulfills a psychological need; 4. Ability refers to the degree of skill and personal trait congruence with the workplace.29

All of these categories, each necessary for a person to adjust to a new workplace, are at risk if (1) there are no other openly LGBTQ people; (2) there are no LGBTQ people in leadership; (3) microaggressions are present in the workplace; or (4) the workplace neither attempts to adapt to the new person, nor makes an effort to ease the new person’s adjustment.

LIS Scholarship and Anti-Feminist Rhetoric

Unlike much of the research already discussed, Nectoux’s book of personal narratives written by LGBTQ-identified librarians in 2011, titled Workplace Issues for LGBTQ Librarians, is less empirical and more personal, and it provides a great diversity of different identities and workplace issues to consider.

One such anecdote comes from Phillips, who shares that the reason he pursued academic librarianship was because the university he attended had accepted a non-discrimination policy that included sexual orientation (but not gender identity), so he decided that he could pursue academic librarianship as a field, as it seemed universities were growing in this way.30 But he cites having to work with only straight colleagues on LGBTQ-related scholarship and being forced to use his social media accounts in his professional life (thus automatically outing him), as ways his job lacked sexual-identity-related issues awareness. Ciszek writes from the perspective of a library administrator who remains closeted, fearing being out will hinder advancement in his field. (Phillips, 86.)) He cites that a supportive network, both in one’s individual library and nationally, is the most important element to being out at work.31 Roberto discusses the difficulties in transitioning (female-to-male in this case) at work in a library. He describes the double life lived between different subsets of librarians, and he worried that at conferences their two separate groups would interact. He describes the awkwardness of being on the job market while transitioning, but notes that LGBTQ allies at his particular library helped him get and keep his job.32 In these cases and others, interpersonal relationships were the crux for success in the workplace.

These anecdotes also provide evidence that many academic library environments provide neither a comfortable nor a supportive workplace for many LGBTQ-identified people. But what is of greater interest to me is that LIS scholarship that engages with notions of gender shows hostility and backward thinking by continually arguing that the field itself suffers from being associated with that which is feminine and that which is homosexual, and that this association renders the field illegitimate. This scholarship often focuses on the male librarian stereotype and the embarrassment and struggles to which male librarians are subjected for working in a “female” profession. This literature does not discuss gender equality or recruiting more diverse people into the workforce—it is heterosexist writing concerned only that people might mistake straight men for gay men (feminine ones at that), and a repeat of the antiquated idea that what is feminine is naive and shameful. This literature is written not as a critique of equating femininity and homosexuality with negativity, but rather with the expressed intention of proving that the field is not feminine, and not homosexual. Further, it is written by and for librarians, with little input from or attention to cultural forces outside of the field. As Dickinson writes, “the possibility remains that such stereotypes never really found strong footing in the public consciousness,” and that “the image of the effeminate or gay male reference librarian was more entrenched within the library profession itself than it was outside of it.”33

The study referenced by Dickinson, James Carmichael, Jr.’s “The Male Librarian and the Feminine Image: a Survey of the Stereotype, Status and Gender Perceptions,” sought to disprove that men only wanted administrative positions and to prove that men, in fact, shared with women the “negative (feminine) stereotype of the profession while being immune from it, thus profiting from its existence in terms of preferential treatment and consideration because they were men.”34 He also sought to prove that male librarians suffered from “low self-esteem”35 on account of their public image. In gathering the results, he notes that “The most prevalent stereotype is ‘effeminate (probably gay)’”36 and commenters wrote various statements about how they were presumed gay until proven straight (which is the same scenario as being assumed straight until coming out as gay, which affects gay people in every facet of life). Not surprisingly, ten percent more of the gay-identified respondents showed awareness of the feminine stereotype than did straight respondents.37

When this study was repeated (with a smaller sample size) by Piper and Collamer (2001), they concluded, “The greatest puzzlement was that respondents acknowledged that there were more women in the field than men, but did not consider librarianship a women’s profession… male librarians are currently quite content with their role, with respect to gender issues, in the library world.”38 Even given such clear research results, more scholarship continues to be published that frets over the supposed abuse and shame men face in librarianship. Hickey’s (2006) study focused on how male librarians fare working in a “non-traditional work environment,” or rather, in a work environment where there are more women than men. His study participants reported feelings of social isolation,39 criticism for not understanding how to organize a teatime,40 and anxiety in dealing with issues related to personal identity formation. A major flaw with his study is that the anecdotes relate common workplace issues (e.g., a supervisor picking favorites) more than anything directly related to gender issues. There are many other examples of questionable research and assertions. Blackburn’s 2015 “Gender Stereotypes Male Librarians Face Today” worries over how heterosexual male librarians must feel hurt and thus avoid the LIS profession because they risk being thought of as feminine or gay. Blackburn writes, “Men in nontraditional professions such as nursing and librarianship have become targets for stereotyping, creating a vicious cycle. Men assume the stereotypes are valid, they avoid taking the jobs, and the profession continues to see fewer males entering the workforce, creating a self-fulfilling prophecy of low employment rates.”41 Her logic suggests that what is distinctly feminine is by nature negative, and should be taken out of the profession so that men who are uncomfortable with that which could be viewed as feminine or homosexual will join the profession. Critical discussions of gendered aspects of the field would do better to critique the systems that associate what is feminine and homosexual with what is unlearned, illegitimate, and shameful. We should avoid becoming more “masculine” in order to solve this issue, but more aware and culturally competent, instead.

Practical Methods for Improvement

One major difference between tackling recruitment issues for librarians (and potential librarians) who identify as LGBTQ, and for librarians of color, is that there are no identity-based initiatives at either the individual library level or the national LIS field level for LGBTQ-identified people, as there are for people of color. There are programs related to librarianship, like the Martin Duberman Visiting Fellowship at New York Public Library, which funds a scholar using the LGBTQ sources in their archive, or the American Library Association’s (ALA) Gay, Lesbian, Bisexual, Transgender Round Table, which discusses information needs and serving patrons who identify as LGBTQ, but neither is specifically about librarians themselves. The LIS field has taken commendable steps that result in people of color feeling more welcome in library school and in entry-level positions after graduation. According to Haipeng42 and Acree, Epps, Gilmore, and Henriques,43 part of the commitment to recruiting new graduate librarians in underrepresented populations into an academic library can involve offering them something that makes them feel wanted for who they are. Many schools have residency programs for ethnic minority librarians, such as Cornell University, Iowa State University, University of Michigan, Ohio State University, and Yale University, among others.44 These programs are good for recruitment because they offer librarians a chance to develop collections based on their interests, participate in workshops that may increase retention, and earn fellowships with high-quality benefits. The librarians already employed by these universities also have the opportunity to show their support for this type of recruitment and learn from these newly-recruited students. The ALA’s Spectrum Initiative, which provides financial aid to students of color for three years and includes annual reports from the student, participation in a longitudinal study, and support to attend the Spectrum Institute, is described as the largest diversity initiative in the field.45 This type of opportunity has significant value because it not only helps fund a student’s education, but it also supports research related to the needs and satisfaction levels of the award-winners. Symbolically, it serves as an important welcome sign to people of color interested in librarianship.

Similar residencies for LGBTQ-identified librarians would be an excellent addition to the field. These residencies would give these librarians an opportunity to take leadership roles, meet other librarians like themselves, and have special professional accolades when entering the job market. Scholarships such as the Spectrum Initiative opportunities would give LGBTQ-identified people encouragement to be out and pursue the field. Such scholarships could follow the Spectrum Initiative structure to ensure ongoing retention in jobs and in the field as a whole. Importantly, unlike people with disabilities, people of racial minorities, members of religious groups, those who identify as lesbian, gay, bisexual, transgender, and queer –and many others identities– are not protected classes of people by the United States government, though some of these categories are protected to some level in some states, counties, or cities.46 Trying to find employment where one feels comfortable enough to come out is a precarious position. People cannot choose to come out or stay closeted on their own terms, as with transgender people who have changed their name or sex on their birth certificate, a fact that will be involuntarily revealed if a background check is done by the employer. Further, LGBTQ-identified people often have to break from their family of origin if that family does not support them, so a program such as the Spectrum Initiative would provide much-needed financial aid and emotional support for things like education and career training, as well as a sense of security within the field.

Another tactic that could be employed at the individual library level is producing academic library job advertisements written to recruit LGBTQ-identified librarians (as Williams, II argues doing for people of color47) by broadening the job descriptions to include enticements that might attract people outside the dominant culture. He writes, “the hiring opportunity should be looked upon as a means to move the library to the next level of excellence by creating a post that is broad in scope, flexible,”48 and allows for someone from an unconventional background to feel empowered to apply. He writes that, “when creative use of the vacancy becomes the norm in academic library recruitment programs, it establishes another norm that opportunities in our library are no longer static, but dynamic.”49 This strategy would apply to all populations outside of the dominant culture. If the workplace is truly inclusive, signals or direct statements could be placed in the job advertisement to specifically recruit underrepresented applicants and alleviate the worry that they may end up in a hostile environment.

Also on the individual library level, as Simmons-Welburn50 and Majekodunmi51 suggest, interpersonal issues within the workplace can be improved through understanding. They argue for communication and education. Dialogue groups and forums about issues affecting minorities can be useful at staff trainings, though existing research only extends to visible minorities. This type of initiative can also be extended to LIS graduate programs. Since nearly all programs require a core class introducing students to the field of librarianship, that class could include substantial discussions of cultural competency, workplace ethics and attitudes, and the field’s commitment to inclusive behaviors. Library scholarship and training emphasizes accepting patrons who come to the reference desk as they are, on not judging their questions, and on including a variety of interests in the collection; yet simultaneously the field fails to foster these same attitudes among colleagues who work together at least 40 hours a week.

In terms of future scholarship, more quantitative studies need to be conducted on how many people in libraries identify as LGBTQ; how many people are out in the workplace; and what types of workplace behaviors and attitudes make LGBTQ-identified librarians and staff feel as if they are not included, or make them feel fearful, or hated. Identifying what microaggressions happen in academic libraries and general academic workplaces is another area of necessary research, especially since academic communities so often see themselves on the outside of—or beyond—issues related to discrimination. Because of the accidental nature of microaggressions, they are very likely to happen in the academic workplace, as the scholarship has proved.

Finally, LIS scholarship must immediately move away from publishing about the supposed shame of working in a profession associated with the feminine and the homosexual (an association largely managed and perpetuated by librarians themselves). The topic shows antiquated, sexist, self-hating, and self-perpetuating thinking; the notion that a male librarian must be worried about being perceived as homosexual is an unacceptable, contemptuous sneer toward what it is and what it means to be homosexual. The implication is, in no uncertain terms, that being feminine (or being a woman) or being homosexual is something about which to feel bad and something from which to distance oneself. As this line of scholarship stands now, if left uncorrected, the real enemy of recruitment and retention of LGBTQ-identified librarians will be the attitudes embedded in the field itself.

Conclusion

In addition to the practical steps we can take to recruit LGBTQ-identified people into the LIS field, we must also retain them and keep them satisfied, or even thrilled, with the field that they have chosen. In order to accomplish this goal, we must become critics of more than our collections, archives, and budgets, but also critics of our daily behavior and contributions to scholarship. We must be mindful of the goals of feminism when we are selecting the language to use when we describe the gendered aspects of the field, and the language and behavior we use around our colleagues or would-be colleagues. We must be mindful of the disparity of opportunities and safety among different types of people when we suggest new scholarships, initiate new workshops, and assist in decisions about whom to promote. These shifts are not the type that are made by showing allegiance to a particular activist group or participating in a single training; these shifts take place every day in the mundane and quotidian aspects of our work.


I am grateful to reviewers Amy Koester and Taryn Marks, and to Publishing Editor Ian Beilin, for their time and feedback. They made huge improvements to my work, thank goodness.


References

Acree, Eric Kofi, Sharon K. Eppes, Yolanda Gilmore, and Charmaine Henriques. “Using Professional Development as a Retention Tool for Underrepresented Academic Librarians.” Journal of Library Administration 31, no. 1-2 (2008): 45-61.

Alabi, Jaena. “’This Actually Happened: an Analysis of Librarians’ Responses to a Survey about Racial Microaggressions.” Journal of Library Administration 55, no. 3 (2015): 179-191.

Alabi, Jaena. “Racial Microaggressions in Academic Libraries: Results of a Survey of Minority and Non-Minority Librarians.” The Journal of Academic Librarianship 41, no. 1 (2015): 47-53.

Albanese, Andrew Richard. “Take this Job and Love It.” Library Journal 133, no. 2 (2008): 36-39.

Alire, Camila A. “Diversity and Leadership: The Color of Leadership.” Journal of Library Administration 32, no. 4 (2001): 99-114.

Andrade, Ricardo and Alexandra Rivera. “Developing a Diversity-Competent Workforce: the UA Libraries’ Experience.” Journal of Library Administration 51, no. 7-8 (2011): 692-727.

Blackburn, Heather. “Gender stereotypes male librarians face today” Library Worklife: HR E-News for Today’s Leaders. (2015). http://alaapa.org/newsletter/2015/09/08/genderstereotypes-male-librarians-face-today/

Blobaum, Paul. “Gay Librarians on the Tenure Track: Following the Yellow Brick Road?” in Workplace Issues for LGBTQ Librarians, edited by Tracy Marie Nectoux, 63-67. Duluth: Library Juice Press, 2011.

Buddel, Neil. “Queering the Workplace.” Journal of Gay and Lesbian Social Services 23, no. 1 (2011): 131-146.

Carmichael, James V. “The Male Librarian and the Feminine Image: A Survey of Stereotype, Status, and Gender Perceptions.” Library and Information Science Research 14 (1992): 411-446.

Ciszek, Matthew. “Managing Outside the Closet: On Being an Openly Gay Library Administrator,” in Workplace Issues for LGBTQ Librarians, ed. Tracy Marie Nectoux, 83-90. Duluth: Library Juice Press, 2011.

Carmichael, James V. “The Gay Librarian: A Comparative Analysis of Attitudes Toward Professional Gender Issues.” Journal of Homosexuality 30, no. 2 (1996): 11-57.

Creth, Sheila D. “Academic Library Leadership: Meeting the Reality of the Twenty-First Century,” in Human Resource Management in Today’s Academic Library: Meeting Challenges and Creating Opportunities, edited by Janice Simmons-Welburn and Beth McNeil, 99-116. Westport: Libraries Unlimited, 2004.

Cook, James C. “Gay and Lesbian Librarians and the ‘Need’ for GLTB Library Organizations.” Journal of Information Ethics, Fall (2005): 32-49.

Damasco, Ione T. and Dracine Hodges. “Tenure and Promotion Experiences of Academic Librarians of Color.” College and Research Libraries 73, no. 3 (2012) 279-301.

Dickinson, Thad E. “Looking at the Male Librarians Stereotype.” The Reference Librarian 37, no. 78 (2003): 97-110.

Farkas, Meredith Gorran, Lisa Janicke Hinchliffe, and Amy Harris Houk. “Bridges and Barriers: Factors Influencing a Culture of Assessment in Academic Librarians.” College and Research Libraries 76, no. 2 (2015): 150-169.

Hall, Liz Walkley. “Changing the Workplace Culture at Flinders University Library: from Pragmatism to Professional Reflection.” Australian Academic and Research Libraries 46, no. 1 (2014): 29-38.

Hastings, Samantha Kelly. “If Diversity is a Natural State, Why Don’t Our Libraries Mirror the Populations They Serve?” The Library Quarterly: Information, Community, Policy 85, no. 2 (2015): 133-138.

Haipeng, Li. “Diversity in the Library: What Could Happen at the Institutional Level.” Journal of Library Administration 27, no. 1-2 (1999): 145-156.

Hickey, Andrew. “Cataloging Men: Charting the Male Librarian’s Experience Through the Perceptions and Position of Men in Libraries.” Journal of Academic Librarianship 32, no. 3 (2006): 286-295.

Irshad, Muhammad. “Factors Affecting Employee Retention: Evidence from Literature Review.” Abasyn Journal of Social Sciences 4, no. 1 (2011): 84-102.

Jones, Dorothy E. “’I’d Like You to Meet our New Librarian’: the Initiation and Integration of the Newly Appointed Librarian.” The Journal of Academic Librarianship 14, no. 4 (1988): 221-224.

Kawasaki, Jodee L. “Retention-After Hiring Then What?” Science and Technology Libraries 27, no. 1-2 (2006): 225-240.

“Know your rights: transgender people and the law.” American Civil Liberties Union. (2016). https://www.aclu.org/know-your-rights/transgender-people-and-law

Love, Johnnieque B. “The Assessment of Diversity Initiatives in Academic Libraries.” Journal of Library Administration 33, no. 1-2 (2001): 73-103.

Luzius, Jeff and Allyson Ard. “Leaving the Academic Library.” Journal of Academic Librarianship 32, no. 6 (2006): 593-598.

Majekodunmi, Norda. “Diversity in Libraries: The Case for the Visible Minority Librarians of Canada (VimLoC Network).” Canadian Library Association 1, no. 59 (2013): 31-32.

Martin, Judith N and Thomas K. Nakayama. “Reconsidering Intercultural Competence in the Workplace: a Dialectical Approach. Language and Intercultural Communication 15, no. 1 (2015): 13-28.

Millet, Michelle S. “Is This the Ninth Circle of Hell?” Library Journal 130, no. 5: 54.

Nadal, Kevin L., Chassitty N. Whitman, Linsey S. Davis, Tanya Erazo, and Kristen C. Davidoff. “Microaggressions Toward Lesbian, Gay, Bisexual, Transgender, Queer, and Genderqueer People: A Review of Literature.” The Journal of Sex Research 53, no. 4-5 (2016): 488-508.

Neely, Teresa Y. “Diversity Initiatives and Programs.” Journal of Library Administration 27, no. 8 (1999): 123-144.

Neely, Teresa Y., and Lorna Peterson. “Achieving Racial and Ethnic Diversity Among Academic and Research Librarians: the Recruitment, Retention, and Advancement of Librarians of Color, a White Paper.” College and Research Libraries 68, no. 9 (2007): 562-565.

Phillips, Jason D. “It’s Okay to be Gay: A Librarian’s Journey to Acceptance and Activism,” in Workplace Issues for LGBTQ Librarians, edited by Tracy Marie Nectoux, 33-47. Duluth: Library Juice Press, 2011.

Piper, Paul S and Barbara E. Collamer. “Male Librarians: Men in a Feminized Profession.” The Journal of Academic Librarianship 27, no. 5 (2001): 406-411.

Platt, Lisa F. and Alexandra L. Lenzen. “Sexual Orientation Microaggressions and the Experience of Sexual Minorities.” Journal of Homosexuality 60, no. 7 (2013): 1011-1034.

Ridinger, Robert B. “Out lines: an LGBT Career in Perspective.” In Workplace Issues for LGBTQ Librarians, edited by Tracy Marie Nectoux, 131-130. Duluth: Library Juice Press, 2011.

Robert, K. R. “Pronoun Police: A Guide to Transitioning at Your Local Library,” in Workplace Issues for LGBTQ Librarians, edited by Tracy Marie Nectoux, 121-127. Duluth: Library Juice Press, 2011.

Simmons-Welburn, Janice. “Creating and Sustaining a Diverse Workplace,” in Human Resource Management in Today’s Academic Library: Meeting Challenges and Creating Opportunities, edited by Janice Simmons-Welburn and Beth McNeil, 71-81. Westport: Libraries Unlimited, 2004.

Simmons-Welburn, Janice. “Diversity Dialogue Groups.” Journal of Library Administration 27, no. 1-2 (1999): 111-121.

Stambaugh, Laine. “Recruitment and Selection in Academic Libraries,” in Human Resources Management in Today’s Academic Library: Meeting Challenges and Creating Opportunities, edited by Janice Simmons-Welburn and Beth McNeil, 27-36. Westport: Libraries Unlimited, 2004.

Tompson, Sara R. “Competencies Required!” Science & Technology Libraries 27, no. 1-2 (2006): 241-258.

Thompson, W. On being as if, imagination and gay librarianship. In Workplace Issues for LGBTQ Librarians, edited by Tracy Marie Nectoux, 255-266. Duluth, MN: Library Juice Press, 2011

Tetreault, Patricia A., Ryan Fette , Peter C. Meidlinger, and Debra Hope. “Perceptions of Campus Climate by Sexual Minorities.” Journal of Homosexuality 60, no. 7 (2013): 947-964.

Williams II, James F. “Managing Diversity.” Journal of Library Administration 27, no. 1-2 (1999): 27-48.

  1. Dorothy E. Jones, “’I’d Like You to Meet our New Librarian’: the Initiation and Integration of the Newly Appointed Librarian.” The Journal of Academic Librarianship 14, no. 4 (1988): 221-224.
  2. Jodee L. Kawasaki, “Retention-After Hiring Then What?” Science and Technology Libraries 27, no. 1-2 (2006): 225-240.
  3. Jeff Luzius and Allyson Ard, “Leaving the Academic Library.” Journal of Academic Librarianship 32, no. 6 (2006): 593-598
  4. Andrew Richard Albanese, “Take this Job and Love It.” Library Journal 133, no. 2 (2008): 36-39.
  5. Ricardo Andrade and Alexandra Rivera, “Developing a Diversity-Competent Workforce: the UA Libraries’ Experience.” Journal of Library Administration 51, no. 7-8 (2011): 693-694.
  6. Andrade and Rivera, 692-727.
  7. James F. Williams II, “Managing Diversity.” Journal of Library Administration 27, no. 1-2 (1999): 27-48.
  8. Andrade and Rivera, 696.
  9. Johnnieque B. Love, “The Assessment of Diversity Initiatives in Academic Libraries.” Journal of Library Administration 33, no. 1-2 (2001): 77.
  10. Love, 78.
  11. Love, 83.
  12. Camila A. Alire, “Diversity and Leadership: The Color of Leadership.” Journal of Library Administration 32, no. 4 (2001): 98.
  13. Teresa Y. Neely and Lorna Peterson, “Achieving Racial and Ethnic Diversity Among Academic and Research Librarians: the Recruitment, Retention, and Advancement of Librarians of Color, a White Paper.” College and Research Libraries 68, no. 9 (2007): 562-565.
  14. Ione T. Damasco and Dracine Hodges, “Tenure and Promotion Experiences of Academic Librarians of Color.” College and Research Libraries 73, no. 3 (2012): 281.
  15. Janice Simmons-Welburn,. “Diversity Dialogue Groups.” Journal of Library Administration 27, no. 1-2 (1999): 111-121.
  16. Norda Majekodunmi, “Diversity in Libraries: The Case for the Visible Minority Librarians of Canada (VimLoC Network).” Canadian Library Association 1, no. 59 (2013): 31-32.
  17. Jaena Alabi. “Racial Microaggressions in Academic Libraries: Results of a Survey of Minority and Non-Minority Librarians.” The Journal of Academic Librarianship 41, no. 1 (2015): 47-53.
  18. Jaena Alabi, “’This Actually Happened: an Analysis of Librarians’ Responses to a Survey about Racial Microaggressions.” Journal of Library Administration 55, no. 3 (2015): 182-184.
  19. Alabi, 187.
  20. Alabi, 187.
  21. Alabi, 187.
  22. Kevin L. Nadal, Chassitty N. Whitman, Linsey S. Davis, Tanya Erazo, and Kristen C. Davidoff,“Microaggressions Toward Lesbian, Gay, Bisexual, Transgender, Queer, and Genderqueer People: A Review of Literature.” The Journal of Sex Research 53, no. 4-5 (2016): 488.
  23. Lisa Platt and Alexandrea Lenzen, “Sexual Orientation Microaggressions and the Experience of Sexual Minorities.” Journal of Homosexuality 60, no. 7 (2013): 1012.
  24. Kevin L. Nadal, Chassitty N. Whitman, Linsey S. Davis, Tanya Erazo, and Kristen C. Davidoff, 490.
  25. Kevin L. Nadal, Chassitty N. Whitman, Linsey S. Davis, Tanya Erazo, and Kristen C. Davidoff, 492.
  26. Nadal, Whitman, Davis, Erazo, and Davidoff, 494.
  27. Patricia A. Tetreault, Ryan Fette, Peter C. Meidlinger, and Debra Hope. “Perceptions of Campus Climate by Sexual Minorities.” Journal of Homosexuality 60, no. 7 (2013): 950.
  28. Nadal, Whitman, Davis, Erazo, and Davidoff, 494.
  29. Neil Buddell, “Queering the Workplace.” Journal of Gay and Lesbian Social Services 23, no. 1 (2011): 139.
  30. Joseph Phillips, “It’s Okay to be Gay: A Librarian’s Journey to Acceptance an Activism,” in Workplace Issues for LGBTQ Librarians, ed. Tracy Marie Nectoux. (Duluth: Library Juice Press, 2011), 38.
  31. Phillips, 87.
  32. K.R. Roberto, “Pronoun Police: A Guide to Transitioning at Your Local Library,” in Workplace Issues for LGBTQ Librarians, ed. Tracy Marie Nectoux. (Duluth: Library Juice Press, 2011), 121-127.
  33. Thad E. Dickinson, “Looking at the Male Librarians Stereotype.” The Reference Librarian 37, no. 78 (2003): 106.
  34. James V. Carmichael, “The Male Librarian and the Feminine Image: A Survey of Stereotype, Status, and Gender Perceptions.” Library and Information Science Research 14 (1992): 416.
  35. Carmichael, 417.
  36. Carmichael, 422.
  37. Carmichael, 423.
  38. Paul S. Piper and Barbara E. Collamer, “Male Librarians: Men in a Feminized Profession.” The Journal of Academic Librarianship 27, no. 5 (2001): 410.
  39. Andrew Hickey, “Cataloging Men: Charting the Male Librarian’s Experience Through the Perceptions and Position of Men in Libraries.” Journal of Academic Librarianship 32, no. 3 (2006): 290.
  40. Hickey, 291.
  41. Heather Blackburn, “Gender stereotypes male librarians face today” Library Worklife: HR E-News for Today’s Leaders. (2015). http://alaapa.org/newsletter/2015/09/08/genderstereotypes-male-librarians-face-today/
  42. Li Haipeng, “Diversity in the Library: What Could Happen at the Institutional Level.” Journal of Library Administration 27, no. 1-2 (1999): 145-156.
  43. Eric Acree, Sharon K. Epps, Yolanda Gilmore, and Charmaine Henriques. “Using Professional Development as a Retention Tool for Underrepresented Academic Librarians.” Journal of Library Administration 31, no. 1-2 (2008): 45-61.
  44. Haipeng, 146.
  45. Teresa Y. Nealy, “Diversity Initiatives and Programs.” Journal of Library Administration 27, no. 8 (1999): 125.
  46. “Know your rights: transgender people and the law.” American Civil Liberties Union. (2016). https://www.aclu.org/know-your-rights/transgender-people-and-law
  47. James F. Williams II, “Managing Diversity.” Journal of Library Administration 27, no. 1-2 (1999): 27-48.
  48. Williams II, 44.
  49. Williams II, 44.
  50. Janice Simmons-Welburn, “Diversity Dialogue Groups.” Journal of Library Administration 27, no. 1-2 (1999): 111-121.
  51. Norda Majekodunmi, “Diversity in Libraries: The Case for the Visible Minority Librarians of Canada (VimLoC Network).” Canadian Library Association 1, no. 59 (2013): 31-32.

Know Your Hood – #ODD2017 in South Africa / Open Knowledge Foundation

This blog is part of the event report series on International Open Data Day 2017. On Saturday 4 March, groups from around the world organised over 300 events to celebrate, promote and spread the use of open data. 44 events received additional support through the Open Knowledge International mini-grants scheme, funded by SPARC, the Open Contracting Program of Hivos, Article 19, Hewlett Foundation and the UK Foreign & Commonwealth Office. This event was supported through the mini-grants scheme under the Human Rights theme.

South Africans dug into local datasets for our #KnowYourHood online challenge for Open Data Day 2017. Here’s what they found.

How well do you know your neighbourhood? In an age where communities are using Facebook and Whatsapp groups to monitor crime and share information, we’re sure you’d be able to tell us what streets to avoid, where the local bakery is and which fish & chips shop not to buy from. But can you tell us how many – if any – child-headed households there are in your area? You’d probably be surprised at the answer. Or how many people actually have access to water, electricity and municipal services in your ward? What about how your municipality is spending your hard-earned taxes, and who to contact if they’re not? This was the focus of Codebridge’s online event for Open Data Day 2017. Instead of the usual.

But can you tell us how many – if any – child-headed households there are in your area? You’d probably be surprised at the answer. Or how many people actually have access to water, electricity and municipal services in your ward? What about how your municipality is spending your hard-earned taxes, and who to contact if they’re not?

This was the focus of Codebridge’s online event for Open Data Day 2017. Instead of the usual data quest with techies sitting at laptops, wrangling data and coming up with tech solutions – not that this isn’t super important and awesome as well – we decided to place the spotlight on the end-users this year: citizens.

The Codebridge #KnowYourHood challenge encouraged South Africans to dig through census and municipal financial data via simple-to-use online tools such as Wazimap and Municipal Money to get to know their own neighbourhoods. The challenge was comprised of 30 carefully selected questions, designed to leave participants with useful knowledge that they could act on later should they need or want to. Things like: who is your ward councillor; who would you contact if people don’t have access to basic sanitation; which political party has the strongest support in the 2014 local election in your ward?

There was also an educational element to it. When Municipal Money was built by Code for South Africa and the Department of Treasury last year, they knew it wasn’t enough to just put municipalities’ financial statements online, as the average person doesn’t understand the jargon and how it works. They added a host of videos, explainers and text boxes with definitions that explain to the user what they are seeing. The challenge questions for this section was constructed to help the user navigate the site and involved watching at least one video and looking up a few definitions.

The day was rounded off by live online support via email, Facebook, Twitter and our Coderidge discussion form – although users managed to find their way around pretty much on their own. They seemed to enjoy it too!

And as the answers came in, we were very pleased with the feedback. Our final two questions asked participants: “What is the most interesting discovery you have made about your municipality or ward” and “Did you learn anything about your area that you find surprising”.

As expected, people were alarmed at the number of child-headed households in their communities. Two interesting facts about municipalities that emerged were “how little is spent on contractors” and “that the money spent on roads is small compared to planning”.

One participant stated on the financial performance of municipalities that she “didn’t know about most of the terms until today”. Our very favourite comment out of the lot, however, was: “That there is (sic) so much data publicly available.”

It shows not only how far we have come in the Open Data movement, but that our work is making a difference, one dataset at a time.

Hash#map ? / Jonathan Rochkind

I frequently have griped that Hash didn’t have a useful map/collect function, something allowing me to transform the hash keys or values (usually values), into another transformed hash. I even go looking for for it in ActiveSupport::CoreExtensions sometimes, surely they’ve added something, everyone must want to do this… nope.

Thanks to realization triggered by an example in BigBinary’s blog post about the new ruby 2.4 Enumerable#uniq… I realized, duh, it’s already there!

olympics = {1896 => 'Athens', 1900 => 'Paris', 1904 => 'Chicago', 1906 => 'Athens', 1908 => 'Rome'}
olympics.collect { |k, v| [k, v.upcase]}.to_h
# => => {1896=>"ATHENS", 1900=>"PARIS", 1904=>"CHICAGO", 1906=>"ATHENS", 1908=>"ROME"}

Just use ordinary Enumerable#collect, with two block args — it works to get key and value. Return an array from the block, to get an array of arrays, which can be turned to a hash again easily with #to_h.

It’s a bit messy, but not really too bad. (I somehow learned to prefer collect over it’s synonym map, but I think maybe I’m in the minority? collect still seems more descriptive to me of what it’s doing. But this is one place where I wouldn’t have held it against Matz if he had decided to give the method only one name so we were all using the same one!)

(Did you know Array#to_h turned an array of duples into a hash?  I am not sure I did! I knew about Hash(), but I don’t think I knew about Array#to_h… ah, it looks like it was added in ruby 2.1.0.  The equivalent before that would have been more like Hash( hash.collect {|k, v| [k, v]}), which I think is too messy to want to use.

I’ve been writing ruby for 10 years, and periodically thinking “damn, I wish there was something like Hash#collect” — and didn’t realize that Array#to_h was added in 2.1, and makes this pattern a lot more readable. I’ll def be using it next time I have that thought. Thanks BigBinary for using something similar in your Enumerable#uniq example that made me realize, oh, yeah.

 


Filed under: General

Collecting Digital Content at the Library of Congress / Library of Congress: The Signal

This is a guest post by Joe Puccio, the Collection Development Officer at the Library of Congress.

Joe Puccio

Joe Puccio. Photo by Beth Davis-Brown.

The Library of Congress has steadily increased its digital collecting capacity and capability over the past two decades. This has come as the product of numerous independent efforts pointed to the same goal – acquire as much selected digital content as technically possible and make that content as broadly accessible to users as possible. At present, over 12.5 petabytes of content – both acquired material and content produced by the Library itself through its digitization program – are under management.

In January, the Library adopted a set of strategic steps related to its future acquisition of digital content. Further expansion of the digital collecting program is seen as an essential part of the institution’s strategic goal to: Acquire, preserve, and provide access to a universal collection of knowledge and the record of America’s creativity.

The scope of the newly-adopted strategy is limited to actions directly involved with acquisitions and collecting. It does not cover digitization nor does it cover other actions that are critical to a successful digital collections program, including:

  • Further development of the Library’s technical infrastructure
  • Development of various access policies and procedures appropriate to different categories of digital content
  • Preservation of acquired digital content
  • Training and development of staff
  • Eventual realignment of resources to match an environment where a greater portion of the Library’s collection building program focuses on digital materials

It must also be emphasized that the strategy is aspirational since all of the resources required to accomplish it are not yet in place.

Current Status of Digital Collecting and Vision for the Future

In the past few years, much progress has been made in the Library’s digital collecting effort, and an impressive amount of content has been acquired.  As the eDeposit pilot began the complex process of obtaining digital content via the Copyright Office, additional efforts made great strides toward the goal of acquiring and making accessible other content.  Digital collecting has also been integrated into a range of special collections acquisitions.

The adopted strategy is based on a vision in which the Library’s universal collection will continue to be built by selectively acquiring materials in a wide range of formats – both tangible and digital.  Policies, workflows and an agile technical infrastructure will allow for the routine and efficient acquisition of desired digital materials. This type of collection building will be partially accomplished via collaborative relationships with other entities. The total collection will allow the Library to support the Congress in fulfilling its duties and to further the progress of knowledge and creativity for the benefit of the American people.

Assumptions and Principles

The strategy is based on a number of assumptions, most significantly that the amount of available digital content will continue to grow at a rapid rate and that the Library will be selective regarding the content it acquires. An additional primary assumption is that there will continue to be much duplication in the marketplace, with the same content being available both in tangible and digital formats.

Likewise, there are a number of principles that support the strategy, including the fact that the Library is developing one interdependent collection that contains both its traditional physical holdings and materials in digital formats. Other major principles are that the Library will ensure that the rights of those holding intellectual property will be respected and that appropriate methods will be put in place to ensure that rights-restricted digital content remains secure.

Plan for Digital Collecting

Over the next five years, the Library intends to follow a strategic framework categorized into six objectives:

Strategic Objective 1 – Maximize receipt and addition to the Library’s collections of selected digital content submitted for copyright purposes

Strategic Objective 2 – Expand digital collecting via routine modes of acquisitions (primarily purchase, exchange and gift)

Strategic Objective 3 – Focus on purchased and leased electronic resources

Strategic Objective 4 – Expand use of web archiving to acquire digital content

Strategic Objective 5 – Develop and implement an acquisitions program for openly available content

Strategic Objective 6 – Expand collecting of appropriate datasets and other large units of content

More Information

Much more detail is available in Collecting Digital Content at the Library of Congress.  Any questions or comments about this strategy or any aspect of the Library’s collection building program may be directed to me, jpuc@loc.gov.

Library Life with Universal Translators / LITA

As a lifetime science fiction watcher, I’ve been patiently waiting for current science to catch up to the futures I saw on the screen. Tiny computer in my pocket? Check. Hovercraft? All good. Commercial space flight? Almost there.

But when I saw the Indiegogo campaign for Mymanu CLIK – wireless earbud translators – I looked at them through my former public librarian’s eyes. My mediocre Spanish fluency would be replaced by effortless, instant, two-way translation, smoothing out frustrations and improving customer service.

Imagine: A library staffer wearing an earpiece and holding a smartphone (translation app installed) asks the patron to speak into the microphone, then hears the translation in their earpiece in real time. The conversation goes quickly, and the patron is more likely to get the information they need, even if the materials are still mostly in English.

Of course, there are concerns and questions. The key to most real-time translation is the computing power of servers hosted…somewhere…owned by…someone. As the person you’re listening to speaks, their words are streamed to these computers, analyzed, translated, and the translation is streamed back to you. Is that content saved, are people identifiable, what happens to patron privacy and library liability in the age of livestreamed translation? We collectively threw a fit when we discovered Adobe was sending patron information in the clear through their ebook reading service? Are we willing to ask for less in the name of better customer service?

More immediately, how accurate are the translations? Google Translate is good, and getting better all the time, but if we’re using these services to help patrons find medical or legal information, we can’t risk misunderstandings. Again, is it worse to suffer along with no translation at all, and run the risk of inaccurate information, or to risk a bad translation?

Both the CLIK and the Pilot earpiece from Waverly Labs are coming soon. What questions do we need to remember to ask before we’re mediating our interactions through these devices?

Tweak to the Bot / Tim Ribaric

Made a change to the bot.

EDIT: Yes I know typo in the image, changed it and not gonna screen cap again.

read more

MarcEdit Update Notes / Terry Reese

MarcEdit Update: All Versions

Over the past several weeks, I’ve been working on a wide range of updates related to MarcEdit. Some of these updates have dealt with how MarcEdit handles interactions with other systems, some of these updates have dealt with integrating the new bibframe processing into the toolkit, and some of these updates have been related to adding more functionality around the programs terminal programs and SRU support. In all, this is a significant update that required the addition of ~20k lines of code to the Windows version, and almost 3x that to the MacOs version (as I was adding SRU support). In all, I think the updates provide substantial benefit. The updates completed were as follows:

MacOS:

* Enhancement: SRU Support — added SRU support to the Z39.50 Client
* Enhancement: Z39.50/SRU import: Direct import from the MarcEditor
* Enhancement: Alma/Koha integration: SRU Support
* Enhancement: Alma Integration: All code needed to add Holdings editing has been completed; TODO: UI work.
* Enhancement: Validator: MacOS was using older code — updated to match Windows/Linux code (i.e., moved away from original custom code to the shared validator.dll library)
* Enhancement: MARCNext: Bibframe2 Profile added
* Enhancement: BibFrame2 conversion added to the terminal
* Enhancement: Unhandled Exception Handling: MacOS handles exceptions differently — I created a new unhandled exception handler to make it so that if there is an application error that causes a crash, you receive good information about what caused it.

Couple of specific notes about changes in the Mac Update.

Validation – the Mac program was using an older set of code that handled validation. The code wasn’t incorrect, but it was out of date. At some point, I’d consolidated the validation code into its own namespace and hadn’t updated these changes on the Mac side. This was unfortunate. Anyway, I spent time updating the process so the all versions now share the same code and will receive updates at the same pace.

SRU Support – I’m not how I missed adding SRU support to the Mac version, but I had. So, while I was updating ILS integrations to support SRU when available, I added SRU support to the MacOS.

BibFrame2 Support – One of the things I was never able to get working in MarcEdit’s Mac version was the Bibframe XQuery code. There were some issues with how URI paths resolved in the .NET version of Saxon. Fortunately, the new bibframe2 tools don’t have this issue, so I’ve been able to add them to the application. You will find the new option under the MARCNext area or via the command-line.

Windows/Linux:

* Enhancement: Alma/Koha integration: SRU Support
* Enhancement: MARCNext: Bibframe2 Profile added
* Enhancement: Terminal: Bibframe2 conversion added to the terminal.
* Enhancement: Alma Integration: All code needed to add Holdings editing has been completed; TODO: UI work.
Windows changes were specifically related to integrations and bibframe2 support. On the integrations side, I enabled SRU support when available and wrote a good deal of code to support holdings record manipulation in Alma. I’ll be exposing this functionality through the UI shortly. On the bibframe front, I added the ability to convert data using either the bibframe2 or bibframe1 profiles. Bibframe2 is obviously the default.

With both updates, I made significant changes to the Terminal and wrote up some new documentation. You can find the documentation, and information on how to leverage the terminal versions of MarcEdit at this location: The MarcEdit Field Guide: Working with MarcEdit’s command-line tools

Downloads can be picked up through the automated updating tool or from the downloads page at: http://marcedit.reeset.net/downloads

The Amnesiac Civilization: Part 5 / David Rosenthal

Part 2 and Part 3 of this series established that, for technical, legal and economic reasons there is much Web content that cannot be ingested and preserved by Web archives. Part 4 established that there is much Web content that can currently be ingested and preserved by public Web archives that, in the near future, will become inaccessible. It will be subject to Digital Rights Management (DRM) technologies which will, at least in most countries, be illegal to defeat. Below the fold I look at ways, albeit unsatisfactory, to address these problems.

There is a set of assumptions that underlies much of the discussion in Rick Whitt's "Through A Glass, Darkly" Technical, Policy, and Financial Actions to Avert the Coming Digital Dark Ages. For example, they are made explicit in this paragraph (page 195):
Kirchhoff has listed the key elements of a successful digital preservation program: an independent organization with a mission to carry out preservation; a sustainable economic model to support preservation activities over targeted timeframes; clear legal rights to preserve content; relationships with the content owners, and the content users; a preservation strategy and supporting technological infrastructure; and transparency about the key decisions.
The assumption that there is a singular "independent organization with a mission to carry out preservation" to which content is transferred so that it may be preserved is also at the heart of the OAIS model. As in almost all discussions of digital preservation, it is not surprising to see it here.

There are three essential aspects; the singular organization, its independence, and the transfer of content. They are related to, but not quite the same as, the three options Whitt sets out on page 209:
Digital preservation should be seen not as a commercial threat, but as a new marketplace opportunity, and even advantage. Some voluntary options include persuading content owners to (1) preserve the materials in their custody, (2) cede the rights to preserve to another entity; and/or (3) be willing to assume responsibility for preservation, through "escrow repositories" or "archives of last resort."
Lets look at each in turn.

Not Singular

If the preservation organization isn't singular at least some of it will be independent and there will be transfer of content. The LOCKSS system was designed to eliminate the single point of failure created by the singular organization. The LOCKSS Program provided software that enabled the transfer of content to multiple independent libraries, each taking custody of the content they purchased. This has had some success in the fairly simple case of academic journals and related materials, but it is fair to say that there are few other examples of similarly decentralized preservation systems in production use (Brian Hill at Ars Technica points to an off-the-wall exception).

Not singular solutions have several disadvantages to set against their lack of a single point of failure. They still need permission from the content owners, which except for the special case of LOCKSS tends to mean individual negotiation between each component and each publisher, raising costs significantly. And managing the components into a coherent whole can be like herding cats.

Not Independent

The CLOCKSS Archive is a real-world example of an "escrow repository". It ingests content from academic publishers and holds it in a dark archive. If the content ever becomes unavailable, it is triggered and made available under Creative Commons licenses. The content owners agree up-front to this contingency. It isn't really independent because, although in theory publishers and libraries share equally in the governance, in practice the publishers control and fund it. Experience suggests that content owners would not use escrow repositories that they don't in practice control.

"Escrow repositories" solve the IP and organizational problems, but still face the technical and cost problems. How would the "escrow repositories" actually ingest the flow of content from the content owners, and how would they make it accessible if it were ever to be triggered? How would these processes be funded? The CLOCKSS Archive is economically and technically feasible only because of the relatively limited scale of academic publishing. Doing the same for YouTube, for example, would be infeasible.

No Transfer

I was once in a meeting with major content owners and the Library of Congress at which it became clear to me that (a) hell would freeze over before these owners would hand a copy of their core digital assets to the Library, and (b) even after hell froze the Library would lack the ability or the resources to do anything useful with them. The Library's handling of the feed that Twitter donated is an example of (b). Whitt makes a related point on page 209:
In particular, some in the content community may perceive digital obsolescence not as a flaw to be fixed, but a feature to be embraced. After all, selling a single copy of content that theoretically could live on forever in a variety of futuristic incarnations does not appear quite as financially renumerative as leasing a copy of content that must be replaced, over and over, as technological innovationmarches on.
In the Web era, only a few cases of successful pay-per-view models are evident. Content that isn't advertiser-supported, ranging from academic journals to music to news and TV programs is much more likely to sold as an all-you-can-eat bundle. The more content available only as part of the bundle, the more valuable the bundle. Thus the obsession of content owners with maintaining control over the only accessible version of each item of content (see, for example, Sci-Hub), no matter how rarely accessed.

The scale of current Web publishing platforms, the size and growth rates of their content, and the enormous cash flow they generate all militate against the idea that their content, the asset that generates the cash flow, would be transferred to some third party for preservation. In this imperfect world the least bad solution may be some form of "preservation in place". As I wrote in The Half-Empty Archive discussing ways to reduce the cost of ingest, which is the largest cost component of preservation:
It is becoming clear that there is much important content that is too big, too dynamic, too proprietary or too DRM-ed for ingestion into an archive to be either feasible or affordable. In these cases where we simply can't ingest it, preserving it in place may be the best we can do; creating a legal framework in which the owner of the dataset commits, for some consideration such as a tax advantage, to preserve their data and allow scholars some suitable access. Of course, since the data will be under a single institution's control it will be a lot more vulnerable than we would like, but this type of arrangement is better than nothing, and not ingesting the content is certainly a lot cheaper than the alternative.
This approach has many disadvantages. It has a single point of failure. In effect preservation is at the whim of the content owner, because no-one will have standing, resources and motivation to sue in case the owner fails to deliver on their commitment. And note the connection between these ideas and Whitt's discussion of bankruptcy in Section III.C.2:
Bankruptcy laws typically treat tangible assets of a firm or individual as private property. This would include, for example, the software code, hardware, and other elements of an online business. When an entity files for bankruptcy, those assets would be subject to claims by creditors. The same arguably would be true of the third party digital materials stored by a data repository or cloud services provider. Without an explicit agreement in place that says otherwise, the courts may treat the data as part of the estate, or corporate assets, and thus not eligible to be returned to the content "owner."
But to set against these disadvantages there are two major advantages
  • As the earlier parts of this series show, there may be no technical or legal alternative for much important content.
  • Preservation in place allows for the survival of the entire publishing system, not just the content. Thus it mitigates the multiple version problem discussed in Part 3. Future readers can access the versions they are interested in by emulating the appropriate browser, device, person and location combinations.
I would argue that an urgent task should be to figure out the best approach we can to "preservation in place". A place to start might be the "preservation easement" approach take by land trusts, such as the Peninsula Open Space Trust in Silicon Valley. A viable approach would preserve more content at lower cost than any other.

OpenSRF 2.5.0 released / Evergreen ILS

We are pleased to announce the release of OpenSRF 2.5.0, a message routing network that offers scalability and failover support for individual services and entire servers with minimal development and deployment overhead.

New features in OpenSRF 2.5.0 include:

  • Support for message chunking, i.e., breaking up large OpenSRF messages across multiple XMPP envelopes.
  • The ability to detect the time zone of client applications and include it in messages passed to the server.
  • Dispatch mode for method_lookup subrequests.
  • Example configuration files for using NGINX or HAProxy as a reverse proxy for HTTP, HTTPS, and WebSockets traffic. This can be useful for Evergreen systems that wish to use port 443 for both HTTPS and secure WebSockets traffic.

OpenSRF includes various other improvements as detailed in the release notes.

OpenSRF 2.5.0 will be the minimum version of OpenSRF required for the upcoming release of Evergreen 2.12.

To download OpenSRF, please visit the downloads page.

We would also like to thank the following people who contributed to the release:

  • Ben Shum
  • Bill Erickson
  • Chris Sharp
  • Dan Scott
  • Galen Charlton
  • Jason Etheridge
  • Jason Stephenson
  • Jeff Davis
  • Kathy Lussier
  • Mike Rylander
  • Remington Steed

Announcing the DPLAfest 2017 Travel Award Recipients / DPLA

We are thrilled to officially introduce the five talented and diverse members of the extended DPLA community who will be attending DPLAfest 2017 in Chicago as recipients of the travel awards announced last month! We received a tremendous response to the call from many excellent members of our field and are grateful that in addition to the three travel awards initially announced, we are also able to welcome two members of the Greater Chicago cultural heritage community to the fest.

The selected awardees represent a broad cross-section of the DPLA community including graduate students and established professionals studying and working in public libraries, government institutions, and local colleges. Together, this group also serves diverse communities across the country, from Los Angeles to North Carolina.

Here are the folks to look for at DPLAfest:

Tommy Bui
Los Angeles Public Library

At the Los Angeles Public Library, Tommy Vinh Bui works to promote literacy and to bridge the information gap in his community. He encourages utilizing emerging technologies and guides stakeholders to become critical and self-aware consumers of information and teaches good information literacy. He holds an MLIS in Library and Information Science with an emphasis on Digital Assets Management. He previously worked with the Los Angeles County Metropolitan Transportation Authority in the Art and Design Department organizing their image collection and served in the Peace Corps abroad. Tommy Vinh Bui is enthused to be attending the conference and avers, “Attending DPLAFest allows me an ideal opportunity to network and collaborate with like-minded professionals and peers who are passionate about digital public libraries and the increasingly significant role they’ll play in creating a verdant and informed society.”

 

Amanda Davis
Charlotte Mecklenberg Library

Amanda H. Davis is an Adult Services Librarian at Charlotte Mecklenburg Library in North Carolina. She received her MLIS from Valdosta State University and is a proud ALA Spectrum Scholar and ARL Career Enhancement Program Fellow. Her professional interests include diversity in LIS, public librarianship, community building, and creative writing. She is excited about attending DPLAfest because of her interest in making sure her city’s diverse perspectives are meaningfully and sustainably recorded.

 

Raquel Flores-Clemons
Chicago State University

Raquel Flores-Clemons is the University Archivist and Director of Archives, Records Management, and Special Collections at Chicago State University. In this role, she manages over thirty collections that reflect the history of CSU as well as capture the historical narratives of South Side communities of Chicago. Raquel maintains a deep commitment to capturing the historical narratives of communities of color and has a strong research interest in hip hop and its use as a platform for social justice and change, as well as the use of hip hop pedagogy to enhance information literacy. Raquel is interested in attending DPLAfest to share the unique archival collections and digital projects happening at Chicago State University, as well as to connect with and learn from other LIS professionals to expand collaboration and techniques in preserving historical materials through digital means. Raquel also looks forward to engaging with other professionals who are working to elevate unknown histories.

 

Valerie Hawkins
Prairie State College

Prior to her current position at Prairie State College, Valerie Hawkins served as Library Reference Specialist in the ALA (American Library Association) Library at its headquarters in downtown Chicago, answering most of the questions that came in to its reference desk, from member and non-member libraries as well as from the public, for nearly twenty years. Valerie was on the front lines of the transition within librarianship to electronic and online communications, publications, resources, and tools. Valerie is also deeply interested in pop culture, performing arts, and media representations of African American history. She writes, “It’s greatly informed my deliberate moves to increase the visibility of works by people of color and other marginalized communities, including the disabled and LGBTQ, in a public e-newsletter I curate called ‘Diverse Books and Media.’” Of her interest in attending DPLAfest, Valerie says, “The past, present, and future of librarianship is digital. Once materials are digitized, the work has actually just begun, not ended.” At DPLAfest, she looks forward to engaging in discussion around questions of organizing and providing maximal access to digital collections as well as user experiences.

 

Nicole Umayam
Digital Arizona Library

Nicole Umayam works as a content and metadata analyst for the Digital Arizona Library. She also works as a corps member of the National Digital Inclusion Alliance to engage tribal and rural community stakeholders in Arizona in increasing home and public broadband access and digital literacy skills. She worked previously with tribal communities in Oklahoma on various endangered language revitalization projects, including building a digital community language archive and training community members in using technology for language documentation. Nicole holds an MLIS and an MA in Applied Linguistic Anthropology from the University of Oklahoma. Nicole says, “I am eager to attend DPLAfest to learn more about creating inclusive and culturally relevant metadata, increasing discoverability, and forging digital library partnerships. I hope to contribute to future efforts of providing equitable access to digitized cultural heritage resources for diverse communities.” Learn more about Nicole’s work at the Arizona Memory Project in this DPLAfest session.

 

Congratulations to all – we look forward to meeting you in Chicago next month!

 

Code4Lib 2017 and The Cobra Effect / OCLC Dev Network

I have recently returned from Code4Lib 2017 in Los Angeles. This was my first national Code4Lib, and I have brought back much more than a great t-shirt to our Dublin, Ohio, office.

Scientific Publisher Celebrates Open Data Day / Open Knowledge Foundation

This blog is part of the event report series on International Open Data Day 2017. On Saturday 4 March, groups from around the world organised over 300 events to celebrate, promote and spread the use of open data. 44 events received additional support through the Open Knowledge International mini-grants scheme, funded by SPARC, the Open Contracting Program of Hivos, Article 19, Hewlett Foundation and the UK Foreign & Commonwealth Office.  This event was supported through the mini-grants scheme under the Open Research theme.

The Electrochemical Society (ECS) hosted an Open Data Day event with assistance from Open Knowledge International and SPARC-Science.

The Electrochemical Society, a nonprofit, international, scientific publisher communicated with over 27,000 scientists about the importance of open data in the scientific disciplines between 2nd to 4th March. ECS encouraged the researchers contacted to take a survey to assess the interest and need for open data services in the scientific community, the knowledge gaps which existed, and responsiveness to open data tools.

Participants from 33 institutions, 14 countries, and six continents gave the scholarly publisher information about what they felt was necessary for open data to be successful in their field and what they didn’t know about open data concepts.

The Electrochemical Society has been exploring open research policies, including open access for their flagship journal, and other open science practices. Eager to contribute to the world of open data and data science the scientific society has been making strides to incorporate research projects which implement open data and data science practices in their publications.

In order to determine the next steps to socialising open data into the community, questions asked on the survey included:

  1. How often do you access data via repositories;
  2. how often do you deposit data into repositories;
  3. do you feel there are enough open notebook tools for this specific field of science;
  4. did you know what open data was before today;
  5. what concentrated areas of open data do you most contribute to?

ECS’s PRiME Meeting Hall where scientists from around the world came to openly discuss and share the results of their research. In October, ECS will host their annual Fall meeting which will introduce symposia on energy data science and open science, including open data and open access practices.

Outcomes

The event successfully enabled The Electrochemical Society to determine the needs of their constituents in the electrochemical and solid state science community in terms of open data and open science platforms.

The Society randomly selected survey participants and issued 20 open access article credits to allow 20 scholarly papers to be published completely free of charge and completely free to read.

The event led to the announcement of their contribution to an open research repository and the launch of a new open science tool.

ECS’s celebration of Open Data Day helped to determine gaps of knowledge in the field, assess the need for more open data tools, next steps for open science and open data within the organization, the anticipated publication of 20 new research papers, and, most importantly, an increased understanding of open data within their community.

ECS’s Open Data Day celebration is part of a larger initiative to incorporate open science practices into scientific and scholarly communications. You can learn more about the Free the Science initiative and why open research and open data is critical to the advancement of science, here. Below is also short video on the New Model for Scientific Publishing #FreetheScience !

Open data hackathon brings evidence on the right to health care / Open Knowledge Foundation

This blog is part of the event report series on International Open Data Day 2017. On Saturday 4 March, groups from around the world organised over 300 events to celebrate, promote and spread the use of open data. 44 events received additional support through the Open Knowledge International mini-grants scheme, funded by SPARC, the Open Contracting Program of Hivos, Article 19, Hewlett Foundation and the UK Foreign & Commonwealth Office. This event was part of the Human rights theme. 

This is an English translation of the Latvian blog at http://www.datuskola.lv/2017/03/05/atverto-datu-hakatona-laika-atklaj-vairakus-datos-balstitus-pieradijumus-sabiedribas-veselibas-nozare/

On Sunday 4th March during Open Data Day, six new data projects have started in a Data Hackathon. Three of them were dedicated to human rights, to access good healthcare and to find new solutions on what should be done to provide the accessibility for health. The teams have also been focusing on the rights to education for people with special needs to understand if inclusive education actually works in Riga.


 

The participants of the hackathon chose two projects as winners. One of the winning teams created an example for a tool that would allow patients to write in their diagnosis and see all compensated medicaments and the compensation amounts in the selected country. The new tool brings attention towards the support each individual could get in case of a rare disease.

Translation: 1st Graphic: Sum for one person
2nd Graphic: Number of patients
3rd Graphic: Sum for one person
Table – 1) Disease 2) Date 3) Number of patients 4) The amount of compensation 5) The sum for one person

 

The second best team addressed social security as a human right. The team analysed yields and bank fees across all Baltic states to compare investment profitability. The team compared situations in Baltic countries and concluded that from a 1000 euros salary, an individual gets the least in Estonia, but the most in Lithuania.

Translation: If gross salary is 1k euros, how much will 2nd pension pillars contributions earn in financial markets (720 euros in a year)?
Latvia: 22 euros (for you 22 euros + for bank 11 euros)
Lithuania: 32 euros (for you 25 euros + for bank 7 euros)
Estonia: 21 euros (for you 13 euros + for bank 8 euros)

 

Another team focused on health check availability, an important topic for the public health sector. They concluded with a map of pharmacies in Riga, the capital of Latvia, where there should be stations for men between 20-40 years old to check if they have been infected with HIV.

This year one team researched the budget in the field of defence. The team created a detailed view on how the state of Latvia has tried to reach for a defence budget of 2% of the gross domestic product (GDP).

The right to inclusive education has also been a topic this year and has been viewed from a data perspective. Even though there have been 34 million euros funding for inclusive education in the period 2014-2020, more and more children in Riga who are registered as individuals with mental and learning disorders are being separated from their peers and learn in special schools.

Besides these projects, the participants of the hackathon had a chance to see a presentation for a data project that has not been created during the hackathon, which gathered data about electromagnetic radiation in the centre of Riga. This data project discovered spots in a map where the radiation exceeds the norm, even as much as 5 times, and harms people health.

The work on the projects that has been started during Open Data hackathon continues: results will be prepared for publications in the media and for use in NGO projects.

This is already the second Open Data hackathon. We thank visualisation company Infogram, the Nordic Council of Ministers’ office in LatviaOpen Knowledge International and the UK Foreign & Commonwealth Office for their support to create this event.

Listen: Personas, Jobs to be Done, and LITA (18:08) / LibUX

Recently, LITA embarked on a big persona-making project in order to better align their services to the needs of their members, the results of which they just revealed. This provides a solid talking point to introduce conceptual problems with personas and introduce a potentially better-suited approach: jobs to be done.

  • 00:43 – LITA created a bunch of personas
  • 2:14 – What does LITA actually want?
  • 3:39 – Personas are more noise than signal
  • 5:37 – Personas are best as a demographic snapshot
  • 6:05 – The User Story
  • 7:35 – The Job Story
  • 8:04 – Jobs to be Done
  • 11:36 – So what jobs do LITA personas need done?
  • 14:04 – What should LITA do, then?
  • 15:44 – Support Metric: https://patreon.com/libux
  • 16:42 – How to enter for our giveaway: a copy of Practical Design Discovery by Dan Brown.

You can also  download the MP3 or subscribe to Metric: A UX Podcast on OverCastStitcher, iTunes, YouTube, Soundcloud, Google Music, or just plug our feed straight into your podcatcher of choice.

#ALAWO is tracking #SaveIMLS and collecting your stories / District Dispatch

Since 11 a.m. last Thursday (and as of 5 p.m. this afternoon), there have been 3,838 tweets under the #saveIMLS hashtag on Twitter. That is over 767 tweets a day. Or, sliced another way, there are currently 1,800 people who are participating in the conversation on Twitter. Anyway you dice it, we need this momentum to continue.

Right now, the ALA Washington Office is collecting your tweets and stories via TAGS, the Twitter Archiving Google Sheet. You can see the conversion as it has unfolded via this afternoon’s snapshot:

TAGS data visualization photo of networked #SaveIMLS conversation online.

#SaveIMLS conversation on Twitter from March 17 through March 20. The Washington Office is collecting your stories. View and explore the live version here.

As we march towards the next phase of the appropriations process, we need to keep IMLS at the center of the conversation. We need you to keep beating the drum and sharing your stories.

How can you tell an impactful story?

  • First, look up what IMLS does for you specifically. Search their database to see what they have funded in your zip code.
  • Then, pick a project (from the database or one you already know about) and tweet about it’s impact with the hashtag #saveIMLS. (Bonus points: Enter your zip code into GovTrack so you can find and tag your Senator or representative; their social media information is listed.)

While your “numbers” — how many computers, how many programs, how many books, how many patrons — are very important, the best kind of stories talk about how IMLS or LSTA funding has helped you to contribute to the “big picture.” A powerful story from your Congressional district can and will move mountains.

Here are some examples, from the 3,838 tweets, that we thought were great. Keep it coming!

Tweet text: "Because 1 in 5 Americans earning less than $30K a year have to rely on their smartphone for online access."

Tweet Text: "I am a librarian. We receive @US_IMLS $. What do I do? I train seniors in using technology to overcome age-related #digitaldivide #SaveIMLS."

Tweet Text: "Funding from @US_IMLS allowed us to empower students to be a self-directed and creative learner with technologies. #saveimls"

Stay tuned for more information, particularly as it pertains to the upcoming advocacy campaign around “Dear Appropriator” letters. Meanwhile, subscribe to our action alerts to ensure you receive the latest updates on the budget process.

The post #ALAWO is tracking #SaveIMLS and collecting your stories appeared first on District Dispatch.

LITA @ ALA Annual 2017 – Chicago / LITA

Early bird registration closes at noon, Wednesday March 22 central time.

Start making your plans for ALA Annual now by checking out all the great LITA events.

Go to the LITA at ALA Annual conference web page.

Attend the LITA President’s Program featuring Kameron Hurley
Sunday June 25, 2017 from 4:30 pm – 5:30 pm
Program Title: We are the Sum of our Stories

Kameron Hurley headshot

LITA President Aimee Fifarek welcomes Kameron Hurley, author of the essay collection The Geek Feminist Revolution, as well as the award-winning God’s War Trilogy and The Worldbreaker Saga. Hurley has won the Hugo Award, Kitschy Award, and Sydney J. Bounds Award for Best Newcomer. She was also a finalist for the Arthur C. Clarke Award, the Nebula Award, and the Gemmell Morningstar Award. Her short fiction has appeared in Popular Science Magazine, Lightspeed Magazine, and many anthologies. Hurley has written for The Atlantic, Entertainment Weekly, The Village Voice, Bitch Magazine, and Locus Magazine. She posts regularly at KameronHurley.com.

Register for ALA Annual and Discover Ticketed Events.

Sign up for the LITA AdaCamp preconference

Friday, June 23, 2017, 9:00 am – 4:00 pm
Northwestern University Libraries, Evanston, IL
Facilitators: Margaret Heller, Digital Services Librarian, Loyola University Chicago; Evviva Weinraub, Associate University Librarian for Digital Strategies, Northwestern University.

Women in technology face numerous challenges in their day-to-day work. If you would like to join other women in the field to discuss topics related to those challenges, AdaCamp is for you. This one-day LITA preconference during ALA Annual in Chicago will allow female-identifying individuals employed in various technological industries an opportunity to network with others in the field and to collectively examine common barriers faced.

Other Featured LITA Events Include

Top Technology Trends
Sunday, June 25, 2017, 1:00 pm – 2:30 pm

LITA’s premier program on changes and advances in technology. Top Technology Trends features our ongoing roundtable discussion about trends and advances in library technology by a panel of LITA technology experts and thought leaders. The panelists will describe changes and advances in technology that they see having an impact on the library world, and suggest what libraries might do to take advantage of these trends. This conference panelists and their suggested trends include:

  • Margaret Heller, Session Moderator, Digital Services Librarian, Loyola University Chicago
  • Emily Almond, Director of IT, Georgia Public Library Service
  • Marshall Breeding, Independent Consultant and Founder, Library Technology Guides
  • Vanessa Hannesschläger, Researcher, Austrian Centre for Digital Humanities/Austrian Academy of Sciences
  • Jenny Jing, Manager, Library Systems, Brandeis University Library
  • Veronda Pitchford, Director of Membership and Resource Sharing, Reaching Across Illinois Library System (RAILS)
  • Tara Radniecki, Engineering Librarian, University of Nevada, Reno

LITA Imagineering: Generation Gap: Science Fiction and Fantasy Authors Look at Youth and Technology
Saturday June 24, 2017, 1:00 pm – 2:30 pm

Join LITA, the Imagineering Interest Group, and Tor Books as a panel of Science Fiction and Fantasy authors discuss how their work can help explain and bridge the interests of generational gaps, as well as what it takes for a literary work to gain crossover appeal for both youth and adults. This year’s line up is slated to include:

  • Cory Doctorow
  • Annalee Newitz
  • V.E. Schwab
  • Susan Dennard

LITA Conference Kickoff
Friday June 23, 2017, 3:00 pm – 4:00 pm

Join current and prospective LITA members for an overview and informal conversation at the Library Information Technology Association (LITA) Conference Kickoff. All are welcome to meet LITA leaders, committee chairs, and interest group participants. Whether you are considering LITA membership for the first time, a long-time member looking to engage with others in your area, or anywhere in between, take part in great conversation and learn more about volunteer and networking opportunities at this meeting.

LITA Happy Hour
Sunday, June 25, 2017, 6:00 pm – 8:00 pm

LITA 50 YearsThis year the LITA Happy Hour continues the year long celebration of LITA’s 50th anniversary. Expect anniversary fun and games. Make sure you join the LITA Membership Development Committee and LITA members from around the country for networking, good cheer, and great fun! There will be lively conversation and excellent drinks; cash bar.

Find all the LITA programs and meetings using the online conference scheduler.

More Information about LITA conference events and Registration

Go to the LITA at ALA Annual Conference web page.

Open Data Durban celebrates Open Data Day building an Arduino weather station / Open Knowledge Foundation

This blog is part of the event report series on International Open Data Day 2017. On Saturday 4 March, groups from around the world organised over 300 events to celebrate, promote and spread the use of open data. 44 events received additional support through the Open Knowledge International mini-grants scheme, funded by SPARC, the Open Contracting Program of Hivos, Article 19, Hewlett Foundation and the UK Foreign & Commonwealth Office. This event was supported through the mini-grants scheme under the Open Environment theme.

This post was first published on Open Data Durban website: https://opendata.durban

Open Data Durban in partnership with The MakerSpace Foundation hosted International Open Data Day in a dual-charged effort to ignite openness and participation in the Durban community. Together with environmentalists, ecologists, data wranglers, techies and active citizens we built an Arduino weather station. According to Arduino.cc, “Arduino is an open-source electronics platform based on easy-to-use hardware and software.”

How did we promote diversity on the day?

On arrival, participants had to select different coloured stickers of what they thought represented their interest and skill set, choosing either a data wrangler; a maker; an environmentalist; techie and more importantly learners.  The latter being an obvious choice to Mondli, Nolwazi and Nosipho three learners from Umkhumbane Secondary School, located in Chesterville, a township on the periphery of Durban CBD. We invited the learners as part of our data club’s programme, where learners will also be building an Arduino weather station which will be rolling out soon.

It was essential that the teams were made up of each of the skill sets above to ensure:

  • the project speaks to the broader theme of an informed decision-making through the micro-weather station data;
  • participants are assisted in assembling the electronics;
  • the IoT device is programmed through code;
  • participants gain critical environmental insights towards the practical use of the tool;

and more importantly to enable and create a guild of new-age active citizens and evangelists of open knowledge.

Each team was provided with an Arduino weather kit consisting of dust, gas, temperature and rainfall sensors and all other relevant components to build the weather station. We did not provide the teams with step by step instructions for the build. Instead, we challenged them to google search the build instructions and figure out the steps. Within minutes, the teams were busy scouring for instructions from various websites such as Instructables. This emphasised the openness of sharing knowledge and introduced the learners to open knowledge and how someone from another place in the world can share their expertise with you.

What were some of the insights from the environmentalists?

Bruce and Lee, both retired ecologist and environmentalist respectively were charming in their approach to problem-solving and tinkering with the electronic parts. Although not well-versed in the Arduino toolkit, their gallant efforts saw them learning and later tutoring the learners on building the weather station.

Their insights into the environmental status of Durban was unmatched and painted a grim picture of the Durban community’s awareness of the problems that exist.

What were some of the insights from the techies?

Often at our events, we have a number of techies come in who are brilliant at coding but have no concept of data science or how coding can be used to address various issues such as economic, social and environmental. This event helped to introduce such techies to how coding Arduino boards and sensors can be used to gather weather condition data and how such data can be further used to monitor the weather conditions in a given area.

This data then allows the public to be aware of their weather conditions such as the concentrations of harmful gases in the air. The city can also map out pollution hotspots and identify trends which aid in decision making to eliminate or manage the air quality.

How did the learners participate in the session? What were some of their learnings?

There were many different languages spoken by the participants which made communication across all groups a challenge. However, the confidence and enthusiastic wanting to learn prompted the learners to ask some captivating questions for the group members more notably in their pursuit of understanding how things work in the space.

All the attendees were attempting to build the Arduino weather for the first time. The adult attendees were quite hesitant at first to share what they were doing with the learners because they were not certain if what they were doing was correct or not and did not want to confuse the learners. Once the adult attendees were confident with the method of building, they then began to communicate more with the learners.

Outcomes of the day

We eventually saw one complete weather station built by Sphe Shandu who stayed behind after some team members tinkered with other goods in the MakerSpace, minus the LCD component (no team figured this out).

Learnings

  1. Lend an extra hand to students that engage with maker spaces for the first time in an urban setting, they have a natural innate understanding of the moving parts (3D printers, laser cutter, electronics etc) in the MakerSpace and not necessarily the context of new-age manufacturing, practicality and potential outputs.
  2. After lunch, the teams became quite weary. Progress dived down but the teams managed to pull through and complete as much as they could. Long events tend to be vigorous at the beginning and hit a stall towards the end. A possible lesson learnt is to host much shorter events.
  3. Teachers need to be incentivised to attend the programme outside formal school learning.
  4. Parents prove to be the most difficult stakeholders to engage – although involved in their children’s learning they need to be engaged to attend such functions.
  5. For community events on Saturday it is most difficult to rally large attendance numbers.

 

 

Celebrating International Open Data Day in Chicago / Open Knowledge Foundation

This blog is part of the event report series on International Open Data Day 2017. On Saturday 4 March, groups from around the world organised over 300 events to celebrate, promote and spread the use of open data. 44 events received additional support through the Open Knowledge International mini-grants scheme, funded by SPARC, the Open Contracting Program of Hivos, Article 19, Hewlett Foundation and the UK Foreign & Commonwealth Office. This event was supported through the mini-grants scheme under the Open Research theme. 

This post was originally published on Teodora’s webpage: http://bit.ly/2nn6q0m

Open Data Day Chicago 2017 was a great experience for everyone.

I am so glad to be the organiser of the celebration of open data in Chicago this year. 

We were about 30 people, working on 6 projects. There were participants from UChicago (RCC, ImLab, Becker Friedman Institute), Smart Chicago Collaborative, Cook County, Google NYC, Open Source Policy Center, Cornell University, and others. Also, the background experience of the participants was very diverse: economics, software engineering, genomics, humanities, web development, and others. While the hackathon was programmed from 10 am to 5 pm, we worked on some projects until 8 pm.

More information about the event: http://wiki.opendataday.org/Chicago2017

As with any good hackathon, the day started with coffee and sandwiches. Thanks to Research Computing Center and SPARC-Science for sponsoring. 

With the opening festivities done, we started the day by presenting the 6 registered projects:

  1. Visualising Open Data of Chicago
  2. Taxbrain Web Application
  3. Open Data and Virtual Reality  
  4. Reddit Data Visualizer
  5. Computational methods for analysing large amounts of genomic data
  6. Exploring the data portal of Cook County

A member of each project team presented their project to the participants so anyone interested in the project could join the project team.

Kathleen Lynch, the Information Technology Communications Manager of Cook County Government presented their data portal and the work they are doing to support open data. She also presented the Chicago Hack Night event that takes places every Tuesday evening for people interested in supporting open data to meet, build, share, and learn about civic tech. After that, Josh Kalov, Consultant at Smart Chicago Collaborative presented the work they are doing to support Cook County Government, with a focus on access, skills, and data. Their work is currently focused on health, education, and justice.

Work on the projects then started. Using mainly datasets from the City of Chicago Data Portal, we analysed Red Light Traffic Violations in Chicago, and also the Beach Weather Stations. Here you can see a map we created in Python with the main locations in Chicago where the most red light traffic violations occur. Of course, the next step will be to label the locations.

We also created an application to visualise charts on Virtual Reality devices like Google cardboards. We used Three.js and D3 to create the 3D charts and Google Chrome VR.

We designed some graphical widgets for TaxBrain web application, a platform for accessing the open-source tax models. Also, we learned about Tax-Calculator, a tool that computes federal individual income taxes and Federal Insurance Contribution Act (FICA) taxes for a sample of tax filing units in years beginning with 2013.

We also discussed how we can integrate the Reddit Data Visualizer with other open datasets:

Professor Hae Kyung Im from Im-Lab at UChicago led a discussion on the Genomic Data Commons Data Portal and the prediction models offered by the tool which was developed by MetaXcan.

The projects that we worked on are all available on OpenDataDayChicago2017‘s Github.

After all the hard work during the hackathon, we decided to continue working after hours on some of the projects. The projects we worked were later presented at Chicago Hack Night.

All in all, the day was productive, entertaining, and educational. We celebrated open data in a pleasant way and good friendships were founded and strengthened.

Testing DASH and HLS Streams on Linux / Jason Ronallo

I’m developing a script to process some digitized and born digital video into adaptive bitrate formats. As I’ve gone along trying different approaches for creating these streams, I’ve wanted reliable ways to test them on a Linux desktop. I keep forgetting how I can effectively test DASH and HLS adaptive bitrate streams I’ve created, so I’m jotting down some notes here as a reminder. I’ll list both the local and online players that you can try.

While I’m writing about testing both DASH and HLS adaptive bitrate formats, really we need to consider 3 formats as HLS can be delivered as MPEG-2 TS segments or fragmented MP4 (fMP4). Since mid-2016 and iOS 10+ HLS segments can be delivered as fMP4. This now allows you to use the same fragmented MP4 files for both DASH and HLS streaming. Until uptake of iOS 10 is greater you likely still need to deliver video with HLS-TS as well (or go with an HLS-TS everywhere approach). While DASH can use any codec I’ll only be testing fragmented MP4s (though maybe not fully conformant to DASH-IF AVC/264 interoperability points). So I’ll break down testing by DASH, HLS TS, and HLS fMP4 when applicable.

The important thing to remember is that you’re not playing back a video file directly. Instead these formats use a manifest file which lists out the various adaptations–different resolutions and bitrates–that a client can choose to play based on bandwidth and other factors. So what we want to accomplish is the ability to play back video by referring to the manifest file instead of any particular video file or files. In some cases the video files will be self-contained, muxed video and audio and byte range requests will be used to serve up segments, but in other cases the video is segmented with the audio in either a separate single file or again the audio segmented similar to the video. In fact depending on how the actual video files are created they may even lack data necessary to play back independent of another file. For instance it is possible to create a separate initialization MP4 file that includes the metadata that allows a client to know how to play back each of the segment files that lack this information. Also, all of these files are intended to be served up over HTTP. They can also include links to text tracks like captions and subtitles. Support for captions in these formats is lacking for many HTML5 players.

Also note that all this testing is being done on Ubuntu 16.04.1 LTS though the Xubuntu variant and it is possible I’ve compiled some of these tools myself (like ffmpeg) rather than using the version in the Ubuntu repositories.

Playing Manifests Directly

I had hoped that it would be fairly easy to test these formats directly without putting them behind a web server. Here’s what I discovered about playing the files without a web server.

GUI Players

Players like VLC and other desktop players have limited support for these formats, so even when they don’t work in these players that doesn’t mean the streams won’t play in a browser or on a mobile device. I’ve had very little luck using these directly from the file system. Assume for this post that I’m already in a directory with the video manifest files: cd /directory/with/video

So this doesn’t work for a DASH manifest (Media Presentation Description): vlc stream.mpd

Neither does this for an HLS-TS manifest: vlc master.m3u8

In the case of HLS it looks like VLC is not respecting relative paths the way it needs to. Some players appear like they’re trying to play HLS, but I haven’t found a Linux GUI player yet that can play the stream directly from the file sytem like this yet. Suggestions?

Command Line Players

DASH

Local testing of DASH can be done with the GPAC MP4Client: MP4Client stream.mpd

This works and can tell you if it is basically working and a separate audio file is synced, but only appears to show the first adaptation. I also have some times when it will not play a DASH stream that plays just fine elsewhere. It will not show you whether the sidecar captions are working and I’ve not been able to use MP4Client to figure out whether the adaptations are set up correctly. Will the video sources actually switch with restricted bandwidth? There’s a command line option for this but I can’t see that it works.

HLS

For HLS-TS it is possible to use the ffplay media player that uses the ffmpeg libraries. It has some of the same limitations as MP4Client as far as testing adaptations and captions. The ffplay player won’t work though for HLS-fMP4 or MPEG-DASH.

Other Command Line Players

The mpv media player is based on MPlayer and mplayer2 and can play back both HLS-TS and HLS-fMP4 streams, but not DASH. It also has some nice overlay controls for navigating through a video including knowing about various audio tracks. Just use it with mpv master.m3u8. The mplayer player also works, but seems to choose only one adaptation (the lowest bitrate or the first in the list?) and does not have overlay controls. It doesn’t seem to recognize the sidecar captions included in the HLS-TS manifest.

Behind a Web Server

One simple solution to be able to use other players is to put the files behind a web server. While local players may work, these formats are really intended to be streamed over HTTP. I usually do this by installing Apache and allowing symlinks. I then symlink from the web root to the temporary directory where I’m generating various ABR files. If you don’t want to set up Apache you can also try web-server-chrome which works well in the cases I’ve tested (h/t @Bigggggg_Al).

GUI Players & HTTP

I’ve found that the GStreamer based Parole media player included with XFCE can play DASH and HLS-TS streams just fine. It does appear to adapt to higher bitrate versions as it plays along, but Parole cannot play HLS-fMP4 streams yet.

To play a DASH stream: parole http://localhost/pets/fmp4/stream.mpd

To play an HLS-TS stream: parole http://localhost/pets/hls/master.m3u8

Are there other Linux GUIs that are known to work?

Command Line Players & HTTP

ffplay and MP4Client also work with localhost URLs. ffplay can play HLS-TS streams. MP4Client can play DASH and HLS-TS streams, but for HLS-TS it seems to not play the audio.

Online Players

And once you have a stream already served up from a local web server, there are online test players that you can use. No need to open up a port on your machine since all the requests are made by the browser to the local server which it already has access to. This is more cumbersome with copy/paste work, but is probably the best way to determine if the stream will play in Firefox and Chromium. The main thing you’ll need to do is set CORS headers appropriately. If you have any problems with this check your browser console to see what errors you’re getting. Besides the standard Access-Control-Allow-Origin “*” for some players you may need to set headers to accept pre-flight Access-Control-Allow-Headers like “Range” for byte range requests.

The Bitmovin MPEG-DASH & HLS Test Player requires that you select whether the source is DASH or HLS-TS (or progressive download). Even though Linux desktop browsers do not natively support playing HLS-TS this player can repackage the TS segments so that they can be played back as MP4. This player does not work with HLS-fMP4 streams, though. Captions that are included in the DASH or HLS manifests can be displayed by clicking on the gear icon, though there’s some kind of double-render issue with the DASH manifests I’ve tested.

Really when you’re delivering DASH you’re probably using dash.js underneath in most cases so testing that player is useful. The DASH-264 JavaScript Reference Client Player has a lot of nice features like allowing the user to select the adaptation to play and display of various metrics about the video and audio buffers and the bitrate that is being downloaded. Once you have some files in production this can be helpful for seeing how well your server is performing. Captions that are included in the DASH manifest can be displayed.

The videojs-contrib-hls project has a demo VideoJS HLS player that includes support for fMP4.

The hls.js player has a great demo site for each version that has a lot of options to test quality control and show other metrics. Change the version number in the URL to the latest version. The other nice part about this demo page is that you can just add a src parameter to the URL with the localhost URL you want to test. I could not get hls.js to work with HLS-fMP4 streams, though there is an issue to add fMP4 support. Captions do not seem to be enabled.

There is also the JW Player Stream Tester. But since I don’t have a cert for my local server I need to use the JW Player HTTP stream tester instead of the HTTPS one. I was successfully able to test a DASH and HLS-TS streams with this tool. Captions only displayed for the HLS stream.

The commercial Radiant media player has a DASH and HLS tester than can be controlled with URL parameters. I’m not sure why the streaming type needs to be selected first, but otherwise it works well. It knows how to handle DASH captions but not HLS ones, and it does not work with HLS-fMP4.

The commercial THEOplayer HLS and DASH testing tool only worked for my HLS-TS stream and not the DASH or HLS-fMP4 streams I’ve tested. Maybe it was the test examples given, but even their own examples did not adapt well and had buffering issues.

Wowza has a page for video test players but it seems to require a local Wowza server be set up.

What other demo players are there online that can be used to test ABR streams?

I’ve also created a little DASH tester using Plyr and dash.js. You can either enter a URL to an MPD into the input or append a src parameter with the URL to the MPD to the test page URL. To make it even easier to use, I created a short script that allows me to launch it from a terminal just by giving it the MPD URL. This approach could be used for a couple of the other demos above as well.

One gap in my testing so far is the Shaka player. They have a demo site, but it doesn’t allow enabling an arbitrary stream.

Other Tools for ABR Testing

In order to test automatic bitrate switching it is useful to test that bandwidth switching is working. Latest Chromium and Firefox nightly both have tools built into their developer tools to simulate different bandwidth conditions. In Chromium this is under the network tab and in Firefox nightly it is only accessible when turning on the mobile/responsive view. If you set the bandwidth to 2G you ought to see network requests for a low bitrate adaptation, and if you change it to wifi it ought to adapt to a high bitrate adaptation.

Summary

There are decent tools to test HLS and MPEG-DASH while working on a Linux desktop. I prefer using command line tools like MP4Client (DASH) and mpv (HLS-TS, HLS-fMP4) for quick tests that the video and audio are packaged correctly and that the files are organized and named correctly. These two tools cover both formats and can be launched quickly from a terminal.

I plan on taking a DASH-first approach, and for desktop testing I prefer to test in video.js if caption tracks are added as track elements. With contributed plugins it is possible to test DASH and HLS-TS in browsers. I like testing with Plyr (with my modifications) if the caption file is included in DASH manifest since Plyr was easy to hack to make this work. For HLS-fMP4 (and even HLS-TS) there’s really no substitute to testing on an iOS device (and for HLS-fMP4 on an iOS 10+ device) as the native player may be used in full screen mode.

IMLS support for free and open source software / Galen Charlton

The Institute of Museum and Library Services is the U.S. government’s primary vehicle for direct federal support of libraries, museums, and archives across the entire country. It should come as no surprise that the Trump administration’s “budget blueprint” proposes to wipe it out, along with the NEA, NEH, Meals on Wheels, and dozens of other programs.

While there is reason for hope that Congress will ignore at least some of the cuts that Trump proposes, the IMLS in particular has been in the sights of House Speaker Paul Ryan before. We cannot afford to be complacent.

Loss of the IMLS and the funding it delivers would be a disaster for many reasons, but I’ll focus on just one: the IMLS has paid a significant role in funding in the creation and use of free and open source software for libraries, museums, and archives. Besides the direct benefit to the institutions who were awarded grants to build or use F/LOSS, such grants are a smart investment on the part of an IMLS: a dollar spent on producing software that anybody can freely use can rebound to the benefit of many more libraries.

For example, here is a list of some of the software projects whose creation or enhancement was funded by an IMLS grant:

This is only a partial list; it does not include LSTA funding that libraries may have used to either implement or enhance F/LOSS systems or money that libraries contributed to F/LOSS development as part of a broader grant project.

IMLS has also funded some open source projects that ultimately… went nowhere. But that’s OK; IMLS funding is one way that libraries can afford to experiment.

Do you or your institution use any of this software? Would you miss it if it were gone — or never existed — or was only available in some proprietary form? If so… write your congressional legislators today.

Inspired by music: a copyright history / District Dispatch

I started to work for ALA as a copyright specialist during the Eldred vs. Ashcroft public domain court battle that ultimately went to the Supreme Court. The question was whether the recent extension of the copyright term under the Sonny Bono Copyright Term Extension Act of 1998 from life plus 50 years to life plus 70 years was constitutional. In a 7-2 ruling, the Court said that the term was constitutional and that Congress could determine any term of copyright as long as it was not forever. Even one day less than forever met the definition of “limited times” in the Copyright Clause. I was shattered because I was sure we were going to win. Naïve me.Cover of Theft! A History of Music, a graphic novel

ALA was one of the amici that supported Eric Eldred, an Internet publisher who relied on public domain materials for his business. A lot can be said about the case and a lot has been written. I have argued that the silver lining of the disastrous ruling was the formation of the Duke Center for the Study of the Public Domain, Creative Commons and other open licensing movements. The ruling also led the publication of comic book called Bound by Law? Tales from the Public Domain by James Boyle and Keith Aoki. It is a great book that should be in the collection of every library.

This year, there is another book by Boyle, Aoki and Jennifer Jenkins, that should be in the collection of every library. It’s called Theft: A History of Music. It examines the certainty that music could not written without relying on music that was created before—the “standing on the shoulders of giants” idea. There’s a great documentary called John Lennon’s Jukebox that illustrates how music that Lennon loved—rock n’ roll records from the United States—ended up in his music. This music inspired him to be a musician. Its creativity planted the seeds for his own creativity. You can hear a riff on the intro of Richie Barrett’s “Some Other Guy” on “Instant Karma.” That’s cool. (Meanwhile, we see court cases like Blurred Lines and Stairway to Heaven.)

Theft: A History of Music is a labor of love as well as a primer on copyright overall. If you are teaching copyright to librarians or students, this might be the only required text that you assign.

Available online under a Creative Commons license and in print. Here’s a video teaser.

The post Inspired by music: a copyright history appeared first on District Dispatch.

Look Back, Move Forward: Freedom of Information Day / District Dispatch

Senator Tester for accepting the James Madison Award, given to those who have protected public access to gov information.

Senator Tester accepting the James Madison Award at the Newseum in Washington, D.C. The award is given to those who have worked to protect public access to government information.

At the tail end of this year’s #SunshineWeek, let’s take a quick moment to #FlashbackFriday (or should we say #FOIAFriday?) to 29 years ago yesterday, when the American Library Association began celebrating Freedom of Information Day. In honor of the day this year, ALA presented U.S. Senator Jon Tester of Montana with the 2017 James Madison Award for his advocacy for public access to government information. Upon accepting the award, Senator Tester gave a short speech, which you can watch here.

“It is a true honor to receive this award. Throughout my time in the U.S. Senate, I have made it a priority to bring more transparency and accountability to Washington. By shedding more light across the federal government and holding officials more accountable, we can eliminate waste and ensure that folks in Washington, D.C. are working more efficiently on behalf of all Americans.”

At the ceremony, Senator Tester affirmed his longstanding commitment to increasing public access to information by formally announcing the launch of the Senate Transparency Caucus, which aims to shed more light on federal agencies and hold the federal government more accountable to taxpayers.

Earlier this week, Senator Tester also reintroduced the Public Online Information Act, which aims to make all public records from the Executive Branch permanently available on the Internet in a searchable database at no cost to constituents.” In other words, this bill (if enacted) would cement the simple concept we know to be true: in the 21st century, public means online.

In honor of Senator Tester, here is a look back at the origins of ALA’s Freedom of Information Day: a 1988 resolution signed by Council to honor the memory of James Madison.

1988 Resolution on Freedom of Information Day; Resolved, that American Library Association encourage librarians throughout the country to bring the issues of freedom of information and barriers to information access into public consciousness and public debate by mounting appropriate information programs within libraries and their communities on March 16.

1988 Resolution on Freedom of Information Day.

 

The post Look Back, Move Forward: Freedom of Information Day appeared first on District Dispatch.

The Amnesiac Civilization: Part 4 / David Rosenthal

Part 2 and Part 3 of this series covered the unsatisfactory current state of Web archiving. Part 1 of this series briefly outlined the way the W3C's Encrypted Media Extensions (EME) threaten to make this state far worse. Below the fold I expand on the details of this threat.

The W3C's abstract describes EME thus:
This proposal extends HTMLMediaElement [HTML5] providing APIs to control playback of encrypted content.

The API supports use cases ranging from simple clear key decryption to high value video (given an appropriate user agent implementation). License/key exchange is controlled by the application, facilitating the development of robust playback applications supporting a range of content decryption and protection technologies.
The next paragraph is misleading; EME not merely enables DRM, it mandates at least an (insecure) baseline implementation of encrypted content:
This specification does not define a content protection or Digital Rights Management system. Rather, it defines a common API that may be used to discover, select and interact with such systems as well as with simpler content encryption systems. Implementation of Digital Rights Management is not required for compliance with this specification: only the Clear Key system is required to be implemented as a common baseline.
The Clear Key system requires that content be encrypted, but the keys to decrypt it are passed in cleartext. I will return to the implications of this requirement.

EME data flows
The W3C's diagram of the EME stack shows an example of how it works. An application, i.e. a Web page, requests the browser to render some encrypted content. It is delivered, in this case from a Content Distribution Network (CDN), to the browser. The browser needs a license to decrypt it, which it obtains from the application via the EME API by creating an appropriate session then using it to request the license. It hands the content and the license to a Content Decryption Module (CDM), which can decrypt the content using a key in the license and render it.

What is DRM trying to achieve? Ostensibly, it is trying to ensure that each time DRM-ed content is rendered, specific permission is obtained from the content owner. In order to ensure that, the CDM cannot trust the browser it is running in. For example, it must be sure that the browser can see neither the decrypted content nor the key. If it could see, and save for future use, either it would defeat the purpose of DRM.

The CDM is running in an environment controlled by the user, so the mechanisms a DRM implementation uses to obscure the decrypted content and the key from the environment are relatively easy to subvert. This is why in practice most DRM technologies are "cracked" fairly quickly after deployment. As Bunnie Huang's amazing book about cracking the DRM of the original Xbox shows, it is very hard to defeat a determined reverse engineer.

Content owners are not stupid. They realized early on that the search for uncrackable DRM was a fool's errand. So, to deter reverse engineering, they arranged for the 1998 Digital Millenium Copyright Act (DMCA) to make any attempt to circumvent protections on digital content a criminal offense. Cory Doctorow explains what this strategy achieves:
So if DRM isn't anti-piracy, what is it? DRM isn't really a technology at all, it's a law. Specifically, it's section 1201 of the US DMCA (and its international equivalents). Under this law, breaking DRM is a crime with serious consequences (5 years in prison and a $500,000 fine for a first offense), even if you're doing something that would otherwise be legal. This lets companies treat their commercial strategies as legal obligations: Netflix doesn't have the legal right to stop you from recording a show to watch later, but they can add DRM that makes it impossible to do so without falling afoul of DMCA.

This is the key: DRM makes it possible for companies to ban all unauthorized conduct, even when we're talking about using your own property in legal ways. This intrudes on your life in three ways:
  1. It lets companies sue and threaten security researchers who find defects in products
  2. It lets companies sue and threaten accessibility workers who adapt technology for use by disabled people
  3. It lets companies sue and threaten competitors who want to let you do more with your property -- get it repaired by independent technicians, buy third-party parts and consumables, or use it in ways that the manufacturer just doesn't like.
Of course, among the "ways that the manufacturer just doesn't like" can be archiving.

IANAL, but I do not believe that it is a defense under the DMCA that the "protections" in question are made of tissue paper. Thus, for example, it is likely that even an attempt to reverse-engineer an implementation of EME's Clear Key system in order to preserve the plaintext of some encrypted content would risk severe criminal penalties. Would an open source implementation of Clear Key be legal?

It is this interaction between even purely nominal DRM mechanisms and the DMCA that has roused opposition to EME. J. M. Porup's A battle rages for the future of the Web is an excellent overview of the opposition and its calls on Tim Berners-Lee to decry EME. Once he had endorsed it, Glyn Moody wrote a blistering takedown of his reasoning in Tim Berners-Lee Endorses DRM In HTML5, Offers Depressingly Weak Defense Of His Decision. He points to the most serious problem EME causes:
Also deeply disappointing is Berners-Lee's failure to recognize the seriousness of the threat that EME represents to security researchers. The problem is that once DRM enters the equation, the DMCA comes into play, with heavy penalties for those who dare to reveal flaws, as the EFF explained two years ago.
How do we know that this is the most serious problem? Because, like all the other code running in your browser, the DRM implementations have flaws and vulnerabilities. For example:
Google's CDM is Widevine, a technology it acquired in 2010. David Livshits, a security researchers at Ben-Gurion University and Alexandra Mikityuk from Berlin's Telekom Innovation Laboratories, discovered a vulnerability in the path from the CDM to the browser, which allows them to capture and save videos after they've been decrypted. They've reported this bug to Google, and have revealed some proof-of-concept materials now showing how it worked (they've withheld some information while they wait for Google to issue a fix).

Widevine is also used by Opera and Firefox (Firefox also uses a CDM from Adobe).

Under German law -- derived from Article 6 of the EUCD -- Mikityuk could face criminal and civil liability for revealing this defect, as it gives assistance to people wishing to circumvent Widevine. Livshits has less risk, as Israel is one of the few major US trading partners that has not implemented an "anti-circumvention" law, modelled on the US DMCA and spread by the US Trade Representative to most of the world.
Note that we (and Google) only know about this flaw because one researcher was foolhardy and another was from Israel. Many other flaws remain unrevealed:
The researchers who revealed the Widevine/Chrome defect say that it was likely present in the browser for more than five years, but are nevertheless the first people to come forward with information about its flaws. As many esteemed security researchers from industry and academe told the Copyright Office last summer, they routinely discover bugs like this, but don't come forward, because of the potential liability from anti-circumvention law.
Glyn Moody again:
The EFF came up with a simple solution that would at least have limited the damage the DMCA inflicts here:
a binding promise that W3C members would have to sign as a condition of continuing the DRM work at the W3C, and once they do, they not be able to use the DMCA or laws like it to threaten security researchers.
Alas, Cory Doctorow again:
How do we know that companies only want DRM because they want to abuse this law, and not because they want to fight piracy? Because they told us so. At the W3C, we proposed a compromise: companies who participate at W3C would be allowed to use it to make DRM, but would have to promise not to invoke the DMCA in these ways that have nothing to do with piracy. So far, nearly 50 W3C members -- everyone from Ethereum to Brave to the Royal National Institute for Bind People to Lawrence Berkeley National Labs -- have endorsed this, and all the DRM-supporting members have rejected it.

In effect, these members are saying, "We understand that DRM isn't very useful for stopping piracy, but that law that lets us sue people who aren't breaking copyright law? Don't take that away!"
Its not as though, as an educated Web user, you can decide that you don't want to take the risks inherent in using a browser that doesn't trust you, or the security researchers you depend upon. In theory Web DRM is optional, but in practice it isn't. Lucian Armasu at Tom's Hardware explains:
The next stable version of Chrome (Chrome 57) will not allow users to disable the Widevine DRM plugin anymore, therefore making it an always-on, permanent feature of Chrome. The new version of Chrome will also eliminate the “chrome://plugins” internal URL, which means if you want to disable Flash, you’ll have to do it from the Settings page.
You definitely want to disable Flash. To further "optimize the user experience":
So far only the Flash plugin can be disabled in the Chrome Settings page, but there is no setting to disable the Widevine DRM plugin, nor the PDF viewer and the Native Client plugins. PDF readers, including the ones that are built into browsers, are major targets for malicious hackers. PDF is a “powerful” file format that’s used by many, and it allows hackers to do all sorts of things given the right vulnerability.

People who prefer to open their PDF files in a better sandboxed environment or with a more secure PDF reader, rather than in Chrome, will not be able to do that anymore. All PDF files will always open in Chrome’s PDF viewer, starting with Chrome 57.
But that's not what I came to tell you about. Came to talk about the draft archiving.

I fully appreciate the seriousness of the security threat posed by EME, but it tends to overwhelm discussion of EME's other impacts. I have long been concerned about the impact of Digital Rights Management on archiving. I first wrote about the way HTML5 theoretically enabled DRM for the Web in 2011's Moonalice plays Palo Alto:
Another way of expressing the same thought is that HTML5 allows content owners to implement a semi-effective form of DRM for the Web.
That was then, but now theory is practice. Once again, Glyn Moody is right on target:
One of the biggest problems with the defense of his position is that Berners-Lee acknowledges only in passing one of the most serious threats that DRM in HTML5 represents to the open Web. Talking about concerns that DRM for videos could spread to text, he writes:
For books, yes this could be a problem, because there have been a large number of closed non-web devices which people are used to, and for which the publishers are used to using DRM. For many the physical devices have been replaced by apps, including DRM, on general purpose devices like closed phones or open computers. We can hope that the industry, in moving to a web model, will also give up DRM, but it isn't clear.
So he admits that EME may well be used for locking down e-book texts online. But there is no difference between an e-book text and a Web page, so Berners-Lee is tacitly admitting that DRM could be applied to basic Web pages. An EFF post spelt out what that would mean in practice:
A Web where you cannot cut and paste text; where your browser can't "Save As..." an image; where the "allowed" uses of saved files are monitored beyond the browser; where JavaScript is sealed away in opaque tombs; and maybe even where we can no longer effectively "View Source" on some sites, is a very different Web from the one we have today.
It's also totally different from the Web that Berners-Lee invented in 1989, and then generously gave away for the world to enjoy and develop. It's truly sad to see him acquiescing in a move that could destroy the very thing that made the Web such a wonderfully rich and universal medium -- its openness.
The EFF's post (from 2013) had several examples of EME "mission creep" beyond satisfying Netflix:
Just five years ago, font companies tried to demand DRM-like standards for embedded Web fonts. These Web typography wars fizzled out without the adoption of these restrictions, but now that such technical restrictions are clearly "in scope," why wouldn't typographers come back with an argument for new limits on what browsers can do?

Indeed, within a few weeks of EME hitting the headlines, a community group within W3C formed around the idea of locking away Web code, so that Web applications could only be executed but not examined online. Static image creators such as photographers are eager for the W3C to help lock down embedded images. Shortly after our Tokyo discussions, another group proposed their new W3C use-case: "protecting" content that had been saved locally from a Web page from being accessed without further restrictions. Meanwhile, publishers have advocated that HTML textual content should have DRM features for many years.
Web archiving consists of:
content ... saved locally from a Web page ... being accessed without further restrictions.
It appears that the W3C's EME will become, in effect, a mandatory feature of the Web. Obviously, the first effect is that much Web video will be DRM-ed, making it impossible to collect in replayable form and thus preserve. Google's making Chrome's video DRM impossible to disable suggests that YouTube video will be DRM-ed. Even a decade ago, to study US elections you needed YouTube video.

But that's not the big impact that EME will have on society's memory. It will spread to other forms of content. The business models for Web content are of two kinds, and both are struggling:
  • Paywalled content. It turns out that, apart from movies and academic publishing, only a very few premium brands such as The Economist, the Wall Street Journal and the New York Times have viable subscription business models based on (mostly) paywalled content. Even excellent journalism such as The Guardian is reduced to free access, advertising and voluntary donations. Part of the reason is that Googling the headline of paywalled news stories often finds open access versions of the content. Clearly, newspapers and academic publishers would love to use Web DRM to ensure that their content could be accessed only from their site, not via Google or Sci-Hub.
  • Advertising-supported content. The market for Web advertising is so competitive and fraud-ridden that Web sites have been forced into letting advertisers run ads that are so obnoxious and indeed riddled with malware, and to load up their sites with trackers, that many users have rebelled and use ad-blockers. These days it is pretty much essential to do so, to keep yourself safe and to reduce bandwidth consumption. Sites are very worried about the loss of income from blocked ads. Some, such as Forbes, refuse to supply content to browsers that block ads (which, in Forbes case, turned out to be a public service; the ads carried malware). DRM-ing a site's content will prevent ads being blocked. Thus ad space on DRM-ed sites will be more profitable, and sell for higher prices, than space on sites where ads can be blocked. The pressure on advertising-supported sites, which include both free and subscription news sites, to DRM their content will be intense.
Thus the advertising-supported bulk of what we think of as the Web, and the paywalled resources such as news sites that future scholars will need will become un-archivable. Kalev Leetaru will need to add a fourth, even more outraged, item to his list of complaints about Web archives.

The prospect for academic journals is somewhat less dire. Because the profit margins of the big publishers are so outrageous, and because charging extortionate subscriptions for access to the fruits of publicly and charitably-funded research so hard to justify, they are willing to acquiesce in the archiving of their content provided it doesn't threaten their bottom line. The big publishers typically supply archives such as Portico and CLOCKSS with content through non-Web channels. CLOCKSS is a dark archive, so is no threat to the bottom line. Portico's post-cancellation and audit facilities can potentially leak content, so Portico will come under pressure to DRM content supplied to its subscribers.

Almost all the world's Web archiving technology is based on Linux or other Open Source operating systems. There is a good reason for this, as I wrote back in 2014:
One thing it should be easy to agree on about digital preservation is that you have to do it with open-source software; closed-source preservation has the same fatal "just trust me" aspect that closed-source encryption (and cloud storage) suffer from.
Lucian Armasu at Tom's Hardware understands the issue:
there may also be an oligopoly issue, because the content market will depend on four, and perhaps soon only three, major DRM services players: Google, Microsoft, and Apple. All of these companies have their own operating systems, so there is also less incentive for them to support other platforms in their DRM solutions.

What that means in practice is that if you choose to use a certain Linux distribution or some completely new operating system, you may not be able to play protected content, unless Google, Microsoft, or Apple decide to make their DRM work on that platform, too.
So it may not even be possible for Web archives to render the content even if the owner wished to give them permission.

Creating awareness about Open Data in Kyambogo University, Uganda / Open Knowledge Foundation

This blog is part of the event report series on International Open Data Day 2017. On Saturday 4 March, groups from around the world organised over 300 events to celebrate, promote and spread the use of open data. 44 events received additional support through the Open Knowledge International mini-grants scheme, funded by SPARC, the Open Contracting Program of Hivos, Article 19, Hewlett Foundation and the UK Foreign & Commonwealth Office. This event was supported through the mini-grants scheme under the Open Research theme.

Kyambogo University’s Open Data Day event was about creating awareness of open data within the University community.

Held in the Library’s Computer lab on 4th March 2017, the event included presentations on open data and open access resources; an exhibition of open access library resources and a bonus event – tour of the library service centres. It was attended by librarians, academic staff, and students drawn from different faculties of the University.

Participants registering for the open data day event

The event kicked off with a presentation about open data by Mr Wangwe Isimail, a computer technician in Kyambogo University Library. He covered the following topics: What is open data, Kinds of open data, who can open data? Why open data? Key features of openness. How to open data and Top 21 data sources.

He briefed the participants on an open access workshop that was organised by the Kyambogo University Library in June 2016 which was attended by librarians, deans of faculties, lecturers, researchers and graduate students. The open access workshop was facilitated by Mr David Ball, a SPARC Europe Project Officer for PASTEUR4OA and FOSTER [European Union projects]. Mr Wangwe, in his presentation, emphasised the importance of open data as another element of open science in addition to open access and open source. Hopefully, in the future, the library will organise a workshop on open source too.

Mr Wangwe Isimail delivering a presentation on open data

At the end of the presentation, participants were asked to work in groups of five to discuss what Kyambogo University can contribute towards open access. Participants demonstrated an understanding of initiatives to promote open data. They suggested:

  • Increasing the participation on world open data day celebrations so as to increase awareness to a wider audience
  • Set up a data repository (Kyambogo University Library is already in the process of setting up an institutional repository). It was exciting to hear the participants asking for sensitisation for the university management so they will deposit data into the institutional repository to increase transparency in the university.
  • Carry out sensitization workshops in Kyambogo University to encourage people to open up their research data

Participants in a group discussion

The second presentation was about open access resources by Mary Acanit, An assistant Librarian and Head of ICT Services in Kyambogo University Library. The presentation covered: the meaning of open access; open access resources available at Kyambogo University; comparison between open access resources and subscribed resources; how to access open access resources and; information searching techniques.

Ms. Mary Acanit delivering a presentation on open access resources

The presentation further looked at the benefits of open access and open access publishing models. In addition, participants went through a hands-on training on how to search for open access resources and each was asked to select any of the open access resource databases and download an article of choice on any topic. 

After the presentation, participants were given a tour of the library services. In the interest of time, participants were asked to visit a library service centre of their choice and were guided by librarians on duty.

There are four service centres and they are located in different parts of the university campus. Barclays Library is located in the East End of the Campus with subject strengths in humanities, social sciences and business and management. Barclays library mainly serves Faculty of Arts and Social Sciences, School of Management and Entrepreneurship and Faculty of Vocational Studies. West End library has subject strengths in Science, technology and Engineering and serves mainly faculty of Science and Faculty of Engineering. Faculty of Education Library is a faculty library with subject strengths on Education. Faculty of Special needs and Rehabilitation Library (FSN&R) is also a faculty library in the North end of the campus with subject strengths of its collection in special needs studies. Each of the service centres has a wireless internet connection to facilitate access to online library resources including open access resources.

Some learnings from our Open Data Day event

I am glad to be part of a community that organised the open data day event at my institution and added a voice to promoting access to research data.

I was overwhelmed by the support I received from my library. I shared the idea about open data day event with my colleagues and they were willing to offer a hand: making presentations, guiding participants during the library tour, identifying logistics, distributing invitations, etc. I learnt that we can make greater strides if we work as a team. My advice to people planning to organise similar events is that identify with people who are passionate about the same cause as you and start your local open data community.