Blogs and feeds of interest to the Code4Lib community, aggregated.
May 16, 2012
Here the steps I just took to install metaproxy (which requires yaz and yaz++) on Red Hat Enterprise Linux 6.2. The reason for this exercise is because Indexdata’s RPMs don’t work for 6.2 (the versions of boost-devel and icu-devel they require seem to only be available in 5.5). Since I expect Indexdata to eventually release 6.2 compatible RPMs, I installed all of this into /opt/local (so it’s easy to remove — of course, if you’re already using /opt/local, you might want to try somewhere else). Also, this assumes you’ll put a metaproxy.xml in /opt/local/etc/metaproxy/, so keep that in mind.
- yum install boost boost-devel icu icu-devel libxml2 libxml2-devel gnutls gnutls-devel libxslt libxslt-devel gcc-c++ libtool
- Install yaz:
- wget http://ftp.indexdata.dk/pub/yaz/yaz-4.2.33.tar.gz
- tar -zxvf yaz-4.2.33.tar.gz
- cd yaz-4.2.33
- ./configure –prefix=/opt/local
- make
- make install
- Install yaz++
- wget http://ftp.indexdata.dk/pub/yazpp/yazpp-1.3.0.tar.gz
- tar -zxvf yazpp-1.3.0.tar.gz
- cd yazpp-1.3.0
- ./configure –prefix=/opt/local/ –with-yaz=/opt/local/bin
- make
- make install
- Install metaproxy
- wget http://ftp.indexdata.dk/pub/metaproxy/metaproxy-1.3.36.tar.gz
- tar -zxvf metaproxy-1.3.36.tar.gz
- cd metaproxy-1.3.36
- ./configure –prefix=/opt/local –with-yazpp=/opt/local/bin/
- make
- make install
- cd /opt/local
- mkdir etc; mkdir etc/metaproxy; mkdir etc/sysconfig
- Copy this gist as /etc/rc.d/init.d/metaproxy
- chmod 744 /etc/rc.d/init.d/metaproxy
- Copy this gist as /opt/local/etc/sysconfig/metaproxy
- chkconfig –add /etc/rc.d/init.d/metaproxy
- /etc/init.d/metaproxy start
by Ross at May 16, 2012 08:08 PM
As a web developer, I cringe at deprecated code and try my best to keep up to date, which right now means familiarizing myself with HTML5 and CSS3. In reflecting on how best to update our website, I realized that with a CMS, naturally some things are out of my control.
Giving Up Control & Relying on Developers
Whether it’s the core or plugins, users of a CMS are reliant on its developers to keep things up to date. Is that lost of control worth the benefits? Generally, I would say yes, but that doesn’t stop me from wishing that the technology that we use to adopt new specifications.
WordPress & HTML5
Image Tags & Properties
I think it’s interesting that in HTML5 there is now the figure and figcaption elements. If they are taken advantage of, I think it definitely helps to parse information in a webpage and to identify text that is directly related to images.
One thing that does bother me about WordPress (which actually has noting to do with HTML5) is that it forces users to have a title, and leaves alt text blank by default. I don’t know what the best solution may be, but I would propose to insert the title text into the alt text by default and then allowing the user to change it. If they want to leave it blank, then there should be a checkbox to mark it “intentionally left blank” or something. Perhaps this could be an admin option, but I would definitely want something like that since I would really like to force our users to have alt text, but I don’t want to touch the WP core obviously.
Text Formatting Tags
It’s a bit of a minor thing and while some may argue the usefulness of the different semantic tags, users of the rich text editor would have no notion that they’re using <strong> instead of <b> or <em> instead of <i>. While I admit that even I struggle on the appropriate use of each (I have to look it up every time I think about it), if we want to see widespread adoption, then we need to get users to think about their writing and what they intend to do when using any of strong, em, b, i.
Tables
While we avoid tables and it should never be used for layouts, users will still want to insert tables to display data without resorting to an image. I’ve always wondered that WordPress doesn’t have a table insertion button even under the kitchen sink. What worries me is that then users who have a basic knowledge of HTML will insert it themselves using the HTML view with improperly formed code.
Layout & Forms
You might wonder why I’d lump the two, and that’s because, other than (using the default) comment form, both of these are dependent on a WordPress setup.
Forms will generally depend on the plugin. Similarly, whether the layout is in HTML5 is very dependent on the theme, along with many elements of accessibility.
Unfortunately, while HTML5 themes are relatively easy to find, most form plugins do not tell you whether they are using HTML5 or how much of it.
Why Not Adopt HTML5
I do realize that while there are a number of advantages to HTML5, especially in terms of structure, it’s still in development. Working in an educational institution, it’s also more work and sometimes difficult in some cases to ensure backwards compatibility.
In particular, screen readers do not necessarily support all the new HTML5 elements and will frequently ignore whole chunks of text or have difficulty with reading links, etc. Even the newest versions of screen readers do not necessarily recognize elements and properties designed to make webpages easier for screen readers to interpret.
I would like to think that since WordPress talks about trying to be accessible that anything in the WordPress core will be updated once there is widespread adoption not only among browsers, but also screen readers. Obviously, adoption will take time though. For example, many form input types have been adopted by most browsers, but has not been adopted by IE at all (will be in IE10).
One can only hope that adoption will pick up once various part of the HTML5 specifications are ‘cemented.’
Filed under:
Web design
by Cynthia at May 16, 2012 06:50 PM
New vacancy listings are posted weekly on Wednesday at approximately 11:00 a.m. Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.
Visit the
LITA Job Site for more available jobs and for information on submitting a job posting.
by vedmonds at May 16, 2012 04:56 PM
UBC announced today that they ARE NOT signing the Access Copyright agreement. This is a sensible decision and one that speaks to what every university in Canada should do. If you are a Canadian institution that has not yet signed on to the Access Copyright/AUCC "license" pleae forward this to your decison-makers and encourage the same approach. I love the statement encapsulating why:
We believe we are taking the bolder, more principled and sustainable option, which best serves the fundamental and long-term interests of our academic community.
Finally. Leadership that stands up for the fundamental rights and freedoms of learners.
by mleggott at May 16, 2012 12:15 AM
May 15, 2012
We have a great panel this year for Top Tech Trends at the ALA Annual Conference in Anaheim! The panelists will describe changes and advances in technology that they see having an impact on the library world, and suggest what libraries might do to take advantage of these trends. Presentation of LITA Awards and Scholarships will take place prior to the Top Tech Trends program.
When: Sunday, June 24, 2012 – 1:30pm – 3:30pm
Where: Anaheim Convention Center, Ballroom A
The panelists:
- Stephen Abram, Gale Cengage Learning
- Lorcan Dempsey, OCLC
- Meredith Farkas, Portland State University Library
- Clifford Lynch, CNI
- Nina McHale, Arapahoe Library District, Colorado
by mprentice at May 15, 2012 08:47 PM
The May 2012 edition of the Evergreen newsletter focuses on the April International Conference in Indianapolis, Indiana.
You can read the full text of the newsletter by visiting the following Evergreen wiki page.
To submit your own entries for the June newsletter, you can email Amy Terlaga at terlaga@biblio.org.
by Amy Terlaga at May 15, 2012 06:29 PM
Last week saw a big (well big for library data nerds) announcement from OCLC that they are making the data for the Virtual International Authority File (VIAF) available for download under the terms of the Open Data Commons Attribution (ODC-BY) license. If you’re not already familiar with VIAF here’s a brief description from OCLC Research:
Most large libraries maintain lists of names for people, corporations, conferences, and geographic places, as well as lists to control works and other entities. These lists, or authority files, have been developed and maintained in distinctive ways by individual library communities around the world. The differences in how to approach this work become evident as library data from many communities is combined in shared catalogs such as OCLC’s WorldCat.
VIAF’s goal is to make library authority files less expensive to maintain and more generally useful to the library domain and beyond. To achieve this, VIAF seeks to include authoritative names from many libraries into a global service that is available via the Web. By linking disparate names for the same person or organization, VIAF provides a convenient means for a wider community of libraries and other agencies to repurpose bibliographic data produced by libraries serving different language communities
More specifically, the VIAF service: links national and regional-level authority records, creating clusters of related records and expands the concept of universal bibliographic control by:
- allowing national and regional variations in authorized form to coexist
- supporting needs for variations in preferred language, script and spelling
- playing a role in the emerging Semantic Web
If you went and looked at the OCLC Research page you’ll notice that last month the VIAF project moved to OCLC. This is evidence of a growing commitment on OCLC’s part to make VIAF part of the library information landscape. It currently includes data about people, places and organizations from 22 different national libraries and other organizations.
Already there has been some great writing about what the release of VIAF data means for the cultural heritage sector. In particular Thom Hickey’s Outgoing is a trove of information about the project, which provides a behind-the-scense look at the various services it offers.
Rather than paraphrase what others have said already I thought I would download some of the data and report on what it looks like. Specifically I’m interested in the RDF data (as opposed to the custom XML, and MARC variants) since I believe it to have the most explicit structure and relations. The shared semantics in the RDF vocabularies that are used also make it the most interesting from a Linked Data perspective.
Diving In
The primary data structure of interest in the data dumps that OCLC has made available is what they call the cluster. A cluster is essentially a hub-and-spoke model with a resource for the person, place or organization in the middle that is attached via the spokes to conceptual resources at the participating VIAF institutions. As an example here is an illustration of the VIAF cluster for the Canadian archivist Hugh Taylor

Here you can see a FOAF Person resource (yellow) in the middle that is linked to from SKOS Concepts (blue) for Bibliothèque nationale de France, The Libraries and Archives of Canada, Deutschen Nationalbibliothek, BIBSYS (Norway) and the Library of Congress. Each of the SKOS Concepts have their own preferred label, which you can see varies across institution. This high level view obscures quite a bit of data, which is probably best viewed in Turtle if you want to see it:
<http://viaf.org/viaf/14894854>
rdaGr2:dateOfBirth "1920-01-22" ;
rdaGr2:dateOfDeath "2005-09-11" ;
a rdaEnt:Person, foaf:Person ;
owl:sameAs <http://d-nb.info/gnd/109337093> ;
foaf:name "Taylor, Hugh A.", "Taylor, Hugh A. (Hugh Alexander), 1920-", "Taylor, Hugh Alexander 1920-2005" .
<http://viaf.org/viaf/sourceID/BIBSYS%7Cx90575046#skos:Concept>
a skos:Concept ;
skos:inScheme <http://viaf.org/authorityScheme/BIBSYS> ;
skos:prefLabel "Taylor, Hugh A." ;
foaf:focus <http://viaf.org/viaf/14894854> .
<http://viaf.org/viaf/sourceID/BNF%7C12688277#skos:Concept>
a skos:Concept ;
skos:inScheme <http://viaf.org/authorityScheme/BNF> ;
skos:prefLabel "Taylor, Hugh Alexander 1920-2005" ;
foaf:focus <http://viaf.org/viaf/14894854> .
<http://viaf.org/viaf/sourceID/DNB%7C109337093#skos:Concept>
a skos:Concept ;
skos:inScheme <http://viaf.org/authorityScheme/DNB> ;
skos:prefLabel "Taylor, Hugh A." ;
foaf:focus <http://viaf.org/viaf/14894854> .
<http://viaf.org/viaf/sourceID/LAC%7C0013G3497#skos:Concept>
a skos:Concept ;
skos:inScheme <http://viaf.org/authorityScheme/LAC> ;
skos:prefLabel "Taylor, Hugh A. (Hugh Alexander), 1920-" ;
foaf:focus <http://viaf.org/viaf/14894854> .
<http://viaf.org/viaf/sourceID/LC%7Cn++82148845#skos:Concept>
a skos:Concept ;
skos:exactMatch <http://id.loc.gov/authorities/names/n82148845> ;
skos:inScheme <http://viaf.org/authorityScheme/LC> ;
skos:prefLabel "Taylor, Hugh A." ;
foaf:focus <http://viaf.org/viaf/14894854> .
The Numbers
The RDF Cluster Dataset http://viaf.org/viaf/data/viaf-20120422-clusters.xml.gz is 2.1G gzip compressed RDF data. Rather than it being one complete RDF/XML file, each line has a complete RDF/XML document on it, which represents a single cluster. All in all there are 20,379,541 clusters in the file.
I quickly hacked together a rdflib filter that reads the uncompressed line-oriented RDF/XML and writes the RDF as ntriples:
import sys
import rdflib
for line in sys.stdin:
g = rdflib.Graph()
g.parse(data=line)
print g.serialize(format='nt').encode('utf-8'),
This took 4 days to run on my (admittedly old) laptop. If you are interested in seeing the ntriples let me know and I can see about making it available somewhere. It is 2.8G gzip compressed. An ntriples dump might be a useful version of the RDF data for OCLC to make available, since it would be easier to load into triplestores, and otherwise muck around with (more on that below) than the line oriented RDF/XML. I don’t know much about the backend that drives VIAF (has anyone seen it written up?)…but I would understand if someone said it was too expensive to generate, and was intentionally left as an exercise for the downloader.
Given its line-oriented nature, ntriples is very handy for doing analysis from the Unix command line with cut, sort, uniq, etc. From the ntriples file I learned that the VIAF RDF dump is made up of 377,194,224 assertions or RDF triples. Here’s the breakdown on the types of resources present in the data:
| Resource Type |
Number of Resources |
| skos:Concept |
26,745,286 |
| foaf:Document |
20,379,541 |
| foaf:Person |
15,043,112 |
| rda:Person |
15,043,112 |
| foaf:Organization |
3,722,318 |
| foaf:CorporateBody |
3,722,318 |
| dbpedia:Place |
195,472 |
Here’s a breakdown of predicates (RDF properties) that are used:
| RDF Property |
Number of Assertions |
| rdf:type |
84,851,159 |
| foaf:focus |
45,510,716 |
| foaf:name |
44,729,247 |
| rdfs:comment |
41,253,178 |
| owl:sameAs |
32,741,138 |
| skos:prefLabel |
26,745,286 |
| skos:inScheme |
26,745,286 |
| foaf:primaryTopic |
20,379,541 |
| void:inDataset |
20,379,541 |
| skos:altLabel |
16,702,081 |
| skos:exactMatch |
8,487,197 |
| rda:dateOfBirth |
5,215,150 |
| rda:dateOfDeath |
1,364,355 |
| owl:differentFrom |
1,045,172 |
| rdfs:seeAlso |
1,045,172 |
I’m expecting these statistics to be useful in helping target some future work I want to do with the VIAF RDF dataset (to explore what an idiomatic JSON representation for the dataset would be, shhh). In addition to the RDF, OCLC also makes a dump of link data available. It is a smaller file (239M gzip compressed) of tab delimited data, which looks like:
...
http://viaf.org/viaf/10014828 SELIBR:219751
http://viaf.org/viaf/10014828 SUDOC:052584895
http://viaf.org/viaf/10014828 NKC:xx0015094
http://viaf.org/viaf/10014828 BIBSYS:x98003783
http://viaf.org/viaf/10014828 LC:24893
http://viaf.org/viaf/10014828 NUKAT:vtls000425208
http://viaf.org/viaf/10014828 BNE:XX917469
http://viaf.org/viaf/10014828 DNB:121888096
http://viaf.org/viaf/10014828 BNF:http://catalogue.bnf.fr/ark:/12148/cb13566121c
http://viaf.org/viaf/10014828 http://en.wikipedia.org/wiki/Liza_Marklund
...
There are 27,046,631 links in total. With a little more Unix commandline-fu I was able to get some stats on the number of links by institution:
| Institution |
Number of Links |
| LC NACO (United States) |
8,325,352 |
| Deutschen Nationalbibliothek (Germany) |
7,732,546 |
| SUDOC (France) |
2,031,452 |
| BIBSYS (Norway) |
1,822,681 |
| Bibliothèque nationale de France |
1,643,068 |
| National Library of Australia |
977,141 |
| NUKAT Center (Poland) |
894,981 |
| Libraries and Archives of Canada |
674,088 |
| National Library of the Czech Republic |
598,848 |
| Biblioteca Nacional de España |
519,511 |
| National Library of Israel |
327,455 |
| Biblioteca Nacional de Portugal |
321,064 |
| English Wikipedia |
301,345 |
| Vatican Library |
247,574 |
| Getty Union List of Artist Names |
202,711 |
| National Library of Sweden |
161,845 |
| RERO (Switzerland) |
119,366 |
| Istituto Centrale per il Catalogo Unico (Italy) |
45,208 |
| Swiss National Library |
33,866 |
| National Széchényi Library (Hungary) |
33,727 |
| Bibliotheca Alexandrina (Egypt) |
26,877 |
| Flemish Public Libraries |
4,819 |
| Russian State Library |
997 |
| Extended VIAF Authority |
109 |
The 301,345 links to Wikipedia are really great to see. It might be a fun project to see how many of these links are actually present in Wikipedia, and if they can be automatically added with a bot if they are missing. I think it’s useful to have the HTTP identifier in the link dump file, as is the case for the BNF identifiers. I’m not sure why the DNB, Sweden, and LC URLs aren’t expressed URLs as well.
One other parting observation (I’m sure I’ll blog more about this) is that it would be nice if more of the data that you see in the HTML presentation were available in the RDF dumps. Specifically, it would be useful to have the Wikipedia links expressed in the RDF data, as well as linked works (uniform titles).
Anyway, a big thanks to OCLC for making the VIAF dataset available! It really feels like a major sea change in the cultural heritage data ecosystem.
by ed at May 15, 2012 05:57 PM
LITA is pleased to announce the availability of a new webinar, “Social Networking the Catalog: Community Based Approaches to Building Catalogs and Collections,” presented by Margaret Heller (Dominican University) and held June 7, 11:00 am – Noon CDT. This presentation will introduce the Read/Write Library Chicago, a new model for libraries that exists to illuminate and create connections between people, materials, and institutions in the city of Chicago. Participants will learn about new trends and features in social reading and cataloging, social library catalogs and integrated library systems, and crowdsourcing platforms. The catalog builds a social network, and the social network in turn builds the catalog. We love the read/write internet. It’s time to create the read/write library, where everyone has a voice in the collections, catalog, programming, and mission. This model is extensible to more conventional libraries through a multitude of tools and approaches that are already widely used.
Additional information and registration are available at http://www.ala.org/lita/learning/online/socialcatalog
by mprentice at May 15, 2012 05:54 PM
Hello everyone,
Evergreen 2.2 rc1 was just released today, 15 May 2012. This is the
release candidate. The Evergreen community hopes that Evergreen 2.2.0
will follow in just about two weeks, depending as always on feedback from
those who contribute their feedback after testing.
This release includes various bug fixes, please see the full list of
changes.
The 2.2 series includes many new features over the 2.1 series, including
the Template Toolkit OPAC (TPAC) and too many others to count.
Please report any new bugs on Launchpad.
I would like to particularly thank Thomas Berezansky, Ben Shum, Jason
Stephenson and Dan Scott for assisting in innumerable ways with the
mechanics of publishing this release candidate. I am surely neglecting a
couple of other folks whose help was invaluable, but at least they have
their karma.
Thanks everyone!
by Lebbeous Fogle-Weekley at May 15, 2012 03:10 PM
Editorial by Laurence Lannom, CNRI
May 15, 2012 01:18 PM
Article by Natasha Simons, Griffith University, Australia
May 15, 2012 01:18 PM
Article by R. Niccole Westbrook and Dan Johnson, University of Houston Libraries; Karen Carter, Rutgers School of Communication and Information; Angela Lockwood, Texas Women's University School of Library and Information Studies
May 15, 2012 01:18 PM
Article by Christopher A. Lee, Alexandra Chassanoff, and Kam Woods, University of North Carolina, Chapel Hill; Matthew Kirschenbaum and Porter Olsen, University of Maryland
May 15, 2012 01:18 PM
Article by Andras Holl, Konkoly Observatory, Budapest, Hungary
May 15, 2012 01:18 PM

For those looking for yet another reason to join us for OKFestival in Helsinki this September, the OKFestival Core Organising Team is proud to announce the inspiring public outcomes of our unconventional First Call for Proposals – and to request your participation for our Second Call to share your ideas in Finland.
As we’ve noted previously, because OKFestival is the first event of its kind, combining Open Knowledge Conference and Open Government Data Camp together for a week-long celebration of action and collaboration, we decided to take a risk by opening up over 2/3 of the week’s programme to you as festival participants.
So last month, we released the First Call for Proposals, crossing our fingers expectantly as we did it. A few of us on the Core Organising Team (photo) were, admittedly, a tad worried – would global communities rise to the challenge? Or would we be left alone in cyberspace without even a programme to our name? We presented the festival to audiences at FREE CITY in Tallinn, at Re:Publica in Berlin and to local stakeholders in Finland. And we waited in anticipation.
In the end, we didn’t have to worry at all. The response to our First Call for Proposals was both overwhelming and encouraging. Open knowledge and data enthusiasts around the world did take the reins – and now, a month later, we have a groundbreaking, action-focused programme planned in co-operation with citizen teams of Guest Programme Planners all over the world. For a summary of the Open Knowledge Festival planning process in 14 slides, see our first Slideshare presentation here.
As you'll see above, the First Call for Proposals allowed the Core Organising Team to determine the most important themes and salient ideas, the subjects of which are highlighted through our 13 guest-organised Topic Streams of 2012:
- Open Democracy and Citizen Movements
- Open Government Data
- Open Cities
- Open Design, Hardware & Manufacturing
- Open Cultural Heritage
- Open Development
- Open Research and Education
- Open Geodata
- Open Source Software
- Data Journalism and Data Visualization
- Gender / Diversity in Openness
- Open Business and Corporate Data
- Open Knowledge and Sustainability
The breadth of these topics is quite diverse - indeed, the variance is somewhat unprecedented for an event of this kind. Going through the topics above and learning more about how their Guest Programme Planners are determining the programming on the Public Planning Wiki, it's hard not to feel a sense of excitement about what's to come.
For the Second (and last!) Call for Proposals, we encourage ideas that further enrich each of these themes with new perspectives. We want your lightning talks, lectures, panel discussions, workshops, hackathons and all things in between. Let's fill Helsinki's streets with innovative new ideas, new collaborations between civil society and government, and new projects that provoke openness in unexpected ways.
It is our hope that together, these themes will illustrate the importance of diverse understandings within open knowledge and open data communities - and we look forward to seeing even more of you get involved in this inspiring process.
The Second Call for Proposals is here. Deadline for submission is June 1st - go to okfestival.org for details. And feel free to mix and remix the Slideshare presentation above for your own uses - it's meant to be shared!

by Kat Braybrooke at May 15, 2012 12:05 PM
May 14, 2012
I started blogging a little over three years ago. I found that it was a great way to organize my thoughts and it gave me an excuse to talk to people and ask questions about things that interested me. It became an extended conversation with so many readers about the future of libraries and the role of books and readers in our changing society. Also polarons, faster-than-light neutrinos, and log-normal distributions.
But I'm not the type to just write about things. We live in a time where it's easier than ever before for small groups of people to build new things, and if you've been reading the blog, you've had a front row seat to watch the development of such a thing. You've heard the story of sculptors who chip away stone to free the figures trapped inside the rock, or the novelist whose characters struggle to tell
their stories. For me,
Unglue.it is like that, it's something formed from the raw material of ideas from many people. It just wants to exist.
If you've not been paying attention, Unglue.it is an effort to crowd-fund creative commons ebooks. If you can find a way to cover the fixed costs, you can make the ebooks free to everyone, everywhere. Libraries, who can make possible the effective distribution of these ebooks, are tired of being shut out of popular ebook lending and need new ways forward.
One really exciting thing is that it's not just us. There's starting to be a
Movement. Making books more available and more useful to everyone, everywhere is a huge undertaking, and there are a variety of efforts nucleating to address many different bits of the problem. Last week, I got together with Francis Pinter, whose "
Knowledge Unlatched" effort could revolutionize scholarly monograph publishing. In April, I got together with Ash Kalb, who's bringing vintage science fiction books back to life at
Singularity and Co. I've written here about
DPLA,
Internet Archive,
Hathitrust,
Library Renewal,
Project Gutenberg and more. We're all on the same team.
This morning, we started the last testing of the Unglue.it machinery before launch. We're using real money. I'm
offering to "unglue" an ebook comprised of five blog posts I wrote last year on Open Access eBooks. The campaign will end tomorrow no matter what, and we'll verify that we can collect money through Amazon Payments. (See the
Unglue.it blog for the payment processor saga.)
If you want, you can help us test the site. You can
enter a pledge (remember, it's real money!) and request premiums. Whether you pledge or not, you'll end up with a real ebook with a
CC BY-SA license. You can make derivatives, add content, make translations, experiment. (But you might need to wait a week or two to get it). We'll use any cash we take in to cover some expenses (like the block of ISBNs that we bought. My lawyer says we can't offer premiums that include alcohol, but she didn't say I couldn't let people hit me up for a beer.
Already we've received a bunch of really great bug reports and suggestions. It turns out that if you want to pledge $100 billion billion, for example, the website isn't going to let you, and it won't give you a sensible error message.
We start "real" campaigns at noon (EDT) on Thursday (fingers crossed). Our launch line-up will have 5 campaigns. Until then we're frantically busy making sure everything is working as well as possible.
See you on the other side.
by Eric (noreply@blogger.com) at May 14, 2012 07:29 PM
The Koha News Tool allows librarians to post news items to the OPAC and the staff client. In Koha 3.4 the news tool can also be used to add news items to your circulation receipts or slips. This tutorial will walk you through the simple steps of adding and viewing news items.
If you have an idea for a video, please just let me know and I’ll add it to my list of things to record.
Related posts:
- Koha Offline Circulation Tool
- So much Koha news today
- Adding a Child Patron in Koha 3.2
by Nicole at May 14, 2012 03:00 PM
Using OAI-PMH to populate the “Catholic Portal” is not straight-forward, and this posting outlines some of my investigations in this regard.
Introduction
As you may or may not know, OAI-PMH is a “standard” protocol designed for harvesting metadata. It only understands six commands (or in OAI-PMH parlance, “verbs”). These commands are sent to remote computers in the form of URLs, and the remote computer is expected to respond in the form of specifically shaped XML streams. These commands include:
- Identify – Lists who manages the repository and what type of content it contains.
- ListMetadataFormats – Lists the various metadata schemes used to describe the repository’s content. At least one of these schemes must be Dublin Core.
- ListSets – Specifies how the repository’s content is subdivided. There can be zero or more of these subdivisions.
- ListIdentifiers – Returns a list of keys pointing to specific records in the repository.
- ListRecords – An enhanced version of ListIdentifiers, this verb downloads whole records, not just identifiers.
- GetRecord – Given a specific identifier, this verb retrieves a single record.
Through a conversation of these verbs and the returned XML streams, metadata between computers can be exchanged. It is then up to the computer doing the harvesting to implement some sort of cool and interesting service with the harvested content. Here at Catholic Portal Central we want to index the metadata and provide immediate access to remote digitized content.
Investigations
At least three Catholic Research Resources Alliance (CRRA) members have OAI-PMH repositories: Duquesne University, Boston College, and Loyola University Chicago. Using a little Perl script, I most recently investigated the content of the repositories of Boston College and Loyola University Chicago. Through this process I learned what metadata formats they supported, what sets were used to subdivided their collections, and output Dublin Core metadata from a few selected sets.
The harvested Dublin Core metadata was typical of OAI-PMH repositories: thin, a bit ambiguous, and somewhat inconsistant across repositories. It was thin because many of the Dublin Core elements are left unpopulated. It is ambiguous because many of the fields are repeated, and the values of repeated elements are of different types. For example, a description field may be empty, contain an abstract of the work, the full text of the work, or the process used to digitize the material. It is inconsistant because things like dates, names, and subject entries are formatted differently. In some places names are listed in first name/last name order. Other times it is last name/first name order. Dates can be anything from “February 12, 2012″ to “2012-02-12″ to “Twelfth Century”. None of this is new the world of OAI-PMH. It is typical.
All is not lost. There are patterns to this apparent randomness. Using my script I can sometimes output titles, descriptions, subject headings, and URLs of digitized objects. For example, here is such a list from the Loyola University Chicago repository:
item: 46
key: oai:content.library.luc.edu:coll6/45
title(s): Letter to the Secretary of the Literary Agency of London, 1908
title(s): Catholic Women Poets
identifier(s): cudahy219e3
identifier(s): 003_kayesmith_1908;pg3.jpg
identifier(s): http://content.library.luc.edu/u?/coll6,45
subject(s): Shelia Kaye-Smith; poets; women poets; Catholic poets
subject(s): Local
description(s): third page of letter requesting appointment
description(s): does not suit you any other time up to 4 15 will do Would you kindly send a reply to me c o Miss F E Walters Girton College Cambridge With apologies for troubling you believe me Yours faithfully Sheila Kaye Smith
description(s): Master file scanned at 600 dpi RGB in reflective mode from original document using MicroTek ScanMaker 1000XL
description(s): http://www.luc.edu.archives
type: image
From this output it becomes apparent that the first title is the title of the artifact, the third identifier is the URL of the digitized object, the first subject field is a delimited list of keywords, the first description is a sort of abstract, and the type field contains a value denoting what kind of digitized thing is in question. Thus, the output follows a pattern, and computers are very good at patterns, therefore a computer program could easily be written to read this particular OAI-PMH output and stored in the Portal’s index.
Next steps
My next steps are two-fold. First, I will harvest and index some of the metadata from selected Loyola University Chicago OAI-PMH sets. Second, I will let colleagues from various CRRA committees (specifically the Digital Access Committee as well as the Collection Committee) peruse the results. In the end I hope to get feedback on how to proceed. Should I index more content? Less? None? If more, then how should records be displayed, and exactly how ought the Dublin Core metadata be mapped to VuFind’s underlying Solr index fields?
All of this work is entirely feasible. At the same time it is not enormously scalable. Hand-crafting the parsing of OAI-PMH output, and handcrafting how it all gets mapped to Solr’s index is time consuming and fragile. The Portal Home Planet can easily do this work for no more than a dozen different repositories, but after that some other means of production will need to be examined.
by Catholic Portal at May 14, 2012 02:59 PM
Call me crazy, but I think the secret life of checkout slips is fascinating.
Some moms use their foot-long slips filled with children’s books as a master list, crossing off items as they’re returned. One regular patron I knew kept every checkout slip she ever received. Upon returning items, she’d ask us to cross off the titles on the original slip and initial it. This behavior was the result of a typical “I returned that”/“Not according to our computers” interaction.
And, of course, countless slips are used as bookmarks or refrigerator-mounted notices or simply left in dust jackets for weeks. However small, these slips are touchpoints—ways that people interact with us—and collectively we’re pumping out thousands of these things daily.
Likewise, in some small way, we’re representing ourselves through these little scraps of paper. Yet, most of us are churning out slips that could be easier on the eye and more helpful to our users.
This isn’t something to keep you up at night, but it’s still worth thinking about, because details matter. All of these little touchpoints add up to create people’s experience of our libraries. And dispensing ugly checkout receipts illustrates that we haven’t spent enough time sweating these details. Even worse, this inattention is at the root of complaints about hard-to-use websites and repeated questions about where the restrooms are.
What is a good checkout slip?
To answer this we have to know what a checkout slip is supposed to do. As I see it, there are a few core functions:
- remind people when items are due (patron need)
- remind people what items they have checked out (patron need)
- facilitate the return of materials (library need)
Beyond that, some slips have secondary functions:
- facilitate renewing items (patron need)
- promote library events (library need)
- broadcast policy changes (library need)
- alert people to holiday hour changes (patron need)
With these factors in mind, we can now think of some other factors surrounding the design of an ideal checkout slip:
- They should respect people’s privacy.
- They should include the library’s name and branding.
- They should be easy to read. This includes obeying graphic design basics as well as not cutting off item titles, etc.
- Ideally, they’d show some personality and/or be friendly.
- Item types could be helpful to patrons trying to locate a misplaced item.
Checking out the fun
After compiling the functions’ lists, I started to think about whether there was a way to make checkout slips more fun, or whether that was a terrible impulse. More seriously, I considered what would be the minimum amount of information required to make an item easily identifiable and other basic considerations, such as why there is a due date listed for each item when most items share a due date with others. With all of these things in mind, I took a crack at designing a checkout slip.
There’s nothing very different about this design, but I reckon it is a bit easier to use when hanging on a refrigerator than the current crop. Aside from sensible typography, the only thing notable is that items are grouped by due date rather than listing a due date for each item.

I really like the idea of a checkout slip that includes an extra bit that’s specifically meant to be displayed on a refrigerator or corkboard, though such a design could add about three or more inches of length per due date. It might be cumbersome, but consider how much better this communicates your library’s philosophy.
Just remember: the details matter, especially when these checkout slips are the most visible output of your library that most users will see.
This first appeared “The User Experience,” a column I write for LJ.
by Aaron Schmidt at May 14, 2012 12:35 PM
Dan Olds at The Register comments on an interview with co-director of the Wharton School Customer Analytics Initiative Dr. Peter Fader:
Dr Fader ... coins the terms "data fetish" and "data fetishist" to describe the belief that people and organisations need to capture and hold on to every scrap of data, just in case it might be important down the road. (I recently completed a Big Data survey in which a large proportion of respondents said they intend to keep their data “forever”. Great news for the tech industry, for sure.)
The
full interview is worth reading, but I want to focus on one comment, which is similar to things I hear all the time:
But a Big Data zealot might say, "Save it all—you never know when it might come in handy for a future data-mining expedition."
Follow me below the fold for some thoughts on data hoarding.
Clearly, the value that could be extracted from the data in the future is non-zero, but even the Big Data zealot believes it is probably small. The reason the Big Data zealot gets away with saying things like this is because he and his audience believe that this small value outweighs the cost of keeping the data indefinitely. They believe that because they believe
Kryder's Law will continue.
Lets imagine that everyone thought that way, and decided to keep everything forever. The natural place to put it would be in S3. According to IDC, in 2011 the
world stored 1.8 Zettabytes (billion TB) of data. If we decided to keep it all for the long term in the cloud, we would be effectively endowing it. How big would the endowment be? Applying our model, starting with S3's current highest-volume price of $0.055/GB/mo and assuming that price continues to drop at the 10%/yr historic rate for S3's largest tier, we need an endowment of about $6.3K/TB. So the net present value of the cost of keeping all the world's 2011 data in S3 would be about $11.4 trillion. The
2011 Gross World Product (GWP) at purchasing price parity is almost $80 trillion. So keeping 2011's data would consume 14% of 2011's GWP. The world would be writing S3 a check each month of the first year for almost $100 billion, unless the world got a volume discount.
IDC estimates that 2011's data was 50% larger than 2010's; I believe their figure for the long-run annual growth of data is 57%/yr. Even if it is only 50%, compare that with even the most optimistic Kryder's Law projections of around 30%. But we're using S3, and a 10% rate of cost decrease. So 2012's endowment will be (50-10)=40% bigger than 2011, and so on into the future. The World Bank estimates that in 2010 GWP grew 5.1%. Assuming this growth continues, endowing 2012's data will consume 19% of GWP. On these trends, endowing 2018's data will consume more than the entire GWP for the year.
So, we're going to have to throw stuff away. Even if we believe keeping stuff is really cheap, its still too expensive. The bad news is that deciding what to keep and what to throw away isn't free either. Ignoring the problem incurs the costs of keeping the data; dealing with the problem incurs the costs of deciding what to throw away. We may be in the bad situation of being unable to afford either to keep or to throw away the data we generate. Perhaps we should think more carefully before generating it in the first place. Of course, thought of that kind isn't free either ...
by David. (noreply@blogger.com) at May 14, 2012 10:00 AM
RDFa and HTML5 microdata, are, I think, basically interchangeable.
RDF and microdata both use the same fundamental triple data model. Please note that schema.org is just a specific set of vocabularies that can be used with HTML5 microdata, HTML5 microdata goes beyond this. schema.org is a pretty good microdata tutorial though, if you remember you don’t have to use it’s vocabularies. Here’s the actual microdata spec. Here’s a good microdata tutorial that pre-dates schema.org and is not schema.org-specific.
You can take pretty much anything that’s RDF, from any vocabularies, and use an RDFa style approach to express (basically) the same semantics is in HTML5 microdata instead.
This is a good thing for RDF, because there’s no good way to do RDFa in HTML (or anything but xHTML which is basically an abandoned approach — RDFa needs XML namespaces). You can go from (any) html5 microdata to RDF too — although there are a couple gaps I’ll discuss at the end.
First, let’s show how you’d do RDFa-style RDF semantics expressed in HTML5 microdata. Let’s take the complete example from the RDFa wikipedia article, as it’s small but makes us actually use a pretty complete complement of microdata features. There are in fact a couple weird details I’m not sure about.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
"http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
version="XHTML+RDFa 1.0" xml:lang="en">
<head>
<title>John's Home Page</title>
<base href="http://example.org/john-d/" />
<meta property="dc:creator" content="Jonathan Doe" />
<link rel="foaf:primaryTopic" href="http://example.org/john-d/#me" />
</head>
<body about="http://example.org/john-d/#me">
<h1>John's Home Page</h1>
<p>My name is <span property="foaf:nick">John D</span> and I like
<a href="http://www.neubauten.org/" rel="foaf:interest"
xml:lang="de">Einstürzende Neubauten</a>.
</p>
<p>
My <span rel="foaf:interest" resource="urn:ISBN:0752820907">favorite
book is the inspiring <span about="urn:ISBN:0752820907"><cite
property="dc:title">Weaving the Web</cite> by
<span property="dc:creator">Tim Berners-Lee</span></span>
</span>
</p>
</body>
</html>
Here’s the same thing, using the same vocabularies, with HTML5 microdata. (yes, contrary to some belief, you can mix and match more than one vocabulary in microdata too, although you’ve got to spell out the complete URI for all but one in any given scope.
<html lang="en">
<head>
<title>John's Home Page</title>
<base href="http://example.org/john-d/" />
<link rel="http://xmlns.com/foaf/0.1/primaryTopic" href="http://example.org/john-d/#me" />
</head>
<body itemscope itemtype="http://purl.org/dc/elements/1.1/" itemid="http://example.org/john-d/#me">
<h1>John's Home Page</h1>
<p>My name is <span itemprop="http://xmlns.com/foaf/0.1/nick">John D</span> and I like
<a href="http://www.neubauten.org/" itemprop="http://xmlns.com/foaf/0.1/interest"
lang="de">Einstürzende Neubauten</a>.
</p>
<p>
<span itemscope itemtype="http://purl.org/dc/elements/1.1/" itemprop="http://xmlns.com/foaf/0.1/interest" itemid="urn:ISBN:0752820907 ">
My favorite
book is the inspiring <cite
itemprop="title">Weaving the Web</cite> by
<span itemprop="creator">Tim Berners-Lee</span>
</span>
</p>
</body>
</html>
Mismatches and missing semantics
While the fundamental approach is compatible, there are a few mismatches and semantics lost or less clear in html5 microdata. Here are some I feel like noting, there may be others.
- I’m not sure I did the right thing with the <link> in the <head> section — html5 has kind of an odd fork in it between maintaining more or less backwards compatibility with old-style <link> and <meta> (in <head>, using ‘rel’), and microdata style (in <body>, using ‘itemprop’). There are weird things with ‘rel’ only being allowed in certain places and ‘itemprop’ in others; you also are never supposed to have both ‘rel’ and ‘itemprop’. So anyway, I’m not sure what a proper way of expressing a relationship with the document as the subject is, in microdata, may have done something not right here.
- RDFa takes XML’s namespaces to express vocabularies. RDFa’s namespace+name is analagous to microdata’s itemtype+itemprop. But.
- In microdata, you can do something dear to RDF’s heart, and express the predicate URL as a literal absolute URL – which is what you have to do to mix namespaces/vocabularies, and that’s really just fine. You can also do the equivalent of a namespace (in an itemtype) and a non-URI bare name belonging to that namespace (in an itemprop), but you only get one namespace at a time like this.
- But also, RDFa, via XML, is quite clear that you concatenate a namespace and a bare name to get the complete URI. We used this same convention when putting our RDFa into microdata, which works because itemtype’s are always URIs too. – but it’s just a convention, microdata isn’t clear about that, and microdata examples often use itemtype URI examples that clearly weren’t intended like this. Like schema.org: itemtype=”http://schema.org/Book” + itemprop=”bookFormat” concatenated == “http://schema.org/BookbookFormat”. Um, that’s not quite sensible, not what anyone’s looking for… although it is a legal URI….
- microdata makes a lot ‘easier’ to use what the RDFistas call ‘blank nodes’ — nodes whose ‘subject’ lacks a specified URI. Idiomatic microdata actually generally has a bunch of those, including the top-level one(s). The microdata spec tries to tell you that you can only use an `itemid` for certain vocabularies that establish it’s use — ideally, I think this would be opened up, and even encouraged. The semantics should be made more clearly compatible with RDF — the itemid it is an identifier for the ‘itemscope’d thing, that is the ‘subject’ URI of any itemprop’s in that itemscope, that should be made clear.
- I personally think allowing idiomatic blank nodes is a good thing for microdata, making it more usable, letting people get started with the minimal semantics for their use cases, not making them spend time on metadata design/control they don’t need yet. Even if RDFistas disagree, I suggest they focus on making it easier to avoid blank nodes — more idiomatic, more encouraged by docs, more generally legal — and give up on making it hard or impossible to have blank nodes in html5 microdata.
Whither RDF/RDFa
(That’s “whither”, not “wither”. Hopefully).
There are probably other rough spots than the ones I’ve identified. And the one’s I mentioned include some tough ones (the itemtype+itemprop==URI issue).
But by and large, HTML5 microdata’s fundamental model is RDF compatible. Hopefully the RDFistas are focused on figuring out how to lessen the impedence mismatches, if neccesary by lobbying the html5 working group to make minimized interventions. Hopefully they’re not still stuck on an xhtml/rdfa/why-didn’t-they-do-things-our-way train, because that train isn’t leaving the station. Instead though, they can contribute to sanding off a few rough spots in microdata to make it quite capable of doing what they want (and, if they’re right, everyone else will eventually realize they want too). Work on tools to turn microdata to RDF, too, hopefully.
microdata could actually be the a great thing for RDF. If handled correctly, it should be possible to express full RDF semantics in microdata — microdata can be the RDF-in-HTML-markup standard that RDFa wanted to be. (microdata’s designers clearly knew about RDF/RDFa and were influenced by it). It’s also possible to leave a lot of semantics out when writing microdata — but often in ways you could do with RDF/RDFa too, lots of blank nodes, etc, RDF/RDFa just tries to make it inconvenient and non-idiomatic.
While the RDFistas may be rueing that microdata makes it so easy to not have completely specified triples with no blank nodes everywhere — I think the flip side of this is actually what will allow it to possibly get more uptake, and be an easy start on the road to RDF, if RDF plays it’s cards right. That because you have to think through the complete vocabularies and semantics less, you can get started with just the semantics you need, and not be forced to do more up front metadata design than you need for your identified use cases, or more than you can afford or have the skills to do. That, and some the immediate use cases in ‘Google will use it!’ of course. But if Google had tried to say they used RDFa (didn’t they once, maybe, sort of?), I don’t think it would have gone anywhere — RDFa is just too overwhelming.
Filed under:
General
by jrochkind at May 14, 2012 04:59 AM
May 13, 2012
In the past few years, I’ve received numerous emails from the Institute for Cultural Diplomacy announcing upcoming conferences and educational programs. The messages say, “If you do not wish to receive emails from the ICD in the future, please send us an email to info@culturaldiplomacy.org indicating this.” I have, six times, spread out over half a year. It didn’t work. Twice, I cc:ed Mark Donfried, the ICD’s “Director and Founder,” over whose name the emails are written. I never received a response from him, just more spam. I called the ICD’s office, in Germany, and asked to be removed from their list. The woman who answered the phone promised I would be. She lied: the email continued.
This is unethical behavior, inconsistent with the values the ICD supposedly represents. It’s disrespectful, dishonest, and disreputable. I doubt that any of my readers run in ICD circles, but if you do, please think hard about what it says about the ICD as an organization.
by James Grimmelmann (james@grimmelmann.net) at May 13, 2012 06:58 PM
On Friday, the long-awaited decision in the Georgia State e-reserves case (a.k.a. Cambridge University Press v. Becker) dropped. By way of context, the case is a challenge by three academic publishers (Oxford University Press, Cambridge University Press, and Sage Publications) against Georgia State University’s e-reserves policy. The publishers sued in April 2008, in a lawsuit funded by the Association of American Publishers and the Copyright Clearance Center, claiming that the e-reserves policy went far beyond the bounds of fair use. Georgia State, as a state university, invoked the doctrine of sovereign immunity, the practical implication of which is that the publishers can only obtain injunctions against future infringements, not damages for past infringements. Since it also tightened up its e-reserves policy in December 2008, it also successfully argued to the court that only the uses made under the new policy should be relevant to any potential injunction.
There was a trial a year ago, and then long silence from the court. Now we know why it was taking so long: the opinion is 350 pages. That number is a little misleading, in that over two thirds of the opinion are dedicated to a highly methodical copyright ownership, infringement, and fair use analysis of seventy-four separate claims of infringement, using standard templates and highly repetitive language. Having now dug through the details, I’d like to offer a few observations.
First, over a third of the claims didn’t even make it to the fair use stage at the heart of the case. In many cases, the publishers were unable to prove to the court’s satisfaction that they owned the copyright in the portions of the books that were copied and uploaded. Sometimes they couldn’t produce a timely registration certificate and there were proof problems with originality; sometimes they couldn’t find a work-made-for-hire agreement or copyright assignment from the authors of individual chapters in edited volumes. The court was unsympathetic: no documented chain of title, no lawsuit. There’s a looming e-rights mess, loosely akin to the robosigning mess around ownership of securitized mortgages: in both cases, the putative owners don’t have all their papers in order. This opinion either recognizes or contributes to the mess, depending on your point of view.
Other claims dropped out before the fair use stage because they were uploaded to the e-reserves system but never downloaded by students. The court dismisses these from the lawsuit as de minimis, explaining that these uses by the University, while technical implicating the copyright owners’ exclusive rights, don’t affect the incentives for authors to create. This puts more teeth in the de minimis doctrine in copyright: it goes beyond the view that de minimis means “not substantially similar.” It also strengthens the argument that “internal use” copies never used to reach an to an audience that reads them for their content don’t infringe. Think, for example, of the HathiTrust’s archive of scans from Google Books.
(As an aside, the e-reserve logfiles played a key evidentiary role in the case. Specific users were never identified, but if a file had a total hit count of two, it’s unlikely that students actually read it. This stands in contrast to other cases, like American Geophysical, which was tried by sampling: the parties selected a single scientist at random, examined his files looking for photocopies, and treated him as representative of a cohort of 500. Here, the logs permitted an analysis of the copying done for numerous faculty members—presumably all those who assigned any excerpts from any of the plaintiffs’ books.)
When the court did reach fair use, it held across the board that two of the four factors favored Georgia State. The purpose of the use, while not transformative, was nonetheless for highly favored educational purposes by a nonprofit institution. And the nature of the works was consistently informational.
On the third factor, the amount copied, the court repudiated the Classroom Guidelines, calling them “not compatible with the language and intent of § 107.” It noted that the numerical limits in the Guidelines are so stringent that not one of the excerpts at issue in the case would fit within them. It was particularly uninterested in the Guidelines’ position that copying not “be repeated with respect to the same item by the same teacher from term to term,” which the court described as “an impractical, unnecessary limitation.”
Instead, the court fashioned its own quantitative test. For books of nine or fewer chapters, the court set a threshold of 10% of the total page count; for books of ten chapters or more, the threshold was a single complete chapter. (The chapter-based rule creates an odd incentive for publishers to create books with a surfeit of tiny chapters.) Copying of any amount under this threshold, the court held, would be treated as “decidedly small.” In practical terms, this ended up being a one-sided bright-line rule: copying of less than 10% or one chapter always ended in a fair use win for Georgia State.
Finally, the fourth factor, the effect on the market, favored the publishers whenever CCC was offering a digital license for copying the book in question, and favored Georgia State whenever there was “no evidence in the record to show that digital excerpts from this book were available for licensing” as of the date of infringement.” In practice, this was another one-sided bright-line rule: no digital license meant an instant win for Georgia State. The court repeatedly emphasized that students would not have bought the assigned books as a substitute for the excerpts posted on the e-reserve system.
This treatment of licensing is likely to have significant implications. On the one hand, it suggests that libraries may have a freer hand to make expanded uses of orphan works, since by definition, no one will be licensing them. And on the other, the court didn’t consider photocopying licenses to be a suitable substitute for digital licenses. This will put significant pressure on publishers to turn on digital licensing.
Only in seven instances did Georgia State use more than 10% or one chapter of a book that was available for digital licensing. When this happened, the court took a more detailed look at the specifics of the book’s licensing market and the portion copied. Generally, this turned on whether the book made significant revenues via licensing: if so, the use was unfair. (In one instance, the court did a “heart of the work” analysis under factor three to find no fair use because the professor had assigned chapters that “essentially sum up the ideas in the book.”)
Thus, the operational bottom line for universities is that it’s likely to be fair use to assign less than 10% of a book, to assign larger portions of a book that is not available for digital licensing, or to assign larger portions of a book that is available for digital licensing but doesn’t make significant revenues through licensing. This third prong is almost never going to be something that professors or librarians can evaluate, so in practice, I expect to see fair-use e-reserves codes that treat under 10% as presumptively okay, and amounts over 10% but less than some ill-defined maximum as presumptively okay if it has been confirmed that a license to make digital copies of excerpts from the book is not available.
The most interesting issue open in the case is the scope of any possible injunction. Given that Georgia State won on sixty-nine out of seventy-four litigated claims, while the publishers won on only five, I expect that the any injunction will need to be rather narrow. But given how amenable the court’s proposed limits are to bright-line treatment, it is likely that the publishers will push to write them in to the injunction.
My bottom line on the case is that it’s mostly a win for Georgia State and mostly a loss for the publishers. The big winner is CCC. It gains leverage against universities for coursepack and e-reserve copying with a bright-line rule, and it gains leverage against publishers who will be under much more pressure to participate in its full panoply of licenses.
by James Grimmelmann (james@grimmelmann.net) at May 13, 2012 04:25 PM
I hope many of you have seen the excellent and fun WebGL GLSL sandbox at http://glsl.heroku.com/. This live editing of shaders is an excellent learning tool, as it allows you to watch the consequences of any changes you make.
I am constructing some simple OpenGL ES scripts as a learning resource for anyone new to it, in part as a way to help me learn it too. As part of this, I’ve written a little script that gives a similar kind of experience, but on the commandline, with a Raspberry Pi. It overlays the render display over the top of the framebuffer window, and reloads the shader any time you save the script.
You can run this from the terminal (as I do in the video later on) or from within LXDE (X11). (You may wish to replace ‘nano’ with ‘leafpad’ in glsl_sandbox.sh if you are running it in LXDE)
Prerequisites:
* Install pyinotify
$ sudo apt-get install python-pyinotify
* Get the repository:
$ wget https://github.com/benosteen/pyopengles/zipball/master -O pyopengles.zip
$ unzip pyopengles.zip
Inflating benosteen-pyopen....
...
cd into the repository and you are ready to go!
Usage of ‘glsl_sandbox.sh’:
$ bash glsl_sandbox.sh [NAME_OF_SHADER_FILE]
The repository has a few demo shaders that are known to work included for you to try – they are copied from the WebGL sandbox site (http://glsl.heroku.com/)

‘basic.glsl’ – From http://glsl.heroku.com/e#2423.0

‘leds.glsl’ – From http://glsl.heroku.com/e#2450.0

‘raymarch.glsl’ – From http://glsl.heroku.com/e#2171.0
Pass the script any new filename, and it will create a new shader from the template and save it to that location.
This uses the “nano” text editor by default, as that is installed on the reference debian image, but if you look in the script, it is not hard to change to your preference
(NB Ctrl-O, followed by enter to save the file, and Ctrl-X to quit nano.)
Here is a video of it in action, as it is quite hard to describe in words.
by benosteen at May 13, 2012 02:10 PM
I have written before about some issues relating to RDA and RDF. Today I want to actually consider some things we should consider that should cause us to question the concept of "RDA in RDF."
For many decades we have been using relational databases to store our bibliographic data, bibliographic data that we create and exchange using the MARC format. Doing so was not by any means natural or intuitive because there is nothing about the structure or content of the MARC record that lends itself to being stored and managed in a relational database. The results were often awkward, inefficient, and unsatisfying.
Part of the reason for this is the unitary and flat nature of MARC. In spite of the long history of creating separate authority files, each MARC record is a complete and closed document with no actual connections to data outside of itself. While some database implementations for MARC do create relational tables for headings, the degree to which a MARC record can be separated out into tables is minimal and gains us very little in terms of the functionality of an RDBMS.
The underlying problem, however, is not in the structure of the MARC record but in the content of our catalog records. Moving from the card to a database for our data requires more than adding mark-up coding around the catalog data; to do so successfully requires re-thinking the data in terms of relational database principles. There are two basic principles to relational database design: repetition and combination.
To design for relational databases you look at your data to see what elements will be repeated in many different records. Rather than carrying those data elements in multiple records, you create a separate database table for each repeating element, and you store that element once. For example, if you are creating a database of mailing addresses, you see quickly that elements like state and zip code will appear in multiple records. You therefore create a table of state names and one of zip codes, and perhaps even one that links zip codes to city names. In this way, your database carries the string "Mississippi" only once, and that string is replaced in the records with a database pointer that uses much less internal storage. Ditto the zip code. And if the zip code is associated in a table with a city name, you also only store city names once, and each address record needs only a pointer to the zip code, not a city name. In fact, with a zip code you can get the city and state, and your design might look like:

In this way you have saved a huge amount of storage space. You have also made selection of your records on zip code, city and state much more efficient than if they were stored in every address record, because a search on a zip code, for example, retrieves a single entry in the zip code table, and that entry has database-managed links to the relevant records.
In a database of customer orders that has your inventory information along with customer addresses, you use the tables in your database to search for things like "all customers in Mississippi who have ordered WidgetX in the last six months." Information about your inventory and information about purchases are all in appropriate sets of tables in your database and you can combine the data elements to develop different views of the data.
Where the goal in relational database design is to identify and isolate data elements that are the same, the goal in library cataloging data is exactly the opposite: headings are developed primarily to differentiate at the data creation point rather than allow combination within the database management system. The goal is to have each data point be as unique as possible and to be assigned to as few records as possible. Thus, library cataloging creates headings whose purpose is to distinguish between entries:
Shakespeare, William, 1564-1616. As you like it
Shakespeare, William, 1564-1616. As you like it. 1905
Shakespeare, William, 1564-1616. As you like it. 1911.
Shakespeare, William, 1564-1616. As you like it. 1919.
Shakespeare, William, 1564-1616. As you like it. Czech
Shakespeare, William, 1564-1616. As you like it. French
These headings are counter to the functioning of a database management system. If moved to a database table to facilitate retrieval, they will each point to only one or a very small number of records. This negates both the space-saving aspect of database management and it also does not facilitate combination of data elements for retrieval. In the case of headings, the combination of elements is pre-coordinated in the data, rather than post-coordinated in the database retrieval function.
A database approach might break this data into four tables:
In this way one could search for this data by title, by title + author, date + language, or by any other combination of these four data elements. To search the library headings as anything but a single keyworded string, that is to use these headings to perform searches on title or date or language, would be incredibly inefficient. The upshot is that library headings are not "relational" and do not contribute to the functionality that database management systems can provide. Instead, database management systems make use of the separate coded elements, such as date and language, for combinatorial retrieval. Names and titles, because they are text strings and do not have an identified presence in the stored records, must be searched separately rather than being available for relational combination. The results of this type of searching are less than optimal in speed and accuracy.
All of this may seem obvious to some of you, so you may be asking yourselves why I bring this up. I bring it up because even though RDA claims to have as its goal the creation of records in a relational design (see scenario one in this
JSC document), it continues to instruct catalogers to create pre-coordinated headings like the ones above. Not only will these not be efficient or fruitful in a relational database, this brings into question whether RDA is truly modeled on the principles it claims to embrace. If it is not we have cause to worry: we cannot move forward with data that does not conform to a modern model.
Note that in this post I have been emphasizing the use of relational database design for the data. The current plans for a new bibliographic framework appear to plan to create a data model for RDA that is based on semantic web principles. Those principles are yet another significant evolution following on the database model, which is now considered waning technology. Other communities, ones that have been designing for database management requirements for their data for decades, are now looking at ways to transform that data to RDF. It is possible that we can skip the relational database phase of our data development and move directly into a semantic web model. However, to think that data created following RDA instructions, which is not even suitable for a relational database, could be made usable on the semantic web without major modifications is simply wrong. If we create a bibliographic framework that takes RDA as it has been described and ports that, unchanged, to RDF we will create a data model that does not serve us, does not serve our users, and that cannot reasonably interact with other linked data on the web.
What we need is an analysis of our data, not a transformation of it "as is" to a new technology. If we aren't ready to admit that some traditional practices, like headings, are no longer useful or usable in today's technological environment, we cannot have any hope that our data will be relevant in the future.
(p.s. I anticipate that someone will state that headings are needed for alphabetical displays, to "collocate" the records. To that I reply: 1) you can do the same collocation using the data elements, and in fact you could devise multiple collocations by combining the elements in different ways and 2) a linear, alphabetic display is so anachronistic with today's technology, and so seldom used when available, that it is hard to justify the use of human catalogers to create these fields. If you still believe that library records must contain hand-crafted headings, all I can say is: you can believe what you want, but some of us will be exploring other solutions.)
by Karen Coyle (noreply@blogger.com) at May 13, 2012 11:42 AM
After yesterday's blog post, I thought I'd have a go at narrowing down my definition of a "separate search".
If a user enters some search terms, and then uses 2 facets to refine the search before clicking on a result, I was classing that as 3 separate searches — what niggled me overnight was that that approach might inflate the facet use statistics …after all, 30.6% of all searches used at least one facet felt a little high given that I'm forever hearing staff moan that students never use the facets, no?!
For today's blog post, I've removed all searches that didn't lead to a result click. (There's a small caveat that my jQuery code currently doesn't capture a result click for links to the OPAC where the user clicked on the availability message (highlighted in red below) — this is because my jQuery code that captures the result clicks runs once the page has loaded, but before the AJAX'd availability information has been retrieved. When I get some time, I'll see if I can find a way around that.)

So, let's see how much of a difference that makes to yesterday's stats…
- 29.4% of searches used at least one facet to refine the results
- 10.4% of searches were refined using the content type facet (e.g. newspaper articles, book reviews, books/ebooks, journal articles, etc)
- 7.8% of searches were refined to just items with full text available online
- 9.4% of searches were refined by publication date
- 5.6% of searches were refined to just articles from scholarly publications (including peer-review)
- 3.7% of searches were refined using the language facets
- 2.5% of searches were refined using the subject term facets
- 2.1% of searches used a Boolean operator, with nearly all of them being AND
So, that overall figure for the % of searches which used at least one facet hasn't dropped by much from yesterday's figure of 30.6%.
Anyone who follows me on Twitter will know that I like to cheekily mock the importance of Boolean and the data from the last 7 days reveals a few things:
- no-one who used a Boolean NOT in their search clicked on a result
- only 0.07% of searches (that's just 7 searches in every 10,000!) used a Boolean OR, which is arguably the most useful operator to use
- unless you're using a search that includes one of the other Boolean operators, the use of AND is pretty much redundant as it's the default Boolean operator in a search (i.e. the search "dogs AND cats" is the same as "dogs cats")… so why are we telling students to use it in Summon?
After poking a bit of fun at someone for entering a 356 word search query yesterday, I can reveal that the longest search in the last 7 days that resulted in a result click was 98 words (it was a paragraph copied and pasted from a journal article).
I guess the big question here is why the disconnect between the "students don't use facets" mantra and the actual usage data?
Finally, I thought I'd figure out how many results are clicked on after a search…

by Dave Pattern at May 13, 2012 09:54 AM
May 12, 2012
I have an Acer Liquid E phone running Android. All of the photographs it takes are timestamped in the internal EXIF metadata to 8 December 2002. Turns out there's a bug in the camera app: it seems that instead of using "insert correct date here" in libcamera.so somehow the December 2002 date got hardcoded in.
The timestamp on the file itself is correct, however, so I wrote this script to use that to edit the EXIF times. It uses ExifTool, which is probably in your package manager:
#!/bin/bash
for I in 2002-12-08*jpg
do
TIMESTAMP=`stat --printf "%y" "$I"`
NAME=`echo "$I" | sed 's/2002-12-08 12.00.00-//'`
NAME="acer-$NAME"
echo "$I --> $NAME ($TIMESTAMP)"
cp -p "$I" $NAME
exiftool -P -DateTimeOriginal="$TIMESTAMP" -CreateDate="$TIMESTAMP" $NAME
done
For example, a file named 2002-12-08 12.00.00-460.jpg timestamped 2012-04-10 19:30 would have the DateTimeOriginal and CreateDate EXIF fields corrected to to 2012-04-10 19:30 and the file would be renamed to acer-460.jpg. The original file is left untouched.
It worked for me, and it won't delete your files, so use it if it helps. Make sure that whenever you copy files off your Acer phone you use cp -p to preserve the original timestamp. Otherwise your photos will have their internal dates set to today!
by wtd at May 12, 2012 06:26 PM
I discussed the utility of the sub-property relationship in Getting to higher MARC branches, Netting more MARC fruit, and Adding MARC fruit to the cornucopia. Coincidentally, Bob DuCharme posted Simple federated queries with RDF which outlines the same technique and provides additional information on its use for resource discovery. Those posts are somewhat technical, and I tried to lighten up in my presentation Turtle dreaming at the recent Dublin Core Metadata Initiative (DCMI) seminar Five years on. This post is another attempt to demonstrate in a non-technical way (I hope) how useful and powerful the sub-property relationship can be.
A metadata attribute, like ‘title’, that is to be used for linked data in the Semantic Web is usually represented in Resource Description Framework (RDF) as a property. A property can be used as the predicate part of a triple: “Subject – predicate – object”, where ‘Subject’ is what the triple is about (e.g. a resource), ‘predicate’ is the aspect of the subject, and ‘object’ is the value of that aspect for the specified subject. For example:
“This resource – (has) title – ‘Using the sub-property ladder’”
is a single metadata statement in triple format. We can think of this as conforming to the triple template:
“Specified resource – (has) attribute – value”.
Note that prefixing the predicate with ‘has’ turns it into a verbal phrase and renders the statement in (near) natural language.
We can also make meta-metadata statements in triple format. These are ‘data about metadata’ rather than ‘data about data’, and are often referred to as ontological triples to distinguish them from data triples such as the example above. The triple template for one type of meta-metadata statement is:
“Specified RDF element – (has) relationship – Other specified RDF element”
Note that a relationship between metadata elements is also represented in RDF as a property. In particular, ‘sub-property’ is a pre-defined relationship between two RDF property elements, giving the ontological triple:
“Property 1 – (is) sub-property of – Property 2″
Furthermore, such relationships can embed semantic rules that can be processed automatically by software known as ‘semantic reasoners’ or just plain ‘reasoners’. The rule embedded in the sub-property relationship is: If “P1 – (is) sub-property of – P2″, then any data triple using P1 as its predicate can generate another data triple using P2 as its predicate, with the same subject and object. Let’s call this kind of ontological triple a mapping triple, because it effectively maps one property to another.
Suppose we have two attributes ‘title’ and ‘varying form of title’. I can create the mapping triple:
“‘varying form of title’ – (is) sub-property of – ‘title’”.
If we have a data triple:
“This resource – (has) varying form of title – ‘Pat presents cataloguing for beginners’”
then a reasoner will automatically generate the data triple:
“This resource – (has) title – ‘Pat presents cataloguing for beginners’”
In a similar way, we can create the mapping triple:
“‘title statement’ – (is) sub-property of – ‘title’”
and from the data triple:
“This resource – (has) title statement – ‘Cataloguing for beginners’”
generate:
“This resource – (has) title – ‘Cataloguing for beginners’”
So what? Further suppose that the ‘title’ attribute is from the DCMI metadata terms, and the ‘varying form of title’ and ‘title statement’ attributes are from the MARC21 tags 245 and 246. So a MARC21 record for the resource might contain the set of data triples:
“This resource – (has) 245 [title statement] – ‘Cataloguing for beginners’”
“This resource – (has) 246 [varying form of title] – ‘Pat presents cataloguing for beginners’”
A reasoner can generate the set of data triples:
“This resource – (has) [DC] title – ‘Cataloguing for beginners’”
“This resource – (has) [DC] title – ‘Pat presents cataloguing for beginners’”
In other words, we have generated a DC record from a MARC21 record. Or we have generated a title index for the MARC21 record. Or both.
Let’s add an RDA attribute and an ISBD attribute mapping to the mix:
“[RDA] ‘title proper’ – (is) sub-property of – [DC] ‘title’”
“[ISBD] ‘has title proper’ – (is) sub-property of – [DC] ‘title’”
The data triples:
“That resource – (has) [RDA] title proper – ‘Cataloguing for geeks’”
“Another resource – [ISBD] has title proper – ‘Does cataloguing have a future?’”
can generate the corresponding DC triples, and we end up with:
“This resource – (has) [DC] title – ‘Cataloguing for beginners’”
“That resource – (has) [DC] title – ‘Cataloguing for geeks’”
“Another resource – (has) [DC] title – ‘Does cataloguing have a future?”
“This resource – (has) [DC] title – ‘Pat presents cataloguing for beginners’”
So now we have a title index to metadata from multiple heterogeneous sources. And the beginnings of a set of records in Dublin Core format.
Note that the attribute which is the sub-property must be entirely narrower in its semantics than the related super-property. If we create the mapping triple:
“‘title’ – (is) sub-property of – ‘varying form of title’”
then we generate the data triple:
“This resource - (has) 246 [varying form of title] – ‘Cataloguing for beginners’”
which is incorrect.
As a result, a data triple generated by a sub-property mapping triple is usually ‘dumber’ than the original data triple; detail is lost because the generated triple uses an attribute which is broader in meaning than the original. This ‘dumbing-up’ is necessary to produce interoperable metadata from different schemas – but data is not permanently lost because the original triple is still available for use in other applications. Needless to say, data triples created with broad attributes cannot be “smartened-down”, at least on their own.
The sub-property relationship can be chained. We can create a new attribute property, MARC21 ‘title’, which could be used in an application for making a title index to MARC21 records, as already mentioned. This new attribute is a super-property of all the MARC21 title-type attributes, and is also a sub-property of the DC ‘title’ attribute:
“[MARC21] ‘title statement’ – (is) sub-property of – [MARC21] ‘title’”
“[MARC21] ‘varying form of title’ – (is) sub-property of – [MARC21] ‘title’”
“[MARC21] ‘title’ – (is) sub-property of – [DC] ‘title’”
Doing this does not affect the previous mapping triples relating each MARC21 title-type attribute directly to the DC ‘title’ attribute, although it makes them redundant because this new set of mapping triples generates exactly the same data triples at the DC level from the MARC21 originals.
Different application can therefore re-use and, if necessary, augment the sub-property chains for each of the high-level core attributes found in most bibliographic metadata schemas, such as title, author/creator/agent, subject, target audience, etc. The chains form a net(work) of mappings, or map, which can automatically dumb-up triples from any level of semantic granularity to any higher level.
We should only have to publish such maps or part-maps once, openly so that anyone can use them and add to them. If the professional communities develop the maps first, much effort will be saved and much authority imparted. This requires collaboration and action real soon now – the ISBD Review Group and the Joint Steering Committee for Development of RDA have started with the development of a mapping between the ISBD and RDA element sets.
These maps should remain valid forever, so the effort is worth expending. The original data triples use the original properties based on the schema attributes at the time and they will be valid “for their time”, in the same way that many catalogues are likely to contain records created under the Anglo-American Cataloguing Rules, with its ‘general material designation’ attribute long after the successor standard RDA: resource description and access has been adopted with its ‘content type’ and ‘carrier type’ attributes.
And mappings from the MARC21 element sets will show, we hope, that it may not be necessary to convert the entire contents of every MARC21 record as a result of the Bibliographic Framework Transition Initiative!
But the professional communities lack a framework to help them collaborate as a super-community. A network of mappings is more (socially) efficient than an aggregation of one-to-one mappings between pairs of schemas. We need (name)spaces to add intermediary attribute properties and publish the mappings; we need protocols for managing semantic change as schemas evolve; we need lightweight protocols for authorizing mappings; we need systems for ensuring the long-term preservation of RDF element sets and mapping triples.
by Gordon Dunsire at May 12, 2012 06:17 PM
[ update: slightly revised stats are available here! ]
We've just started collecting in-depth data about how students are searching Summon (keywords entered, facets selected, etc) and I thought some of you might be interested in an early analysis from the last 7 days (just under 40,000 separate searches by 2,807 students)…
- On average, students used 4.5 keywords per search (the mode is 3 keywords and the majority of searches used 3 keywords or less — view graph) [1]
- 30.6% of searches used at least one facet to refine the results [2]
- 11.7% of searches were refined using the content type facet (e.g. newspaper articles, book reviews, books/ebooks, journal articles, etc)
- 9.5% of searches were refined to just items with full text available online
- 9.2% of searches were refined by publication date [3]
- 7.2% of searches were refined to just articles from scholarly publications (including peer-review)
- 3.4% of searches were refined using the language facets [4]
- 2.6% of searches were refined using the subject term factes
- 2.3% of searches used a Boolean operator, with AND being by far the most common (2.23% of searches) [5]
notes:
[1] – One student copied & pasted the following 356 word title & abstract into the search box!
Peter J. Shaw, David J. Rawlins Article first published online 2 AUG 2011 DOI:10.1111/j.1365-2818.1991.tb03168.x 1991 Blackwell Science Ltd Issue Journal of Microscopy Volume 163, Issue 2, pages 151–165, August 1991 Additional Information(Show All) How to CiteAuthor InformationPublication History SEARCH Search Scope Search String Advanced >Saved Searches > ARTICLE TOOLS Get PDF (1119K) Save to My Profile E-mail Link to this Article Export Citation for this Article Get Citation Alerts Request Permissions Share Abstract References Cited By Get PDF (1119K) Keywords Confocal microscopy;three-dimensional fluorescence microscopy;point-spread function;deconvolution;computer image processing SUMMARY We have measured the point-spread function (PSF) for an MRC-500 confocal scanning laser microscope using subresolution fluorescent beads. PSFs were measured for two lenses of high numerical aperture—the Zeiss plan-neofluar 63 × water immersion and Leitz plan-apo 63 × oil immersion—at three different sizes of the confocal detector aperture. The measured PSFs are fairly symmetrical, both radially and axially. In particular there is considerably less axial asymmetry than has been demonstrated in measurements of conventional (non-confocal) PSFs. Measurements of the peak width at half-maximum peak height for the minimum detector aperture gave approximately 0·23 and 0·8 μm for the radial and axial resolution respectively (4·6 and 15·9 in dimensionless optical units). This increased to 0·38 and 1·5 μm (7·5 and 29·8 in dimensionless units) for the largest detector aperture examined. The resulting optical transfer functions (OTFs) were used in an iterative, constrained deconvolution procedure to process three-dimensional confocal data sets from a biological specimen—pea root cells labelled in situ with a fluorescent probe to ribosomal genes. The deconvolution significantly improved the clarity and contrast of the data. Furthermore, the loss in resolution produced by increasing the size of the detector aperture could be restored by the deconvolution procedure. Therefore for many biological specimens which are only weakly fluorescent it may be preferable to open the detector aperture to increase the strength of the detected signal, and thus the signal-to-noise ratio, and then to restore the resolution by deconvolution. Get PDF (1119K) More content like thisFind more content like this article Find more content written by Peter J. ShawDavid J. RawlinsAll Authors ABOUT USHELPCONTACT USA
…sadly, Summon failed to find a result for that as we don't subscribe to the article!
[2] – Normally, you search Summon by entering your keywords then, after the results appear, you select facets to refine your search and each facet selection invokes a new search. So, if you ran a search and then select 2 facets, that will be logged as 3 separate searches (1 without any facets, and 2 with).
[3] – Mostly, the publication date facet is being used to limit the search to the X most recent years.
[4] – The vast majority of the content in our Summon instance is in English and, apart from one search that refined the results to just Italian, every use of the language facet was to refine the results to English only.
[5] – Boolean operators have to be entered in UPPER CASE in Summon, with an invisible AND being implict in any multi keyword search that doesn't include Boolean. Looking at the searches queries that included a Boolean operator, 6% were entered entirely in upper case, implying that the user wasn't conciously invoking a Boolean search.
by Dave Pattern at May 12, 2012 01:24 PM
May 11, 2012
Equinox Software, Inc. is excited to announce the development of link checker functionality in Evergreen. Evergreen currently has no built-in mechanism for verifying the validity of URLs stored in MARC records. The ability to verify URLs will be of particular benefit to locations with large electronic resource collections. The requirements for this project are being developed in partnership with NRCan Library and Statistics Canada Library. The technical specifications for this project will be shared with the Evergreen Community once they are ready. Equinox developers estimate that coding will be completed no later than the end of the third quarter of 2012.
Once the coding is finished, the code will be submitted to launchpad, where another developer will need to review and approve it. Once it has been signed off on by another developer, it can be included in the next major release of Evergreen. End user documentation will also be made available to the Evergreen Community. For additional information, contact George Duimovich, NRCan Library, or Suzannah Lipscomb, Equinox Software.
by slipscomb at May 11, 2012 07:08 PM
I'd love to have a law named after me, so here goes:
Dave's Law
users should not have to become mini-librarians in order to use the library
If you ever find yourself needing to invoke Dave's Law, please let me know
by Dave Pattern at May 11, 2012 11:52 AM
Following the fun we had at March’s Meet-up ‘launch’, we will be having another gathering of people interested in open data next Wednesday 16th May. Hosted by the Wash Bar, Edinburgh, from 19.00, come and join us to discuss ideas, projects and plans in relation to openness.
Lightning Talks will include Federico Sangati on crowdsourcing and education, ahead of his presentation at Dev8ed later this month, and a sneak preview of the hackathon that Open Biblio will be running 12-14th June in collaboration with OKFN’s Open GLAM and Cultural Heritage Working Group and DevCSI.
If you would like to give a lightning talk (informal 2-3 minute presentations) about anything related to open data or knowledge, contact naomi.lillie [@] okfn.org.
Sign up here and we’ll see you there!
For this and other events in Edinburgh and the rest of Scotland, sign up here.
by Naomi Lillie at May 11, 2012 08:30 AM
This is the fifth and last in a series about using R to look at reference desk statistics recorded in LibStats. Previously:
I've been making some other charts showing other kinds of ratios and calculations but I'm going to skip to one last pair of charts where I bring in the number of our students to figure out how many students we help with research help each week and for how long.
First, a brief review of the four branches of the York University Libraries system we're looking at:
- Scott is arts, humanities and social sciences, and the building includes the map library, the archives, and music/film library
- Bronfman is business
- Frost is on the Glendon campus in another part of the city and handles all of the students there
- Steacie is science, engineering and health
(Osgoode is law but they don't use LibStats so we'll forget about them for now.)
I calculated how many "home students" each library has. Bronfman handles everyone in the business school and in the administrative studies program in another faculty. Steacie handles everyone in the science and health faculties (except psychology, which is handled at Scott). Frost handles everyone at Glendon. Scott handles everyone else. The York University Factbook let me look up how many students were in each faculty, and I did a bit of adding and subtracting and figured out:
- Scott has 34,388 "home students"
- Bronfman has 6,050
- Frost has 2,677
- Steacie has 10,018
That's 53,133 students total, as of last fall. (We have about 43 librarians and archivists, for a ratio of 1235 students to each librarian, which is one of the worst in Canada.)
You can figure out something very similar for your library, probably.
With those numbers, we're all set for some more work in R.
First, I make a libstats.bigscott data frame, which gloms together all of the reference desk activities that happen in the Scott Library building (which as I said contains three smaller libraries) into one. This is necessary to group together all possible arts/humanities/social sciences questions. These lines below rename certain library.name fields by saying, for example for SMIL, for every entry in this data frame where library.name equals "SMIL", make library.name equal "Scott." Nice example of vector thinking in R.
> libstats.bigscott <- libstats
> libstats.bigscott$library.name[libstats.bigscott$library.name == "SMIL"] <- "Scott"
> libstats.bigscott$library.name[libstats.bigscott$library.name == "ASC"] <- "Scott"
> libstats.bigscott$library.name[libstats.bigscott$library.name == "Maps"] <- "Scott"
> libstats.bigscott$week <- as.Date(cut(as.Date(libstats.bigscott$timestamp, format="%m/%d/%Y %r"), "week", start.on.monday=TRUE))
Next, use our old friend ddply to count how many research questions are asked each week.
> research.users <- ddply(subset(libstats.bigscott,
question.type %in% c("4. Strategy-Based", "5. Specialized")),
.(library.name, week), nrow)
> names(research.users)[3] <- "users"
> research.users$user.ratio <- NA
> head(research.users)
> library.name week users user.ratio
1 Bronfman 2011-01-31 48 NA
2 Bronfman 2011-02-07 80 NA
3 Bronfman 2011-02-14 42 NA
4 Bronfman 2011-02-21 61 NA
5 Bronfman 2011-02-28 53 NA
6 Bronfman 2011-03-07 59 NA
Now, another probably heinous non-R way of dividing the number of users (or, actually, questions) each week by the number of "home students":
> for (i in 1:nrow(research.users)) {
if (research.users[i,1] == "Bronfman" ) { research.users[i,4] = research.users[i,3] / 6050 }
if (research.users[i,1] == "Frost" ) { research.users[i,4] = research.users[i,3] / 2677 }
if (research.users[i,1] == "Scott" ) { research.users[i,4] = research.users[i,3] / 34388 }
if (research.users[i,1] == "Steacie" ) { research.users[i,4] = research.users[i,3] / 10018 }
}
> library.name week users user.ratio
1 Bronfman 2011-01-31 48 0.007933884
2 Bronfman 2011-02-07 80 0.013223140
3 Bronfman 2011-02-14 42 0.006942149
4 Bronfman 2011-02-21 61 0.010082645
5 Bronfman 2011-02-28 53 0.008760331
6 Bronfman 2011-03-07 59 0.009752066
user.ratio there is what we're after. It looks low, doesn't it? Multiply it by 100 to get a percentage. It's still low.

The y-axis is per cent, so this shows that usually through term time we see give research help to under 1% of our students. There are a few weeks in some branches where it gets above that, but it's never above 1.5%.
That really surprised me. I have no idea what the numbers are like at other universities. If you figure it out for where you work, let me know. Perhaps one per cent is a common figure? Could it be five per cent at some universities? It would have to be a small university, I think, or have a lot of librarians.
Know that we know how many students we help with research, I wondered how long we spend helping them. More calculations in R, using ref.desk.spent, the function I defined in the last post to add up an estimate of how much time is spent at the desk. Here we break it down by branch by week, create a research.time.bigscott data frame, which I then merge with research.users so I can divide to create the research.mins.ratio which is what I'm after:
> research.time.bigscott <- data.frame(library.name = factor(), week = factor(), research.mins = numeric())
> branches <- c("Scott", "Frost", "Bronfman", "Steacie")
> for (i in 1:length(branches)) {
branchname <- branches[i]
for (j in 1:length(weeks)) {
spent <- desk.time.spent(ddply(subset(libstats.bigscott,
library.name == branchname & week==weeks[j] &
question.type %in% c("4. Strategy-Based", "5. Specialized")),
.(time.spent), nrow))
rbind(research.time.bigscott,
data.frame(library.name = branchname, week = weeks[j], research.mins = spent)) -> research.time.bigscott
}
}
> research.users$week <- as.factor(research.users$week) # Necessary for merge to work cleanly
> research.time.bigscott <- merge(research.time.bigscott, research.users, by=c("library.name", "week"))
> research.time.bigscott$research.mins.ratio <- research.time.bigscott$research.mins / research.time.bigscott$users
> head(research.time.bigscott)
library.name week research.mins users user.ratio research.mins.ratio
1 Bronfman 2011-01-31 758 48 0.007933884 15.79167
2 Bronfman 2011-02-07 1340 80 0.013223140 16.75000
3 Bronfman 2011-02-14 595 42 0.006942149 14.16667
4 Bronfman 2011-02-21 997 61 0.010082645 16.34426
5 Bronfman 2011-02-28 775 53 0.008760331 14.62264
6 Bronfman 2011-03-07 901 59 0.009752066 15.27119
> xyplot(research.mins.ratio ~ as.Date(week) | library.name, data = research.time.bigscott,
type = "h",
ylab = "Length of average research interaction (minutes)",
xlab = "Week",
main = "Average length of research interactions (Scott includes ASC/Maps/SMIL)",
sub = paste("From Feb 2011 to", up.to.week),
abline=list(h=15, lty=3, col="lightgrey"),
)
In this xyplot command I throw in an extra abline to draw a dashed light grey line along y=15 to help point out that generally we spend about fifteen minutes on each research interaction.

The Steacie library stands out from the others, and there are some peaks here and there, but overall we spend on average about fifteen minutes on each research interaction with students.
Put those two charts together and it shows that during term time we spend on average about fifteen minutes a week giving research help to each of under one per cent of our students.
by wtd at May 11, 2012 05:44 AM
May 10, 2012
(All of this is accurate, so far as I know, in Rails 3.2.3. If you are reading this later in future Rails versions, mileage may vary).
Rails 3 introduced plugins-as-gems, and the special case of Engines. An Engine is basically a library of code that can define it’s own views, controllers, models, assets, etc, in it’s own codebase, that will be available for the Rails app. (An Engine doesn’t actually need to be defined in it’s own gem, it can be defined anywhere that ends up in the load path. but it’s own gem is typical). You can have Rails generate a skeleton for an Engine plugin as gem, with `rails plugin new enginename --full`. (Without the –full, it’d be a less powerful plugin without full Engine features — actually it ends up being pretty much just an ordinary gem).
A “plain” engine (as opposed to ‘isolated’ engine we’ll discuss later) basically “inserts” controllers, views, and models into the host app — they’re added to the load paths to part of the host app same as any locally defined controller, view or model.
Additionally, routes defined in your $engine/config/routes.rb will be automatically included in the host app. I’m not sure if they’ll be included before or after host app routes; route definition order matters in Rails3 routing.
Name collisions?
If there’s a name collision, the thing with the same name in the host app will usually ‘win’, and the one in the gem will be in accessible to most code (in gem or in host app). If there’s name collision between two gems, it probably depends on load order (what order they’re referenced in the Gemfile, usually).
This is pretty much what you’d expect to happen, so long as the host app version really wins, I think it’s “right”. (With helpers specifically, things can sometimes get confusing and not behave how you expect. I now can’t find the message I think I sent to the rails-user listserv on this at some point, and maybe it’s been changed/fixed in recent versions of rails.)
You can put your models, views, and controllers in module namespaces just exactly the same as you can if you were adding em to any Rails app, in order to try and prevent namespace collision. They’ll work just exactly the same way — the point of an Engine is the stuff in an engine is in the host apps load paths just the same as if it was really in the host app source locations.
Avoiding routing name collisions can be handled the same way, in a ‘plain’ engine, using the Rails3 router :namespace function, or any of the other related router functions (:as, :module, :path, etc.)
Some Engines handle routing by not including routes in $engine/config/routes.rb, where they’ll be automatically loaded by Rails, but instead loading routes into the host app using their own logic, so it can be done just so. This is especially useful for routes that should be changed by host app configuration. For instance, Devise and it’s `devise_for` method that the host app calls manually in it’s own routes.rb.
Isolated Engines: Rails 3.1
Rails 3.1 introduced the “isolate_namespace” directive, which you can add to your engine module.
The one main effect this has is actually on routing. $engine/config/routes.rb are not added to the host app’s routing. Instead, Rails creates a little Rack mini-app out of your engine (or maybe any Engine already is this?), with your engine’s routing in it, so that host app can mount the Engine into the host app’s own routing, using the standard Rails routing ‘mount’ directive for Rack apps. See the Engines guide (or the edgeguide version, with slightly expanded information).
It also makes the engine’s $config/routes.rb behave a bit differnet as far as default routing params, assuming all routes are :namespace’d, making sure the routing helper methods are available to your Engine’s controllers and helpers (and at the right method names), etc.
On top of this, it changes how rails generators work inside your engine. You can use rails generators inside an engine to add controllers and models. In a ‘plain’ engine, if you call `rails generate controller foo`, it’ll add an $engine/controllers/foo_controller.rb, just like any rails app. It’ll add an `$engine/views/foo` directory and an `$engine/helpers/foo_helper.rb`. Just like an app.
In an Engine with `isolate_namespace`, if you call `rails generate controller foo`, it’ll namespace everything it generates for you: `$engine/controllers/$enginename/foo_controller.rb` will contain a controller whose class is EngineName::Foo. Similarly, view folder in `$engine/controllers/$enginename/foo`, etc.
Isolated engines are convenient for many cases. You can have Rails generate a new skeleton for an isolated engine with `rails plugin new enginename --full --mountable`
There’s one aspect of them, though, that you may or may not want — and is fortunately pretty easy to change, giving you what I’ll call a Semi-Isolated Engine.
More Isolation Than you Might Want: Controller inheritance
There’s one aspect of isolated engines that ends up being a bit confusing — It’s actually not caused by the `isolate_namespace` directive in the Engine, but purely by the Rails generators — in fact, purely by the `--mountable` arg to `rails plugin new engine_name --full --mountable`.
Let’s look at how controller inheritance works.
If you use the `rails generator controller` to generate in your engine, if you look at it you’ll see that it’s defined as < ApplicationController — inheriting from the class called ApplicationController — just like a controller in a normal app. But your engine gem doesn’t have an ApplicationController (at least it ought not to, at least not a top-level-namespace ::ApplicationController) — what’s it inheriting? Well, it’s inheriting from the ApplicationController in whatever host app it happens to be running in.
This means common logic in the host apps ApplicationController is available to engine controllers. (Say, a current_user? method; the engine would obviously need to document it’s conventions). It also means all the helper methods loaded into the host app in a way that they apply to all controllers, will be available to engine controllers/views. It also means that, by default, the default rails template layout for controllers in the engine is the host app’s `application` layout — or any other default layout specified in host app ApplicationController.
Sometimes that’s all actually nice, but sometimes you want more isolation. If you generate an engine with `rails plugin x --full --mountable`, you get it. But how you get it is a bit confusing at first.
mountable/isolated generation of Engine::ApplicationController
If you generate a `mountable` (ie, isolated) engine, and then you use `rails g controller` to generate a controller, you’ll see it’s still defined as `< ApplicationController`. And yet it doesn’t actually inherit the behavior of the host app ApplicationController — it’s got no logic from host app ApplicationController, no helpers, won’t find it’s layout, etc.
What’s going on? It’s a different ApplicationController. When you generate an engine with rails –full –mountable, it generates an EngineName::ApplicationController to $engine/controllers/$engine_name/application_controller.rb.
Because of the way Rails constant lookup works, it’s finding this ApplicationController.
And it generated a layout in your engine too at $engine/views/layouts/$engine_name/application.html.erb.
That’s the layout used by all your engine controllers, by default too.
multiple ApplicationController’s, really?
While this level of isolation is perhaps useful for many (most?) Engines, I question the decision to ‘override’ the ApplicationController class name and count on ruby constant-lookup in namespaces to get to the right one. ruby namespaced constant lookup is notoriously confusing, and changes from ruby version to version not always in documented ways. I think it’s just asking for developer confusion and bugs.
Fortunately, it’s only a feature of the Rails generators (both the ‘rails plugin new‘ and `rails generate controller` within an isolated_namespace engine). Got nothing to do with actual rails runtime logic.
If you want to do it differently, no problem. Go change $engine/controllers/application_controller.rb to, say, engine_name_controller.rb instead, and the layout to engine_name.html.erb. All of your engine controllers should now “< EngineNameController” instead of “< ApplicationController“.
You’ve got the exact same behavior, just with less confusing and error prone names.
Sadly, `rails g controller` in an isolated_namespace engine will still generate “< ApplicationController“, you’ll have to manually change it each time you use the generator.
Now, for the Semi-Isolated Engine
Okay, now we can get to the actual point. While isolating controllers like this can be useful sometimes, sometimes it’s not. You might still want the routing isolation that “isolate_namespace” gives you, and the convenient change in behavior of the rails generators under that condition.
But you do want your engine controllers to inherit from the host app ApplicationController. No problem! Just change that engine ‘main’ controller to “< ApplicationController”. You could do that even without the name change we discussed above, by properly scoping to top-level namespace, but that would lead to the confusing (but correct!) EngineName::ApplicationController < ::ApplicationController.
Less confusing if we changed the name as recommended above, say if your engine is the Widgetizer, Widgetizer::WidgetizerController < ApplicationController.
Now,
- any logic in the host app ApplicationController is available in engine controllers.
- Your engine controllers are by default using your engine’s ‘main’ layout instead of the host app’s — just delete the engine layout and they’re by default using the host app’s, that’s it! (Delete
$engine/app/views/layouts/widgetizer/application.html.erb, or $engine/app/views/layouts/widgetizer/widitizer.html.erb if you changed the names as recommended).
- If you have logic which you do want available to all engine controllers but shouldn’t be in teh host app, just add it to your intermediary engine main controller, right? Because
SomeEngineController < EngineController < ApplicationController. (With Rails 3.2+ hierarhical view lookup, all views can be looked up through this chain, not just layouts).
- Because of isolate_namespace, the host app is still not automatically given the helpers in the engine — great! (If you want to manually expose engine helpers in the host app, see advice in the Engine Guide).
- Helpers in the engine are a bit more confusing. Since engine controllers subclass the host app ApplicationController, helpers from the host app areavailable in engine controllers. In some cases this is useful, in most others it probably won’t cause a problem.
- If there is name collision between helper methods in host app and engine, when called from within an engine controller, the engine helper method ‘wins’. Which is great. (The engine helper can even call ‘super’ to get access to the host app version, although there are few cases where an engine helper could rely on ‘super’ existing.) However, this is reliant on details of how and in what order Rails include’s helper modules into controllers, something that’s changed in past rails versions, I’d be a bit cautious of relying on this continuing to work, sadly.
So there you have it, the “Semi-Isolated Rails Engine”, a design that works well for me for certain kinds of engines. It’s a testament to Rails 3.x nice, clean, flexible, consistent, well-designed architecture that we don’t need to fight with Rails actual runtime logic at all to do this, we don’t even need to change it, we just need to make different choices than the Rails engine generators make. If someone wanted to, they could even make their own generators that behaved this way for a ‘semi-isolated rails engine’.
Filed under:
General
by jrochkind at May 10, 2012 08:17 PM
Bibliomation, Inc., Connecticut’s largest library consortium, is sponsoring the integration of Syndetic Solutions by Bowker with Template Toolkit OPAC (TPAC) in Evergreen. Equinox developers will be writing the code for this project. TPAC will be able to support cover images, reviews, summaries, table of contents, excerpts, and author notes from Syndetic Solutions. Once the code is written, it will be submitted on launchpad, where another developer will need to review and approve it. Once the code is signed off on by another developer, then it can be submitted for inclusion in the next major release of Evergreen. For more information, contact Amy Terlaga at Bibliomation / terlaga@biblio.org or Suzannah Lipscomb at Equinox / slipscomb@esilibrary.com
by slipscomb at May 10, 2012 05:26 PM
How do we measure social progress? Academics and international institutions have struggled with employing measures of human development which go beyond GDP per capita: education, health the the economy, but then what values do we attach to these?
In countries like Italy stark regional differences have dominated over time. Particularly in times of fiscal austerity when the country attempts to recover from an economic crisis with major social consequences, seeing how and why the South and the North differ is an important step in a consensus-building process to find solutions and realise collaboration with the citizens.

The Open Economics Working Group of the Open Knowledge Foundation released YourTopia Italia – an application which gives the users a chance to input their priorities in eight categories of socio-economic progress:
- Labour Market
- Education
- Health
- Environment and Energy
- Science and Research
- Household Income and Inequality
- Public Safety
- Social Life
Each category is comprised of sub-indicators e.g. Neighbourhood Safety, Income Inequality, Problems with Air Quality or Friends Networks. While the Northern regions fare rather well in most indicators, which are highly correlated with income per capita, Social Life seems to be better in the Italian South, where more people get married, fewer people separate and more people meet friends in their free time.

YourTopia Italia gives a chance to the user to adjust weights of their personal priorities and see how the map changes when some indicators are excluded altogether. A timeline visualisation also gives the perspective of how Italian regions have developed over time.

All YourTopias can be saved and shared through social media.
So, join our efforts: go to italia.yourtopia.net and define the YourTopia that reflects your vision of social progress!
The application was created with a dataset assembled from istat, and the source code of the application is released under an open license. This project is a result of a team work effort and follows up on ideas initiated during the Open Economics Hackday in January this year.
by Velichka Dimitrova at May 10, 2012 01:00 PM
Tomorrow, I will travel to Peru for the third BSLA review meeting. I’m looking forward to hearing what they’ve been up to (a lot!) and sharing what’s already come out of Botswana and Cameroon. One of the best things about this project has been how enthusiastically everyone has embraced working with colleagues in their own region, and across the world when we meet. I’m sorry to say it, but far, far too often at home I encounter that all-too-familiar syndrome: Not Invented Here. Every country, organisation, and many individuals want to put their own stamp on the profession, which is great, but at the same time that can lead to a lot of reinventing of the wheel and ignorance of resources that already exist. I’ll list just one institutional obvious example: study guides. I’ve been guilty of this too. Research, statistics, standards and training materials are other resources that also tend to be rewritten frequently. There is a barrier of research to practice. One way we’ve tried to help with that is by distilling research into practical case studies.
In countries with few resources, there’s a tendency by many librarians to work smarter, across borders and regions to get the information, specialists, and advice that’s needed to develop the profession and library services. A librarian may travel to Cameroon from Senegal to train on digitisation. A librarian in Cameroon may go to Angola to advise on LIS curricula. Regional associations provide a common meeting place. Certainly, it would be preferable to have enough resources in the country itself, and replicating projects is never so simple as just running the same thing again in a different place.
The recognition of the necessity to work together, across borders and sectors, building on where you are and what you already have, is something that we could all gain from.
by Fiona at May 10, 2012 12:33 PM
The W3C has published the Open Annotation Core Data Model.
Annotating, the act of creating associations between distinct pieces of information, is a pervasive activity online in many guises but lacks a structured approach. Web citizens make comments about online resources using either tools built in to the hosting web site, external web services, or the functionality of an annotation client. Comments about photos on Flickr, videos on YouTube, people's posts on Facebook, or mentions of resources on Twitter could all be considered as annotations associated with the resource being discussed. In addition, there a plethora of closed and proprietary web-based "sticky note" systems, and stand-alone multimedia annotation systems. The primary complaint about all of these systems is that the user created annotations cannot be shared or reused, due to a deliberate "lock-in" strategy within the environments where they were created, or at the very least the lack of a common approach to expressing the annotations.
The Open Annotation data model provides an extensible, interoperable framework for expressing annotations such that they can easily be shared between platforms, with sufficient richness of expression to satisfy complex requirements while remaining simple enough to also allow for the most common use cases, such as attaching a piece of text to a single web resource.
Seen on
Digital Koans.
by David (noreply@blogger.com) at May 10, 2012 10:51 AM
If you are at all familiar with the open textbook world, you’ve likely heard of the startup called
Boundless Learning. Leveraging information in the public domain, as well as dipping into the enormous
stockpile of learning that is Open Education Resources, Boundless Learning has a created a tool that
hopes to eventually replace the traditional textbook model.

Just like “open” anything, however, Boundless Learning has not gone without its fair share of trouble
from vested industry interests. Recently, the textbook publishing giant Pearson, along with MacMillan
and Cengage, filed a complaint alleging copyright infringement. Even though Boundless Learning
culls its information from material available to the public through Creative Commons licensing, the
publishers allege that “Defendant [Boundless Learning] exploits and profits from Plaintiffs’ successful
textbooks by making and distributing the free “Boundless Version” of those books in the hopes that it
can later monetize the user base that it draws to its Boundless Web site. In short, to build its business on
Plaintiff’s intellectual property rights.”
Boundless Learning, on the other hand, claims that the accusations are patently false. The startup states
that it only uses information already in the public domain, and said in a Boston.com article, “you can’t
copyright facts and ideas. When you look at educational information, it’s primarily facts and ideas.”
Boundless Learning will soon send out a legal response, and has expressed disappointment that the
textbook publishers didn’t communicate with Boundless Learning amicably before resorting to litigation.
So what does this mean for the open textbook movement? Can we expect more lawsuits of this nature
against innovative businesses? For one, Boundless Learning has truly launched a paradigm-shifting
product. Most open textbooks are presented to students in PDF format using e-readers and other
devices. However, Boundless Learning has extended beyond just digitizing traditional books by offering
more. Their content is distinctly interactive, and students may build upon Boundless Learning material
in a way that closely resembles both Facebook and Wikipedia. You can study along with other students,
help each other in the learning process, and do it all online. For free.
Lawsuits of this sort aren’t anything new, and it’s important for those of us who are believers in the open
textbook movement that we understand what we’ll have to fight against to live in a more open society.
While Boundless Learning may have been careless in copying the format of copyrighted textbooks,
down to the pagination, it does offer a platform that is new, that goes beyond mere open versions of
closed textbooks. It’s with this innovative spirit that we can effectively, legally, and affordably, make
information available to all. The world is not yet open, but we can get it there.
This guest post is contributed by Katheryn Rivas, who writes on the topics of online university. She
welcomes your comments at her email Id: katherynrivas87@gmail.com.
by Katheryn Rivas at May 10, 2012 09:30 AM
Last Friday (May 4), the Ministry of Planning in Brazil launched the final version of the Brazilian Open Data Portal. In line with the federal government policy to promote the use of free software in public administration, the portal was made using only free and open source tools. Among them is the Open Knowledge Foundation’s open-source data portal software CKAN. Moreover, the whole process of development of the portal was conducted with the participation of concerned citizens in an open way to promote open data.

Opening Data Openly
The development project of the Brazilian Open Data Portal takes the concept of social participation to the extreme. From the beginning, planning meetings and development forums were open to any interested citizen, announced in advance on open discussion lists and where possible relayed via live streaming video to the Internet (webcast) .

In each planning meeting, the development tasks were selected by a flexible development process, in which people present ideas of what they think is needed in small ticket records. At the end of the round, the tickets are grouped, categorized and prioritized. At the end of the meeting, the events were recorded in a publicly accessible wiki (INDA wiki), and a publicly visible task manager (Trac).

We engaged the participation of people from civil society and of civil servants, who collaborated in various ways. Some people were involved right through the process, while others made contributions along the way. We had contributions in the form of software development, design, and information architecture, among others. The latter began with an experimental “card sorting” conducted with the participants of the event Campus Party 2012 in Sao Paulo. This synergy between government and citizens working together for the common good is what we mean by open government.
The Portal has gone through several versions, but the most important are the first (a simple HTML page with a tagcloud of catalogue data), followed by its beta, a little more prepared and documented, and then the current version with a new set of features and extensive reference material and learning.
The dados.gov.br now has 78 data sets with 849 resources. These have mostly been catalogued based on a survey of data that public bodies already publish on the Internet, but that until then were scattered and lacked a central access point where the public could find them. They are, however, the tip of the iceberg compared to what there is to be opened around public data in Brazil.
Recognizing this and the urgency in meeting the new law on access to information, the Secretariat for Logistics and Information Technology is preparing a workshop to guide public bodies, on how to include their data in the catalogue. This will take place in early June.
The portal is part of a larger project called the National Infrastructure Open Data – INDA. The general idea of INDA is to establish technical standards for open data, promote training and support public bodies in the task of publishing open data. This entire process is done through intra-government cooperation and cooperation between government and citizens, always aiming to achieve a real platform for open government.
by Rufus Pollock at May 10, 2012 08:59 AM
May 09, 2012
One thing I can say about Kindle: error reporting is easier.

You report problems in context, by selecting the offending text. No need to explain where - just what the problem is.

Feedback receipt is confirmed, along with the next steps for how it will be used.
By contrast, to report problems to academic publishers, you often must fill out an elaborate form (e.g. Springer or Elsevier). Digging up contact information often requires going to another page (e.g. ACM.). Some make you *both* go to another page to leave feedback and then fill out a form (e.g. EBSCO). Do any academic publishers keep the context of what journal article or book chapter you’re reporting a problem with? (If so, I’ve never noticed!)
by jodi at May 09, 2012 09:45 PM
The Rails Guides are actually really good overview documentation. The days of saying Rails documentation is terrible are over, with the good guides, and good api-level docs too.
I knew there was a Plugin Guide, but I only just noticed there’s an Engine Guide too.
For reasons I don’t know, the Engines guide is not listed on the Guides home table of contents page, even though it’s available at guides.rubyonrails.org. It also doesn’t google very well.
So here’s my part to publisize it. Both the Engines and Plugin guides are pretty good. They’re also overlapping in coverage. As the Engines guide says:
Engines are also closely related to plugins where the two share a common lib directory structure and are both generated using the rails plugin new generator. The difference being that an engine is considered a “full plugin” by Rails as indicated by the --full option that’s passed to the generator command, but this guide will refer to them simply as “engines” throughout. An engine can be a plugin, and a plugin can be an engine.
Maybe ideally they’d be merged, but they’re both good guides; you’ll probably want to review both if you’re working on Rails plugins OR engines.
Actually, you won’t find those exact words above in the current stable release of the Engines Guide, you’ll find it on edgeguides.rubyonrails.org instead. I believe the actual guides are versioned with Rails releases — after one is released to guides.rubyonrails.org with the most recent rails version, it’s never changed (but perhaps for very serious errors), rather any changes will be released with the next Rails release.But you can preview em on edgeguides (I’m not sure if new ‘stable’ guides at guides.rubyonrails are pushed with new patch-releases of Rails, or just new minor-releases).
So you might find edgeguides contains some content that doesn’t apply to the current Rails release; but it may also contain improved explication or better examples or more coverage, as a result of contributors improving things. It’s worth checking edgeguides for a complicated topic. In this case, the Engines guide is rather improved in the ‘edge’ version as of this writing, and I don’t believe it includes anything that won’t work in Rails 3.2, it’s worth checking out.
A while ago I learned about `rails plugin new plugin_name` command to give you a gem plugin skeleton. Including a very useful dummy app for testing. Before that I had been doing it by hand! But this generates a very lightweight plugin, I was going in by hand and adding the files and methods turning into a heavyweight engine. I only just now, from the Engines Guide, noticed you can do `rails plugin new plugin_name -full` to get a fully engine-ized plugin.
Note on plugins ‘vs’. gems
It’s not a “vs.”. Since Rails 3.0, Rails plugins could be distributed as gems. Since Rails 3.2, distributing a plugin as anything but a gem is deprecated — vendor/plugins is probably going to go away in future releases.
Although the great architecture of Rails 3 is such it wouldn’t be that hard to put it back into a particular app yourself, if you really needed to for legacy purposes. But in general, you don’t want, plugins-as-gems are great, making dependency management a lot more sane.
The current guides.rubyonrails.org plugin guide still gives you the option of a /vendor/plugin or a plugin-as-gem — it ought not to, since the current Rails version deprecates /vendor/plugin. So don’t do that with a new plugin.
The edgeguide is more clear. (I made the commit myself! Did you know anyone can commit to docrails github repo? The commits are reviewed by editors before being merged into actual rails, and in the case of guides, eventually deployed to guides.rubyonrails).
Filed under:
General
by jrochkind at May 09, 2012 06:46 PM
New vacancy listings are posted weekly on Wednesday at approximately 11:00 a.m. Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.
Visit the
LITA Job Site for more available jobs and for information on submitting a job posting.
by vedmonds at May 09, 2012 05:04 PM
I’m expecting you already know about the important Oracle v. Google court case over Android’s use of Java APIs, including both copyright and patent claims. But it would be hard to find a more detailed and direct account than Groklaw’s series of notes from the courtroom like this one from the copyright phase, ultimately subtitled Partial Verdict; Oracle Wins Nothing That Matters. For the entire ongoing catalog, try this rabbit hole.
Having read direct courtroom reporting and the Court’s own documents, the headlines in some mainstream news outlets declaring Oracle the “winner” and Google “guilty” will start to look awfully remote and more than a little bizarre. Google has moved for a new case since this jury was unable to determine whether their code constitutes a “fair use” of the Java API, so we might get to see the whole thing play out again.
More likely, the Court itself could deliver definitive resolution to the question whether APIs are copyrightable, particularly if the Court’s opinion converges its EU counterpart in a very recent case (Ars Technica via Google Cache, since the original is 404-ing for some reason). One can hope. The EU ruling was particularly bold because it protects reimplementation to the extent of voiding any agreements (read: EULAs) inhibiting that right. That is a smart extra step in order to make the rights not immediately click-through disposable.
For more, follow along during the trial’s current patent phase at Groklaw.
by Joe Atzberger at May 09, 2012 04:34 PM

The Open Knowledge Foundation’s Open Biblio group, and Working Group on Open Data in Cultural Heritage, along with DevCSI, present BiblioHack: an open Hackathon to kick-start the summer months. From Wednesday 13th – Thursday 14th June, we’ll be meeting at Queen Mary, University of London, East London, and any budding hackers are welcome, along with anyone interested in opening up metadata and the open cause – this free event aims to bring together software developers, project managers, librarians and experts in the area of Open Bibliographic Data. A workshop will run alongside the coding on the 13th, and a meet-up on the evening of the 12th is open to all whether you’re attending the Hackathon or not.
What is BiblioHack?
BiblioHack will be two days of hacking and sharing ideas about open bibliographic metadata.
There will be opportunities to hack on open bibliographic datasets and experiment with new prototypes and tools. The focus will be on building things and improving existing systems that enable people and institutions to get the most of bibliographic data.
If you’re a non-coder there are sessions for you too. We will be running a hands-on workshop addressing the technical aspects to opening up cultural heritage data looking at best of breed open source tools for doing that, preparing your data for a hackathon and the best standards for storing and exposing your data to make it more easily re-used.
When and where?
- The main hackathon will take place over two days between 13th and 14th June at Queen Mary University of London
- On the morning of the 13th June we’ll be running the workshop addressed at the technical challenges to opening up metadata. So for those unable to participate in the hack due to time constraints or lack of coding know how – this is for you!
- On the 12th June – Tuesday evening (details TBC but will be a pub in central / east London!) – we’ll also be hosting a meet-up for anyone attending the hack and open data more generally. Whether it’s open bibliographic data, spending or government data that floats your boat all tribes are welcome!
Who is organising the event?
Who else is involved?
We’ve already lined up a whole host of speakers and groups who’ll be attending both the hack and the workshop. The list so far includes UK Discovery, CKAN, Europeana, Total Impact, Neontribe, The British Library with many more to be added in the coming days…
You’re giving your time and expertise – what do you get if you attend the whole hack?
- Accommodation at QMUL overnight on the 13th
- Food and drink across the 3 days
- The chance to work with experts in their fields
- Admiration and respect from your peers
- We could expound at length, but… go on, you know you want to (it’s free!)
How can I sign up?
- Register here for the 2 day hack
- Register here for workshop only
- Register here for Meet-up only
Please note, if you wish to attend all 3 events you should sign up for each, and the Workshop will run in parallel with the hacking on the morning of the 13th.
More questions?
Contact Naomi Lillie on admin [@] okfn.org.
See you there!
by Naomi Lillie at May 09, 2012 12:04 PM
[Had some problems with the images in this post at fist, should be fixed now]
At UWS we’re about to start work on our Research Data
Repository project, which you can read about over
on the UWS eResearch blog. The starting point will be the Research Data
Catalogue component of the repository. The main point of the catalogue is to describe research data collections for
the purposes of discovery, reuse, reporting and archiving. But what is a research data collection and how
might a researcher put one together?
I won’t attempt an all-encompassing answer to that, but I
would like to look at one common case, where a data collection is a set of
files. How can we help researchers deal with file-based data efficiently, and
as generically as possible?
This is important – we know from talking to eResearch and IT
people from other universities that if you provide raw storage to researchers
people will use it; it will start to fill-up and at some point the institution
will be scratching its eResearch head and asking “what exactly do we have
here?”. We really need to get data described both early and often, and to think
about data in the context of the research lifecycle; applying for grants,
reporting on grants, publishing and so on.
This post tackles the question “How can we help our
researchers keep track of the vast amounts of stuff that will start
accumulating our servers when we roll out file storage services?” It summarizes
a recent demo I gave at Intersect NSW,
our eResearch partner, at a meeting organized by Ingrid Mason (@1n9r1d). The demo is designed to show
how generic file management services could help researchers to select and
package file-based data for easy deposit into long-term curated, managed
storage in a couple of scenarios.
I have written about this before, and showed
some other Intersect people a similar demo last year in the hope that the
demo might be of interest to the team working on the application formerly known
as FieldHelper. FieldHelper
is about getting files labeled and bundled for repository deposit as efficiently
as possible. I’d love to hear from the Intersect team about what other
applications they’ve found in this space, and their experiences with The Fascinator software,
comments are open below, or there is an active
mailing list for the software.
Previously I have shown the same demo software looking at
other kinds of data such as computational
chemistry in the Beyond the PDF workshop organized by Prof Phil Bourne, and
documents, such
as Joss Winn’s thesis.
The use-case for a data collection is where there are a
number of files that need to be grouped together:
A desktop or laptop computer.
A workgroup or departmental shared drive.
An institutional data storage service.
A replicated cloud service like Dropbox or Google Drive.
So, there’s a bunch of data files sitting somewhere; on a
laptop, a share, an USB message stick, in Dropbox,
etc.
Nobody knows what they’re for apart from the
people who created/collected them.
The university, the
researchers, funding bodies, the public, the government, lots of stakeholders
want to make sure the data files are looked after; so they can be
reused, so the publication can be validated so that others can build on the
work, so they can be cited, and archived for the appropriate length of time.
For the data for this demo I chose an example from the
University of Western Sydney, where I work, using a data collection collected
by Professor
Roger Dean and Dr Freya Bailes from the
MARCS institute. This data set is one of the exemplars from the university’s
Seeding the Commons project, funded by the Australian
National Data Service. It’s a
collection of measurements of audio intensity in a range of musical works
consisting of 51 files, all plain text. This data set is explained in a journal article, A rise-fall temporal asymmetry of intensity
in composed and improvised electroacoustic music.
There’s a web application (The Fascinator)
watching the relevant storage, finding all the files you put there and showing
them to you as best it can through a web browser. There are two ways to package
the files:
In a hand-curated
‘package’ where you can corral a group of files, optionally provide some navigation
hierarchy and describe the data. This was the main focus of this particular
demo.
In a dynamic
view of the working storage that watches the storage for data with certain properties
such as a location on disk, a tag, or a metadata field and does something with
it, like routing it a repository or a collaboration-space.
The demo:
There’s a Dropbox
folder (as in dropbox.com) on my machine. I put the sound intensity data files
in there:

I’ve set up a server using a free (as in beer) virtual
machine from the NeCTAR research cloud, funded by the Australian
government.

On the server I have installed a copy of The Fascinator
in its default, un-customized guise – but remember the same software could be
installed on a laptop, or in the lab. (The Fascinator is the Free Software
toolkit that was used to build the ReDBox Research
Data Catalogue that’s being widely deployed in Australian Universities now,
including at UWS).
The server also has the Dropbox
folder so anything I put in the folder on my machine turns up there (there’s
still no compelling Free (as in libre) alternative to
Dropbox that I could have used, but we keep looking –
has anyone tried OwnCloud or SparkleShare?
Let me know in the comments).
The Fascinator is, by virtue of a few lines of
configuration, watching the Dropbox folder. Anything
that appears in the folder gets processed. Metadata is extracted,
web-previews are generated for office documents, images, videos etc using an extensible set of plugins. If there was a
business case someone could write a plugin for the sound intensity data, to
show it as a graph, or do analysis across samples.
You can see the files in the web interface via
the file system
:
And via a search interface:

And there is a mechanism to package several
files together, and build a navigational structure for them. This produces a
navigable package outline.

Here’s what it looks like when browsing the
package online to find an individual file:

So we have:
Found the data.
Packaged it together and ordered it.
An interface, using The Fascinator where we can
eyeball the data file by file, tag things, or apply formal metadata; there’s a
huge list of features in The Fascinator, we would need to work out which ones are
useful to which researchers if we deployed it in this kind of role.
The next step is not done yet, but soon we will
demonstrate a very simple workflow showing a path from files on disc, to a
package in the institutional Research Data Repository. I could tag this package
as ‘CurateMe’ and institutional Research Data
Catalogue could pick it up and put it in the work-queue for a research
librarian to help with long-term curation. This is exactly
the same model we described for linking
our Data Capture project for ecological data to the Research Data Catalogue.
This work is a demo that was built by the team back at the
University of Southern Queensland. The work there was halted, but now with many
institutions building institutional Research Data Catalogues with their free
ANDS Metadata Stores money it is time to think about how we might capture some
of the long-tail research data which is never going to have a $200,000 data
capture project devoted to it and how we are going to keep track of data
throughout the research lifecycle.
Copyright
Peter Sefton 2012. Licensed under Creative Commons Attribution-Share Alike 2.5
Australia.
<http://creativecommons.org/licenses/by-sa/2.5/au/>
by ptsefton at May 09, 2012 12:09 AM
May 08, 2012
Just a warning that some of this gets fairly technical, especially with hardware setup, and without the related diagrams, it may be difficult to understand, but the basics are there.
Presenters
- Marc Lalond
- Andrew McAlorum
- Graham Stewart
Evolution of the UTL Website
- recognized need for CMS back in 2003
- 2005 – used Plone
- 2008 – had to move frontpage out of CMS, which meant more maintenance
- 2010 – took another look at new CMS since current CMS needed a lot of Python knowledge and decided on Drupal
- 2011 – launched new site in Drupal
Drupal
- little coding work
- modules (much like WordPress plugins) that are available
- steep learning curve, but coding not necessary
Drupal Related Additions
- Drupal Commons – community distribution, pre-configured package
- Islandora – digital asset management, Fedora database backend
- Solr – well integrated into Drupal, especially for faceted searching
Implementation
- multi-site drupal allows multiple instances
- especially useful for simpler sites with little custom code and modules that are updated
- built custom Drupal distribution for UTL with all modules, theme, settings
- theme built on LayoutStudio starter theme
- next: responsive version
Training
- regularly schedule training, about once a month
- covers setup and config, users, content, etc.
Performance
- 2000 visits per hour
- single page load = 279 MySQL queries
- initial loads for 2000 page loads = 558,000 MySQL queries
- Drupal on one box: User <-> Apache Web Services <-> PHP <-> MySQL
- problem occurs when there is a bottle neck with a single point of failure
- Solution: horizontal scaling with multiple servers with Drupal functions split into smaller boxes
- very flexible
- less expensive
- more adaptive
- fully redundant
Setup
- individual servers are virtual machines, buil using KVM virtualization with Ubuntu Linux
- High availability with Keepalived
- Load balancing with HAProxy
- Caching using Varnish and XCache
- Storage on shared high performance disk
- MySQL query caching with Memcached and Keepalived
- MySQL master/slave replication + Keepalived
Results
- 5 minute downtime (planned and unplanned) between Sept 2011 and April 2012
- load time = < 2s on campus, 4.1 on simulated DSL in Virginia
- 100% Open Source
As I posted on twitter, I’m quite glad we don’t take care of our own hardware, especially since we just don’t have the people and resources (including not having any server admin), but I was quite impressed with the setup of the UTL Drupal setup. Quite interesting to hear what they’re doing.
Filed under:
Academic,
Technology
by Cynthia at May 08, 2012 11:33 PM
Once again I bring you a gCal of the LITA-sponsored events at ALA Annual conference.
Also, I encourage you to go and check out the awesome ALA Annual Conference Scheduler and build your own custom calendar for the conference.
Please note that the LITA Happy Hour will be on Sunday 6/24/2012 from 5:30pm – 8:00pm (a change from our usual Friday evening, so I felt it should be highlighted)
by AaronDobbs at May 08, 2012 10:41 PM
Angela Hamilton, U of T Scarborough, spoke about technologies that she has used particularly at a campus where many are commuter or distance education students.
Libguides: Customized tools
- branding yourself for students to recognize you as their librarian: picture, meebo, contact info
- info on what is an article, database, annotated bibliography, etc.
- custom course guide
- use the tools available to you
- helps to build relationship with users
Online Meeting Software
- e.g. Adobe Connect
- for more advanced reference questions
- share screen – the “show-er” needs to install a plugin, but viewer doesn’t need to
- one-to-one, but also for teaching sessions
Screencapture Videos
- check vendors for already made videos e.g. ISI for Web of Science
- Jing (sp?) – free 5 min videos
- answer longer questions
- can also do it at the reference desk and e-mail it to them
- esp useful for non-techsavvy and ESL students to review later
- can also work for one-on-one session if have software for longer videos
I think some of the ideas presented here are great ways to give students further reference on how to do their research, especially on-the-spot screencasts for customized tutorials for them to review later.
Filed under:
Library,
Technology
by Cynthia at May 08, 2012 08:32 PM
This presentation actually not only talks about digital signage itself, but also the work culture change that happened in the systems department at UTL.
Presenters
- Sian Meikle
- Bilal Khalid
- Graham Stewart
Good Signs Can Make a Difference
- brief
- consistent
- easily read
Writing the Message
- simple
- reduce: punctuation, pictures, words
- headline: 22 characters
- body: 10-18 words
- short URLs
- brief
- 5 seconds per slide
- 8-10 seconds total
- usually less is more
- clear
- call to action e.g. Chat with a librarian
- photographs can be powerful
- coherent design
I don’t know that I agree with all of these, but then it was clear that it depends on the size and distance of the sign as well as where it is.
Presenting the Message
- Chunking
- Coding
- position
- prime spots on a list: first and last get noticed the most
What Makes Digital Signage Different?
- easy to update
- can differentiate content by
- time of day
- audience
- viewing time
What Users Say
- Help me make better decisions
- chat with a librarian, workshops
- Save me time
- maps: library, stacks, workstations
- directories: by floor, service, name, library
- Show me something relevant to me
- Tell me something new and interesting
- Give me ideas
This is not what their actual users were saying. These ideas were based on a talk done by someone outside of the library and the list here is how those ideas might be applied in a library setting.
Touchscreen Kiosks
- PHP – CodeIgniter
- jQuery
- MySQL
- Closed Environment – not open to the Internet
- Javascript Keyboard
Interaction
- Most Frequent Pageviews
- since May 2011
- Libraries & Hours
- Robarts Directory
- Workstations
- User Feedback
- Let me find a book
- Let me access this information from my phone
What’s Next
- catalogue search
- entire catalogue available
- StackMap
- map of physical item location, with directions
- Responsible Design
- designed to be used on any device
This is interesting, because we’re working on something similar at our library and we were considering how responsive to make the site. Obviously, we need to seriously consider designing from desktop down to mobile.
Overhead Signage
- 4 vertical screens
- PHP + AJAX
- Media Commons
- JavaScript video player
- Fishers Rare Book
- screensaver
Features
- auto refresh
- detection of new content
- remote control
- ability to have different slideshows
- control to switch between slideshows
- control through phone
What’s Next
- Scala software across all overhead screens
- content regions e.g. time at bottom of screen
- RSS Feeds to Drupal based on another content type
- Scheduling e.g. times of day
Building Directories
- one PHP + JavaScript page per vertical pylon (two vertical screens)
- alternating event feed display (from Drupal, via AJAX)
Development – Devops Movement
- focus on increase collaboration and cooperation
- agile methodology applied to system administration
- agile development and teams (self organizing, cross functional, quick daily meetings, open environments, face to face meetings, encourage input)
On System Administration
- timeframes all shrink
- web presence critical
- software is developed much faster and changes are more frequent
- massive growth in automation tools
- growth in OSS: sharing and collaboration
Devops Goals
- Eliminate stereotypes
- developers are careless, arrogant while sysadmins always say no and work all night
- Increase communicatin between developers, operations, and management
- Continuous systems improvement
- Break down barriers and silos
- Develop methods to encourage all team members to see the organization’s goals
Advantages
- all staff use all their skills
- diversity
- use knowledge outside defined roles
- roles expand
- cross pollination
- creativity
- “many minds”
- enhanced mutual respect and communication
- greater trust
- shared responsibility
- everyone feels a sense of ownership over the end product
- greater commitment to the product
- everyone focused on the organization’s end goal
- happier, move productive staff
Implementing DevOps With Digital Signange
- operations and development involved jointly from the start
- weekly full meetings and as necessary (often daily) with quick interrupts/one-on-ones for specific issues
- fast code releases: several times/week
- “many minds”
- two screen display: one browser? 2 PCs?
- disabling right click
- URL shortening
- Planning and execution
- browser choice
- OS choice
- development options
- design decisions
- New and experimental project
- innovative methods required
I thought it was interesting that they spoke a lot about the more technical aspect as well as development methodology. I think it’s a good lesson for a lot of library IT departments that agile development with integrated back and front end staff can be very beneficial, particularly because it makes more development faster and more flexible.
One of the things that came up during the code4lib conference too is that developers should have a small amount of time to work on whatever seems interesting to develop new tools or services.
Filed under:
Events,
Technology,
Web design
by Cynthia at May 08, 2012 08:19 PM
What other sites share the same infrastructure with your site, or any other? Bing‘s IP search can answer. Do a search by IP number:
by Casey Bisson at May 08, 2012 07:39 PM
We now know much more about the Google Street View WiFi story, thanks to Google’s decision to release an unredacted version of the FCC report, to the New York Times’s identification of the Google employee involved as Marius Milner, and to further reporting from Ars Technica. The picture it paints is in some respects more flattering to Google, and in some respects worse.
Milner is the creator of NetStumbler, a tool for detecting and analyzing WiFi access points. It makes sense in hindsight that he ended up using his 20% time for the part of the Street View project that aimed to build a database of WiFi networks. And it turns out that he thought about the ethics and legality of recording payload data. He appears to have read some law-review scholarship on wardriving. He considered potential privacy issues, and concluded that the mobility of the Street View cars would minimize the risk of extensive data-gathering from any one user. Further, he emphasized that none of the data would be shared with Google users.
This is, I have to say, above the baseline of ethical cognition for programmers. Looking to legal scholarship at all is quite unusual. In fact, Milner’s thoughtfulness strikes me as roughly par for the course for front-line Google technologists. It’s a company that hires reasonably thoughtful people and encourages them to think about the implications of what they do for society, both good and bad.
But if Google is a company of smart, reflective, and well-intended individuals, collectively they make bad choices. Milner put his privacy concerns and the details of the WiFi payload recording in a design document. The document included a “to do”: “[D]iscuss privacy considerations with Product Counsel.” He talked to a member of the search quality team about the idea; he circulated the design document together with his code to Street View’s project leaders, who forwarded it to the entire Street View team. And he exchanged emails with other Street View programmers and managers that made clear Google was collecting payload data. But nothing happened. For fifteen months, Google Street View cars sucked up and recorded WiFi payload data.
As I said in an earlier post:
When it comes to privacy, this is a company out of control. Google’s management is literally not in control of the company.
Google’s Street View managers failed badly at their jobs. One of them “pre-approved” the design document before it was written, demonstrating complete failure to understand the purpose of managerial review. No one followed up to make sure the discussion with Product Counsel actually happened. Other engineers read the design document and Milner’s code, but either missed the fact that it was collecting payload data or didn’t realize that this could be a potential issue. Again, this is a failure of management: it’s an important part of their job to make programmers aware of the possible legal trouble zones in the areas they’re working on.
Milner has invoked the Fifth and isn’t talking to reporters. He made a mistake, but he’s not a legal expert and it’s a bit unfair to expect him to be. No, his managers let him—and the rest of us—down.
by James Grimmelmann (james@grimmelmann.net) at May 08, 2012 05:09 PM
So it turns out there’s a significant typo, that keeps the algorithm from working right, in the several previously blogged descriptions of reddit’s story-ranking algorithm.
update 6:28PM ET 8 May On reddit, someone with a flag suggesting they ought to know tells me I got it wrong and the original algorithm was correct and is used in production. All I can say is I can’t figure out how that could be, I could not get it to work in a non-reddit codebase, I could get it to work with my ammendment.
If I haven’t corrected a typo, then I guess I’ve derived my own variation which works a lot better for me (that is, works at all for me, but also seems to mimic reddit), in my own codebase. Good enough for me. If you are trying to reimplement this algorithm in a non-reddit codebase, I suspect you’ll find my investigation useful.
Now back to the blog post as originally written.
More oddly, this same significant typo is in the public version of reddit’s code released on github.
I’ve found myself finding joy in code-for-code’s-sake like I haven’t since past days of being an undergrad staying up all the night in the CS lab working on the most fun homework ever. And so I found myself staying up into the wee hours last night investigating reddit’s story ranking algorithm (the one used for stories/posts in the default ‘hot’ ranking, that is time-of-posting sensitive. A different algorithm is used for comments).
The wrong algorithm
The (typo’d) algorithm is most nicely described by Amir Salihefendic. He even provides a python implementation. I figured I’d translate it to my preferred comfortable language ruby, and play around with it changing different parameters to get a feel for it, and get a feel of how it might be modified/tuned to behave somewhat differently if one wanted.
My assumption was that this algorithm outputs a number which can be stored in the database, and stories can be ordered purely by this number, to produce the on-page ranking. This seems indeed to be true, although I was doubting it a bit in the middle. (Another nice thing about this particular algorithm, that everyone did catch on to even in the typo’d version, is that a ranking order calculation only needs to be done when a ‘vote’ happens, then it can be stored in the db unchanging forever (until another vote happens)).
So I translated Amir’s python to ruby and starting playing, and the results made no sense. They didn’t match how things work on reddit, and they didn’t result in any kind of useful ranking algorithm.
Users of reddit know that the story listings are mostly chronological. The vote count will change a story’s position somewhat, but not put it dozens of weeks or pages ahead or behind of it’s strictly chronological order. But that’s what this algorithm did. It also gave any story with a net-negative vote a negative score. Which would put all the net-positive vote stories before any of the net-negative voted stories. Which is not how reddit works.
Looking at the math again now, I’m kind of embaressed I didn’t immediately see the problem, it’s not complicated math. But I didn’t, it just made my head swim. I’ll give you the relevant line from Amir’s python version here, maybe you can do better than I did, now that I’ve primed you:
return round(order + sign * seconds / 45000, 7)
Before I give you the answer, I’m going to tell you all the things I did first:
- Went back and from scratch re-ported Amir’s python to ruby, doing as literal a translation as possible. Same nonsensical result.
- Took the more mathematical description on Amir’s page (the one in a giant png? That he took from semoz?), and implemented that in ruby. Same nonsensical results (modulo some new bugs I introduced).
- Googled around for anyone elses description of the algorithm to see if they had a different version, or explained it better, or explained how it fit into the context of the software as a whole (maybe I was wrong that it was supposed to produce the actual bare ordering number?). No dice.
- Found the relevant source in the reddit github open repository (Amir’s link was broken as reddit moved their public source repo. Hooray github for being so easy to navigate on the web). Translated this to ruby. Same results.
- Okay, noted that the reddit source mentions an “equivalent function in
(Yep, the implementation in github public reddit has the typo, and is wrong!).
At this point, I gave up on understanding the reddit algorithm, I figured there was something I was missing (wrong, only thing I was missing was the typo). But, okay, I dove back into the math, trying to understand it and convert it to something that would work for me.
Take a moment to note lesson learned
Like many programmers, I rather like working from fixed assumptions and constraints, and building on top. This is kind of the nature of abstraction, don’t question the lower levels, take em as assumptions, don’t question em, build upon them.
This is the second time recently that’s led me astray into butting my head against a dead end wall repeatedly, assuming the problem was in my own implementation or understanding, instead of in the framework code I was using, or the published algorithm or explanation I was working from.
Sometimes you’ve got to start questioning the validity of the algorithm you’re working from or the correctness of the library/framework code you’re using sooner rather than later, to save yourself some time. However, do it privately, if you start questioning your dependencies in public without evidence, everyone’s probably (rightly) going to tell you “occam’s razor, the bug is probably in your code, not the dependency.”
The Fix
WRONG: return round(order + sign * seconds / 45000, 7)
RIGHT: return round( (order * sign) + (seconds / 45000), 7)
Is it obvious now that you see it, that the first one makes no sense, but the second one does? Maybe if you see it in context, here’s my ruby implementation of the corrected algorithm.
I feel kind of stupid for not noticing this right away; on the other hand, as far as I can find on google, nobody’s pointed out the typo bug before, and several have commented on the (wrong, typo buggy) algorithm.
The Explanation of the Algorithm
With the typo corrected, it’s much easier to explain the algorithm. The crucial line, from my ruby version, with variables named how I think is clearer:
return (displacement * sign.to_f) + ( epoch_seconds(date) / 45000 )
It plots each story on a fixed timeline by post, and then displaces a story on the timeline by it’s votes. It uses only the vote difference between up and down for displacement, the total number of votes is irrelevant. First:
... + ( epoch_seconds(date) / 45000 )
This just plots each story on a fixed timeline, with distance between two stories always exactly proportional to difference between absolute posting time.
The `/45000` fixes the units of the timeline as “12 hour periods” (45000 seconds in 12 hours), rather than seconds. This reduces the order of magnitude of the units by 4.5ish, making them conveniently less likely to overflow wherever you’re keeping them. But more importantly, choosing the units matters for how much displacement the actual votes will cause, making sure they match appropriately. Then:
(displacement * sign.to_f) + ....
Here’s our displacement. `displacement` is the based on the vote difference (up – down), but on a logarithmic scale. The way the logarithmic scale is calculated, it loses the sign, so it just has to be added back in to net-down-votes will displace the story to be older on the timeline, and net-up-votes will displace the story to be newer on the timeline.
Why is a logarithmic scale used? Other explainers have said something like “to weight the first votes higher than the rest.” While it might have this effect because of reddit voter’s behavior, this is a misleading explanation. The algorithm pays no attention to which votes were made first, either in absolute chronological time or in sequence. It’s just vote-difference. ”10 up, 1 down” has exactly the same effect as “100 up, 91 down” or “1000 up, 991 down”. And it doesn’t matter what order the ups and downs were placed.
The logarithmic scale is in fact used to prevent the displacement-from-voting from displacing the display order too much. Reddit doesn’t want a very high or low voted story to be months ahead or behind, the reddit ‘hot’ order is mostly chronological, with just some displacement from votes.
I dont’ do this kind of mathematical analysis much, and don’t know how to get, say, R, to make you a pretty plot (it ought to be an actual function plot not a bar graph, for explanatory power). So I’ll just give you some samples of how much displacement a given vote-diff can get. Again vote-diff is just ups minus downs, doens’t matter total number of votes. I’ve converted from the “12-hour units” the displacement is actually expressed in to more comprehensible ‘in hours’ units.
| vote-diff |
displacement in hours |
| 0 |
0 |
| 1 |
0 |
| 2 |
3.7 |
| 3 |
6.0 |
| 4 |
7.5 |
| 5 |
8.7 |
| 6 |
9.7 |
| 7 |
10.6 |
| 8 |
11.3 |
| 9 |
11.9 |
| 10 |
12.5 |
| 100 |
25.0 |
| 1000 |
37.5 |
| 10000 |
50.0 |
As you see, even something with an absurdly high 10000 vote-diff only gets put 50 hours ahead of it’s usual place in the timeline. Likewise, if it had a -10000 vote-diff (10k more downvotes than upvotes), it would be only 50 hours behind it’s usual place in the timeline.
Keeps votes from changing the position of a story too much, keeping it at the top forever, or moving it so many pages in that nobody ever sees it. That’s what the log scale does.
And that scale pretty well matches what we reddit users actually observe on reddit, I went and checked it against some popular reddits; reddit only displays approximate posting time of a story as far as I can tell (“1 day ago” could mean 28 hours or 32 hours or whatever), so can’t check completely, but the actual ordering could be explained by the corrected algorithm.
Wrong in the public source on github?
update 6:55pm ET 8 May 2012. reddit assures me that this code is what reddit runs live, and I have made some really stupid mistake. Fair enough. Struck out this section.
Unless I’m making some really stupid mistake, this typo-bug is present in the reddit source publicly shared on github as of time of this writing. [1], [2]
This means that there’s pretty much no way actual reddit.com is using the code they’ve posted publicly on github. At the very least, they’ve fixed this bug in the implementation they’re actually running.
It probably means nobody else is using the reddit github source either, cause it wouldn’t work right with ‘hot’ ranking. (Or someone else is using it, and fixed the bug in their source but didn’t send it back).
How did this bug end up in the publicly shared reddit source? Not fixed yet? I’m kind of curious, and curious as to what relationship this publicly shared source has to what reddit actually runs.
Considering tweaking the algorithm
Now that we understand the basic “timeline + displacement” algorithm, we can consider tweaks/modifications/tuning of the algorithm to behave differently in different environments, which curiosity was my original motivation for looking into this in the first place.
You might want vote displacement to have more of an effect, or have the effect trail off faster or slower . You’d still want to use a log-scale (or a mathematical function with similar properties) to keep very high vote-diffs from displacing a story too much, you still want a trail-off effect.
You could change the log from base-10 to base-something else to effect the velocity of the trail-off effect. You could also introduce a factor into the operand of the log, take `log( factor * vote-diff)` instead of just `log(vote-diff)`. You potentially could change the units from 12-hour units to something else (the 45000 number), but that could get confusing quick, you might need to add another factor on the left hand of the sum to compensate. So actually, instead, you can just add a factor in the left-hand side, `factor * log(votediff)` instead of just `log (votediff)`
I’m not enough of a math guy to predict exactly what all those things would do, I’d want to actually plot the function in R (or something else) and see what it looks like when you change the various factors, and I don’t know enough R (or anything else) to do it. I think plotting vote-diff vs hours-of-displacement is the right thing to plot though to give you the right feedback.
You could also try to introduce something to the equation to take account of total number of votes, so “10 up, 1 down” and “100 up, 91 down” don’t have exactly the same effect. You’d want to base this on the Wilson score confidence interval used by reddit for default comment ranking somehow, that’s the right way to take account of total number of votes, but it’s not immediately clear to me where you’d introduce that into the equation how (Did I mention I’m not a math guy?). That would make it a bit harder to see what it does by plotting it, since it’ll be a multi-variate function now, doh.
And you might not want to trust that the algorithm found in reddit’s public github source for Wilson score confidence interval is actually bug free. Last year someone said they found a bug in at least one published implementation; I think I saw someone say it had been fixed on reddit.com, but I don’t know if it’s been fixed in the github public source.
You might also want to make up votes worth more or less than downvotes, instead of equivalent. Not quite sure how you’d do that. You could make net-negative votes worth more or less than the same absolute value net-positive, just by using a factor in `factor * log(diff)` that depends on diff being positive or negative.
Filed under:
General
by jrochkind at May 08, 2012 04:18 PM
Presenters
- Judith Logan – Robarts Library, UTL
- Michelle Spence – Engineering & Computer Science Library, UTL
The Basics
- LibAnwers: User Knowledgebase FAQ database powered by SpringShare
- Contact Information if question not answered
Implementation
- Designed to have one FAQ system per library, but too many libraries at UTL
- 3 libraries grouped together: Gerstein, OISE, Robarts
- launched Dec 2011
Training
- relied on Springshare’s training materials and FAQ
Workflow
- Questions come into system
- => access & information staff member reads and answers questions
- or assigns questions appropriate for other libraries/services
- send on to specific library if needed
Guidelines and Best Practices
- developed collaboratively
- ensure questions get answered in a timely manner
- ensure answers are up to date (each library check their questions)
- tips for writing for the web
- default settings/entering questions manually (private by default, so not in knowledgebase because frequently includes personal info)
- applicable to all libraries (in most cases)
- FAQ under Quick links
- E-mail contact link now goes to submission form to cut down on spam
- FAQ browse and search on Contact Us page
- Library FAQs button under every Ask Us chat – widget includes tag cloud and contact info
Statistics
- Knowledge Base Explorer that tracks public and private questions
- Query Spy tracks user interaction with the system
- Custom analysis queries
Typical Month
- 57% find an instant answer
- 13.5% receive an answer within one business day
- 30% do not find their answer (successive queries or outside scope of FAQ service)
- unanswered usually using the wrong search: searching for staff, database, or research question
Future
- analyze query spy data further
- integrate with other reference service vehicles
- promote as a resource for staff
- expand to suburban campuses and more St. George libraries
- create workflow to maintain currency and accuracy of articles
- enrich resources with multimedia (images & videos)
Filed under:
Events,
Library
by Cynthia at May 08, 2012 03:52 PM
Panelists
- Mandissa Arlain – RULA
- Monique Flaccavento – OISE Library, UTL
- Ricardo Laskaris – YorkU Libraries
- Fangmin Wang – RULA
- Jenaya Webb – OISE Library, UTL
Loaning Device
- Laptops
- iPads (with covers & cables) at OISE
- York also provides many other gadgets & accessories
- most 4 hour loans (York 1-4 days), restricted to university community
Marketing
- posters
- social media: twitter, facebook, blog
- LCD screens
- website
- branding of bags
Popularity
- iPad > laptop at OISE
- 12.5% of circulation stats at Ryerson
- laptops & iPads at York
Security
- sign waiver first time
- replacement fee for losts
- personal data cleared by deep freeze software once powered down
- iPads cleared manually (~20 minutes each time) whenever returned
- theft reported to security & IT
- repairs sent to IT
Staffing Considerations
- training sessions for staff including hands on experience
- basic use and troubleshooting help
- technical support & issues to IT
- working group meeting to discuss issues
- chargers with devices
Financial Support
- education commons as pilot project at OISE
- library itself & one-time funding from provost office to upgrade at Ryerson
- library paid & some donations at York
- apps purchased with gift card so as not to associate credit card #
Software & Apps Selection
- laptop software same as what’s on desktop
- productivity apps e.g. Dropbox
- educational
- preferred free, but some money to purchase apps
Age & Replacement Schedule
- no formal refreshment cycle
- mostly depends on budget, try to repair existing laptops
- replacements determined by IT
User Feedback
- informally, anecdotal
- from student committee
- studies planned for future: focus groups, survey
Future Directions
- meeting demands, so unlikely to expand
- no money to expand
- future to encourage students to bring their own devices
Filed under:
Academic,
Events,
Technology
by Cynthia at May 08, 2012 02:55 PM
The next #OpenDataCBG meet-up will take place this Monday 14th May, at 7pm in the Panton Arms. Sign up now!

OpenDataCBG is back for its third bi-monthly meet-up!
The previous two meet-ups have been a huge success, with almost thirty people squeezing into the function room of the Panton Arms for an evening of talks, discussion and socialising.
On Monday 14th May we will gather in the Panton Arms from around 7pm, to get in a round of drinks before lightning talks kick-off at 7:30pm.
Give a talk
Confirmed to speak so far we have Tom Oinn, who will be giving a lively talk about Overtone, featuring a live demo of ‘things that change colour and go beep’.
There is still space for a couple more talks, so get in touch asap if you’d like to get involved.
Lightning talks are short 2-3 minute presentations on any topic related to open data. The talks are relaxed and informal, and anyone is welcome to join in! Contact laura.newman [@] okfn.org for more details.
Get involved
Whatever your interests – whether government, science, cultural heritage, hardware, design, transport, or something else entirely! – you are sure to find like-minded people eager to discuss your ideas and share their own.
Sign-up on our meet-up page and tweet using the #OpenDataCBG hashtag.
If you live in or near Cambridge, we hope to see you next week!
by Laura Newman at May 08, 2012 02:08 PM
We haven't done much with JSON in VIAF yet, but Ralph came up with a new feature for VIAF for which JSON seemed a natural fit: requesting a view of a VIAF cluster that just shows the links. Here's the JSON for this new view of the Mark Twain cluster, http://viaf.org/viaf/50566653/justlinks.json:
{ "viafID":"50566653",
"BAV":["ADV10188047"],
"BIBSYS":["x90056487"],
"BNE":["XX945992"],
"BNF":["http://catalogue.bnf.fr/ark:/12148/cb11927291n"],
"DNB":["118624822"],
"EGAXA":["vtls000823270"],
"JPG":["500020427"],
"LAC":["000002392","0105C0556"],
"LC":["340758"],
"NDL":["00459304"],
"NKC":["jn19981002263"],
"NLA":["000035028957"],
"NLIara":["000160730"],
"NLIcyr":["000156897"],
"NLIheb":["000175478"],
"NLIlat":["000133341"],
"NUKAT":["vtls000634910"],
"PTBNP":["46269"],
"RERO":["vtls000164542"],
"RSL":["nafpn-000083946"],
"SELIBR":["98057"],
"SUDOC":["027171876"],
"VLACC":["000002392"],
"WKP":"Mark_Twain"}
We return an array for each source because some may have multiple IDs (e.g. LAC in the example). The array labels are VIAF's abbreviations (e.g. http://viaf.org/authorityScheme/LAC) for each of the source files, with WKP standing for Wikipedia.
For those of you that prefer content-negotiation to mangling URIs, the mime-type 'application/json+links' 'application/vnd.oclc.links+json' should also work.
Already built into VIAF is the ability to go from the source ID to the VIAF cluster: http://viaf.org/viaf/sourceID/SUDOC|027171876 will take you to the VIAF cluster http://viaf.org/viaf/50566653, and http://viaf.org/viaf/sourceID/SUDOC|027171876/justlinks.json will give you just the links for that page.
If you want ALL the links out of VIAF then visit the VIAF Dataset page at http://viaf.org/viaf/data. It has a pointer to a file that lists each of the 27 million VIAF ID to source ID links.
We're having fun here at VIAF Central!, especially Ralph LeVan who thought of and implemented this.
--Th
Update: 2012.05.09 changed mime type (old one should still work)
by Thom at May 08, 2012 02:00 PM
The Loop’s Jim Dalrymple compiled the following numbers for the time it takes various tech sites to load in a browser in late 2011:
- The Loop: 38 requests; 38.66KB; 1.89 secs
- Daring Fireball: 23 requests; 49.82KB; 566 milliseconds
- Macworld: 130 requests; 338.32KB; 8.54 secs
- Ars Technica: 120 requests; 185.99KB; 2.08 secs
- Apple: 46 requests; 419KB; 1.39 secs
- CNN: 196 requests; 269.41KB; 4 secs
- BGR: 368 requests; 2.74MB; 35.33 secs
- AppleInsider: 141 requests; 649.39KB; 5.64 secs
- Facebook: 137 requests; 993.54KB; 11.19 secs
- MacStories: 119 requests; 2.16MB; 2.13 secs
John Gruber started this by calling out The Next Web for it’s slow performance:
- TheNextWeb: 342 requests; 6MB; no time info
More benchmarks can be seen at Browsermob.
by Casey Bisson at May 08, 2012 01:36 PM
The Open Knowledge Foundation are currently recruiting for a Data Wrangler and a Data Visualisation Developer. If you’d like join our team, please visit our jobs page.

At the Open Knowledge Foundation, we build tools and communities to create, use and share open knowledge – and to help others to do the same. In recent months, we have become involved in a growing number of open data projects, and two new positions have now been created within our team.
We are seeking two data experts to join us as a Data Wrangler and a Data Visualisation Developer. Read on to find out more about what the roles involve.
Data Wrangler
We’re looking for a data wrangler who is excited to tell stories through data. You will work on various datasets, to understand them and to tell their story to a broader audience. You will also be involved in training efforts, creating and teaching courses in data analysis to technical and non-technical audiences.
Your role will be exciting and varied, and will include:
- Work on the School of Data, building learning challenges and course content (see our previous post for more information on the School)
- Research for our new data blog, coming soon.
- Collaborations with our Working Groups, for example the Working Group on Open Economics
- Work on OpenSpending, one of our flagship projects.
Skills
We are open to people from a wide variety of backgrounds; whether coding, visualisation, journalistic, statistical or otherwise. We are seeking someone who has:
- Experience in data analysis and statistical methods
- Experience with data cleansing, ETL patterns
- Good written communication skills
- Experience with R/Stata/SPSS
- Coding skill in a modern script language, e.g. Python, Javascript.
- Basic skills in information/data visualization
If that sounds like you, please visit our jobs page to find out more.
Data Visualisation Developer
As a Data Visualisation Developer, much of your time will be spent on our flagship OpenSpending project.
OpenSpending is about mapping the money. We want to make government finances accessible to advocates, journalists and citizens. Our goal is to collect budgeting information from across the world and to present it in a form that promotes understanding, analysis and participation. Some of the questions we ask are:
- How much is government spending on health? Is expenditure growing or shrinking? How does this translate into results?
- What are the proportions of different government programmes? What is spending on prisons compared to schools? How much is Ghana spending on education compared to Nigeria?
- How much tax do I pay into which area of government?
Our day-to-day work has many facets. We work on the core platform, undertake journalistic projects as part of “Spending Stories”, which won the Knight News Challenge in 2011, and work with organizations and civic activists world-wide to set up local budget transparency projects.
Your role with us
You’ll help us to create new visualizations to answer spending questions through meaningful, visual narration.
Skills we’re looking for:
- Strong visual design skills
- HTML5/Javascript visualisation experience
- Familiarity with several visualization toolkits (e.g. D3, Raphael)
- Experience with cross-browser compatibility
- Plus (but optional): Knowledge of Python
Basically: send us some demos of good stuff you’ve done.
Come and join us!
For more information, please email jobs [@] okfn.org.
Applicants should send a CV and covering letter/email to jobs [@] okfn.org, highlighting their relevant skills and suitability for the job.
If you’re interested in the Data Visualisation job, we would also be keen to see some demos of your work.
by Laura Newman at May 08, 2012 09:30 AM
This is Hat Rack, by Marcel Duchamp, at the Art Institute of Chicago. The label dates it at 1964 and adds "(1916 original now lost)."
Here is the signature on the bottom:
Seeing Duchamp's work in any gallery is a joy.
by wtd at May 08, 2012 12:29 AM
May 07, 2012
Kris Carpenter Negulescu of the Internet Archive and I organized a half-day workshop on the problems of harvesting and preserving the future Web during the International Internet Preservation Coalition General Assembly 2012 at the Library of Congress. My involvement was spurred by my long-time interest in the evolution of the Web from a collection of linked documents whose primary language was HTML to a programming environment whose primary language is Javascript.
In preparation for the workshop Kris & I, with help from staff at the Internet Archive, put together a list of 13 problem areas already causing problems for Web preservation:
- Database driven features
- Complex/variable URI formats
- Dynamically generated URIs
- Rich, streamed media
- Incremental display mechanisms
- Form-filling
- Multi-sourced, embedded content
- Dynamic login, user-sensitive embeds
- User agent adaptation
- Exclusions (robots.txt, user-agent, ...)
- Exclusion by design
- Server-side scripts, RPCs
- HTML5
A forthcoming document will elaborate with examples on this list and the other issues identified at the workshop. Some partial solutions are already being worked on. For example, Google, the Institut national de l'audiovisuel in France, and the Internet Archive among others have active programs involving executing the content they collect using "headless browsers" such as Phantom JS.
But the clear message from the workshop is that the old goal of preserving the user experience of the Web is no longer possible. The best we can aim for is to preserve a user experience, and even that may in many cases be out of reach. An interesting example of why this is so is described in an article on A/B testing in Wired. It explains how web sites run experiments on their users, continually presenting them with randomly selected combinations of small changes as part of a testing program:Use of a technique called multivariate testing, in which myriad A/B tests essentially run simultaneously in as many combinations as possible, means that the percentage of users getting some kind of tweak may well approach 100 percent, making “the Google search experience” a sort of Platonic ideal: never encountered directly but glimpsed only through imperfect derivations and variations.
It isn't just that one user's experience differs from another's. The user can never step into the same river twice. Even if we can capture and replay the experience of stepping into it once, the next time will be different, and the differences may be meaningful, or random perturbations. We need to re-think the whole idea of preservation.
by David. (noreply@blogger.com) at May 07, 2012 10:00 PM
v0.5.0
- Extensive rewrite of MARC::Reader (ISO 2709 binary reader) to
provide a fairly complete and consistent handing of char encoding
issues in ruby 1.9.
- This code is well covered by automated tests, but ends up complex,
there may be bugs, please report them.
- May not work properly under jruby with non-unicode source
encodings.
- Still can't handle Marc8 encoding.
- May not have entirely backwards compatible behavior with regard to
char encodings under ruby 1.9.x as previous 0.4.x versions. Test
your code. In particular, previous versions may have automatically
_transcoded_ non-unicode encodings to UTF-8 for you. This version
will not do so unless you ask it to with correct arguments.
`gem install ruby-marc -v 0.5.0 `
https://github.com/ruby-marc/ruby-marc
by jrochkind at May 07, 2012 06:50 PM
As a web based system, Koha becomes inaccessible if you lose your Internet connection. That doesn’t mean you can’t continue to handle the functions of your circulation desk. The Koha Offline Circulation Tool developed by Kyle Hall at Mill Run Technology Solutions is a handy way to circulate items when you’re offline and this tutorial will walk you through the process.
If you have an idea for a video, please just let me know and I’ll add it to my list of things to record.
Related posts:
- Google Docs Offline
- So much Koha news today
- Offline Google Docs
by Nicole at May 07, 2012 03:00 PM
Twitter front-end guy Nicolas Gallagher likes both CSS and speech bubbles enough to want them unadulterated by images and non-semantic markup. The lesson from his many examples is that it all comes down to an :after pseudo element that puts the little triangle in there:
.speechbubble:after {
content:"";
position:absolute;
bottom:-15px; /* value = - border-top-width - border-bottom-width */
left:50px; /* controls horizontal position */
border-width:15px 15px 0; /* vary these values to change the angle of the vertex */
border-style:solid;
border-color:#f3961c transparent;
/* reduce the damage in FF3.0 */
display:block;
width:0;
}
More examples on Nicolas’ site.
by Casey Bisson at May 07, 2012 02:20 PM
News from LC.
The MADS 2.0 User Guidelines http://www.loc.gov/standards/mads/userguide/index.html are now available on the Library of Congress' MADS Web site: http://www.loc.gov/mads, along with the XML schema itself, an Outline of Elements and Attributes, and a mapping and XSLT from the MARC 21 Authority Format to MADS 2.0.
by David (noreply@blogger.com) at May 07, 2012 10:37 AM
Crowdsourcing cataloging at the Bodleian Library.
What's the Score at the Bodleian? is a project which aims to enlist the wider community's help in describing a selection of digitised scores from the Bodleian Library's extensive music collections, thereby facilitating access to valuable and interesting material which has not been catalogued and is therefore difficult to find. The approach is two-fold in that it combines a process of rapid digitization of the scores and the creation of descriptive metadata through crowd-sourcing, and it is hoped that the outcomes of the project can be used to inform an efficient yet cost-effective approach to creating access to other music-related material in the Bodleian in the future. It is hoped that there will also be scope in the final delivery of images and crowd-sourced data for additional enhancements such as the hosting of audio performances relating to the music scores and provision of external links to video performances.
My feeling is for some material this makes sense. For items that may take years or decades to fully catalog this may be a good interim solution. Or for items of low importance that may never get described some metadata is better than none. I'm reminded of the 4 levels of access and description once proposed. Most stuff, little importance, indexed by search engines. More important stuff, some metadata like PDF and Word description fields. Materials of still more importance, get Qualified Dublin Core so something on that level. Most important get full treatment by a trained professional. FGDC, MARC/RDA/ISBD, MODS, whatever standard fits. Crowdsourcing could move materials at the search index level up a level or two. It would improve access without using lots of resources.
by David (noreply@blogger.com) at May 07, 2012 10:34 AM
Altmetrics is hitting its stride: 30 months after the Altmetrics manifesto, there are 6 tools listed. This is great news!
I tried out the beta of a new commercial tool, The Altmetric Explorer, from Altmetric.com. They are building on the success and ideas of the academic and non-profit community (but not formally associated with Altmetrics.org). The Altmetric Explorer gives overviews of articles and journals by the social media mentions. You can filter by publisher, journal, subject, source, etc. Altmetric Explore has a closed beta, but you can try the basic functionality on articles with their open tool, the PLoS Impact explorer.

"The default view shows the articles mentioned most frequently in all sources, from all journals. Various filters are available.

Rolling over the donut shows which sources (Twitter, blogs, ...) an article was mentioned in.

Sparklines can be used to compare journals.

A 'people' tab lets you look at individual messages. Rolling over the photo or avatar shows the poster's profile.
Altmetric.com seems largely aimed at publishers. This may add promotional noise, not unlike coercive citation, if it is used as an evaluation metric as they suggest:
Want to see which journals have improved their profile in social media or with a particular news outlet?
Their API is currently free for non-commercial use. Altmetric.com are crawling Twitter since July 2011 and focusing on papers with PubMed, arXiv, and DOI identifiers. They also get data from Facebook, Google+, and blogs, but they don’t disclose how. (I assume that blogs using ResearchBlogging code are crawled, for instance.)
by jodi at May 07, 2012 09:18 AM
May 06, 2012
I’m going to focus on some highlights, rather than rehashing the entire Library Journal Design Institute, but overall it was a timely, highly worthwhile event, a solid mix of panel sessions and interactive problem-solving sessions. Most of the attendees were from public libraries, but there were a few academics, and the ones I spoke with were in agreement that academic librarians can learn a lot from studying public library design (not just facilities, either, but services as well).
The informal theme of this institute — I think I heard LJ has done about 20 of these? — is, in Joseph Sanchez’s terms, “library as question mark.” Sanchez, from the Auraria Library at the University of Colorado, was on an opening panel where he and Matt Hamilton from Anythink Libraries talked about the impact of changes in the reading ecology on how library space is used, with a lot of conversation about users creating digital content. Traci Lesneski from MS&R talked about the library as extrovert: more transparent, more visible — a point that resonated as I thought about our library becoming more proactively welcoming.
Nevertheless, for all the talk about content creation, library gardens, gaming, and so on, implicit in all the sessions that day was the idea that when users walk into a library, they want to see people and products (versus wandering into an empty space – I saw this at a fairly new university library where my first thought was that the first-floor lobby was a missed opportunity).
Those products will probably include books, but can also include DVDs and other media. In some cases, the users themselves may be the attraction, on display as they create, browse, and read (not unlike watching the pizza maker twirl his dough). And build in a visible location for a helpful human presence — call it a librarian or library worker, but I hear the word “concierge” a lot these days (waving at West Hollywood!), and think that’s a good fit for that role.
There were tours the previous day which my travel schedule didn’t let me attend, but I did get the tour of Denver Public Library, which for me had several ah-hah moments. As one librarian, a facility manager, observed, I got the tour I needed. It’s a midcentury building about the same age as my library, and it had a renovation and expansion in 1990 led by Michael Graves. So their challenge was to preserve an iconic look and feel while bringing the library into the technology era. I don’t have those challenges per se, but renovating a pre-technology building with “good bones” is certainly relevant.

Two Benches, Paired
Plus I saw Michael Graves benches scattered about DPL, and thought, Perfect. Benches. Which leads (rather loosely, like a dog galloping ahead of its owner) to a point made at the Institute: the project lead for a library design need to be outgoing and friendly, but also firm. That describes me to a tee on my best days. (I will refrain from commenting on what I’m like on my worst days.)
The leader must also have strong and well-communicated ideas and opinions–like, those benches are a great fit — but be flexible. One strong idea I had early on (courtesy of Linda Demmers, a bit more on her below) is that our library would absolutely need a thorough facility inspection before any other design activity moved forward (with the exception of the computer classroom), and that’s wrapping up as I write this. (By the way, who knew there were so many ways to use asbestos?).
I was right, and sticking to my guns was the right thing to do. It doesn’t mean I’m always right, or that, when I know I’m right, I always stick to my guns, but when I’ve got the Greek Chorus of Experienced Library Administrators chanting “You’ll be sorry if you don’t do that,” I do try to do the right thing.
I realized as well that I am beginning to synthesize and organize my ad-hoc education in library facilities. During one interactive session, people discussed how to guide users through a library. “Power paths,” I peeped up. I had seen an example of these in a public library when Linda Demmers came to our library for a day-long consulting visit and did a slideshow. Think of how Ikea pulls you through a store.
There are many ways to design a path, and they don’t have to be underfoot, either — think of that part of O’Hare where you are guided through a connector by undulating lights. The group liked the power-path idea, as did the architects (for whom it wasn’t a new concept, as became apparent when they pulled out their drawing, which sure enough featured a power path).
I’m also getting to the point that I feel I at least know the major vendors in the field and can pick out an Agati easy chair at 20 paces, plus figure out whether that media desk is from Steelcase or KI. But even more significantly, I’m feeling the landscape of this knowledge area and beginning to understand where my rather significant gaps are — for example, sustainability.
Most of all I realized I’m catching the facility bug. There was a time when a renovation or new-building project only interested me in the most abstract, utilitarian manner possible. I have even felt relief that my career had not overlapped with anything more involved than upgrading computers and so forth. Now I’m genuinely excited to be on this journey; it is a big part of what puts spring in my step as I walk into the library every day.
Anyway, the following is just a pastiche of ideas gleaned from the day (sans synthesis, but with a bonus digression or two). LJ also encouraged everyone to see the LandMark Libraries discussed in their May 15 issue, and to watch for the July 1 issue for academic libraries.
- Content creation is more than just about digital experience — it can include visual, applied, and performance arts; crafts; library gardens; etc.
- Using local materials roots the building in the community
- IT costs are hard to quantify because you always want more
- Under-carpet wiring has improved a lot (aside: I remember dealing with an under-carpet wiring issue over a decade ago that was originally presented as an electricity shortage; staff were actually taking turns using computers because there “wasn’t enough electricity.” My dad was an electrician, among other things, and that popped my B.S. flag. Sure enough, there was plenty of electricity, once it had wiring to flow through).
- Openness and flexibility can interfere with comfort. Broad open spaces don’t make us feel comfortable, and don’t make us want to linger. Look at creating rooms within rooms. (We have these spaces under stairwells I’ve wanted to equip with easy chairs, small rugs, and hanging lamps. Maybe this will help me find the time to do this.)
- Lighting: libraries tend to use the same lighting everywhere in public areas. Focus more on task lighting — it’s more flexible. The brightest light is not the light you always need or want.
- A good design must be founded on sustainable principles (this is one of the Landmark Library guidelines).
- Develop a strong statement that establishes guiding principles for the project. People may come and go from all parts of the project; the project needs coherent, continuous coherent direction.
- Yes, you need good signage: big, simple, and clear.
- Question assumptions.
- Get big results from small decisions.
- Bring in the light (walking through the Denver airport where there was a display on city architecture, I learned that’s called “daylighting”).
- Think privacy, but think collaboration.
- Rediscover quiet (amen on that one: one of my summer projects is to visibly zone the library into noise levels).
- Respect history.
- In comparing construction costs, be sure you’re comparing apples to apples.
- Keep your facilities people involved.
- There is no greener building than the building that already exists.
- Don’t build a 15-year building; don’t make them so cheap that they can’t last a long time.
- There is no value in value engineering. “Value engineering” only prolongs the problem ; it will end up costing you more. N.b. those comments intrigued me even as they resonated. Worth hunting down relevant articles.
- Hiring an architect with a lot of library experience is a cost-cutting exercise; it’s incredibly important to the success of this project.
- Design spaces that can be used for a variety of purposes and at different times.
- Look at your existing assets and find ways to leverage them better.
- Drywall on masonry is putting 15-year material on top of 100-year material.
- Sustainability is ultimately about using less.
- A great location makes it much easier to get people in the library.
- Be buyer-aware about the people who are going to use your money.
- Integrated project delivery is the newest PM approach; the whole team is put together at the beginning. Instead of silos, the team is more like a studio.
- Watch out for project soft costs: construction, land, shelving, furniture, technology, infrastructure for wiring for phones, computers, etc; adding more books, collection development, research to start the project, site surveys, geotech consultants, lawyers for reviewing contracts,moving costs (especially if doing the project in phases), developing a phasing plan.
- Costing sustainability: don’t guess, bring an energy analyst on board to work with the design team to evaluate cost decisions.
- LEED isn’t the be-all end-all of sustainability, and can be an expensive, difficult direction. You can design a sustainable library you’re proud of even if you don’t get a plaque. (That said, I bet a workshop or class about LEED would be a great introduction to sustainability in construction.)
- Nobody said this, I just thought it: “iconic” seems to be a synonym for “expensive and difficult to renovate.”
So yes, it was worth it. As with most activities, networking with others was a major part of the experience. There’s a lot of wisdom in LibraryLand, and some of it is translated into building and updating enduring and beloved landmarks.
Finally, this is an area where I want to grow. As noted on Facebook and Twitter, at the closing reception I asked a library director, how can I learn more about library building projects? There’s so much to know! Him: um… by reading books? OH RIGHT, BOOKS. I’ve found two to start with (and I visited Donald Barclay’s library and he knows whereof he speaks); recommendations welcome.
by K.G. Schneider at May 06, 2012 11:55 PM
Inspired by the Summon result click stats that Matthew Reidsma has extracted (and, to be honest, I find myself being regularly inspired by what Matthew's doing!), I've started tracking the clicks on our Summon instance too.
Anyone who's had the misfortune to hear me present recently will know I've been waffling on about the importance of making e-resources easy to use and painless to access, and the fact that most of us are biologically programmed to follow the easiest route to information…
…an information [seeker] will tend to use the most convenient search method, inthe least exacting mode available.Information seeking behaviour stops assoon as minimally acceptable results are found.
Wikipedia, Principle of least effort
Why will our students not get up and walk ahundred meters to access a key journal article in the library? … the overwhelmingpropensity of most people is to invest as absolutely little effort into information seeking as they possibly can.
Prof Marcia J. Bates, "Toward an Integrated Model of Information Seeking & Searching" (2002)
…numerous studies have shown users areoften willing to sacrifice informationquality for accessibility. This fast food approach to information consumption drives librarians crazy. "Our information is healthier and tastes better too" they shout. But nobody listens. We're too busy Googling.
Peter Morville, "Ambient Findability" (O'Reilly 2005)
As early as 2004, in a focus group for one of my research studies, a collegefreshman bemoaned, "Why is Googleso easy and the library so hard?"
Carol Tenopir, "Visualize the Perfect Search" (Library Journal 2009)
The present findings indicated that the principle of least effort prevailed in the respondents' selection and use of information sources.
Liu & Yang, "Factors Influencing Distance-Education Graduate Students' Use of Information Sources: A User Study" (2004)
People do not just use information that is easy to find; they even use information that they know to be of poor quality and less reliable — so long as it requires little effort to find — rather than using information they know to be of high quality and reliable, though harder to find.
Jason Vaughan, "Web Scale Discovery Services" (ALA TechSource 2011)
If you're looking at Discovery Services, demand a trial and don't get distracted by how many options the advanced search page has, how well it handles complex Boolean queries, or how many obscure specialist subject headings it supports — to misquote Obi-Wan Kenobi, "these are not the features you are looking for". The real questions you should be asking are:
- Can students use the skills they've already picked up from a lifetime of searching Google to use this thing?
- If I pluck 2 or 3 vaguely relevant keywords out of the air and type them in (possibly misspelling them), do I get useful and relevant results?
- If I choose some slightly more carefully considered keywords, are the first 5 results on the first page all relevant?
- Does the interface look uncluttered, straightforward to use and, if I wanted to, is it obvious how to refine the search?
- Does this product work with EZProxy (or similar) to provide easy off-campus access to articles?
…in fact, and please don't take this wrong way, you're possibly not the best person to be answering some of those questions as your neural pathways have been severely damaged by years of using poorly designed journal database interfaces and you have an unhealthy (bordering on the sexually perverse) obsession with "advanced" search pages
Instead, grab some of your newest students (ideally ones who look blankly at you when you ask them if they know what a Boolean operators is) and let them play with it — the more Information Illiterate they are, the better! Treat their comments as pearls of wisdom ("out of the mouth of babes…") and try to see the library's e-resource world through their eyes for what it really is: a scary alien landscape of weird library terminology, perplexing login screens, and unnecessary friction at every turn. Above all, never forget that "Libraries are a cruel mistress"!
Matt Borg nicely summed up the above when he cheekily said (and apologies for paraphrasing you, Matt!)…
The trouble with Summon is that students don't need to be taught how to use it, but librarians do
In other words, you shouldn't have to be an Information Professional to use a Discovery Service and you don't have to become a mini-librarian just in order to figure out how the damn thing works. If the interface looks comfortable and familiar to you, it's probably been designed for librarians to use and will the scare the bejebus out of most of your students. Swallow hard, gird your loins and remember that you're not buying this product to make your life easier (although chances are it will), you're buying it to make life easier for your users.
Or, to put it another way, if a Discovery Service looks like a journal database and acts like a journal database, then it probably is a journal database and not a Discovery Service. There's a very good reason Summon looks more like Google and less like like <insert name of your favourite database here>
(If your idea of a "good time" is to scare undergraduates in training sessions by showing them journal database interfaces — "it's OK, I'm a friendly librarian and I'm here to show you just how hard it can be to find an article!" — then it's probably high time you sought medical counselling
)
OK, so why am I ranting on about all this stuff? It's simply because I've been pulling out some usage stats from our Summon instance…
- The library's print collection accounts for just 0.3% of the items, but accounts for 10.3% of the result clicks — I think our users are trying to tell us that they think our OPAC sucks and they'd rather use Summon to search for books
- 89% of the results clicked on appeared on the first page of results — as with Google, users rarely delve any further the page 1 of the results
- Only 2% of result clicks came from beyond the 4th page of the results — very few users will explore the long tail of results
- 50.5% of result clicks were for the first 4 results on page 1 — the majority of users won't even bother to scroll down the page!
- 72.3% of searches used 3 keywords or less — students are using their Google skills
- Since launching Summon, we've seen increases of 300% to 1000% in the COUNTER full-text download stats for many of the journal platforms we subscribe to — although "cost per use" can be a crude measure, we're getting much better value out of our e-resource subscriptions now
All of the above tells me that Summon is doing all the things we originally bought it for and that the relevancy ranking is schmokin'!
"Yes", there's still a place for Information Literacy in all of this, and, "yes", we need to be able to support researchers and Boolean Buffs, but the majority of students just want to whack in a few keywords and quickly find something that's relevant — if you select a product that allows them to do just that, they will come
by Dave Pattern at May 06, 2012 09:55 AM
A tabletop storytelling interface called a Narration Negotiation and Reconciliation Table allows disagreements to be visually represented:
Points of Disagreement… can be dragged onto any part of a story to explicitly denote disagreement without preventing the story from continuing.
From A Reflection on Using Technology for Reconciliation through Co-Narration (PDF) by Oliviero Stock, Massimo Zancanaro of FBK-irst, Italy and Chaya Koren, Zvi Eisikovitz, Patrice L. (Tamar) Weiss of University of Haifa, Israel. In the CHI2012 HCI for Peace workshop.
The mutltitouch table interface was tested for peace reconciliation work with Israeli-Jewish and Palestinian-Arab teen boys.
I’d love a screenshot. Quick searching turned up a project description and an (unrelated) discussion of the role of narrative in reconciliation. I excerpt:
The textbooks juxtaposed both historical narratives on the same page: on the right side of the page, the Israeli narrative began with the birth of Zionism in the 19th century; on the left, the Palestinian narrative commenced with Napolean’s plans to establish a Jewish state in Palestine. Historical events faced off like soldiers in trenches; and while students were scrutinizing their positions, they were simultaneously recongnizing their own involvement in the conflict. This, of course, was an intended pedagogical tool carefully thought out by the authors of the book.
From Political Reconciliation and Narrative Negotiation (PDF): by Nadim Khoury of the Department of Politics at the University of Virginia.
This points out the obvious: reconciliation first requires understanding and externally representing the disagreements. Rooting out the disagreement in mundane situations discussed online, and providing representations for them, are a big part of my current work.
by jodi at May 06, 2012 08:11 AM
Touché is a new sensing technology that proposes a novel Swept Frequency Capacitive Sensing technique that can not only detect a touch event, but simultaneously recognize complex configurations of the human hands and body during touch interaction. This allows to significantly enhances touch interaction in a broad range of applications, from enhancing conventional touchscreens to designing interaction scenarios for unique use contexts and materials. For example, in our explorations we added complex touch and gesture sensitivity not only to computing devices and everyday objects, but also to the human body and liquids. Importantly, instrumenting objects and material with touch sensitivity is easy and straightforward: a single wire is sufficient to make objects and environments touch and gesture sensitive.
(via @genebecker)
by Fiacre at May 06, 2012 03:17 AM
May 05, 2012
Having gone 34 years without being a coffee drinker, I personally never got why people wanted coffee shops in libraries. But over the last year, my wife and Greenhill Farms, a Kona Coffee Grower in Hawaii, convinced me that not all coffee is bad. I’m so convinced, that having a morning cup of coffee (black, no sugar – yuck) has become a bit of a habit.
Well, this morning, I was hanging out in Vancouver, BC killing time before heading to the airport. Since I didn’t have anything to do, I grabbed my copy of Norman Mailers “The Castle in the Forest” and headed a couple blocks down the road to Tim Horton’s. There, I grabbed a medium cup of black coffee, and found myself a quite table to just sit and read. And I have to admit, kicking back, nursing my cup of coffee and enjoying a good book was really appealing. Without realizing it, I’d spent about an hour and half in my little corner of the coffee shop. I think I now definitely understand the draw.
Maybe now that I’ve had this break through, I’ll be able to unwrap other mysteries – like why people enjoy watching talk shows and reality TV, who actually liked the show FireFly and why (because as a scify fan, I don’t get it) the infatuation with Dr. Pepper, and why my cats always look like they are plotting to kill me in my sleep.
–tr
by Administrator at May 05, 2012 08:09 PM
OK, I'll admit it, I've fallen in love with jQuery over the last 18 months
I've ended up using quite a bit of jQuery in our new reading list software ("MyReading"), to add various bells and whistles, including dropping an "add to MyReading" option into the Summon interface.
Like they say, "when you've got a hammer, everything looks like a nail", once you know a bit of jQuery, every web page looks hackable, so I've pondering what else might be fun and/or sensible to do. To be honest, I really like the Summon interface, so making any major changes to it feels a bit like drawing a moustache on the Mona Lisa (or Mr Graham Stone, for that matter).
So, rather than hack the interface around too much, you could use jQuery to start collecting usage data from Summon ("hmmmm… [drool] usage data!")…
…or maybe add a helpful hint if a search brings back a silly number of results?

To do the above, you'll need to host a JavaScript file on your own web server and then include a link to that file in the Summon Administration options, e.g.

Because Summon already uses jQuery, it means you can put jQuery code into your JavaScript file without having to worry about loading the jQuery library yourself. To do the above helpful hint, you could use the following 7 lines of code:
$(document).ready(function() {
var count = $('#summary .highlight:last').html( );
count = count.replace(/[^0-9]/g,'');
if( count > 50000 ) {
$('#summary').append('<div style="margin-top:5px;"><span id="refineSearchHelp" style="display:none; font-style:italic;">Too many results? Use the options below to refine your search...</span> </div>');
$('#refineSearchHelp').delay(1000).fadeIn(1000);
} });
Let's walk through each of those lines…
line 1
Typically, you don't want your jQuery JavaScript to run until the web page has finished loading, so you'll often see this line of code — it ensures what follows won't be executed until after the web page has loaded. If you've coded JavaScript before, you'll probably be familiar with using the onload event in the body HTML tag to do that.
line 2
jQuery lets you easily grab bits of the web page, typically by referencing id attributes (which should be unique) and/or class attributes (which can be repeated). In the same way that CSS uses "#" and "." to style ids and classes, jQuery uses them to select elements of the page.
If you hunt through the source of a Summon results page, you'll find something like the following bit of HTML…
<h1 id="summary">
<span class="label">Search Results:</span>
Your search for
<span class='highlight'>germany</span>
returned
<span class='highlight'>3,892,793</span>
results
</h1>
…so, the number of results (3,892,793) appears in a span with a class value of highlight, which itself is inside a h1 with an id of summary. Unfortunately, there's another span that also has the same class value before it, so we need to use :last in the jQuery to make sure we fetch the HTML contents of the second (i.e. last) span.
line 3
OK, at this point, we should have a JavaScript variable named count that contains the string 3,892,793, so this line strips out the commas (in fact, it strips out anything that isn't a digit), which should leave count containing 3892793.
line 4
How many results is too many results? Let's say we'll display the message for anything more than 50,000 results…
line 5
Time for some more jQuery!
jQuery lets you add new bits of HTML to a page, so let's create a new div — that will appear underneath the results summary message — by appending it to that existing h1. Just to show off, we're going to have the helpful hint gradually fade in, so we'll pop the text within its own span that has an id value of refineSearchHelp and we'll style it so it's initially hidden (display:none).
In case you're wondering, I added that space character just so that the div contains something to start off with, which should ensure the page doesn't suddenly jump as the hint fades in.
line 6
So, now that we've got our helpful hint in a hidden span, let's wait a second (delay(1000) …OK, we'll actually wait 1,000 milliseconds!) before letting the message gradually fade in (fadeIn(1000)).
line 7
We've got to balance the books, so for every brace and bracket we've opened, we need to close them, otherwise the web browser might get upset.
Disclaimer!
Dropping jQuery into Summon isn't officially supported by Serials Solutions, so be sure to take full responsibility for anything to do and thoroughly test it to make sure you've not broken Summon for your users, otherwise they'll be grumpy.
The other thing to be aware of is that Summon is in a state of coninual development, so you'll need to test any tweaks you've made after each update (to make sure that they still work) and that they don't conflict with any changes Serials Solutions have made to the Summon HTML.
Appendum
By subverting the "Custom Link" option to insert the JavaScript file, you lose the opportunity to add in a normal custom link (this appears to the left of the "Help | About | Feedback" options at the top right of the Summon interface)… or do you?
Well, there's absolutely no reason why you can't use jQuery to do that and, in fact, rather than just having one custom link, you could add 2 or 3…
$('#topbar .link').prepend('<a href="http://library.hud.ac.uk/wiki/">A to Z List of Electronic Resources</a>');
The default links appear in a div with a class of link, which has a parent div with an id of topbar. To add in our new extra link before those existing links, we have to prepend it.
by Dave Pattern at May 05, 2012 06:17 PM
I’m humbled to be reelected to ALA Council for what will be my fourth term, and congrats to Barb Stripling (ALA President), Trevor Dawes (ACRL President), Cindi Trainor (LITA President), and everyone else elected yesterday. As a sign of the times, I first learned this yesterday from a dear colleague’s post to my Facebook wall. So this is an unusually short post from me, because I’d like to shut up and listen: how can I be part of the change you want to see in ALA?
by K.G. Schneider at May 05, 2012 01:41 PM
May 04, 2012
Please join us in congratulating the newly elected LITA Board members:
- Cindi Trainor, Vice-President/President-Elect
- Cody Hanson, Rachel Vacek, Directors-at-Large
Additionally, the proposed changes to LITA Bylaws Section 1, Article IX, were adopted. This change requires the Nominating Committee to present the names of candidates to the Board of Directors at the Midwinter Meeting , instead of the Annual Conference, preceding an election.
LITA members elected as ALA Councilors-at-Large include: Aaron Dobbs, Mario M. Gonzalez, Joan S. Weeks, Karen G. Schneider, Eric D. Suess, and Courtney Louise Young
by mprentice at May 04, 2012 09:25 PM
Just throwing this up here because I didn’t find it elsewhere.
I want to run ruby scripts from the command line or in a cronjob, and I do not want to always have to type “ruby scriptname”.
But, I use rvm. I want to run a particular ruby, maybe identified by an alias, maybe with a specific gemset.
It turns out you can use the env program with rvm do to accomplish this.
#!/usr/bin/env rvm 1.9 do ruby
require 'mygem'
o = MyGem.new
# blah blah blah
In this example, 1.9 is the name of the ruby (actually, an rvm alias) I want to use, and it could just as easily specify a gemset as well (e.g., 1.9@mygems).
If you’re running in cron, don’t forget you need to load the environment variables first. Here I use the bash . command to source my .bashrc.
54 9-16 * * 1-5 . /Users/dueberb/.bashrc; /Users/dueberb/bin/exercise
Nothing fancy, but worth knowing.
by Bill at May 04, 2012 02:58 PM
VoID The VIAF dataset is now available for public consumption! http://viaf.org/viaf/data describes and links to the files involved and describes how we expect the ODC-By license to be applied. We are not sure just how popular the files will be, so if the site appears slow, please stop downloading and come back later. From my machine here at OCLC my browser is estimating 20-30 minutes to download the larger files, from my home it was double that.
For more about this, see a previous post: http://outgoing.typepad.com/outgoing/2012/04/viaf-developments.html.
One question that has come up was whether it would be possible to incorporate VIAF identifiers and information into a dataset that is released under a CC-0 license. The short answer is yes. Here's a longer answer:
- We would like to see acknowledgement of VIAF as a source somewhere on your site
- We encourage the use of VIAF URI's where appropriate and they can be considered acknowledgement
- Incorporation of those VIAF URI's and associated information from VIAF should not prevent you from releasing your dataset under CC-0, since the URI's are considered sufficient acknowledgment
--Th
by Thom at May 04, 2012 01:05 PM
Thom Hickey has details about the Virtual International Authority File being publicly available on his Outgoing weblog..
The VIAF dataset is now available for public consumption! http://viaf.org/viaf/data describes and links to the files involved and describes how we expect the ODC-By license to be applied. We are not sure just how popular the files will be, so if the site appears slow, please stop downloading and come back later. From my machine here at OCLC my browser is estimating 20-30 minutes to download the larger files, from my home it was double that.
by David (noreply@blogger.com) at May 04, 2012 10:48 AM