The following guest post is a guest post by John Cummings, Wikipedian and founder of the Monmouthpedia project.
Monmouthpedia is the first Wikipedia project to cover a whole town. The project aims to cover every single notable place, people, artefacts, flora, fauna and other things in Monmouth in as many languages as possible. We will use QRpedia codes, a type of bar code a smartphone can read through its camera that takes you to a Wikipedia article in your language. QR codes are extremely useful, as physical signs have no way of displaying the same amount of information and in a potentially huge number of languages. We aim to have 1,000 QRpedia codes in Monmouth by April including in the museums. We are going to have a free wifi network throughout the town and tablets in the museums to lower the cost of access to the information.
So far contributors have created 54 new articles and improved 70 articles, we’ve had 6 articles on the Wikipedia English language main page in “Did you Know?”. Contributors are choosing to learn how to edit Wikipedia and to give their time for the combined knowledge of others, I think this demonstrates how much people value free information and it’s benefits. It’s been amazing to teach people simple tools to give a wider reach to the information they have.
I started Monmouthpedia because I wanted everyone to have free and easily available information about the place in which they live. I grew up in Monmouth, I knew enough about the area to make a start by myself and make a plan that other people could see what I was doing and join in and add to and change. Local groups and the councils (Monmouthshire County Council have recently adopted the Open Government License) have been wonderfully supportive and there is a well connected network of people who are willing to help. Wikimedia UK have been very helpful and have put a lot of time and effort into supporting me. I feel as though for the large part I have been pushing against open doors, I’ve had a steady stream of new people to teach Wikipedia editing to since I started.
The project is still very much a work in progress, we are starting to work with schools and other groups, there is such a wide range of opportunities for so many groups of people to be involved, it feels like we’re trying something new every day.
For more info on the project visit monmouthpedia.org, you can Tweet at it on @Monmouthpedia and to get in touch with John via email it’s john.cummings [at] monmouthpedia.org.
The following guest post is by Denise Recheis from reegle, the clean energy info portal.
Offering multiple explanations for a concept increases understanding and using LOD allows both humans and machines to semantically connect related content. This is a huge advantage in our increasingly complex world!
Especially in the field of clean energy, the increasing availability of LOD is really beneficial. To make sense of the often complex factors contributing to climate change and the highly technical solutions thereof, as well as rapid development in national and international policy regarding these factors, access to high quality and timely information is crucial.
The clean energy info portal www.reegle.info and the energy info wiki www.openEI.org see themselves as gateways to a wealth of information regarding renewable energy, energy efficiency and climate change issues. They are hosted by REEEP (Renewable Energy and Energy Efficiency Partnership – where I work) and NREL (National Renewable Energy Laboratory) respectively. Both organizations have a strong commitment to the idea of Linked Open Data (LOD) and have been integrating the core principles of LOD into their online portals.
In an effort to increase awareness about the possibilities associated with publishing and consuming LOD, we organized a well-attended workshop in Abu Dhabi in January 2012. Alongside the event, we brought out a publication explaining the basics of LOD, as well as the first steps for any organization considering joining the LOD cloud. “Linked Open Data: The Essentials” (published by Semantic Web Company and REEEP) is available as a downloadable PDF, as well as a booklet which can be ordered.
“Linked Open Data: The Essentials” also highlights some best practice examples, two of them being reegle and OpenEI.
Reegle’s country energy profiles are a prime example of mashed up open data. These dossiers present the reader with statistics, maps, general facts and policy and regulatory details in a pleasant design. The information is provided by LOD providers such as DBpedia (Wikipedia), the UN and the World Bank, OpenEI and other highly trusted sources. Reegle has also developed an extensive thesaurus covering clean energy and climate compatible development with full liked data capabilities, which is available for free to re-use as a widget or word press plugin, and which is currently used as the basis for a brand-new API. Of course reegle provides all its datasets as Linked Open Data free for re-use and provides datasets in RDF (Resource Description Framework) format and via a SPARQL endpoint on our data portal.
OpenEI (Open Energy Information) has always seen sharing as one of its key missions. The data is available in RESTful API, RDF and SPARQL, for integration into external websites. But even when browsing the site, users benefit from a variety of LOD sources which enhance and increase the information presented. For example, several definitions offered in the glossary are collected from different LOD sources and OpenEI’s country pages feature information from a variety of sources, including reegle’s country energy profiles. This is easily possible when organizations rely on LOD, because when several websites describe the same things they can all be connected and give users a more rounded picture of sometimes difficult subjects.
Our expected end-users include the educational sector, helping students across the world study laws and regulation, efficient engineering, and the latest ideas in clean energy from many different authoritative sources in a single gateway. Specialists and project developers can quickly gather valuable information about specific regions and areas focusing on energy-relevant issues.
Integrating the principles of LOD has had a pleasant side-effect which has been highlighted in the recent workshop in Abu Dhabi: sharing data is often a starting point for fruitful collaborations between organizations with a similar agenda. Sharing data very often also means sharing the work burden. Each organization can then focus on their specific areas of expertise, while freeing up resources from areas that can be taken over by other organizations. Sharing the results of such targeted efforts generates high-quality content, and makes it available to all stakeholders in renewable energy, energy efficiency and climate adaptation/mitigation.
We are committed to increasing the share of information available as LOD, and will continue to actively support other organizations thinking of joining the LOD cloud.
I had never heard the term “flipped” teaching, so I wanted to make a note of it, via Mel Chua, who says:
my classmate Nikitha’s project for pedagogy class: redesign Purdue’s MATLAB-heavy intro-to-engineering first-year class to use a “flipped” model – view lectures at home, work on homework in class where there’s help available. (Mind you, this doesn’t mean they’ll implement it; she’s a TA, not the prof. Still, it’s cool.)
Flipped teaching is one of the ideas that could help sustain and justify small-group teaching as highly scaleable online learning becomes feasible and productive; MSNBC notes that, “Apparently even the Stanford students preferred watching the classroom lectures as online videos on their own time.” They report that 85-90% of Thrun’s in-person AI class at Stanford AI had stopped attending by the end of the class. Imagine Thrun’s shock: “These are students who pay $30,000 a year to Stanford to see the best and brightest of our professors, and they prefer to see us on video?”
In-class homework is not a new idea: the success of Berkeley’s Math/Science Workshop and UT’s Emerging Scholars Program (which I TA’d back in the days when I taught calculus) was based on providing challenging problems and group study support in class.
Nor is avoiding lecture: It would be interesting to compare the underlying philosophy of flip teaching to St. John’s, where, rather than listening lectures, students discuss the source materials in small groups. It was a wonderful way to learn!
The following guest post is by Lori Byrd Phillips 2012 US Cultural Partnerships Coordinator for the Wikimedia Foundation. She was the second person to become a Wikipedian in Residence, and has served in that role at The Children’s Museum of Indianapolis for the past year and a half, where she is now also part time staff. It is cross-posted from openglam.org.
Wikipedians in Residence from left to right: Liam Wyatt, British Museum; Lori Phillips, The Children's Museum of Indianapolis; Benoît Evellin, Wikimédien en résidence au Château de Versailles; Sarah Stierch, The Smithsonian Institution. Photo by Andrew Lih (cc-by-sa 3.0).
It was just under two years ago when Liam Wyatt proposed a concept that seemed so bold, it required the British Museum to run a risk assessment before they’d agree to it. Liam suggested that he serve as the “Wikipedian in Residence,” a role that would allow him to put into practice the idea that cultural institutions should share their knowledge with Wikipedia. Thankfully, the British Museum agreed. That basic premise has turned into a global movement known as GLAM-WIKI (Galleries,
Libraries, Archives, and Museums). Today, the GLAM-WIKI community is made up of Wikimedians from around the world who work to establish models and best practices that help cultural institutions share their resources with Wikimedia.
Prior to Liam’s residency in June 2010, cultural institutions had donated images to Wikimedia Commons, but there had not yet been an institution that committed to establishing a relationship with the Wikimedia community. The concept of building a mutually beneficial cooperation is at the heart of the Wikipedian in Residence scheme. The main role of a resident is to serve as a liaison between the museum and Wikipedia. Projects still include image donations, but now more often focus on staff workshops, outreach events (such as “Backstage Passes”) to connect with local Wikipedians, and on-site events (such as “Edit-a-Thons”) that help get cultural content out of the filing cabinets and into Wikipedia.
I’ve enjoyed watching the evolution of the Wikipedian in Residence concept as it has been implemented in different institutions. Each residency has shown its own strength. At the Derby Museum, Roger Bamkin followed through on an idea to improve the multilingual capabilities of QR codes in exhibits. What resulted was QRpedia, a QR code-generating website that detects the language of the user’s phone and links to the Wikipedia article in that language. QRpedia has now been implemented in museums in the US and Europe and has been nominated for a Smart UK award.
Dominic McDevitt-Parks, the Wikipedian in Residence at the NARA, has broken new ground in facilitating the digitization and transcription of primary source materials through Wikisource and Wikimedia Commons. NARA’s cooperation with Wikipedia has been strongly incorporated into their broad strategy of increasing digital accessibility to their holdings and has proven to be a point of pride for the Archivist of the United States, David Ferriero.
The concept of the Wikipedian in Residence has come a long way since the British Museum’s big gamble. Now, those who have served as Wikipedians in Residence travel the world presenting projects to increasingly enthusiastic cultural professionals. In April, four residents will come together from three countries to present at the American Association of Museums, the largest and most significant museum conference in the US. I can’t wait to see what incredible residencies and cooperations are around the next corner.
For additional information about Wikipedians in Residence, see the information page on GLAM Outreach or the GLAM Infographic.
The source code listed below has been recently approved. The code will be added to the applicable Source Codes for Vocabularies, Rules, and Schemes list. See the specific source code list for current usage in MARC fields and MODS/MADS elements.
The code should not be used in exchange records until 60 days after the date of this notice to provide implementers time to include the newly-defined code in any validation tables. Subject Heading and Term Source Codes
The following source code has been added to the Subject Heading and Term Source Codes list for usage in appropriate fields and elements.
Addition:
collett
Collett-bibliografi: litteratur av og om Camilla Collett (Oslo: Nasjonalbiblioteket)
The Joint Steering Committee for Development of RDA (JSC), the DCMI Bibliographic Metadata Task Group (formerly DCMI/RDA Task Group), and ALA Publishing (on behalf of the co-publishers of RDA) are pleased to announce the publication of a second set of vocabulary terms as linked open data. The RDA Carrier Type, Content Type and Media Type vocabularies have been reviewed, approved, and their status in the Open Metadata Registry (OMR) changed to ‘published.’ The finished vocabularies can be viewed following the links from the terms above. (The links lead to the description of the vocabulary itself, the specific terms can be viewed under the tab for ‘concepts’).
Terms in the Content Type vocabulary refer to the intellectual or artistic content of a resource, such as text or notated music; terms in the Carrier Type vocabulary refer to the means and methods by which content is conveyed including volume, sheet, computer disk; terms in the Media Type vocabulary specify the general type of intermediation device (if any) required to view, play or run the content of a resource. These vocabularies are derived from the RDA/ONIX framework for resource categorization which established an extensible methodology for categorization of resources according to content and carrier.
These tools aim to provide a "write once, run everywhere" capability for developers of apps for the Web. Content is developed once in Javascript and HTML5. It can run in a browser, but also as an app in environments such as iOS and Android. When a user invokes an app, such as Yahoo's Livestand, what they are actually doing is invoking what Yahoo calls a "chromeless" browser, an app that is generic in the way a browser is, in that it is the same irrespective of the content. Unlike a browser, the chromeless browser provides no user interface, just a Javascript VM and a rendering engine. This downloads and runs the content, just as a browser would, but allows the content developer total control over the user experience. Yahoo's tools also address one major problem with this approach, the amount of code that needs to be downloaded and run at the client before the user experience is functional. They run the code at the server first to provide an initial, simplified user interface that runs while the full version is being downloaded and executed.
As I've been saying for some time, techniques like this are making our current approaches to collecting and preserving Web content less and less effective as time goes by. It is time to invest in some R&D.
Gimme is a interesting discovery tool from Scottsdale Public Library. It has a fantastic visual design and it easy to use. Yes!
I can understand why they turned this into a web app but I’d also like to see something like this integrated into a main library site. Speaking of taking library users to perhaps disparate places, clicking “more…” on the staff reviews whisks users to the library’s reviews on Goodreads. I’d rather see an accordion function expand the rest of the review, keeping people on Gimme.
If you click through, be sure to resize your browser (or visit it on a mobile device and a desktop). This is the first responsive library related website I’ve come across. Really nice.
Perhaps they intended this to be used mainly on mobile devices. Clicking the “Reserve” button takes users to the mobile version of their catalog. Or maybe they just think the mobile version is better than their normal catalog and want to send users there.
With a little iteration this could go from great to really really great.
My hotel’s WiFi is $9.95 per day, added to your room bill. You purchase it by logging into the network with your room number as the username and the hotel’s name as the password. All of this is explained on the authentication page you encounter as soon as you start trying to use the WiFi.
At first, I thought they might as well have a tipjar at the front desk saying, “If you liked our complementary WiFi, why not why not express your gratitude by leaving us $10?” But then I realized that a tipjar wouldn’t let you add charges to other guests’ bills at will. So really, it’s more like a tipjar for use by pickpockets.
I am teaching the following workshops for ByWater Solutions and you’re all welcome to join in – for FREE!
Open Source in Libraries: Freedom and Community
Librarians have adopted a culture of helplessness and workarounds when it comes to our software. Open source software is a way to get freed from these chains. But open source is about more than just software, it’s about community and a philosophy of freedom. This session will give librarians the facts about open source software by introducing them to what open source is and what it means for libraries.
Join us online for one of our first two lessons (space is limited so register early).
A preview version of "unglue.it", the crowd-funding site for creative commons ebooks that I've been working on for more than a year, opened last week. Some key features are missing (pledging, campaigns) but the site lets you make a list of books you would support for "ungluing".
You can't really plan for a launch. Things always happen that you don't expect. It helps to have had a good night's rest, but other than that...
Our first unexpected event was that Library Journal ran a piece about our "soft launch" on their Digital Shift website, while we were in the process of deploying the website to production. They didn't link to us, but a few impatient readers typed in the website name and started exercising the site before we were finished testing the deployment. Nothing awful happened. Thanks, dave and gsf! Then Google spidered the site, exposing one or two errors. Thanks, googlebot!
We wanted our mailing list subscribers to be the first to see our work, and we finally sent out the email on Thursday. List readers discovered that our "popular" and "unglued" views were running very very slowly, loading down the site. Raymond studied the problem, and, as seems to happen so often with Django, found the answer hidden in plain sight (the documentation). After moving some nested queries, the pages returned 100x faster. The miracles of EC2 allowed us to spin up a bigger server to help with load. And the high load from the glacial queries helped expose some concurrency problems that we never would have found in a million years of normal operation. Or so says the errant coder- me.
We wanted to open the website when we did so that we could show our work to our many friends at the American Library Association Midwinter meeting in Dallas. So on Friday, I got up at 5AM (after bugfixing till 1AM) to catch a flight. I decided not to have any coffee so I could sleep on the plane. When I arrived at my Dallas hotel, I discovered another unexpected occurrence: I had left my laptop on the plane.
Before reading the next paragraph, check your laptop, your iPad, your kindle, your nook, or whatever. It probably looks plain, like mine (left). If there is no identification on it, go get one of those free address labels you got from the Awful Disease Foundation, and stick it on. Also some stickers from your favorite organizations. When you are done, it should look like @vmbrasseur's (right). Are you done?
Here's what I learned about lost MacBook Pros and airlines. Once the battery runs out, you can't even find a serial number. The typical baggage claim operation does not have geek squad backup. They don't have spare power cords or batteries to help them ID lost laptops. What they DO have is a safe, and that's where errant laptops go to die. If you ever find yourself in my position, go to the airport and ask the friendly lost-luggage attendant to go look in the safe. Otherwise, you will never see your laptop again, even if you have entered its serial number into the web form that has replaced the lost and found phone number that no one helpful ever answers.
In contrast, Jeanette, the DFW Continental Airlines baggage claim professional that I talked to in person, was very helpful. She called back to the guy with the safe, and we hardly had time to joke about the huge bag of dried fish from Africa that was smelling up the lost baggage area before safe-guy came out with MY LAPTOP. Yay!
Raymond had emailed me with a status report from the virtual home office, reproduced here in its entirety:
So I was feeling pretty good. I got to the Convention Center and found Andromeda doing an in-person demo of unglue.it. The in-person demos are an invaluable complement to submitted feedback reports because they let you see expectation mismatch as well as outright failures. Andromeda seemed to have the demo drill down to a science. I am thankful for the generosity of our in-person testers, whose insights will soon be incorporated into the site.
Since this post has been accepting digressions, I must note here that Andromeda gives new meanings to the adjective "awesome". At some point over the past few months, most likely due to lack of proper supervision coupled with web development despair, Andromeda has learned to code javascript and CSS. In a subsequent period of inadequate supervision, Andromeda seems to have recruited a squadron of librarians learning to code, which is somehow becoming an official ALA "codeyear" Interest Group. I doubt that we have heard the last of this.
On Sunday, Beth Kephart's article on Unglue.it went live at Publishing Perspectives. Beth writes so beautifully that it hurts. Her first book, A Slant of Sun is on my Unglue.it wish list. Her article introduced the Unglue.it concept to hundreds of new book lovers, more than a few rights holders, and generated a bunch of traffic. We've always expected that we'd need some publicity to find significant numbers of rights holders willing to take the plunge for a completely new business model, and the Publishing Perspectives article was a great start. Ed Nowotka's more cautious commentary is spot on, as well.
In the first week, the preview site has signed up 133 users (a conversion rate of about 10%) and we've received numerous suggestions for improvement. Our intrepid ungluing pioneers have added over 7500 works to our database. The most frequent comment is that we need better ways to indicate works that are already "unglued", either by virtue of being in the public domain, or by being already available under creative commons licenses. Raymond is currently working on loading Project Gutenberg titles; there will be more "unglued" books added as we go on, as well as ways of adding them directly. Coming in second were requests to have more selective imports from GoodReads and LibraryThing.
I should mention that we've had some great talent helping the core unglue.it team. Most prominent is the design work of Stefan from Design Anthem. We've had part-time help on systems and software from Ed Summers and Jason Kace. And it's hard to overlook the contribution of the countless developers who contributed to the open source Python and Django projects.
If you haven't tried the site yet, please give it a spin and tell us what you like (or dislike). The more people that sign up, the less skeptical rights holders with interesting books will be about the concept. If you're a rights holder or a rights manager of any kind, please contact Amanda at rights@gluejar.com with your ideas and questions. Follow @unglueit on Twitter, like unglueit on Facebook.
Feel free to send this to others you think might be interested in the topics. If you find these threads interesting and useful, you might want to add the Thursday Threads RSS Feed to your feed reader or subscribe to e-mail delivery using the form to the right. New this year is that Pinboard has replaced FriendFeed as my primary aggregation service. If you would like a more raw and immediate version of these types of stories, watch my Pinboard bookmarks (or subscribe to its feed in your feed reader). Items posted to are also sent out as tweets; you can follow me on Twitter. Comments and tips, as always, are welcome.
Code Year: Learn to Code in 2012
Sign up for Code Year to start receiving a new interactive programming lesson every Monday. You’ll be building apps and websites before you know it!
Code Year is a project of internet startup Codecademy, a service that teaches people how to code (JavaScript only, at the moment). There have been threeclassesposted already, and the website says they are still accepting registrations at the homepage. Code Year is free, and it sends an e-mail at the beginning of each week with a link to that week’s course. More questions? See the frequently asked questions.
What I think is really cool about this is that a group of librarians has self-organized themselves to support each other through the year. There is a community area on ALA Connect and a list of resources on the catcode wiki that includes examples tailored to cataloging challenges. (“catcode” is a unique story onto itself. It is a wiki created to “help support dialogue between catalogers and coders.”)
Apple Introduces iBooks Author
Graphic from Talking Points Memo
Educators so far seem excited about the potential promise of a learning “revolution” enabled by Apple’s new iBooks Author app. However, not everyone is feeling that same level of enthusiasm: e-book publishing experts have concerns about the formatting that iBooks Author can output, which isn’t fully ePub 2 or ePub 3 compliant. Furthermore, Apple has added a clause to iBooks Author’s end user license agreement that prohibits selling e-books created with iBooks Author anywhere but the iBookstore.
Last week saw the big introduction of iBooks Textbooks for iPad and iBooks Author ebook creation utility. The combination were billed as a promising new way to have students interact with course materials and to have teachers build their own content. There were some not-so-nice surprises in the implementation, though. First, the ebook format is close to that of ePub standard from the International Digital Publishing Forum, but strays in enough important ways that the iBooks Textbooks themselves won’t be usable on non-Apple devices. Second, included the End-User License Agreement for the iBooks Author software are terms that says content created with iBooks Author can be given away freely but can only be sold through Apple’s iBookstore. Apple also reserves the right to determine if your work is sold at iBookstore with no recourse for rejected works. The article above has more details, and the press coverage of iBooks Textbooks and iBooks Author has been generally negative so far.
That’s right, the Stop Online Piracy Act (SOPA) in the House and the PROTECT IP Act (PIPA) in the Senate are, for all practical purposes, dead in the water.
Sure, Senate Majority Leader Harry Reid (D-NV) and Rep. Lamar Smith (R-TX) used the word “postponed” in their announcements, saying that Congress would only take a breather, but would certainly not give up for good on its goal of passing some sort of legislation designed to combat overseas “rogue” websites hosting pirated American content.
But whenever Congress decides to re-engage the online piracy fight — and it could be a while, given just how acrimonious the debate over the bills became in the last week — it’s almost certain that SOPA and PIPA won’t be revived in any recognizable form.
Who would have thought — grass roots organizations convince major internet presences to “black out” or otherwise inform users of ill-considered provisions (at best) in legislation, and in turn those users bury both houses of Congress with so much anti-SOPA and -PIPA feedback that they effectively kill the bills. Is this the closest we’ve come to direct democracy since ancient Athens? Perhaps! The article quoted above goes into great detail about the formational elements of SOPA and PIPA and the forces that gathered to stop them.
I had planned to go along to SummonCamp at ALA Midwinter on Sunday and talk about using the Summon API but, perhaps all too predictably, I ended up staying up waaaaay too late on Saturday night sampling some yummy US beers, forgot to set my alarm and overslept
Anyway, here's what I would have talked about if I hadn't been asleep at the time…
MyReading Project
For the last 12 months, I've been working on developing reading list software for the University of Huddersfield (home page and blog). By making use of both the Summon and 360 Link APIs, I've been able to cut down development time and also improve the functionality of the software for both staff and students.
360 Link API
E-journals and e-journal articles make up about 15% of all the reading list references in the software. One of the primary issues was how to provide accurate links to that material and how to ensure those links are updated whenever we change e-journal subscriptions or database platforms. On top of that, we also needed to ensure that authentication was as seamless as possible. Seeing as our link resolver (360 Link) already does all of the above, it made sense to use that.
So, for journal and article references, we're storing the OpenURL so that we can query the 360 Link API on-the-fly to fetch back current access links. As 360 Link also handles the creation of EZProxy URLs for authentication, the API will return EZProxy prepended URLs when relevant.
By calling the 360 Link API with the above OpenURL, we can get back a page of XML.
At the time of writing, the ssopenurl:linkGroups element contains a couple of ssopenurl:linkGroup elements of type holding which, in turn, contain the current article access links for SwetsWise Online Content and ScienceDirect Journals.
So, as long as we've got an accurate OpenURL for a reference, we should be able to automatically insert the correct access links into the reading list. But, how do you get the OpenURL in the first place…?
Summon API
Once staff are logged into the reading list software, they'll find an option to import any result from Summon as a reference into one of their reading lists…
Although Summon doesn't officially support modifications like this, unofficially it's possible to execute jQuery by hacking in a link to suitable JavaScript via the "Custom Link" option within the Summon Administration Console…
As doing this isn't officially supported by Serials Solutions, it's possible that it could stop working at any time. But, until that day comes, it's a useful way of making minor tweaks to the Summon interface
I'm only a beginner with jQuery, so the following might not be the most efficient and/or elegant way of adding the custom links, but it does the job…
$(document).ready(function(){ doMyReading( ); });
function doMyReading( )
{
$( '.metadata' ).each(function(intIndex)
{
var myReadingDocID = $( this ).parent().parent().parent().parent().parent().parent().parent().attr("id");
if( myReadingDocID )
{
$( this ).append( '<div style="margin-top:3px;background:#004088;color:#ccf;padding:3px 8px;font-size:98%; white-space:nowrap;">item options: <a title="add this item to MyReading" style="color:#fff;" href="http://library.hud.ac.uk/myreading/perl/admin/import_summon.pl?id='+myReadingDocID+'">add to MyReading</a></div>' );
}
});
}
…the important bit is that we grab the document ID value for the result (myReadingDocID in the above), which we can then use to retrieve the exact same result via the Summon API.
When the staff user clicks on the "add to MyReading" link, the reading list software uses the document ID to pull in the reference's details from the Summon API and automatically populates the reference form…
…which includes the OpenURL and DOI, both of which can subsequently be used to query the 360 API to fetch access links
We can also use the document ID to retrieve the article's subject terms and abstract from Summon…
Summary
So, in summary, we've used the APIs to:
avoid having to manually maintain links to e-journal content
make it both quicker and easier for staff to add items from Summon (which currently encompasses over 600,000,000 items!) to reading lists
enhance records by bringing in abstracts and subject terms from Summon
New vacancy listings are posted weekly and appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.
The following guest post is from Regards Citoyens, a French organisation that promotes open data.
As part of The Law Factory project we are running an international conference for hacktivists and academics working on parliamentary monitoring and legislative studies. The conference will take place on the the 6th and 7th of July in Paris.
The conference will be held in English to ensure maximum participation from communities all over the world. At the event, a team from the project we have been working on with SciencesPo will unveil the first prototype of our legislative evolution monitoring tool.
We firmly believe that “open data means better science”. Panton Fellowships have been created in order to support scientists – particularly graduate students and early-stage career scientists – to explore this idea, and to tackle those barriers which currently prevent science data from being made open.
Dr Cameron Neylon, of the Panton Fellowships Advisory Board, commented on the ‘real potential’ of the Fellowships to influence practice surrounding open data in the scientific community. ‘Panton Fellowships will allow those who are still deeply involved in research to think closely about the policy and technical issues surrounding open data’, observed Dr Neylon. By allowing scientists the scope both to explore the ‘big picture’ – gathering evidence to promote discussion throughout the community – and also to work on specific technical solutions to individual problems, the Panton Fellowship scheme has the potential to make a real impact upon the practice of open data in science.
Panton Fellows will have the freedom to undertake a range of activities, and prospective applicants are encouraged to formulate their own work plan. As Fellows will continue to be employed and/or study at their current institution, activities undertaken for the Panton Fellowship should ideally complement and enhance their existing work.
Policies and Permissions in FulfILLment will be very flexible and highly configurable. Read on to learn more.
FulfILLment will have the ability to create policies to determine/control patron eligibility. This is flexible down to specific libraries or specific item types.
Authorized staff will be able to create org groupings to funnel patrons into searching pre-defined groups (county-wide, regional, state, school, etc.).
FulfILLment will allow blocking of material types at specific libraries from filling ILL for all or specific patrons or groups.
Circulation policy can be generated from either the owning or lending library policies.
When an item is put in transit to a library to fill an ILL request, a brief bib record is pushed to the home ILS system (if the system permits it) to facilitate checkout. If the home ILS does not permit it, then staff will need to create a brief record.
The Administration module will allow local definition of what actions staff may perform and at what locations.
The Administration module also allows fine granularity of policies and permissions to be set/configured. Here are a few examples:
1. The ability to configure max requests per patron.
2. The “Need by” date for material can be configured.
3. Org units can be configured to be valid pick-up locations.
4. The ability to designate a library to be the “lender of last resort”.
5. Renewals will be configurable on a material type or library-by-library basis.
6. The ability to configure standard processing charges per library. This amount will be visible to patrons in “My Requests”.
Remember to check back next week. Next week’s post will cover Patrons and Staff Use in FulfILLment.
Problem: MySQL taking forever to load some large data dumps. Forever or longer.
“mysql> show processlist;” shows it wedged at “Repair with keycache” and “Waiting for table metadata lock”.
According to a handy Stack Overflow article, this is a known and dreaded condition, which can be addressed by making sure tmp dir has plenty of space, and increasing size of myisam_max_sort_file_size from 2G (2146435072) to 30G (32212254720). Using MAMP 1.9.6 it took some more digging to find out how to add a local my.cnf settings file for MySQL. This now lives in /Applications/MAMP/conf/my.cnf (I added into [mysqld] section a line saying ‘myisam_max_sort_file_size = 30G’ (or there-abouts). Shut down the MySQL server, create that my.cnf and restart; then confirm it read your config using ‘show variables’.
Does this work? Well I don’t know yet. But enough times I’ve searched around before and found my own notes, that I thought I should at least write this much down for my future self to find :)
Update: it worked. A data import that took 2+ weeks (before I gave up) now runs in a few hours. After the bulk of the data was imported, we see ‘Repair by sorting’ in ‘show processlist’ for a while (couple of hours for 15 million records, in my case). This is, as promised, faster than ‘Repair with keycache’. I’ve done this on two machines now (with the same data); on one of them I did notice some ‘Waiting for table metadata lock’ processes in the list, but it still successfully completed overnight.
I took the Stanford AI class. Overall I’m very impressed with the scalability and quality of the experience. I’m tempted to check Thrum’s TA roles for Udacity! While I don’t think that *every* kind of class should be taught in this way, for technical material with clear “right answers”, this is the (a?) right approach.
The class worked incredibly well overall despite a number of flaws. The best part was immediate feedback, on quiz materials and once the homework deadline had passed. When something is fresh in your mind, this is the real learning moment — so that’s invaluable for keeping student engagement. Even though so much of the class was rough at the edges, that, and a trust in the knowledge/expertise of the instructors, is the key aspect that would drive me to take this sort of class again. I also went in expecting to learn more about online learning (having both taken and TA’d other online classes); I did not expect to have such a powerful experience of the potential of this (relatively impersonal) approach.
“Education” means many things to many people. To the extent that education is about filling heads with core technical material, this is the future of education. In my mind, this pushes educators in several directions:
pedagogical development / teacher training There are definite skills to be learned both in presenting course material and structuring courses for optimal learning.
curriculum development and curriculum systems development
There is incredible potential for increasing the personalization — for instance, there’s a kind of error that allows insight for the teacher, into how the student is thinking. For these kinds of errors, when a particular wrong answer characterizes a certain wrong way of thinking, you can provide specific feedback about the error, and even appropriate follow-up questions
the tutorial model
Formerly common in the UK — perhaps in Ireland, too?
My impression is that this supplemented reading with personal interaction with a knowledgable person. Here the videos/class assignments provide structure, which could be supplemented as needed.
exploring and justifying the need for liberal arts education, not as an alternative to technical training, but ideally in conjunction with and countering it.
exploring and justifying what is the role of education in subsidizing research and stimulating researchers.
Here are some pros and cons, off the top of my head:
+
Material was chunked into short segments.
There were clear, regular assignments.
There was an active community of students, discussing in many places.
Discussion approaches improved over the course of the class (e.g. tagging questions to particular homework problems).
There were attempts at engagement (e.g. “office hours” where questions were submitted via
The students built tools for their own and others’ use; I relied heavily on the subtitling (much easier for skimming through the parts that I already understood).
The material was well-chosen.
Eminent instructors who know the field and are passionate about it.
The feeling of being part of a game-changing educational endeavor.
-
The classes were very video-focused:
–Other learning modalities were not well accommodated.
–Watching required a lot of time and for AI, no inherent speed-up capabilities were built in.
Feedback was not personal.
The schedule varied a bit more than necessary (DOS attacks, scalability issues, changes of plans)
Repetitive conversations in the online discussions made things hard to follow.
Lack of engagement in some ways.
Reliance on a number of external tools (e.g. google hangout for “office hours”, aiqus, reddit, …) made it difficult to keep up even with core discussion.
Difficulty inherent in the size of the class/scalability (e.g. google docs couldn’t handle the number of editors for course notes)
Assignments were not proofread in advance
Assignments could have made better use of the particular kinds of typical mistakes (“insightful errors”)
Class communication could have used improvement — for instance, announcements didn’t use RSS and important corrections didn’t always get shared.
There were two instructors with very different styles and pedagogical skills.
Some people complained about the “low tech” approach to videos — but I found it helped avoid sterility.
Class materials were sometimes not available at the moments when I had time.
Some material seemed simplistic.
Many students dropped out or lost interest (I haven’t seen statistics which were promised).
+/-
Assessment could be seen as varied or insufficient: several German testing centres were opened to allow students to prove their mettle in a timed environment.
The Stanford online classes (AI etc.) and Udacity came up in a DERI listserv discussion about the future of education. This is an answer to: “Care to share experience of the classes? How did it compare to a conventional lecture series? What were the pros and cons?”
The following is a post by Sam Leon, Community Co-ordinator for The Public Domain Review and other Open Knowledge Foundation projects. It is cross-posted from the Open Knowledge Foundation’s Public Domain Working Group Blog.
At The Public Domain Review we’re always scouring the internet for public domain gems. It’s simply incredible how much of our shared cultural heritage is now available for free online. But with so much content out there and with so many different digital collections to choose from it can often be difficult to know where to start looking for interesting and curious works. On top of this, it can often be difficult to understand what you’re allowed to do with a given work and what the license that is applied to it actually means.
It was because of these difficulties that we decided to write a Guide to Finding Interesting Public Domain Works Online. In the guide you’ll find information on how to collect leads, an overview of the main online public domain collections (e.g. Project Gutenberg, the Internet Archive & Wikisource) as well as some basic legal information about licensing and the public domain.
Happy exploring! If you come across something that you think could be featured on The Public Domain Review give me a shout at sam.leon [at] publicdomainreview [dot] org.
The 1st operational eXtensible catalog is Cute.Catalog at Kyushu University Library.
Cute.Catalog completely covers the bibliographic information of academic resources in Kyushu University which contain not only library holdings but also research output produced by Kyushu University researchers.
This report, sponsored by the European Science Foundation (ESF), outlines and describes what is needed to advance digital humanities research and their supporting research infrastructures (RI). The report is complete with summaries and case-statements from existing centers. Many of the centers are based in Europe. Priorities from the conclusion include: creating an inventory of activities and needs, 2) fostering partnerships,3) establishing RI ecosystems, and 4) developing more higher education programs. This is a good report for folks who are create digital humanities laboratories.
DAVID MAMET is sitting quietly, drinking a scotch.
LOU COSTELLO enters, sees MAMET, and reacts with his catchphrase: Heyyyyyyyyy, Mam-met!
MAMET: I'm here. I'm here. Don't, the shouting. Don't.
LOU sits down in the booth.
LOU: You're running a baseball team now, right?
MAMET: I won the, in a poker game. I won the team in a poker game.
LOU: In a poker game? A baseball team?
MAMET: I had three kings. He had two aces. [Beat.] He lost the thing that matters most to him in the world. Now he has nothing. Now he goes back to work. That's what a man does. He works.
LOU: Well, I never met the guys on the team, so you'll have to tell me their names, and then I'll know who's playing on the team.
MAMET: The names? The names, I'll tell you, I'll tell you, but--funny names. Peculiar names.
LOU: Peculiar?
MAMET: Who's on first, What's on second, and I Don't Know is on third.
LOU: You're the owner?
MAMET: Yeah.
LOU: And you're--
MAMET: And the manager.
LOU: The manager.
MAMET: And the ...
LOU: The ...
MAMET: The coach. I'm the coach. The coach too.
LOU: And you don't know the fellows' names?
MAMET (angrily): You're saying I don't know their names? I own the team and I don't know their names? I manage the team and I don't know their names? I coach the team and I don't know their names? What am I? What am I?
LOU: Hey, hey, all I'm saying is--
MAMET: I know their names. I'm telling you their names. Are you listening? I'm talking. I'm telling your their names.
LOU: I'm saying, who's on first?
MAMET: Yes.
LOU: The guy on first.
MAMET. Yes.
LOU: I mean the fellow's name.
MAMET: Yes.
LOUL: The first baseman.
MAMET: Yes. Are you listening? Are you hearing me?
LOU: I'm hearing you.
MAMET: No, no, no, listen. Listen. We're talking here--
LOU: We're talking.
MAMET: We're talking. We're talking about the baseball team.
LOU: We're speaking about the team.
MAMET: Speaking? Speaking? The hell? The hell? We're speaking about the team now? Are we actually speaking about it? Or are we just talking?
LOU (hands up, placatingly): We're just talking. We're just talking.
MAMET: We're just talking.
LOU: All right.
MAMET: All right. All right. Jesus. We're talking.
LOU: But who's on first?
MAMET: Are you hearing me? Are you hearing me. I don't think you are. We're sitting here and I'm talking but you are not [pause, then more calmly] hearing me. Jesus. Yes. Who is on first.
LOU: That's what I'm asking you!
MAMET: I'm telling you.
LOU: Who.
MAMET: Yes.
LOU: Who is on first.
MAMET: Yes.
LOU: How often do you pay the players?
MAMET: Every week.
LOU: When you pay the first baseman, what name do you put on the cheque?
MAMET: Cheque? I don't pay them with a cheque. I use direct deposit.
LOU: What's the name on his bank account?
MAMET: Who.
LOU: The first baseman.
MAMET: Yes.
LOU: Yes?
MAMET: Yes. Yes.
LOU: Yes is the name on his bank account.
MAMET: What do you, what do you, what is this? I mean what is this? Why do you care how he gets paid? Murray got the account number off a blank cheque. What do I care about his account?
I’ve revised my blog post on The Orphan Wars into a short essay for the EDUCAUSE Review. It bears the same title, but I’ve updated it for the higher-education IT community. Here’s the new opening paragraph:
“Orphan books”—books that are in copyright but whose copyright owners can’t be found—have been in the news lately, thanks to lawsuits over Google’s plan to scan a copy of every book ever published. What started as a project to make a better search engine has gradually become a focal point for debate over whether the legal system can find a way to rescue the orphans from copyright limbo. Some of the libraries working with Google have announced plans to make available to their patrons digital versions of the books they think are orphans; an authors’ group has sued to stop them. In this column, I’ll review the convoluted history of the Google Books lawsuits, with an eye toward what they might mean for orphan books.
The following is a blog post by Rufus Pollock co-Founder of the Open Knowledge Foundation.
I have a dream, one which I’ve had for a while.
In this dream I’m able to explore, seamlessly, online, every text ever written. With the click of a button I can go from Pynchon to Proust, from Musil to Machiavelli, from Homer to Hugo.
And in this dream not only can I read, but I myself am able to contribute, to write upon these texts — to annotate, to anthologize, to interlink, to translate, to borrow — and to share what I do with others.
I can see what others have shared, what notes they have added, what selections they have made. I can see the interweaving of these texts created by borrowing, by inspiration, by reference, all made concrete by the insight and efforts of myself and others and their ability to layer their insights freely upon those original texts — just as those writers built upon the works that had gone before them.
And while each text still can stand still stand alone — in all its greatness or mediocrity — we have something new, a single unified corpus woven together out of this multitude of separate text — e pluribus unum.
A whole that is a concrete instantiation in an immaterial realm of the cultural achievement of mankind as expressed in the written word.
Dream Meets Reality
Why is this dream not yet a reality? After all don’t we have the tools and technology.
One answer is legal, one answer is technological, and one answer is social. The legal issue is copyright, at least in its current exclusive rights form 1. Copyright means this vision is only really possible for works in the public domain, works therefore that are, in most countries, a hundred years or more old. This isn’t necessarily that big a problem, at least for texts: the public domain though old is already incredibly rich and so we therefore already have more than enough material to be getting on with.
On the technology front we have the cost of digitization, processing and storage. Digitization costs are significant. This has meant either that digitization activities have either been limited or the material created has not been released openly (for example, the material produced by Google’s efforts with its Books project, which is probably largest effort to date, is not open). That said, efforts like Project Gutenberg and the Internet Archive have already made available tens of thousands of texts, and there are now several digitization projects underway that will result in even larger amounts of material freely and openly available.
Then third we have the social issue, or rather it a question of how technology can support the social activities required for this dream of a unified text to become real. Specifically, to realize our dream we need to bring material — texts and the writing upon them — together in a single coherent experience. Yet the centralization (and ownership) that implies may be a significant obstacle to mass participation.2 Similarly, we need it to be possible for anyone with ‘net access to be able to contribute to the weaving of the unified inter-text but, at the same time, to be able to select which contributions we want to see (if we are not to be overwhelmed by an avalanche of material, much of it possibly of dubious quality).
Conclusion
We have then within our grasp, the realization of th dream of a unified text. Combining of text of technology we can create something truly extraordinary.
Interested in making this happen, come join us at the Textus Project.
Let me be clear, I’m not saying that copyright is per se is bad or that everything should be ‘free’. Time, energy and capital are required to create books, music and films and that expenditure often needs to be recompensed. However, the current system of copyright is by no means the best way to achieve this. This is not something I wish to explore in detail here. More can be found on my personal website and in papers such as Forever Minus a Day: Theory and Empirics of Optimal Copyright↩
This tension between distributed collaboration and centralizing tendencies of coordination and scale is a common theme in many ‘net projects. ↩
New vacancy listings are posted weekly and appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.
In converting my blog from Wordpress to Octopress, I
had a lot of old posts I was leaving unpublished. I wanted to keep them around
but don’t see the need to republish them right now. I also want to be able to
create a lot of drafts of ideas and leave them unpublished. Then whenever I’m
ready to work on a post, they’re all right there in my repository already.
Problem is that I find it hard to read through the filenames of posts and try to
remember which have been published and which have not. So in order to see
the publication status of all my posts, I created this rake task. I just
dropped this at the end of Rakefile and run rake listpub.
When posts have published: false in the YAML front matter, they get no
asterisk. All other posts get an asterisk as they either have no published
field and so are published
by default, or set explicitly to true with published: true.
The method of extracting the YAML front matter from the post with a regular
expression is taken
from Jekyll.
12345678910
desc"List all blog posts and an asterisk if they are published"task:listpubdo|t|Dir.glob("#{source_dir}/#{posts_dir}/*.markdown").sort.eachdo|post|file=File.read(post)file=~/^(---\s*\n.*?\n?)^(---\s*$\n?)/mdata=YAML.load($1)status=data['published']||data['published']==nil?'*':' 'puts"#{status}#{File.basename(post)}"endend
Here’s some partial output showing two published posts and one unpublished:
If you don’t know John MacFarlane’s Pandoc, the “Swiss army knive of document formats”, you should definitely give it a try! Pandoc’s abstract document model and its serialization in an extended variant of Markdown markup let you focus on the structure and content of a text instead of dealing with formats and user interfaces. In my opinion pandoc is the best tool for document creation invented since (La)TeX (moreover pandoc is a good argument to finally learn programming in Haskell) Images in pandoc markdown documents, however, are only referenced by their file. This requires some preprocessing if you want to create different files for different document formats, especially bitmap images and vector images. So I hacked a little preprocessing script that let’s you embed images in pandoc’s markup language. For instance you write
~~~~ {.dot .Grankdir:LR}
digraph {
A -> B -> C;
A -> C;
}
~~~~
A detailed description is included in the manual which has been transformed automatically to HTML and to PDF. Compare both documents to see that HTML includes PNG images and PDF contains vector images!
Feel free to reuse and modify the script, for instance by adding more diagram types! For instance how about ASCII tabs and ABC notation if you write about music?
That’s what I’m doing right now, ensconced in my window seat in coach on my flight home, playing Aretha Franklin’s “Young, Gifted, and Black” tuned up loud enough to drown out the food-smackers behind me while I tidy up trip reports and budget forecasts and put the buff on a small preservation planning grant.
But it was also what I did at ALA…
… When I picked up my badge and began my peregrinations through meetings and exhibits
… When I met up with old and new colleagues over dinner, coffee, lunch, walks down the street, hugs in the hallways
… When I walked into the Council chambers at ALA Midwinter to hustle up a few signatures for my petition to run as an at-large Council candidate.
I felt it was time to get back into ALA governance. I had been puzzling over whether this was, in fact, the right thing for me to do (in addition to LITA Nominations and GLBTRT External Relations and the occasional panel, such as the “ROI in Academic Libraries” Springer hosted last Friday) until I walked into the Council Chambers.
When I push open our door tonight, I know what to expect: Sandy, our cat Emma, my favorite spot on the green couch, a pile of unopened mail, the Sutro Tower twinkling on the hill. I am not being arch when I say I had a similar (if not quite as numinous) experience in the Council chambers today, when I tweeted that I had a petition and within minutes it was overflowing from signatures from Councilors both fresh and well-aged.
I sat a spell, watching the text transcripts unfold on the wall, watching Councilors debate and stand up and stretch and fill out ballots and knit and scoot onto the Web. (A colleague asked me how anyone could “stand” to be in Council for all those hours, and I replied, “These days, the Internet.” By gum, when I was in my first term we sat there in our analog misery, front and center!)
There’s been a lot of water under the bridge since my third term on Council. Financial downturn for my job (Librarians’ Internet Index). The move to Florida. The Florida Era. The move back to California. I’m still me, six years later, but I have that slightly smudged patina of accumulated experience.
We don’t get an Undo button in life, however useful that would be. We’re blessed and cursed with our history. One truth I have had to learn is that for some of us — many of us? — our sense of place looms large in that history.
For many years I preached — and lived — the mantra of “geographic flexibility.” Education, jobs, other opportunities: first I, then we, could follow the wind. I have repeatedly counseled librarians that they had to have geographic flexibility for their careers. I judged them for not seeking jobs far and wide. I looked to myself as an example–I, who had lived worldwide.
Yet it took the Florida Experience to teach me why some people — and I now realize I am in their numbers — have an allegiance to the place they call home so powerful that it is on the other issues in life that they compromise.It’s not that Florida was insanely horrible; it’s that experiences that were less than stellar (and life always has them) took place in a context of alien other-ness — and it was this alien experience that made them sad, at times overwhelmingly so.
There’s an expression, generally condescending: “She knows her place.” It’s too bad it’s never intended as a compliment. I do indeed know my place. I know where I am not “other.” I know where I belong. Not necessarily on this particular block in the Inner Sunset of San Francisco, but not much farther.
On a related note, I’ve been thinking about the events at Harvard last week, where the administration presented tough news about reorganization and downsizing. I can’t speak to what — or who — is right or wrong (if anyone or anything is right or wrong). But I can empathize with the sense that one’s place has become liquid under one’s feet, like one of those rolling earthquakes that feel as if they are never going to stop. Even if you know the Big One is going to hit, that’s an intellectual abstraction until the floor has become molten and undulating and the bookcases are swaying to and fro and it occurs to you that your world as you know it is going to end.
I had a very bad moment about six months into the Florida Experiment where I sobbed, “I want my old life back.” Yes, I did. I forgive myself for that highly emotional moment because I had hit upon a fundamental truth about being and place. There was no magic wand, of course, but I made one change, which led to another, and eventually we got very, very, very lucky.
Naturally, I do not have my old life back. That will never happen. We move forward in time, no lux capacitor to reorder that reality, and only through rigorous memory work — personal reflection, and efforts such as writing, film, music, and dance — can we run our fingers over the fluttering fabric of the past.
But I am no longer a displaced person, living in the backward glance. This may not be forever — it’s not mine to predict cataclysmic change or natural disaster — but it is at least how I plan to spend my days, God willing and the creek don’t rise. And for those who thought the same and have learned otherwise, you have my love and sympathy.
ee. On or about October 18, 2007, BENCKO sent an e-mail to VAN DER KOLK indicating that “sorry to bother you but if you would have a second to find me some links for the “Grand Archives” band id be very happy.” On or about the same day, VAN DER KOLK responded to BENCKO with an e-mail that contained a Megaupload.com link to a Grand Archives music album with he statement “That’s all we have. Cheers mate!”
At least they had good taste in the music they were pirating.
Digital Collections Services Through Using Web Crawls
Digital libraries have attempted to provide various aggregations of their
content. Usually the participants in the aggregation already make that content
accessible on the open web. The approaches to aggregating content
that have been taken in the past have relied on hosting institutions to provide
their metadata in new ways and support additional infrastructure and workflows.
An alternative approach to creating aggregations is to perform targeted crawls
and reuse the content on the pages. The problem with the crawler approach
dentifying items in the collection as opposed to other pages. This document
presents a few possibilities for how to identify items.
If you know of prior work with similar critiques and suggested solutions, please
let me know. I am eager to improve on the techniques of this approach.
Problems with technical approaches taken so far to achieve aggregations
Leaving aside the usefulness of aggregations, the approaches taken to achieve
aggregations of digital collections have had
problems for those who want to be a part of such aggregations. These problems
fall into the categories of separate standards, separate infrastructure, and
metadata dumb-down.
The primary way in which
institutions make their resources available is on the Web. Great effort is
expended to make web pages that are optimized for search engines and usable
and attractive for users. The standards used are common, ubiquitous standards
like HTML and HTTP
shared by developers throughout the world. The metadata presented provides as
rich context as is available for the objects made accessible.
Counter this to the common approach found in aggregations created by libraries.
They often rely on an OAI-PMH gateway
for harvesting item-level metadata about collections.
In order for collections to take part in an OAI-PMH enabled aggregation,
institutions which host digital collections must expose their information
through special XML rather than the HTML they already have. The different
standards and tools are foreign and a barrier to entry for the many developers
more familiar with web standards.
An OAI-PMH gateway is another separate service which needs to
be maintained in addition to the web site. The effort for providing services
and harmonizing data can be pushed down from the aggregator to the source
collections. It takes
extra effort on the part of institutions which are already squeezed. There are
maintenance costs to keeping these services up and in sync with the data which
is exposed through the website. New aggregators would do well to investigate
the problems encountered by previous aggregations using this kind of approach
like the DLF Aquifer.
Metadata dumb-down is where metadata goes through a transformation which
decreases the level of precision of metadata. It has been a valuable strategy
to harmonizing metadata across institutions. The problem is that many
institutions have rich, specialized metadata that, when dumbed-down, loses much
of its value. This rich content is often exposed on collection websites but
cannot make it through the transformation to the shared metadata schema. In
order to create the powerful aggregations that we want, we need to look for
new ways to leverage more of the rich metadata our institutions have invested in.
Crawling for Data
Instead of relying on separate standards or infrastructure to provide an
aggregation, it is possible to crawl websites instead. This significantly
lowers the barrier for institutions to participate in an aggregation as it
relies on the existing web sites of digital collections. When a robot crawls
a web site, it may follow all of the links on a page. Pages like browse, search,
about, and contact pages can be crawled along with the pages that describe
individual digital objects. For many kinds of search, an index of all of that
crawled content could be useful.
Some aggregations, though, prefer to only expose item-level metadata including
small surrogates for objects like thumbnails. Crawling digital collections is not
incompatible with being able to identify individual items in a digital
collection. There are relatively simple solutions for identifying items which
could cover many of the digital collections that already exist on the web.
Some of these will be set forth below.
Even if an aggregator is able to identify pages that describe individual items,
there is still the problem of how to extract useful information to allow for
functionality like faceted search interfaces. Until recently search engines
have had to be content with using some HTML semantics (<ol> means an ordered
list), along with natural language processing, in order to discover the core
content and meaning of pages on the web. These approaches can go very far in
extracting meaning from the unstructured data on a page, but they have their
limits.
Recently there have been renewed efforts to create simple standards for
embedding data in HTML. These allow search engines to extract data from the
page and make sense of it. While these efforts so far have been targeted
toward the use cases of search engines and commercial organizations, there are
possibilities for the cultural heritage sector to make use of these same
technologies. By digital collections using hidden markup embedded in HTML
the crawler approach could also have access to the rich content that is
already accessible in a form which would enable more interesting interfaces
for aggregations.
Following sections will make the case for how this could be accomplished.
Collection Profiles
The first problem encountered in using crawlers to create item-level
aggregations is curatorial. It is infeasible for most institutions to crawl
the whole web looking for appropriate digital collections content. Instead
there needs to be a store of metadata about digital collection web sites to
know what
is accessible and where to find it. Such a system can store metadata about
collections that enable crawling and other services.
Collection Profiles are a compilation of metadata about digital collections on
the open web. The other solutions set out below could use such a metadata
store, though the approaches themselves do not rely on it. I have done prior
work, along with Tito Sierra, in setting out what a system for compiling
metadata about digital
collections might look like and how it might function. For more information
on this work see the documentation for the
Collection Achievements and Profiles System.
Possible Solutions for Detecting Item-Level Pages
Identifying Item Pages
The first problem is determining which pages in a crawl of a site are
item-level pages rather than search or browse pages.
Below are different solutions for being able to discover item-level pages and
make the best use of them.
URL Template
The URLs for item-level pages often have a standard pattern on a single site.
Knowing the pattern would allow a system to only use those crawled URLs which
fit the pattern to be included in an search aggregation. For example, it would
be simple to determine that URLs of this pattern are item-level:
These URL templates could be stored with a centralized Collection Profile or
made available through metadata in the head of web pages. This is a simple,
low-cost solution that could work for many digital collections sites.
A site which can add content to the head of the HTML on item pages could add a
meta element with a particular name and a value specifying that the page is an
item page. The HTML5 specification has a way to officially extend the values
allowed for the name attribute to meta.
1
<metaname="itempage"content="true">
The problem with this approach is that consumers would have to know to look
for this particular extension.
Sitemaps
The Sitemap Protocol is a simple standard to allow
crawlers to see a list of
the pages that a site suggests could be crawled. In most cases a site will only
want to expose the most important pages. Item-level pages would be included,
while search pages or pages with duplicate content could be excluded from
a sitemap. These non-item-level pages could still could be crawled by search
engines, but by excluding them from the sitemap makes it easier for a crawler
to zero in on the item-level pages.
Many sites are already publishing sitemaps, making them discoverable through
their robots.txt, and sending them to search engines. There are common tools
for creating sitemaps based on existing sites.
An aggregation could specify that the sitemap provided to their crawler should
only contain item-level URLs. Since this is something that many sites are
already doing, it could be a low-cost way for many digital collections to
give an indication as to what pages are item-level.
Extracting Meaning
Once item-level pages are identified, there are various ways to extract meaning
from the pages or otherwise communicate the metadata.
Links to Alternate Representations
One mechanism which has been available for a long time is the ability to add
links to alternative representations of resources in the head of the HTML. For
instance, a page which describes a book could advertise that an RIS
formatted representation is available.
This communicates that there is an alternative representation of a particular
type at the given URL.
The metadata in the alternative representation could be retrieved, parsed,
and indexed.
A problem with this approach is that the alternative representation could go
out of sync with the public-facing HTML or the API could be neglected.
Nevertheless this approach could be a bridge between a specialized API and a
crawl-centered approach.
Meta head content
Digital collections may already make some of their structured data
about digital objects
available by exposing Dublin Core terms through meta elements in the head of
the HTML.
123
<head><metaname="DC.title"content="Title of Digital Object"/></head>
The content of the meta elements could be supplemented with parsing
and indexing of the
visible content of the page.
Semantic HTML Markup
Semantic markup is a promising future direction for digital collections
aggregators to be able to extract richer metadata out of the visible content
on web pages.
Semantic markup is the use within HTML of additional, hidden markup and
vocabularies to more fully express the meaning of the visible content of the page.
Cultural heritage organizations often have detailed metadata about the
digital and physical objects they curate. Digital collections metadata are
often stored in relational databases with rich schema. When they currently
expose their metadata through the Web, search engines for the most part get
plain text and need to attempt to make sense of the page with natural language
processing.
Semantic markup would allow cultural heritage organizations to expose their
metadata in a way that preserves more of the intention and meaning. Rather
than dumb-down the metadata to Dublin Core before being used by an aggregator,
consumers of structured data could utilize something much closer to the
intention of the producer. It also allows organizations to expose this
structured data without having the burden of maintaining a separate, little
used service. Since effort is already being put into the website, it allows
organizations to leverage that representation of the data which they already
make available.
Producers of digital collections only need to expose to the public one version
of the metadata they have. Everyone wants to have web pages for their digital
collections. The semantic HTML enables both human users and machines to access
the same data and understand it. Relying on collections to have metadata
gateways or APIs to expose alternative representations of their data can be
burdensome. It adds greater maintenance costs for making the data available.
Having more than one public representation of the data leads to the less public
one becoming unmaintained and getting stale or out of sync.
Semantic markup is implemented through various syntaxes.
Microformats and RDFa
syntaxes have allowed for making embedded, structured
data available on the Web through HTML for a while and have existing communities
of practice. The syntaxes are sometimes difficult to implement. The newer standard
HTML5 Microdata
has learned from previous efforts and attempts to be simpler for page authors.
RDFa Lite is a profile of RDFa which tries
to simplify the RDFa syntax similar to Microdata.
These syntaxes are used in combination with a vocabulary to make meaning. In
2011 the big search engines announced the schema.org
vocabulary and support
for Microdata. When the search engines make an announcement it gets the
attention of webmasters. This means that many webmasters will be applying
Microdata and schema.org. The tools and ecosystem will be growing. Already
many content management systems are providing a way to implement this kind of
markup.
There are simple to complex ways in which semantic markup could be implemented
to better support aggregations through crawling. For instance, it is simple in
Microdata using Schema.org to specify that a particular page is an
item-level page.
Even this amount of information would allow
an aggregator to sort item-level pages from others. Layered on top of that,
digital collections sites could embed a whole host of other types of data.
This structured data could be extracted to create new tools and more powerful
search interfaces.
Utilizing semantic markup would allow aggregators, search engines, and
others to do more interesting
things with the pages of resources that collections already make accessible.
This broad benefit for collection producers could encourage more involvement
from within and outside the cultural heritage community. So while benefiting
digital collections through the aggregation this would also benefit them on
the open Web.
Semantic HTML markup and relevant vocabularies are also areas where cultural
heritage organizations could still have input on the web. While the predominant
vocabularies at use now meet the needs of e-commerce, they could be improved
and expanded to better fit cultural heritage materials. Using a common,
standard, widely-accepted, and widely-consumed web vocabulary for description
of cultural heritage materials would have multiplier effects for these
collections. Aggregators could be a strong force for spurring creation of web
vocabularies which better fit our data models and the use cases we want to
enable. Through using common web standards, aggregators could also encourage
consumption of this data at scale in ways that have until now been impossible.
Conclusion
Creating aggregations through crawling can lessen the burden on institutions
that want to make their collections more discoverable. With just a little more
effort, the crawler approach to aggregations can still be narrowed to item-level
resources. Collections can implement semantic markup to make structured data
available on their pages, then rich semantic data can be extracted
from pages for more powerful aggregations. All of these web-centered approaches
to the item-level problem benefit digital collections broadly on the web as
well as aggregators, search engines, and developers.
At this point an aggregator may need to use some of the existing infrastructure
distributed digital collections have in place for exposing item-level resources,
but the nudge of standards ought to be in the direction of doing this in the
future in this more Web-centric way. There are possibilities for merging the
HTML and alternative (e.g. XML) versions for indexing. While a current
implementation of an aggregation is likely to use niche protocols like OAI-PMH
to include some collections, these approaches should be deprecated.
Please let me know if you know of other and better ways aggregators could
identify item-level pages and extract meaning from those pages. I would also
be interested in known aggregators that use these approaches.
Collection Achievements and Profiles System and DPLA Crawler Services
This is a quick strawman proposal for what the Digital Public Library of
America should build as the first parts of a generative platform. This document
is not in a finished state, but just as the DPLA has been good at opening up its
process with the Beta Sprint,
I wanted to release this
document early even in this unfinished state.
I attended the December DPLA Technical Workshop in Cambridge and was
inspired by the discussion there. I hope that this document
makes it clearer some of the approaches I and others at that meeting
were advocating. I shared this with the DPLA Interim Development Team a couple
of weeks ago, and now that development has started I thought I would share it
here as well.
While the first iteration of the DPLA platform may be set and on
its way, I still wanted to share one vision of what a generative platform for
aggregations might involve. The main point is to get the DPLA to the
aggregations they likely need to present at some point. This document leaves
aside the question of whether creating aggregations is a good idea. The desire
to create aggregations is a big, often unquestioned, assumption of big
digital library projects.
I think what is set out below is one simple architecture for
accomplishing aggregations in a very Web-centered way while potentially having
more reuse outside of just aggregations.
Introduction
This proposal gives a high-level overview of one possible DPLA technical
architecture. This gives the idea of what a beginning of a scalable,
extensible DPLA platform could look like. The architecture starts with a
foundation in the distributed digital collections which already exist on
the Web. The platform set out here works with the way the Web works
while allowing the DPLA to meet its goals. As a result it will also help
cultural heritage organizations to meet their goals for greater
discoverability of their collections.
Collecting and keeping track of the those existing collections is the
job of the Collection Achievements and Profiles System (CAPS). The
metadata CAPS collects can be reused to do focused crawls of digital
collections through a DPLA Crawl Service. The results of those crawls
can be analyzed, and the data used for a variety of applications
including topical or format aggregations, mashups, visualizations, and
other internal and external tools.
This architecture can be summarized through the following diagram.
The rest of this document begins with an overview of how these major
components fit together. It then goes into more detail about the
architecture and technical components required to implement the two
major pieces of this platform:
Collection Achievements and Profiles System
Crawler Services: Crawler Services are further divided into the Raw
Crawler Service and the Analyzed Crawl Service.
Architecture Overview
The foundation for the DPLA platform would be the distributed
collections on the Web. The DPLA can help make these distributed
collections more discoverable on the open Web and enable new services. A
user with a browser can already enter the address for these digital
collections and get something useful back. In this scenario any digital
collection on the Web can be a part of the DPLA. The collection (and
hosting institution) is not required to implement any new metadata
format, gateway, or API. The existing published HTML pages are enough to
gain the initial benefits of a DPLA. Collections can choose to adopt
other Web standards or provide more information on their collections to
gain more of the benefits of the DPLA and the Web at large.
The technical barriers to entry into the DPLA are purposefully low to
maximize participation. The DPLA has an opportunity to be a truly big
tent approach to solving the problems of making America’s cultural
heritage accessible and discoverable. When suggesting digital
collections adopt standards or make changes, this proposal gives
preference to asking digital collections to optimize for the Web for
broad applicability. The technical decisions made here always choose
what would make the system simpler and easier for producers of digital
collections over what would be easier for the DPLA or other aggregators.
The Collection Achievements and Profiles System (CAPS) is an editable
directory of Web-accessible digital collections. In its most basic form
Collection Profiles hold the name and URL of digital collections.
Achievements are a way to expand Collection Profiles through gathering
discrete pieces of data about collections and their institutions. In
order to validate various Achievements, CAPS can request pages and
resources from a collection Website. Full Collection Profiles with all
completed Achievements are available through a simple API.
The Raw Crawler Service finds new collections to crawl through CAPS. The
Raw Crawler Service can then launch crawls of a collection Web site. The
raw crawl data can be made available to external developers who want to
do their own analysis of the raw crawl data and build new services.
The Analyzed Crawl Service makes use of the raw crawl data to extract
data and text from pages. CAPS can use this data to perform work like
validating Achievements, assigning automated tags to collections, and
confirming the health of Web sites. The DPLA could use this analyzed
crawl data to create various aggregations, search interfaces, and other
services for digital collections that are only possible through having
this central data store. External developers could access the analyzed
crawl data to create their own aggregations, mashups, and other
services.
The DPLA can create a generative platform through using the existing
digital collections on the Web and adding value. Each major component of
the DPLA platform can make its data available to the world to enable the
creation of novel new services and new creative works.
Collection Achievements and Profiles System (CAPS)
The first component that the DPLA could build is a Web application which
allows for collecting basic information about collections. We call this
the Collection Achievements and Profiles System (CAPS). DPLA
Collection Profiles provide a mechanism for the DPLA to host a
centralized Web-based, editable directory of collections on the
Internet. DPLA Collection Achievements provide a mechanism for
progressively expanding Collection Profile descriptions, promote
standards adoption, validate adherence to standards, and progressively
engage the community. While the initial barrier of entry is low, CAPS
encourages digital collection managers to adopt standards that benefit
the discoverability of their collections and benefit the goals of the
DPLA.
Before reading the following technical aspects of the CAPS proposal it
would be best to familiarize yourself with the
Collection Achievements and Profiles System documentation.
This detailed documentation was done as part of a DPLA Beta Sprint
submission, and it forms the initial thinking for this work, including a
narrative, wireframes, and Achievement ideas.
Technical Components of CAPS
CAPS can be modeled with the following diagram. Elements in blue are
managed by the DPLA. Other colors refer to various external contributors
and consumers.
Following is a description of the major components of CAPS.
The CAPS Web application allows for editing and managing Collection
Profiles and Achievements (information about collections on the Web).
Collection managers and DPLA volunteers can create and update Profiles
and Achievement data through Web forms. A researcher looking for
collections (on a particular topic, in a geographic region, or other
relevant facet) could also discover collections and see all information
about the collections.
The Web application can request pages or other resources for a
collection on the Web using the stored URL. For instance when a
Collection Profile is first created the URL is validated for being
well-formed and then the page is requested to check that it returns a
200 OK status code. Other Achievement validations could also request
information from the site (e.g. robots.txt, sitemap). To insure that the
CAPS application returns a timely response for editors, CAPS can defer
some of these longer running processes and validations to a background
job queue.
CAPS would require a persistent data store for Collection Profiles and
Achievement data. Periodically a data dump of all data could be created
for consumption by external aggregators, crawlers, and service
providers. Access to all of the data or searches for slices of data
would be available through a Web API. Consumers of the API could be
aggregators, crawlers, and other service providers. Through the API the
DPLA could also provide other services like aggregated sitemaps. Having
multiple ways (data dump and API) for accessing the data, lowers the
barrier for developers both internally and externally to build new and
interesting applications.
Standards through Achievements
Through Achievements the DPLA can encourage the use of various standards
which can make Collection Profiles more useful.
Initial effort can be put into Achievements which can be automatically
detected, therefore requiring minimal effort from contributing
collections. The DPLA can adopt Achievements for Web standards that will
improve the discoverability of collections on the open Web. For instance
it is possible to automatically check whether the site allows for
crawlers (robots.txt) and has a sitemap of the most important pages to
crawl (sitemap protocol). When digital collections implement these kinds
of standards it benefits all consumers of digital collections resources,
including the DPLA and search engines.
These same Achievements will have interconnections with the rest of the
architecture laid out here. For instance automated Achievement
validations can require analyzed crawl data to be confirmed.
Other Achievements which would benefit libraries and museums could also
figure prominently. Knowing the hours of operation and geographic
location of the access point to the physical collections, could help
encourage visits to a library or museum. CAPS can provide the data to
start making connections between the digital and the physical.
Since Achievements add a named, small, discrete piece of information to
a Collection Profile, the code required to implement an Achievement is
relatively small and self-contained. The starter set of Achievements
could create a basic functional system that can be delivered quickly to
help bootstrap the rest of the DPLA effort. Achievements can be
incrementally added over time. Communities and developers could work to
create and incubate new Achievements around new standards before they
become part of the DPLA core platform. Achievements are another way in
which the DPLA could continue to spur innovation around digital
collections standards and services.
Crawler Services
Crawler Services are responsible for coordinating robots to crawl
digital collections sites, analyzing the data, and making it available.
Benefits of Crawler Services
Rather than using new or existing niche library protocols, the DPLA
could make use of common, ubiquitous Web protocols and standards.
Encouraging standards (through Achievements) that help the DPLA do its
work to crawl digital collections, will also aid the discoverability of
digital collections on the open Web.
The data created through the Crawler Services is important background
information for CAPS to validate some Achievements. Certain standards
which the DPLA may want to promote through Achievements, would require
requesting multiple pages from a collection. For instance validating a
sitemap could involve requesting each of the listed URLs. The resulting
data could be used to confirm the presence of listed URLs.
Analyzed crawl data could form the basis of various DPLA aggregations
and services. Making this data available will also encourage other
developers to create applications using digital collections.
Architecture of Crawler Services
Crawler Services can be split into two interrelated, but separate,
applications. The Raw Crawler Service is responsible for coordinating
crawls of digital collections sites and making the raw crawl data
available. The Analyzed Crawl Service is responsible for extracting data
and meaning from the raw crawl data to enable DPLA services. Building
them as two independent applications can allow much of their development
to happen in parallel.
Raw Crawler Service
The Raw Crawler Service uses the data collected by CAPS to discover
digital collections to crawl. Crawls could fall into different
categories. CAPS or other DPLA services could require a focused crawl of
a collection to be triggered for timely data. Extensive crawls of
digital collections sites could also be made.
The final product of the Raw Crawler Service is a store of the pages
crawled along with technical metadata. Technical metadata would include
when the page was last crawled and the HTTP headers returned with the
request including the status code.
This data could be made available to external developers who want to
conduct their own research or analysis on this slice of the Web. Both an
API and data dump could be made available. Whether the API only provides
for discovery of available raw crawl data or actually returns crawl
data, is an open question. Because of the size of the corpus, it may be
that the raw crawl data is made available in a lower cost way through
cloud services. (See the Common Crawl for
more on how this might work.)
The Raw Crawler Service would require an application to coordinate
robots, a data store for raw crawl data, and a database for technical
metadata about the crawl data. A Web application would also be needed to
create the API service.
Analyzed Crawl Service
The DPLA could also provide an Analyzed Crawl Service. This service
analyzes the raw crawl data to extract data and text from the raw crawl
data. At this stage it can also begin to make connections across
repositories. For various ways the DPLA can get to item-level data
through crawl analysis, see
Solving the Item-Level Problem on the Web.
With crawls
resulting in the full text of the page there is the the potential to
provide rich item-level data without reliance on niche protocols.
The initial consumer of this service would be the DPLA. The resulting
data could be used as the source metadata about collection pages
underlying new DPLA aggregations and services. The data could be made
available to external developers to create new aggregations, mashups,
and services.
Conclusion
This high-level overview of a DPLA platform architecture is intended to
spur discussion. There are many possibilities for what a DPLA technical
architecture may look like. Presented here is a technical architecture
which would enable the DPLA to function in the way that the Web works.
Development could be scaffolded quickly and immediately begin to provide
real benefits from the DPLA effort.
If the model set out here is not followed, the hope is that some of the
principles here will remain in the DPLA effort. Allowing for content
producers at all levels of technological sophistication to be part of a
big tent DPLA effort is an important underlying principle. Technically,
the DPLA can insure that at every level of the platform that the data
and metadata it creates is made easily accessible for reuse.
Is there any merit to this kind of approach for digital library aggregations?
Feedback welcome in the comments.
Harvard has a famously byzantine library system comprising over forty libraries, and administratively divided into two separate library systems (confusingly called the Harvard University Library or HUL, and the Harvard College Library, or HCL) has changed very little in terms of organizational structure since the late 19th century.
Harvard is not alone here. In fact, I’d suggest that the oldest academic libraries, and ironically especially the old ones that really excelled 80+ years ago, are most likely to have completely dysfunctional organizational structures and organizational behavior today.
Libraries today aren’t the same as libraries 80+ years ago, especially with regard to electronic content we purchase, which has different workflows to manage and different economies to purchase; and in terms of metadata maintenance as well, something which the blog author rightly points out libraries realized the benefits of cooperating/coordinating/sharing many years ago — but sharing cards (or data to print cards) through LC is a different beast than than modern metadata control needs.
I also generally agree with the blogger’s conclusion — but with less optimism:
But second, the importance of catalogers, and more broadly speaking, librarians is not necessarily diminishing into nothingness. The environment has changed radically, and there are sure to be plenty of future “massacre-like” events that will painfully remind us of these changes. But librarians do have a future, and I think it may even be a bright one: they just need to accept that it won’t be quite the same as the past.
I fully agree that there is still as much of a need for the tasks librarians have always done as ever — most definitely and even especially including cataloging/metadata control.
However, despite agreeing with that, I am actually not optimistic, like that blogger is. We are running out of time to demonstrate that our profession, community, and industry is capable of meeting the metadata control needs of the 21st century. We are not doing a good job of it. We do not seem to be capable of changing our priorities, expertise, organizational structures, and inter-organizational collaborative infrastructures, to deal with it.
The traditional goals of libraries have traditionally are still useful and needed just as much as ever, but with different ways of accomplishing them. There is still a great need for an organization specializing in information management on behalf of a user community, and without trying to make a profit off that user community. But I am, sadly, no longer particularly optimistic that libraries as they are are actually capable of accomplishing those goals. However, even in the best of cases, trying will result in some painful organization reorgs — nobody likes change. (It’s of course also possible for painful reorgs to end up entirely useless or even counter-productive, or simply admissions of defeat as libraries slowly die).
Hint: If you or your organization thinks if we can just put all our metadata into RDF as quickly as possible and therefore be “doing linked data”, that this is necessary and sufficient to handle modern metadata control needs — you have not only missed the boat, you are on the wrong boat. I have lately been seeing a worrying increase of people suggesting “oh, we just need linked data to solve that problem”, with “linked data” meaning “the data we’ve already got expressed in RDF”, with a worrying ignorance/disregard for what good data actually entails in the 21st century systems environment.
JISC’s Digital Infrastructure innovation team is aiming to release a Grant Funding call at the end of this month. It’s aim is to fund work to enable the UK Further and Higher Education communities to improve the digital infrastructure in the areas of managing research data, library systems, disciplinary vocabularies, access and identity management, research tools, and research information management. More details on most of these are available from the JISC Funding Roadmap. We’ll be using this and related blogs to provide further information and an FAQ.
I am a huge fan of Zotero, so much so that I use it for all of my books and share my bibliographies online with others. Koha integrates nicely with Zotero and this tutorial video will show you exactly how.
If you have an idea for a video, please just let me know and I’ll add it to my list of things to record.
Here are the steps I took to install Ruby and Rails on a fresh and updated
Ubuntu 11.10 install. The two places where there were hiccups involved having
to install openssl through rvm and updating to a more recent version of
rubygems. Some steps are thrown in there just to show how rvm and gem provide
some information. I used a virtualbox image to allow me to have a clean install.
There's a lot of talk about doing away with bibliographic records and replacing them with collections of linked data. In this scenario keeping track of the links is of vital importance. The recent paper How To Track Your Data: The Case for Cloud Computing Provenance by Olive Qing Zhang, Markus Kirchberg, Ryan K. L. Ko, and Bu Sung Lee addresses this topic.
Provenance, a meta-data describing the derivation history of data, is crucial for the uptake of cloud computing to enhance reliability, credibility, accountability, transparency, and confidentiality of digital objects in a cloud. In this paper, we survey current mechanisms that support provenance for cloud computing, we classify provenance according to its granularities encapsulating the various sets of provenance data for different use cases, and we summarize the challenges and requirements for collecting provenance in a cloud, based on which we show the gap between current approaches to requirements. Additionally, we propose our approach, DataPROVE, that aims to effectively and efficiently satisfy those challenges and requirements in cloud provenance, and to provide a provenance supplemented cloud for better integrity and safety of customers' data.
The latest MarcEdit update has been baked and pushed out the door. If you are running a current version of MarcEdit, you can expect to see the program prompt you for update (unless you’ve disabled that functionality). Otherwise, you can find the update at: http://people.oregonstate.edu/~reeset/marcedit/html/downloads.html. Originally, this update was planned to be primarily cosmetic, with two small bug fixes. However, after working with a colleague playing with some large Hathi Trust metadata files, a few other updates ended up squeezing in. So what’s changed? See below:
Enhancement: MARCXML => MARC enhancements. When translating from MARCXML to MARC, MarcEdit will truncate records if the record data is too long (over the 99,999 bytes) or the field data is too long (over 9,999 bytes). MarcEdit will truncate records that are too long or split the field data if too long. If either operation occurs, MarcEdit will recode the 008/38 to an "s". This enhancement only affects the MARCXML=>MARC conversion function — however, that means that any function that converts data to MARC through MARCXML is affected by this change.
I discussed this change in more length here, but essentially, this change was necessitated because I’m occasionally running into XML data that I’d like to translate into MARC, but simply is too large. The changes here allow MarcEdit when translating data through the MARCXML=>MARC process to automatically augment records that would otherwise be generated as invalid (as currently happens). If you’d like to see how MarcEdit handles these types of errors, you can look at a sample file at: http://people.oregonstate.edu/~reeset/marcedit/anonymous/long_xml.xml. This file has 3 MARCXML records. The first one is roughly 3 times too large for a traditional MARC record thanks to the many 9xx fields in the record. Prior to this update, MarcEdit would generate a record, calculating the length of the record incorrectly (it would calculate the length, then take the first 5 numbers in the value – since the record is longer than 5 values, the record length would be incorrect). After this update, MarcEdit will now truncate fields once the record limit has been reached and notify the user through the UI that the truncation took place, in addition to the 008 modifications mentioned above.
Bug Fix: Swap Field function: Under certain rare conditions, moving data from a control field to a variable field results in the delimiter value being dropped on the swapped data.
Bug Fix: Set Font function — when the function fails, the program will now exit the function gracefully and render the font in its default state.
Enhancement: Validator has been augmented so that invalid record identification of records in .mrk format can be done outside of the MarcEditor.
Enhancement: Added a new Change Case shortcut that allows users to set the initial character in a field to upper case, without modifying the case of any other characters in the subfield.
So that’s it for the updates. The MARCXML=>MARC changes were very significant changes, but hopefully they will be useful ones. I know that they will be welcomed at OSU since we occasionally run into issues of fields being too long when harvesting our ETD records from DSpace to generate our MARC records for the catalog.
At our meeting on Friday, the LITA board (your board!) conducted an exercise to craft an updated vision statement for our organization. We started out by thinking of organizations that inspire us and listing the attrubites that we admire about that organization. Some people talked about non-profits, some about online businesses or communities and some about their current or former places of work. Again and again, we came back to the idea that the people make the organization. To that end, we tried to come up with a short statement to describe our vision for LITA. Give it a test-drive and let us know what you think. Is this your LITA? We welcome comments and suggestions!
LITA: Libraries. Innovation. Technology. Awesomeness: Choose your own adventure!*
We probably knew this on the day of the meeting, but “Choose Your Own Adventure” is a registered trademark, so this is probably a good place to start with the discussion.
To all LITA Members – current, past and future and Friends of LITA:
You are invited to virtually join the LITA members present in Dallas for the LITA Town Meeting – Monday 1/23 from 8-10am CST. Instructions and links for how to participate virtually will be posted to the LITA Blog (http://litablog.org) approximate 30 minutes before the session starts.
We will be using UStream (audio only) and ALA Connect Chat to facilitate virtual participation. We will stream the audio for the outbound channel – so virtual participants can hear the general introduction and overview and then break out into three streams – one on each topic – so that the discussion in the room can be heard. Remote participants can log in to ALA Connect (http://connect.ala.org/) chat to send questions and comments to a discussion moderator who will share those with the participants in the room.
No account is required to access the audio streams. However to participate in the chat, you must have an account set up in ALA Connect. You do not have to be a member of ALA or LITA to create an account in ALA Connect.(http://connect.ala.org/user/register)
We also encourage you to send in questions for the LITA Leadership. Those can be sent in in advance (post on the LITA blog in comments (http://litablog.org/2012/01/lita-president-elect-zoe-stewart-marshall-hosts-the-lita-town-meeting/#respond) or reply to this posting with your question. Or you can send in your questions the day of the Town Meeting through any of the chat streams. Or you can submit your questions via Twitter – please use the hash tag #LITATM12 – so we will be sure to see them.
There will be three general topics for discussion:
*what is the value that participation in professional associations in general and LITA in particular provide;
* how LITA can provide support for developing leadership skills in emerging technology leaders;
* where and how should LITA provide support for development of technology skills.
My thanks to the LITA members who have volunteered to wrangle the software and equipment needed to facilitate virtual participation!
Please do plan to join us and help us beta test ways to improved and enhance virtual participation in the work of the association. Your input on these important topics is greatly appreciated.
The difference between a culture and its subculture fascinates me; “A subculture is a group of people with a culture (whether distinct or hidden) which differentiates them from the larger culture to which they belong.” In my opinion, the label of subculture tends to be accurate only for a very short time before it has built upon its own ideas and is worthy of a less derogatory definition.
The early thinking toward what helps a subculture thrive was based on the idea of seeking subversion, of subverting the norm. This is a powerful emotional driver, but many other concepts and systems of positive and negative feedback act as cultural attractors in the same manner. A metaphor I use here is one from maths and science, specifically the idea that given a dynamic and ‘chaotic’ system, in which there is a complex interplay between all actors and aspects within, there can exist things called ‘attractors’. The dynamic parts of the system gravitate around these attractor sets, the ‘empty’ central areas that the paths curve around.
Culture vs subculture
I find it hard to understand that there is a culture, in which subcultures exists, subverting the norms. Rather, I picture a very large space of ideas and concepts, through which we all make our own paths. We interact with each other and with the ideas themselves, moving differently as a consequence. The attractors are how trends and fashions show up in the system, causing many of us to swing around these points together, reinforcing the attractor by doing so.
Fifty years ago, it was hard to quickly traverse this space of ideas – you would need to make effort to find ideas you wouldn’t ordinarily be aware of, let alone know about. Word of mouth could only reach so far, and at a slow speed. Mainstream media provided well-controlled sources of information, forming powerful attractors in the space, pulling everything round with it. Small pockets of the aforementioned subversive culture sprang up, mini-galaxies of people rotating counter to the larger galaxy of culture around them.
Nowadays, this space is trivial not just to move through, but also to navigate due to the embedded nature of the internet in many people’s lives. People are being limited by their own personal, biological bandwidth limits rather than the limits of their communication channels; there is a limit on the amount of information you can consume and create in one lifetime that has nothing to do with technology. Word of mouth is now an incredibly powerful tool and can spread as idea to all the corners of the internet in minutes. It is easier to find people who are taking similar paths through this idea space and so, it is easier to be affected by them and they by you. People can pass many more cultural attractors in their lives, and subjectively, may choose to move around different ones to the mainstream. The act of living online is an attractor in of itself, and is pulling more people away from the predictable movement around the older, mainstream attractors.
What might be seen as being a fragmentation of culture, a ‘loss of traditional values’, is the adoption of differing cultural vocabularies and ideas, different to those tropes and other norms of behaviour that they are meant to participate in. I do not see this as a bad thing. People adopt a culture because of many reasons: obligation, tradition, apathy, passion, belief, exposure through peers and so on. However, of those, the people who change culture do so for much more active reasons, mainly in the belief that they will be happier because of it.
What may have started as a collection of like-minded fans has certainly grown into a culture of its own. In the documentary – see below – the concept of family was repeated many times and intimately linked with the idea of what it means to be a Juaggalo. Many of those who responded said explicitly that their juggalo friends around them meant more and did more for them than their own blood families, that they felt part of a community. It’s unfair and a mistake to call ‘Juggalos’ a gang, as the FBI has, as I hope you’ll see from the documentary below.
An apparent trend in the way that established media reports on these new cultures is to focus on how they are more uncivilised, more uncouth – simply, worse – than ‘we’ are. With the Juggalos, it’s easy to focus on their hedonism and loud, brash mannerisms and ignore their human sides. One culture that arguably peaked some years ago is Straight-edge, which is based around the ideas of abstinence from drugs, alcohol, tobacco, to respect your body, and to not be promiscuous, unified by a number of straight-edge bands, in a similar manner to the Juggalos above.
Even though the ideals are good, a tiny number of people self-identified themselves as straight-edgers and were reportedly violent to people who they disagreed with. Of course, this was the story that the movement became associated with in the wider media. Compare the following documentary of some straight-edge people talking about themselves and their beliefs, with the National Geographic documentary after it (it is poor quality and in several parts, I’m afraid. Click through to the Youtube pages to find the other parts, if you are interested.)
Is there a mainstream any more?
From the perspective of many internet communities, it can be hard even to know where mainstream is anymore, as some of the older groups have built-up a radically different cultural vocabulary which they share with each other, but very little of which is broadcast over the wider channels such as TV, newspapers and radio. However, the truth of it is that the mainstream still has many orders of magnitude more impact on the majority of the population than it would appear to those outside of it. The internet has allowed a few groups to establish themselves and grow strong enough to exist without the support or even acknowledgement of the normal channels. The geek culture that has grown over the past ten years or so is one that I readily ascribe to, as the use of the word (online at least) is more akin to a passion for something than anything negative or unpleasant.
I am very pleased to see that some of these communities are strong enough to interact with the mainstream and perhaps, change it for the better. I have a real soft spot for these incursions, whether they are flash mobs, large gatherings or, as is the case with ‘Bats Day‘ at Disneyland, a community sharing a holiday together. Bat’s day at the Fun Park is a simple idea – a date is picked and goths are encouraged to visit, as they will be among like-minded people and can dress naturally. Here’s a little video scanning the crowd as they line up for the signature group photo of the day:
My point?
What we call the mainstream is going to have less and less power to drive fashions and trends. People will no longer encounter ideas as they are metered out by these sources, instead, discovering and re-discovering ideas at many different points. What looks like social fragmentation is really the drive for community reasserting itself, bringing together people that are near each other in mindset, if not geographically.
I’ll finish this post with a few socially angled documentaries that I’ve enjoyed recently in the hope that you may enjoy them too:
“Winters of My Life is a portrait of Howard Weamer. For the past 35 years he has spent his winters as a hutkeeper in Yosemite’s backcountry. He fills his days writing, reading, photographing, and being an ambassador to mountain culture. This is a brief look into his world and why he chooses to stay.
“ALBERTO MIELGO is a Spanish painter. This short documentary follows two different artistic world[s] for an unexpected combination.” – nudity warning.
“A rogue with an eye for salvage – and the ladies – Ray: A Life Underwater is an affectionate portrait of one man’s deep sea diving career, told through his extraordinary collection of marine artefacts.
Like a modern-day pirate, 75-year-old Ray Ives has been scouring the seabed for treasure his whole life.”
“Paul Mawhinney was born and raised in Pittsburgh, PA. Over the years he has amassed what has become the world’s largest record collection. Due to health issues and a struggling record industry Paul is being forced to sell his collection.
This is the story of a man and his records. I hope you enjoy it.” (by Sean Dunne, the same film-maker as ‘American Juggalos’ above.)
“At The Barbershop in Drexel, NC, the atmosphere is laid back, the conversation free, and the music a cut above the rest. Emmy® nominee, Official Selection of over 60 film festivals, and Best Documentary Short Film winner at the Florida Film Festival and Woodstock Film Festival.”
“The story of the last glass eye maker in Britain.”
There is something slightly off about sentences like this one, from the New York Times:
At a minimum, it is clear that Republican voters, after delivering three different winners in the first three stops in the nominating contest, are in no rush to settle on their nominee.
You see this kind of coverage a lot in primary season: commentators make sweeping statements about the electorate’s desires based on polling or primary numbers. The problem is that it hopelessly conflates individual and collective preferences. In this sentence alone, there are two such slippages. First, clear votes for different candidates are smushed together into a collective indecision. And then, that indecision is attributed back to the individual voters. But the fact that Republican voters in three states disagree about which candidate they would like to nominate does not tell us that they are happy to make the choice slowly. It could just be that their primary process is set up in a way that doesn’t lead to an unambiguous early winner, given this year’s candidates and political climate.
The mild version of this mistake comes up in two-party elections or in polling on an issue that has only two choices. To win an election by 10 votes out of 1,000,000 cast does not mean that the electorate collectively has spoken and pointed to you. (Indeed, with many voting technologies, this difference would be well within any reasonable margin of error, so we couldn’t even be sure that the “winning” candidate got more votes.) All it means is that the election process resulted in selecting one candidate over the other, which, by the rules of our system, means giving the winner the job. A “close” election could reflect a bitter partisan divide, or vast collective indifference as between two decent choices. The number alone says little. The same goes for margins of victory: even a 65-35 victory in the popular vote — an unprecedented margin for a presidential election in U.S. history — need not mean uniform national agreement on anything. The many millions of voters on the short side of the count need not be acquiescent, just outnumbered.
In multi-way contests — like early primaries — the gap between individual and collective preferences is even more severe. Indeed, thanks to Arrow’s impossibility theorem, there may be no coherent way to combine individuals’ choices at all. Every voting (or polling) system will fail in one way or another.
This year’s Republican primaries, in particular, seem to be suffering from a severe independence of irrelevant alternatives problem. The media has settled on a narrative in which the essential choice for Republican voters is between Romney and Not-Romney. But that choice is never on the ballot, and it seems never to be presented in polls. Instead, they’re asked to cast a single vote for one candidate, with a wide array of would-be Not-Romneys to choose among.
Under these circumstances, we simply do not know what the statistical preference of Republican voters looks like. It could be that as the various Not-Romneys drop out, their support will largely break for Romney, or for the other Not-Romneys. If the former, then Romney is in fact broadly preferred by the Republican electorate; if the latter, then he is in fact broadly rejected it. By the first-past-the-post standards of political reporting, these are two very different outcomes. But by those same first-past-the-post standards, Santorum “won” Iowa, Romney “won” New Hampshire, and Gingrich “won” South Carolina and we’re no closer to having a meaningful picture of overall individual preferences.
The secret, of course, is that reflecting individual preferences isn’t necessarily the point of a voting process. Simple coordination — telling the party stalwarts whom to believe in and coalesce around — is frequently valued in a primary. So is awarding delegates in a way that permits a relatively small group of party insiders to push a favored candidate (remember the superdelegates?). And at the general election, the media narrative of a decisive “choice” almost certainly helps promote social cohesion and effective governance, even if the narrative itself is a lie.
In any event, here’s my suggestion for how to perform more illuminating primary polling. Don’t just ask people who they’ll vote for. Instead, ask them to rank-order candidates from most preferred to least. Or ask them whether they consider each individual candidate “acceptable” or “not acceptable” as a possible nominee. No poll is perfect, but I think this sort would be substantially less misleading, because it would focus attention on what actual voters’ desires, not the fictional desires of a mythical collective hypothesized from raw vote totals.
We’re streaming Top Tech Trends from ALA Midwinter 2012! Join us, ask questions, and enjoy. If you’d like to chime in on Twitter, the hashtag is #alamwttt
WordPress is a very fine writing and publishing platform but it was developed for blogging and the heyday of blogging is over. I still adore the WordPress platform and so do many other writers. Most of these people are writing long-form. All the "micro-bloggers" have migrated to Twitter. There are also poets who can make good literary use of short post-length spaces and this opinion piece does not apply to them. Put simply, long-form writers are moving in the direction of a book. A blogging platform is not the best choice.
I started my "I, Reader" series in Drupal. What a flop. I am currently much happier with WordPress but even so I struggle. Take comments. There are many lively blogs with lively comment dialogues. In the heyday of blogging I enjoyed some of that too. Today, most commenting has moved to Twitter. Today, the best use I get from comments is to make my own updates to my own posts. I add new research or comments that I will use in my next draft. I can also notify followers of updates using comment subscriptions. It's a blog platform, and I want it's collaborative elements, but it is also like a wiki, being drafted toward a single better form. When it is done, I may want to publish it as a book. I am very interested in the release of PressBooks: "a new book publishing platform, built on WordPress, that makes it easy to collaborate with an editorial team, and to generate clean, well-formatted books in multiple outputs: .epub, print-ready PDF, InDesign-ready XML, and of course HTML."
I am also interested in new writing tools. I earlier wrote a post on a concept piece, dubbed Lila. Someone said to try Scrivener or some of the other writing software tools out there. I have looked at them to some extent. I have much more in mind. I am experimenting with them in the background as I write "I, Reader". I am not opening a new project at this time, just updating you on my thinking. It's on the back burner at present, but count on something in the future.
One of the benefits of moving the MARCXML=>MARC translation algorithm away from XSLT to an inline function is the ability to provide some sanity checking beyond the simple XML validation. One of the issues that I see periodically when working with XML conversions is the need to code data truncation into my XSLT stylesheets. For example, the ETD process that we use with DSpace looks for the abstract and makes sure that the data in the abstract doesn’t exceed the 9,999 bytes for a MARC field.
Recently however, I found a different problem that I don’t run into often, but showed up when working with some data provided by the Hathi Trust. Some colleagues were given a large sample of data (32 GBs of MARCXML) data to do some research into providing better identification of government documents records. The new MarcEdit MARCXML process is able to make short work of this 32 GB file, translating the data into MARC in ~20 minutes. The problem however, that arrives, is that some of these records are too long. For reasons I cannot understand, the Hathi Trust data includes a local 9xx field, that from the context, appears to be item information. Unfortunately, some records include thousands of items, meaning that when the data is translated, the resulting record is too large (exceeds the total length of 99,999 bytes).
However, because of the new MARCXML process, I’ve been able to create a work around for situations like this. When processing MARCXML data, MarcEdit will internally track the record length of a translated record. If that record would exceed the maximum record length, MarcEdit will truncate the record by dropping fields off the end of the record. The program will also modify the 008/38 byte, setting the value to “s” (means modified) and will visually notify the user that a truncation occurred by changing the results panel purple.
While I generally take a hands off approach to modifying MARC data through the translation process, this seems to be a good compromise for dealing with what is now, a rare situation, but what I predict, will become an all too common situation as more data is created in systems without the MARC record limitations.
These changes to the translation engine will occur on the next MarcEdit update (scheduled for 1/23/2012), when I’ll post both an announcement and include a small record set that can demonstrate the new functionality. Hopefully, folks will find these changes useful, especially as technical services departments find themselves having to deal with more and more non-MARC metadata.
My university is moving its entire web content management system over to Microsoft SharePoint, and so I thought it would be a good idea to dive into the platform to get ready. My recent experience with a sandbox SharePoint installation on a hosted server has answered some questions, and made me appreciate some of SharePoint’s power, but in many ways, my initial concerns remain.
It almost goes without saying that anything Microsoft is going to be clunky. The once-enigmatic Tech Titan has not aged well. The list of failures is legion: Internet Explorer, XP, Vista, Window Mobile…As a result, many people have gone Mac or Android, Mozilla or Chrome.
It didn’t help that during its heyday, Microsoft’s market strategy focused on forcing the world of round pegs to conform to its square peg model. So the public at large has been fleeing in droves and the once mighty emperor is left naked on the stage as was the case at this year’s CES keynote given by MSFT CEO Steve “I’m going to F**kin’ Kill Google” Ballmer. Ouch.
Sadly, MSFT remains fairly entrenched in business and education, even as the precipitous flight of employees and students to anything not-Microsoft carries on. And so, we’re left with this disconnect between the ecosystem of our users and the ecosystem of IT departments. Long-term, this will get sorted out in a way that is highly unlikely to favor Microsoft, but in the transition period, we all must do what we can to fit those square pegs into their assigned holes.
Such was my primary mission for Project Spork, a multi-pronged CMS exploration that tested different CMS against a selection of requirements from our library’s production site. In this project, we built three prototyope sites in LibGuides, Drupal and SharePoint.
SporkSP (the SharePoint version) started off with a large dose of suspended disbelief as I waded through a product that was originally designed as a document-oriented wiki. It bears noting that the WCMS components in SharePoint were only later added when Microsoft realized that many of its Intranet customers were applying SharePoint to Internet problems. And this really shows when you start trying to create an institutional website with it.
Like Drupal and other CMS systems, if you come from a hand-coding/Dreamweaver background, you’re going to be put off right away. But in SharePoint, it’s much easier than in Drupal to code the old-fashioned way. However, while you can easily jump into codeview in SharePoint Designer, there is never any guarantee that the wiki-oriented SharePoint won’t strip out your code and drive you batty with security warnings.
SharePoint’s real strength, however, is in its database tools. With one click, you can connect SharePoint to an XML file, RSS feed, REST web services or external database (of almost any flavor). You can even easily make SharePoint the database editor for your external SQL database. And once you’ve made the connections to that data, SharePoint seamlessly integrates the database with the rest of your site. On this level, it blows Drupal away.
But lest you forget that you are working in a Microsoft product, let me recall the list of “Microsoft” issues that come up with this tool:
IE is required to create pages and carry out many editing tasks within the browser due to certain Active X-based features
You must install Silverlight for no better reason than the controls in the browser are built with Silverlight. In other words, without Silverlight, you can’t work effectively in a SharePoint site.
And of course, nothing is straightforward:
CSS class and ID names are frightfully machine-oriented and even change!
If you build a relational database (or something that feels like a relational database) in SharePoint (say in your test environment) you cannot move it without all the lookup fields being broken.
The development tools are divided between the IE interface and SharePoint Designer. Meaning you are constantly jumping between them and it is never clear where a given feature will be found. For example, you have to create a database in SharePoint Designer, but then create the lookup fields within IE…say what?
But if I had to pick the worst FAIL of them all, it would be the wonky way SharePoint requires you to integrate external widgets and tools into your SharePoint site. Again, to be fair, SharePoint was designed as a wiki-like tool that focused on internally held documents. So tyring to bring in external web features was not ever built into the initial product. As a consequence, the most straightforward way to do so is through an iFrame…yuck!
For example, bringing in our WorldCat Local Search Box required the iFrame method, because SharePoint strips out the search form when you place the code directly into the page html. Okay, so you use an iFrame. But this method ensures that, one, you will need to host the form html and styling on an external server, and two, that iFrames are always fraught with display issues across browsers and devices.
Later investigation suggests that the more stable way around this, is by developing web parts that can display external content. But that requires that you learn .Net and Visual Studio…just to put a search form on a web page, folks.
For a site whose most important feature (the Library Catalog) needs to be brought in through an iFrame or some other even more heavy-handed method…well, that would normally be a deal breaker for any library comparing its requirements against a CMS.
I’m sure that Campus IT can solve some of these kinds of issues for us. But that brings up my final point: using this platform means that a Library web team will almost always be reliant on Campus IT for things that they would otherwise be able to do with ease. Or, alternatively, be forced to learn to develop for SharePoint. This isn’t necessarily a bad thing, of course, but when you’re trying to keep pace with libraries that use more intuitive platforms (PHP, MySQL, Drupal, WordPress, etc.), well that means you’re likely to play catch up for years.
Library web services are always changing and will undoubtedly change even faster as we speed into the increasingly shifting landscape of e-readers, mobile devices, etc. The whole reason that librarians got into the IT business to start with, was because only they are familiar with their boutique technologies enough to make them all play nicely together.
SharePoint never had the library ecosystem in mind. Indeed, it never had web design in mind. Stil, you can use it to build very beautiful web pages that allow multiple users to edit them. The very nicely done pages that have already been deployed in SharePoint at my institution testify to how well this product can work in some cases. But as a tool that improves the productivity of library web teams and library system teams, it fails.
eXtensible Catalog is open source, user-centered, next generation software for libraries. It comprises four software components that can be used independently to address a particular need or combined to provide an end-to-end discovery system to connect library users with resources.
Through this website, we will continue, unify, and expand NASA’s open source activities. The site will serve to surface existing projects, provide a forum for discussing projects and processes, and guide internal and external groups in open development, release, and contribution.
This is the 4th book that I have finished in my Two-Thirds Book Challenge. I started it 6 October 2011 and finished it 15 January 2012. I had not intended to take so long but it is somewhat complex and, in all honesty, the rampant Freudianism/psychoanalysis is simply too much at times.
I have almost 6 pages of notes but I think I will ignore them for this review.
The central thesis is, I believe, reasonably sound. Although, certainly, it is not the only way to spin a description of cross-cultural mythology. It is in some of the (psychoanalytic) interpretation that the spinning out of control happens.
This past fall semester I took a course in classic literature and mythology, and as of today I finished a quick 3-week romp through 30 of the Grimm’s fairy tales. This book explains, or at least describes, much of what is present and happening in these stories.
One of the things I appreciated and respected is that Campbell clearly includes the stories of the Christian Bible–Old and New Testaments–in his analysis of myth.
One of the things I am unsatisfied with—I fear to be expected in Western culture and, in particular, with psychoanalysis—is the gendered explanation.
I do think the book is worth reading; some parts are certainly much better than others. In most places my notes are fairly detailed but in a few I wrote “This [such and such] is crap!” or “mumbo jumbo.”
I am going to provide a detailed list of the contents as perhaps that will provide the best overview of what the book contains/discusses:
Prologue: The Monomyth
1. Myth and Dream
2. Tragedy and Comedy
3. The Hero and the God
4. The World Navel
Part I: The Adventure of the Hero
Chapter I: Departure
1. The Call to Adventure
2. Refusal of the Call
3. Supernatural Aid
4. The Crossing of the First Threshold
5. The Belly of the Whale/li>
Chapter II: Initiation
1. The Road of Trials
2. The Meeting with the Goddess
3. Woman as the Temptress
4. Atonement with the Father
5. Apotheosis
6. The Ultimate Boom
Chapter III: Return
1. Refusal of the Return
2. The Magic Flight
3. Rescue from Without
4. The Crossing of the Return Threshold
5. Master of the Two Worlds
6. Freedom to Live
Chapter IV: The Keys
Part II: The Cosmogonic Cycle
Chapter I: Emanations
1. From Psychology to Metaphysics
2. The Universal Round
3. Out of the Void–Space
4. Within Space–Life
5. The Breaking of the One onto the Manifold
6. Folk Stories of Creation
Chapter II: The Virgin Birth
1. Mother Universe
2. Matrix of Destiny
3. Womb of Redemption
4. Folk Stories of Virgin Motherhood
Chapter III: Transformations of the Hero
1. The Primordial Hero and the Human
2. Childhood of the Human Warrior
3. The Hero as Warrior
4. The Hero as Lover
5. The Hero as Emperor and as Tyrant
6. The Hero as World Redeemer
7. The Hero as Saint
8. Departure of the Hero
Chapter IV: Dissolutions
1. End of the Microcosm
2. End of the Macrocosm
Epilogue: Myth and Society
1. The Shapeshifter
2. The Function of the Myth, Cult, and Meditation
3. The Hero Today
As a follow-up book to this one, I began another of my 2/3rds Challenge books, Mircea Eliade’s The Myth of the Eternal Return: Cosmos and History. It, too, is in the Bollingen Series. So far I am enjoying it. It is also a quite deep book and I am taking many notes. Thus, it may also take a while to get through.
This past weekend, I made my annual pilgrimage to Cambridge for the MIT Mystery Hunt, a puzzle competition on a grand scale. Teams of up to 200 people attempt to be first to solve over a hundred puzzles and put the answers together to find a coin that has been hidden somewhere on the MIT campus. This past year, my team, Codex, won the Hunt, which means that by tradition, it was our turn to write and run the Hunt this year. It was an intense, exhausting, and deeply fulfilling experience.
I like to think of the Mystery Hunt as a gift economy. Each year’s Hunt is a gift given by the previous year’s winner to the other teams. I put in hundreds of hours writing and test-solving puzzles, plus an intense final sprint behind the scenes at Hunt HQ from Friday morning until late on Sunday. Codex’s leaders easily spent thousands of hours each making the Hunt come together. All of this was completely unpaid.
Why would any sane person sacrifice a year this way? Part of it is pride: just like solving a puzzle is a way to show off your cleverness, creating one lets you show off your creativity. But I think the reciprocal obligation that gift exchange creates best explains why every year the winners take on this tremendous burden. The winning team in a Hunt is the one that has most fully enjoyed the puzzles, that has been the greatest recipient of that year’s gift. This creates a social debt, one that can be repaid only with a return gift: another Hunt. Every year, teams joke that they will locate the coin, then walk away and leave it alone so that someone else can write the Hunt. No one ever does it: everyone understands what cheap move it would be.
This also explains something else. Each year’s Hunt is typically a little more ambitious than previous Hunts, on average. The overall number of puzzles has been rising with time, and the writing teams are always adding some new element. Last year’s Hunt had an incredibly clever structure, with unusually imaginative metapuzzles. (A metapuzzle is a puzzle based on combining the answers to other puzzles.) This year, we had teams come and put on fake Broadway productions. These something mores, I think, are a way for the writing team to demonstrate that it isn’t just returning exactly the gift it was given and is obligated to give back. They show that the Hunt, a labor of love, is freely given, that we chose to add something unique and not required.
This year’s Hunt theme was musical theater, as filtered through The Producers. It’s an apt metaphor: running the Hunt reminded me of working backstage on college theater productions. Everything is a complete disaster up through and including the dress rehearsal, but on opening night, everything always comes together in front of the curtain. I had the best seat in the house to appreciate the brilliance and inexhaustible work of my teammates, and to see the ingenuity and enthusiasm of the Hunters in the audience rising to the occasion. At the Hunt wrap-up — presented as an awards show for things like “Best Wrong Answer” — I found myself choking up. Getting to be part of a Mystery Hunt is an emotional, uplifting, humbling thing.
And now for some details of the puzzles I worked on, and my favorite puzzles. Warning, some mild spoilers lie ahead:
Written by me:
My favorite is 25th Annual Putnam County Debate Tournament. It requires solvers to classify the syllogisms hidden within a series of intentionally terrible arguments. The difficulty was slightly miscalibrated: many teams got stuck on the step of realizing that there were syllogisms involved, rather than on the more fun step of peeling away the informal arguments to find the (amusingly invalid) syllogisms within. It got called “this year’s WTF puzzle” by one solver.
Tax in Space, was described by one reviewer as “straightforward(ish).” This puzzle started life as a logic problem that would actually use some real legal doctrine, and mutated repeatedly. In its final version, it’s a shaggy-dog puzzle: a long and convoluted joke. As a bonus, there are in-jokes for anyone who’s studied basic tax law (e.g. “Capital Gains” and “lower-case gains”).
Raw Bar was a late-in-the-day idea. I was looking over a sushi menu and thought, “You know what looks kind of like a puzzle: sushi menus.” It seemed obvious that the ingredients in a roll could make a cryptogram, and from there, what could they be a cryptogram for? This one didn’t quite work; it was both too hard and too easy, even if the concept is decent.
I also helped write a piece of the endgame, which isn’t yet online. As part of it, I got to dress up as Watson 2.0.
My favorite other puzzles:
Potlines: A cute, well-executed idea. Once you have the “aha” about what the diagrams represent, what remains is just the right level of difficulty: doable but not trivial. The elegance of the illustrations makes this one work.
Slash Fiction: Very nerdy and very silly. The idea is clever (although likely to be baffling if you don’t have computer experience), but the execution absolutely sells it. Seth and Vera took a secret four-day trip to Paris to film it.
Yo Dawg, I Herd You Like Puzzle Hunts: A multiply recursive puzzle that requires no special expertise to solve, this one’s construction is absolutely brilliant. And it had the best title in the Hunt. Whenever we called a team about this puzzle, we’d lead off with “Yo Dawg, …”
Paper Trail: A nice little diagramless crossword with a twist.
Winning Conditions: Play with this for a bit, until you get the idea. Then try to win. Yeah, it’s devious. And fun.
B.J. Blazkowicz in ‘Wintertime for Hitler’: Yes, it’s a Wolfenstein 3D / “Springtime for Hitler” mashup. And yes, it really is playable. And yes, it’s a good reminder about how much we’ve learned about FPS level design in the last two decades.
Incredible Edibles: Another cute, well-executed idea. A good one for non-puzzle-experts to try their hands at.
Critical Thinking: Like my puzzles, this one has a prominent humorous strain. But this one has an actual humorous payoff each time you make progress in solving it.
Dawn of a New Era: Kai has a real gift for elegant puzzle mechanics. You’ll learn a lot in the course of solving this one.
In Vivo and Makefiles: For heavy UNIX users only, but lots of fun for them.
Twosquare: I helped fact-check this one, and it was plenty of fun. Prepare to watch some truly stunning magic tricks, I mean illusions. Be sure to read the alt-text on the images; it provides a significant but important hint.
Picture an Acorn: Not only are the individual pictures fun to identify, but the extraction of the final answer is exceedingly clever.
Itinerant People of America: I didn’t solve this one, and I admire anyone who can. Notable because we got John Hodgman to embed an important clue in one of his blog posts.
Award-Winning Poetry: Another puzzle whose humor is perfectly embedded. Broadway musical fans have a shot at this one; anyone else should just keep moving on.
Carb Pool: We gave each team two bags of pasta: one intact and one broken. And just to be sure that they didn’t think the number of pieces was important, we broke it in front of them, violently. This one required several hours of cutting dry pasta by hand. Here’s a photo:
Set Theory: Not a novel idea, not that difficult, very well-executed.
Cross-Breeding: A puzzle whose implementation perfectly reflects its concept.
Course 7E: The first puzzle I test-solved, and still a favorite. Not quite “funny” per se, but definitely enjoyable.
Functions: Arguably the most widely admired puzzle in the Hunt, judging by the number of Codexians who were raving about it.
Rats: You had to see Michael (an actual MIT alumni interviewer) in action to get the most out of this one, but having a interview to be admitted to the second half of the puzzle was an idea of loopy genius.
Sovereignty: I fact-checked and helped edit this puzzle, and in its final form it requires some very nice logical reasoning. Per the references to “players,” should probably not be attempted by non-gamers.
Argh: Like Andrew, I couldn’t believe this one hadn’t been done before. But it hadn’t, and now it has been, and in style.
Encoded: I haven’t otherwise coded in at least a year, but I installed two programming environments and learned some new libraries to do this one.
Screen Test: I like the concept, but I couldn’t have solved this one alone.
My favorites metapuzzles were:
Charles Lutwidge Dodgson: play chess and Scrabble simultaneously, each with a hidden twist. I spend a day grinding through the chess half during test-solving, and never noticed the time flying by.
Blogs are bringing the tools of scholarly communication to the mass market, and with the leverage the mass market gives the technology, may well overwhelm the traditional forms.
It was developed based on experience with PLOS Currents, a rapid publishing journal hosted at Google. After a detailed review of the alternatives, the developers decided to implement Annotum as a WordPress theme providing the capabilities needed for journal publishing, such as multiple authors, strict adherence to JATS (the successor to the NLM DTD), tables, figures, equations, references and review. The leverage of mass-market publishing technology is considerable. The paper describing Annotum is well worth a read.
The Great Wikipedia Protest Blackout of 2012 did not result in any particularly significant increase in site searching at the University of Michigan Library. While traffic was up on January 18, 2012, over the same day the previous week, (January 11), the increase was about the same as for the day before and the day after -- reflecting the increasing workload of the academic semester more than any Wikipedia-inspired bump.
January 18 compared to January 11, 2012
Here are some numbers to illustrate the point. For "Outage Wednesday" compared to the previous Wednesday (January 11), searches were up slightly: 4% overall, and 14.5% for the default default "MLibrary" site search:
Search Kind
1/18/12
1/11/12
Change
Percent
MLibrary
4260
3719
541
14.55%
Catalog
4518
4870
-352
-7.23%
Articles
1805
1579
226
14.31%
Total
10583
10168
415
4.08%
January 17 compared to January 10, 2012
However, a somewhat larger overall increase is noted between the Tuesday before (January 17) and the Tuesday a week earlier (January 10): up 9% overall and 10% for the default "MLibrary" site search:
Search Kind
1/17/12
1/10/12
Change
Percent
MLibrary
4297
3894
403
10.35%
Catalog
5326
5427
-101
-1.86%
Articles
2211
1465
746
50.92%
Total
11834
10786
1048
9.72%
January 6-12 compared to January 13-19, 2012
For a 7 day week extending from the Friday before the outage to the Thursday after, when compared to the previous week, we see an actual decrease in Outage Week over the week before (a decrease of 2.45% overall, although a small 2.2 percent increase in the default "MLibrary" site search:
I found this call for participation that I thought would be of interest to many of you. Please share your excellent training stories so that others can learn from what you’ve done.
We are looking for higher education libraries, particularly in the US, UK and in Scandinavia, which are delivering exceptionally good and/or innovative support services to research and teaching staff.
If you think your academic library is doing well in supporting research and teaching faculty, we want to hear from you! Your library could be featured as an example of good practice helping the academic library community
to promote and develop novels ways to strengthen its relations with academic departments;
to enhance the marketing and profiling of library services for this constituency;
to maximise its value to research and teaching staff; and
to demonstrate that value within and beyond the institution.
If you would like to be considered as one of our eight case studies, to be undertaken during January to March 2012, or would like more information, please contact us.
The Rise of the New Groupthink : “Solitude is out of fashion. Our companies, our schools and our culture are in thrall to an idea I call the New Groupthink, which holds that creativity and achievement come from an oddly gregarious place. Most of us now work in teams, in offices without walls, for managers who prize people skills above all. Lone geniuses are out. Collaboration is in.”
The Amazing Discussion That Led to the Wikipedia Blackout : “At Wikipedia, one of the corest of core values is Neutral Point of View, contributors’ collective goal of “representing fairly, proportionately, and as far as possible without bias, all significant views that have been published by reliable sources.”… So! The decision to make English Wikipedia dark tomorrow — to go from no POV to whoa, POV — wasn’t one that Wikipedians took lightly. It was, on the contrary, like almost everything that happens on Wikipedia, the result of extensive deliberation and debate. It was agonized over. Like, agonized.”
Young, in Love and Sharing Everything, Including a Password : “Young couples have long signaled their devotion to each other by various means — the gift of a letterman jacket, or an exchange of class rings or ID bracelets. Best friends share locker combinations. The digital era has given rise to a more intimate custom. It has become fashionable for young people to express their affection for each other by sharing their passwords to e-mail, Facebook and other accounts. Boyfriends and girlfriends sometimes even create identical passwords, and let each other read their private e-mails and texts.”
Pinterest Works Better Than Google+ : “Let’s be grown up about this. Pinterest is an app for sharing lists of scrumptious-looking stuff. It’s not for girls or guys, it’s for people who like looking at things. The story I’ve heard is that it was designed for architects and designers and “then brides found it.” This is why, my sources explain, it tends toward the jewelry-and-table-settings end of the spectrum.”
The Year in Review at Kickstarter : “Darling of the crowdfunders, Kickstarter released its stats for the past year, and there is a lot of data to digest. The total number of projects is more than double from last year, the success rates for funding them is up slightly, and the total dollars pledged is close to a $100 million, which is more than triple what was pledged last year.”
I noticed this cover of “Five Laws” on the Otlet’s Shelf example. I’ve never read the book (gasp!) so I don’t know if the design is meaningful or just random. Is there a theme of three or thirds?
Due to space issues, I had to move my MySQL data files (/var/lib/mysql) on a Ubuntu box to another file system. I did that and created a symlink, but MySQL would not start. It turns out to be an easy fix. I found a post on MySQL Forums from someone who had a similar problem, and someone named Richard Guy posted my solution:
Did you fix the apparmor config file for mysqld and restart apparmor?
first you need to edit /etc/apparmor.d/usr.sbin.mysqld and add the new fully qualified (ie, no symbolic link) path(s). [you may want to leave the original /var/lib/mysql entries intact ]
Over the holidays while I wasn’t paying attention, there were some interesting filings in the HathiTrust case.
First, the Authors Guild and HathiTrust reached an agreement on how to litigate, given the state universities’ sovereign immunity. Under the stipulation, all of the individual regents named in the lawsuit are out. In their places, the presidents of the universities involved have agreed to be defendants. If they lose, they agree that they have the authority to order their libraries and HathiTrust to knock off whatever activities the court orders them to knock off.
Second, HathiTrust filed a motion for “judgment on the pleadings.” As usual, the motion itself is boring; all the action is in the associated brief. HathiTrust claims that the Authors Guild and other authors groups’ don’t have standing to sue on behalf of their members, and that none of the plaintiffs have standing to sue to stop the use of orphan works. In both cases the basic argument is the same: you’re not allowed to sue for infringement of a copyright owned by someone else. This doesn’t go to the part of the lawsuit over the HathiTrust database itself: the motion would narrow the lawsuit, not block it entirely.
And finally, Judge Baer entered a new scheduling order. He gave the plaintiffs until January 31 to respond, and the defendants until February 17 to reply to the response. Oral argument will be held on March 2.
The snow is falling here in central Ohio, so I’m eager to leave here and head to warm Dallas for ALA Midwinter 2012. I’m looking forward to catching up with colleagues; making new acquaintances; learning the latest thinking on RDA, linked data, and standards activity; and talking about free/open source software in libraries. On the latter point, I encourage you to come see me give an introduction to the newly announced FOSS4LIB site, answer questions, and take feedback on Saturday morning (10:30 to 11:30) or Sunday morning (10:30 to 11:30). (Or, if you are not coming to Midwinter, sign up for one of the free webinar sessions later in January and February.)
ALA is using a new iteration of its scheduler this year, and it keeps getting better and better. This one even allows you to embed your selected schedule as an <iframe> on an arbitrary page. So here is my schedule:
You can follow me on Twitter where I’ll be tweeting about #alamw12. A Twitter mention or direct message is also the best way to get ahold of me while in Dallas.
Safe travels if you are headed to Midwinter, and I hope to run into you there.
I posted earlier today about a survey for libraries to share their satisfaction with their ILS. What I didn’t know was that if you chose the library type of anything other than Academic or Public you would be kicked out of the survey.
As an SLA member and a special library supporter I find this a bit offensive. Why don’t our opinions matter? A survey of libraries in the US done by the ALA finds that there are actually more special libraries than there are academic libraries (in fact there are many many more school libraries than publics or academics and they too are excluded from this survey) – this is a big portion of our profession and sharing their opinions will help others make informed decisions about their future ILS purchases.
This came across my email today and I thought I should share it with you all.
Library Journal is conducting a snap survey to determine library and patron satisfaction with integrated library systems (ILS) in both public and academic libraries. Are you in charge of technology, collections, or reference at your library? We are eager to hear your thoughts about the systems that you and your patrons use every day.
Also on Tuesday, Judge Chin allowed three of the six representative plaintiffs to withdraw from the case. Herbert Mitgang, Daniel Hoffman, and Paul Dickson are out; Betty Miles, Joseph Goulden, and Jim Bouton remain. I would love to know what the story behind this move is.
In a previous post, I skimped on the details about the lightning talk strand at Dev8D. Time to sort that out, starting with a definition of “why”.
Lightning talks should provide enough information, presented concisely, to interest, inform, bootstrap and otherwise start people talking about something. A 3 to 5 minute talk with or without slides and props is more than sufficient for this.
A lightning talk should not do the job of the internet. Give an overview of something, and share the URL. Don’t seek to talk through all the text on the site.
It can be many things, such as:
A way to better kickstart someone’s exploration of a topic, project or tool,
A foundation to a discussion that you want to have, either at the event or online,
A ‘mini-workshop’ to illustrate a technical point that is often confused or that has confusing documentation,
a ‘one-liner‘ that you’ve found saved you years of time,
or a taster of something that you feel passionately about.
Straightforward. But this only considers it in one direction. What about the audience?
The audience needs to be given the mental space and time to digest, note and take in the information presented, especially for these short, dense talks. I think this is key.
Information Whiplash
In my own limited experience, I have found too many events that use lightning talks as a means to cram in as many talks in as possible. I tend to remember one, maybe two of these and even rarely remember the URLs I’ve been given. I just didn’t feel I had the time to make good notes or good decisions about whether or not this information was actually useful to me. The URLs would remain as tabs on my browser for a while, until I forgot even why I had them open.
Let’s work in some space into the schedule then, where nothing happens. Nothing formal, anyhow. You’ll have time to talk, ask questions of one another, talk to one of the speakers or, give yourself a break and do absolutely nothing. Not everyone can ‘sprint’ through information and take it in at the same time.
The other strands of the event are mostly portioned into blocks of an hour in length, so it makes sense to organise the talks in a similar fashion. It’s not a sensible idea to join a workshop halfway through for example, unless the workshop has been built around that idea.
Consider some example schemes:
Scheme A: (10 talks/hr) This is the scheme I have most experience enduring (the ‘sardine’ scheme of talk arrangements) but with a little prescribed Q&A/relaxation time added in. What typically happens here is that one or more of the first talks overruns, and the last talk is squeezed in and oops, there goes the Q&A time as we have to start the next series of talks. By the time a sanctioned break happens, that inspiration, that spark of an idea you had because of a talk is often gone and you can’t remember what precisely it was that you wanted to investigate further. I am not a fan of scheme A
Scheme B: (8 talks/hr) Same as A but with more leeway given. 4 talks and then a large break, which due to human factors won’t be 12 minutes in reality, but should be at least 5 minutes in length. Getting better, but how about if we split the talks up a little more.
Scheme C: (still 8 talks/hr) scheme B split into paired talks. 4 sets per hour, with two talks per set. (It would be interesting for the topics of the paired talks to complement each other in this scheme) It provides a good spacing of talks, leaving the gaps required for people to be comfortable sitting, processing what they have just heard.
Scheme D: (6 longer talks/hr) Essentially, scheme B with 5 minute talks. More space to talk through a topic or idea, but might also give enough time to be responsive to audience queries or directions.
There is also a scheme E - ”Normal talk series” – 4 x 13 minute, 3 x 18 min or 2 x 28 minute talks. These I didn’t think I’d have to illustrate.
What’s the plan then?
(Comments on this very welcome, as this is just my opinion.)
During the first period of the day (‘Core Skills‘) -> Scheme D: 3 five minute talks per 30 minutes, shifting to 2 talks (skipping the middle talk) per 30 minutes if that makes more sense. This is because I expect that people won’t be so talkative at this point in the day, and more prone to sitting and taking in information.
During Lunchtime and ‘Emerging Technologies‘ period, scheme C: 2 three minute talks per 15 minute blocks.
During the final part of the day, ‘Pushing Ideas Further’, this will be flexible, but allocation will begin in the scheme D pattern.
Pre-planned talks will happen earlier in the day, with more open spots appearing later on that is available to anyone who wants to talk on the day. This will mirror the theme of the conference as a whole; starting with our guess of what might be useful to you and ending with your choice of talks and conversations that are useful to you.
Please, add comments below if you agree, disagree, hate, or just don’t care.
Yesterday was Internet blackout day. To do my little part I followed Wikipedia, 2600, Reddit and a host of other by blacking out my site. So just how many people were inconvenienced by what I did?
Feel free to send this to others you think might be interested in the topics. If you find these threads interesting and useful, you might want to add the Thursday Threads RSS Feed to your feed reader or subscribe to e-mail delivery using the form to the right. New this year is that Pinboard has replaced FriendFeed as my primary aggregation service. If you would like a more raw and immediate version of these types of stories, watch my Pinboard bookmarks (or subscribe to its feed in your feed reader). Items posted to are also sent out as tweets; you can follow me on Twitter. Comments and tips, as always, are welcome.
Support for Web Bill Wanes as Protests Spread
When the powerful world of old media mobilized to win passage of an online antipiracy bill, it marshaled the reliable giants of K Street — the United States Chamber of Commerce, the Recording Industry Association of America and, of course, the motion picture lobby, with its new chairman, former Senator Christopher J. Dodd, the Connecticut Democrat and an insider’s insider.
Yet on Wednesday this formidable old guard was forced to make way for the new as Web powerhouses backed by Internet activists rallied opposition to the legislation through Internet blackouts and cascading criticism, sending an unmistakable message to lawmakers grappling with new media issues: Don’t mess with the Internet.
The population of the internet became very familiar with the Stop Online Piracy Act (SOPA) and the PROTECT-IP Act (a.k.a. PIPA) today with major internet services like Wikipedia blocking access to its articles and Google placing a black rectangle over its logo. Advocacy sites like americancensorship.org and blacklist.eff.org and www.google.com/landing/takeaction sprang up to prompt U.S. citizens to call their Senators and non-U.S. citizens to petition the U.S. State Department to set in motion opposition to bills that once seemed inevitable. And all sorts of people took to Twitter to protest the fact that they couldn’t use Wikipedia to answer their homework.
It wasn’t all a one-way street, though. Former Senator Chris Dodd (and now MPAA chairperson) denounced the protests as “an irresponsible response and a disservice to people who rely on [the sites] for information and [who] use their services.” House Judiciary Committee Chairperson Lamar Smith announced that his committee will resume consideration of SOPA in February. And PROTECT-IP Act sponsor Senator Leahy released a point-by-point rebuttal to some of the claims made by opponents.
I’ve stated my objections to SOPA and my objections to PROTECT-IP, and reiterated them today by putting up an anti-SOPA/PROTECT-IP splash page on DLTJ. I also still think there is more to learn a few levels deeper than the anti-SOPA/PROTECT-IP advocacy. ProPublica has a project called Who in Congress Supports SOPA and PIPA/PROTECT-IP? that offers a variety of ways to categorize supporters and opponents of the legislation including an accounting of campaign donations by industry. On my own Stop-SOPA/PROTECT-IP page, I ask readers to look into Laurence Lessig’s #Rootstrikers movement. A big part of the disconnect and dysfunctional nature of public office holders is the role that campaign contributions play — or, at best, have the appearance of influence — in the public policy decision making. So while SOPA/PROTECT-IP opponents may have won the battle, there is much to do to win the war of undue influence that created SOPA and PIPA in the first place.
More Legislative Shenanigans: Research Works Act
In case SOPA, the Stop Online Piracy Act, hasn’t given you enough heartburn, here’s another development on the legislative horizon to be concerned about–H.R. 3699, the Research Works Act. The Association of American Publishers has provided a summary of what they hope the bill will accomplish, which is a frightening read for those of us committed to the principles of Open Access. It appears that H.R. 3699 would seriously threaten public access to federally funded research and deal a critical blow to the Open Access movement, which has been buoyed by exactly the kind of activity H.R. 3699 seeks to curtail in the AAP’s view, namely public access mandates and the development of repositories for publicly funded research.
Yes, that’s right — more intellectual property legislation in front of the U.S. Congress. This time it is a bill that would protect the business interests of academic publishers by preventing the U.S. government from mandating open access to federally funded research. An article in The Guardian (U.K.) paper says academic publishers have become the enemies of science. The twist here is that one of the sponsors of the Research Works Act is none other that Representative Darrell Issa, one of the leading opponents to SOPA in the House Judiciary Committee. As you might guess, campaign donations are involved and so there is a call from #Rootstrikers to help fight “SOPA v2″.
Internet2, McGraw-Hill, Courseload, and Five Universities Implement eText Pilot in Spring 2012
Participating universities in the pilot get McGraw-Hill eTexts, the Courseload reader and annotation platform integrated with their Learning Management System, and can be part of a joint research study of eText use and perceptions. Through the Courseload software, students can print, use social annotation with classmates and instructors, and access their eTexts on any HTML5-capable tablet, smartphone, or computer. Students will receive their eTexts at no cost as the institutions are subsidizing the study, and students who prefer a full hardcopy book may optionally order a print-on-demand version of the eText for a $28 fee. Faculty interest at the pilot institutions has been very strong.
This is good news for students and etextbooks. It sounds like a good experiment and I’m eager to see the outcomes of the pilot. And something that might make next week’s DLTJ Thursday Threads? The rumor that Apple is expected to delve into textbooks in an announcement today.
Help get the NACO/LC Authority File ready for RDA.
The Acceptable Headings Implementation Task Group has been established by the Program for Cooperative Cataloging to develop an implementation plan for preparing the LC/NACO authority file for RDA. The work of this group is largely based on the report of an earlier PCC Task Group; this group recommended a series of mechanical operations designed to make as many of the records in the LC/NACO authority file as useful as possible under RDA without individual review. The present group is exploring each of the changes suggested by the first group in detail, and fitting each into a proposed schedule.
The group has created a Facebook page as one means for communication between the group and the larger community: http://www.facebook.com/#!/pages/PCC-Acceptable-Headings-Implementation-Task-Group/232585923488557 We invite comments on our work, but ask that comments follow the guidelines found in the “Info" section of this page. THE INFO SECTION describes the Group's activities, including the broad areas in which the group is interested in receiving comments and those areas not in the Group's charge in which the group is not interested in receiving comments.
a document describing a phased implementation of the suggested changes
a discussion of the issues involved in the handling of subfield $c in personal names
a discussion of the suppression (or otherwise) of 4XX fields for AACR2 forms of name
The Group is in the midst of drafting a series of documents describing the stages in which this work should be performed, and the details involved in the work. These documents will also be posted to the Group's download site, and notices of the postings placed on the Facebook page. The Group is actively soliciting volunteers interested in assisting the Group in its work. These tasks will include the review of long lists of changed headings for correctness.
While I was at the PASIG conference this last weekend, a number of people talked about the death of the harddrive, at least in the sense of our personal portable devices. The popularity of ultrabooks and small form notebooks was discussed many times, noting that personal computing will move more and more away from local copies to cloud-based drives because:
Solid State Drives provide the instant on/performance that people are wanting in their portable devices
The Expense of Solid State drives and their current relative small size will eventually relegate storage off the local device and into the cloud.
While I certainly agree that this likely will continue to be a trend (look at how tools like Dropbox are changing the way researchers store and share their data), I think think that many of the folks at PASIG may be too quick to overlook some of the very cool developments related to SSD technology that allow for microform factors, allowing ultra portables to support both a traditional SSD drive and the more traditional spinning drive. Of course, I’m talking about the current work being done with msata drives.
Currently, there are very few mainstream systems that support msata technology, which is unfortunate because these really are cool devices. The two best probably are produced by Intel, which produces a 40 GB and 80 GB flavor of their drive (http://ark.intel.com/products/56547/Intel-SSD-310-Series-(80GB-mSATA-3Gbs-34nm-MLC)). When I was looking for a replacement laptop this last month, I was looking specifically for a device that had both a SSD and traditional drive setup. However, my requirements that the system be under 4 lbs and compact made this a difficult search. However, in doing my research, I stumbled upon the Intel msata drive system.
Now SSD drives are small to begin with, but the msata drives are downright microscopic. The image below, taken from a review of these devices, shows just how small. In fact, when I ordered one, I had a hard time believing that they really got an 80 GB drive on a chip a little bit bigger than a quarter. Yet, they did.
So how well does this work? From my limited experience with it (about 2 weeks) – great. Intel provides a set of disk tools that allow you to migrate your current partitions onto the SSD disk – however, I choose to do a fresh install. Installing Windows and all my programs onto the SSD drive cost me ~35 GB. Setting up a little symlinking, I moved all the data components to the traditional harddrive (500 GB), leaving the SSD for just the operating system and programs. Then I tested.
When I first received the laptop, I did some start up and shutdown testing. On a clean system, the laptop, running a I-7 with 8 GB of RAM would take approximately 35 seconds for Windows 7 to finish it’s startup cycle. Not bad, but not great. Additionally, on a full charge, the system would run for ~3.7 hours on the battery (not good). Running the Windows Experience tests, it gave the 500 GB, 7200 rpm drive a 6.2 (of 7.9) performance score.
After installing the msata drive and making it the primary boot partition, I gave the tests another whirl and the difference was striking. First, on the Windows Experience testing, there was a significant different in rating. Using the SSD as the primary system disk, the Experience tests gave the Intel 80 GB msata drive a score of 7.7 (of 7.9) – a pretty high score. So what does that mean in real life? Well, let’s start with boot times. From a cold boot, it now takes Windows 7 approximately 5-7 seconds. Closing the lid and opening it back up has essentially become instance on (for a while, I was wondering if the system was actually going to sleep when I closed the lid because it was on as soon as I opened it). And finally, battery life. On a full charge, under heavy use at the PASIG conference, I got nearly 8 hours on a single charge.
While the move away from local disks may indeed happen in the near future, my more recent laptop purchasing experience showed me that for those that want to continue to have a very high performance system, with an small form factor – it is possible to have the best of both worlds utilizing these emerging SSD technologies to create very high performance (and relatively low-cost) portal systems.
On Tuesday, without fanfare, Judge Chin entered a new scheduling order in the Authors Guild case. The new deadlines are:
The authors file their response to Google’s motion to dismiss by February 6; Google replies by February 17. [Previously, these dates were January 23 and February 3, respectively.]
Google responds to the authors’ motion for class certification by February 8; the authors reply by April 3. [Previously, these dates were January 26 and March 12, respectively.]
Discovery closes on April 13. [Previously, this date was March 30.]
Thus, this new order slows down the current round of dueling motions by two weeks.
Federal Government of Canada libraries are taking a hit. The Ottawa Citizen covered the closure of the HRSDC library earlier here and the HRSDC Library posted their update earlier today on the FLC-CBF listserv (login required) and copied here. This is not the beginning nor the end – inside reports confirm a range of libraries [...]
This is that time of year when I try to perfect that look on my face that says "If the first sentence out of your mouth doesn't include the words 'at ALA' or "gushing blood' I don't really have time to talk. It's another busy one as usual, with my time split between OCLC activities, LITA activities, and my first year as an ALA Councilor. Here are a few highlights for me this year:
OCLC (I'll be at or near the podium at the following information sessions)
OCLC Americas Regional Council Annual Member Meeting and Symposium, Friday 12-5pm, Omni Dallas Hotel, Dallas Ballroom EFG
The Power of Cooperation at Webscale: OCLC's Strategy for Academic Libraries, Saturday 8:30-10am, Dallas Convention Center, Room C155
The Power of Cooperation at Webscale: OCLC's Strategy for Public Libraries, Saturday 10:30am-12pm,
Dallas Convention Center, Room C155
Whether or not you've heard about WorldShare, Webscale, and the power of cooperation in libraries, I would encourage you to jon Cathy De Rosa and her OCLC colleagues at these great events.
Workflows Transformed: Librarians Share Experiences with OCLC WorldShare Management Services, Saturday 1:30-3:00 pm, Dallas Convention Center, Room C141
Excited about this one, as I will only speak for 5 minutes before turning things over to Lynne Jacobsen (Pepperdine) and Stefanie WIttenbach (Texas A&M San Antonio) from two libraries that have been live with WorldShare Management Services the longest.
E-resources at Webscale: Simple Solutions for Management, Discovery and Delivery, Saturday 4-5:30pm, Dallas Convention Center, Room C156
I'll be at the podium for this one with lots of audience support from my colleagues. Isn't it time that we started talking about new solutions for managing our most valuable resources? Wouldn't it be even cooler if you could do something about it now? Well, you can...come and find out.
Admit it, you're awake anyway...why not come get a great full breakfast and get a great overview of everything the world's largest library cooperative is up to?
for a full list of OCLC events and registration, go here.
LITA
Happy Hour, Top Technology Trends, LITA Town Hall....LITA Rocks. I suggest you check out the full list of events here. But I also want to add a selfish shout-out for a couple of interest group events happening this time:
The first meeting of the "Technology and Industry Interest Group." I had a hand in putting this IG together and am thrilled that Marshall Breeding and my colleague Matt Goldner have agreed to serve as co-chairs of this cool new IG. Marshall posted about it on GuidePosts. Saturday, 10:30 - 12 noon, Dallas Convention Center A303
My boss, Robin Murray, will be at the Next Generation Catalog Interest Group on Sunday 10:30 am - 12:00 noon, Dallas Convention Center C156, to talk about next-generation systems and services
And every other time gap is filled with Council meetings! It's going to be another great ALA.
The Evergreen community participates along with hundreds of other websites in raising awareness about two pieces of U.S. legislation, SOPA and PIPA, by posting the following banner on the official website.
Please be aware that many websites have decided to “go dark” today to raise awareness about two pieces of U.S. legislation, SOPA and PIPA. Some say these bills seek to fight piracy and protect intellectual freedom. Others say the the bills “reduce freedom of expression, increases cybersecurity risk, or undermines the dynamic, innovative global Internet.
As libraries and members of an online community, we felt it was important to raise awareness of this issue. For more information on SOPA and PIPA and suggestions for how you can take action, see http://sopastrike.com/strike.
On Saturday 28th January we’re getting together for an Open Economics Hackday where we’ll be be wrangling data and building apps related to economics — all are welcome!
Where: Online (IRC, Skype) and in person in London (public space of the main hall on floor G at the Barbican)
Who: Anyone! Coder, data wrangler, economists, illustrator or writer …
As with all hackdays, exactly what gets work on gets decided on the day (you can add suggestions to the etherpad). However, one particular idea, which we could become a submission to Apps4Italy, is set out below.
One Idea for What We’ll Work On: ProgressVote
One of the most fundamental questions in economic research is: how do we measure social progress? Policy makers have come up with alternative measures accounting for environmental impacts, inequality, happiness and other indicators of human development.
However, the multiplicity of factors has caused another problem – how do we decide on the importance of each individual factor in a composite index? They could be either equally important (such as in the HDI) or they could be given different weights.
In our last project YourTopia – which was one of the winners of last year’s World Bank Apps4Development Prize – we offered one possible solution by letting you decide on which dimensions and aspects of economic development to prioritize.
However there are limitations to such an approach: faced with a myriad of technical indicators people are often overwhelmed by the complexity: Does life expectancy at birth matter more than the inflation rate or the M2 money supply? And what does M2 money supply even mean?
In ProgressVote, we’d like to improve on YourTopia in a variety of ways:
First, by combining proxy voting with the crowd-based Yourtopia approach: Instead of voting for indicators, people vote for expert statements that interpret the dashboard of variables. By doing so, it is hoped to strike a balance between expert judgements and the interpretation of the general public: Experts may be more able to interpret technical data, but in the end it is the citizens who decide which expert statement to endorse.
Second, we’d like to add support time series — so you can see how progress (or lack of it) has evolved over time — as well as better geo support — for example, so it is possible to look at regions as well as countries have performed (consider Italy for instance).
Interested? Then come join us on Saturday 28th January!
This blog will be present first-time users with a warning page on January 18, 2012 — the day that many internet sites are using to protest the Stop Online Piracy Act (SOPA) — and January 23rd, 2012 — the day before the U.S. Senate may vote on the PROTECT-IP act. DLTJ is proud to join many other sites in this demonstration of solidarity for an open, transparent internet.
Thought you heard that SOPA was dead? Or was modified to be acceptable? Or that PIPA is on the ropes? As of January 17th, these statements aren’t true:
This legislation is bad for the health of the internet, bad for companies — those that exist now and those that would otherwise come — that make their living on the internet, and bad for the standing of the United States in the global community supporting freedom of speech and due process principles.
Looking for something to do to make your opinions known? Try one more more of these:
Look into the #Rootstrikers movement. A big part of the disconnect and dysfunctional nature of public office holders is the role that campaign contributions play — or, at best, have the appearance of influence — in the public policy decision making. This certainly seems to be true for the current SOPA debate.
Watch the Learn to Be a Better Activist webcast on January 18th (or the recording after that day). It is a full day of talks from people who know something about making voices heard in Congress.
Add a banner to the bottom of your Twitter profile picture to spread the word of your opinions.
Just before Michael Gove delivered his speech at BETT 2012 on the forthcoming (and huge) changes to the way IT is treated as a subject in schools, there were many articles published about the dire situation of IT trained teachers. Most rested on a statistic that only 3 teachers out of 28 thousand had any IT background previous to training. For example, it was oft-repeated in newspapers:
This statistic sounded awfully low. Really, really low. Too low, in fact.
@mberry asserted that in King’s College London (KCL) they had more than 18 CS graduates in their class:
“From @mberry: @benosteen I know that in that year KCL had 18 computing graduates out of 28 ICT/Computing PGCE trainees. 18>3″ (11 Jan 12)
So what was going on here? I wrote to the GTC to find out, as a few quotes attributed this statistic to them. Helpfully, they got back to me to clarify this data point (emphasis my own):
This data showed the number of teachers who qualified in 2010 who listed computing or computing science as their initial teacher training (ITT) qualification subject – in other words, their main area / subject of specialism.
It is accurate to say that of the 28767 teachers who were awarded qualified teacher status (QTS) in 2010, and registered with the GTCE, three gained QTS with computing or computing science as their main teaching specialism.
This compares with 750 teachers, in that 28767, who qualified with a specialism in ICT and were registered with the GTCE. Please note that the GTC will be closing at the end of March 2012 and we now no longer have the resources to respond to totally new data requests that cannot be fulfilled from previous recent data analysis work.
Regards Roger [Roger Greenway, Systems Manager, GTC]
So, now you know. The statistic was about registered teachers who had computing or computing science as their main teaching specialism, not about CS grads entering teaching.
The Vancouver Public Library is documenting a large co-cocreation process called Free-For-All. They’re soliciting community input about the following topics:
Public places and learning spaces
Future directions of library collections including digital formats
The role of the library supporting children and families
The role and purpose of public programming and training
And not, according to its architect, Piers Gough, for whom “books haven’t gone away. Libraries still hold these magic realms of invention, realms of ideas. They’re places where you’re not told what to think; they’re also places where you can stay and stop and spend as long as you like.”
Today, January 18, is a day of widespread Internet blackouts to oppose Protect IP Act and Stop Online Privacy Act, PIPA and SOPA in the US. Just about every post from every country in my RSS reader this morning is discussing this topic. Global Voices has a particularly good post on the issues, and importantly, the post is available in 6 different languages.
I recently returned from vacation, including a week in Los Angeles. I attended a TV show taping, visited Warner Bros studios, hung out at the Santa Monica pier (as seen in any number of TV shows), went to the movies, and spent much of the week on buses and trains reading The War for Late Night by Bill Carter. I am an unashamed fan of The Business. And a librarian who believes in equitable access to information, globally, to all.
One of the most interesting aspects of Carter’s book is the idea of Late Night being a symbol of the problems in media: the divide between the television business as it has always been, tied to ratings and advertising dollars, and new business, embracing a new on-demand, timeshifting culture:
“For Conan, the change he had made in his career had taken on the trappings of religious conversion. He had seen the future; there was no going back. “Everybody is facing the complete transformation of our business,” he said. “You can resist and fight it or you can go with it. I’m saying go with it. Let’s see where this goes.” (Carter, p. 404)
The Hollywood Reporter posted an astonishing article on Monday, Why Hollywood Is Losing the Public Relations War on Piracy (Analysis). The article is a surprisingly honest critique of how media companies have got it wrong in framing the debate on copyright infringement. An argument we hear most is that downloading costs jobs in Hollywood, yet:
“Making an argument on jobs might seem like a winning political one in a tough economic climate. But increasingly, the tech sector is seen as the engine of economic growth in this nation. Hollywood’s estimates of piracy’s economic damage have been picked apart by observers, but more importantly, arguing about the economic sufferings of one industry sector rings hollow as another industry sector thrives.”
We are beginning to see fractures in the PIPA/SOPA debate – former supporters jumping across to oppose the proposed bills, and Internet media companies not joining forces with the big studios. Beyond this issue, there is a larger shift with more shows making clips and full shows available online. Had The X-Files been around now, maybe we wouldn’t have spent all our time in the 90s defending websites against Cease and Desist notices *
Perhaps we need to broaden to who and how we lobby and engage on these issues. Seeing the debate as openness vs big business isn’t productive. We now see some individual shows, agents, and artists embracing the Internet as a way to build audiences, and new sources of advertising revenue. Is there a role for librarians in this? If your library has a celebrity or author champion, or READ posters, it might be interesting to know if they are aware of this proposed legislation, and their position on it. Reach out, and start talking.
* I had a very popular fansite in the late 90s. There is debate about whether it actually impacted a minor storyline on the show. It was featured in Yahoo! Internet Life magazine, the Official X-Files Magazine, and a number of newspapers. None of those publications still exist, nor does the website.
As I mentioned in my last post, the US Congress is currently considering two bills, the Stop Online Piracy Act (SOPA) and the PROTECT IP Act (PIPA),that would make it easy for copyright infringement complaints (whether ultimately justified or not) to wipe entire sites off the Net by various means, with little recourse or due process for site owners.
As the Electronic Frontier Foundation points out, these bills, if enacted, threaten censorship of a wide variety of sites that host controversial content or unfiltered public discourse, not just flagrant bootleg sites. Sites hosting online books, in particular, could be cut off in various ways if they host a book that someone says infringes copyright in some way. (Even the threat of wholesale cutoff could cause them to take the book down, without any sort of judicial hearing.) Even linking to a site that has content that’s the subject of a complaint could put a site at risk.
Many sites are “going dark” in various ways on the 18th, to raise awareness of these bills and show what it could be like if they became targets of SOPA or PIPA-enabled censorship. This includes a number of the sites linked to from The Online Books Page. For example, the Internet Archive, which hosts 2 million volumes, is out of service for 12 hours on the 18th.
The Online Books Page will not go offline, but we will turn many of our pages black for the 18th, as a warning both that some of the links on the site may be out of service, and that the site itself, which links to more than 1.4 million books on thousands of sites around the world, could be at risk if the bills currently under consideration in Congress pass.
My objection to the bills is not an objection to opposing copyright violations. As the US Constitution recognizes, appropriately bounded copyrights serve a useful purpose in “promoting the progress of science and arts“, and a fair bit of the time I spend on The Online Books Page is devoted towards making sure the online books I curate do in fact comply with applicable copyright law. Without clear and reasonable boundaries, though, copyright and its enforcement can inhibit rather than promote the progress of knowledge and the arts, by becoming tools of censorship and chilled speech. I believe the current bills in Congress unfortunately do that. If you are concerned about them as well, I encourage you to contact members of Congress to make your concerns known.
A friend on facebook posted a link the other day to an article about University of Illinois President Michael Hogan’s chief of staff resigning after an anonymous e-mail was sent to the University Senates Conference from a Yahoo! e-mail account. I don’t know much about what is happening at the University of Illinois but I was intrigued about the attempt at anonymous e-mail.
The article stated that a computer science professor, Roy Campbell, was able to determine that the emails may have been sent by someone in the president’s office. The initial article I read didn’t say how the computer science professor figured that out so I thought he might have looked at the e-mail headers. I did some checking with e-mails sent to my personal e-mail account from people with Yahoo! addresses and found that, indeed, Yahoo! e-mail does include the senders ip address in the header (actual IP replaced by XXX.XXX.XXX.XXX):
Received: from [XXX.XXX.XXX.XXX] by web112906.mail.gq1.yahoo.com via HTTP; Fri, 13 Jan 2012 12:11:28 PST
However, I came across another article that had a little more information and while I don’t know that Dr. Campbell didn’t look at the headers (I imagine he did), he also found some clues as to where the e-mail was sent from because the person who sent them composed the e-mail using Microsoft Word and then pasted the content into the Yahoo! Mail. A Chicago Tribune article noted Dr. Campbell as saying “One should also be careful writing anonymous email using (Microsoft) Word .”
I did some testing with cut and pasting from Microsoft Word and I wasn’t able to find any personally identifying information in the mark-up that comes across when you don’t send the e-mail as plain text via Yahoo! but I am sure that depending on your configuration and version of Word it could happen.
I think the take-away from this story in regards to e-mail is that you should never assume any e-mail you send is truly anonymous. It is true that you can make it “more anonymous” and harder to figure out depending on how you sent it and what tools you used, but unless you really take great lengths and know what you are doing, given enough resources if someone wants to enough where an e-mail came from thy can probably can figure it out or come close enough. Maybe not enough for a court of law, but enough that you’ll probably wish you didn’t send it. While it was a computer science professor that first figured out the e-mail was probably not from someone on the committee, it really wouldn’t have taken a computer genius in this case to figure out where it may have come from.