Planet Code4Lib

Mozilla Festival Day 1: Closing Keynotes / Cynthia Ng

We ended the first day with closing plenary featuring numerous people. Marc Surman was back on stage to help set the context of the evening talks. 10 5 minute talks, relay race. Mobile and the Future Emerging Markets and Adoption Chris Locke emerging markets in explosion of adoption of mobile social good example: mobile to […]

Citations get HOT / Karen Coyle

The Public Library of Science research section, PLOSLabs (ploslabs.org) has announced some very interesting news about the work that they are doing on citations, which they are calling "Rich Citations".

Citations are the ultimate "linked data" of academia, linking new work with related works. The problem is that the link is human-readable only and has to be interpreted by a person to understand what the link means. PLOS Labs have been working to make those citations machine-expressive, even though they don't natively provide the information needed for a full computational analysis.

Given what one does have in a normal machine-readable document with citations, they are able to pull out an impressive amount of information:
  • What section the citation is found in. There is some difference in meaning whether a citation is found in the "Background" section of an article, or in the "Methodology" section. This gives only a hint to the meaning of the citation, but it's more than no information at all.
  • How often a resource is cited in the article. This could give some weight to its importance to the topic of the article.
  • What resources are cited together. Whenever a sentence ends with "[3][7][9]", you at least know that those three resources equally support what is being affirmed. That creates a bond between those resources.
  • ... and more
As an open access publisher, they also want to be able to take users as directly as possible to the cited resources. For PLOS publications, they can create a direct link. For other resources, they make use of the DOI to provide links. Where possible, they reveal the license of cited resources, so that readers can know which resources are open access and which are pay-walled.

This is just a beginning, and their demo site, appropriately named "alpha," uses their rich citations on a segment of the PLOS papers. They also have an API that developers can experiment with.

I was fortunate to be able to spend a day recently at their Citation Hackathon where groups hacked on ongoing aspects of this work. Lots of ideas floated around, including adding abstracts to the citations so a reader could learn more about a resource before retrieving it. Abstracts also would add search terms for those resources not held in the PLOS database. I participated in a discussion about coordinating Wikidata citations and bibliographies with the PLOS data.

Being able to datamine the relationships inherent in the act of citation is a way to help make visible and actionable what has long been the rule in academic research, which is to clearly indicate upon whose shoulders you are standing. This research is very exciting, and although the PLOS resources will primarily be journal articles, there are also books in their collection of citations. The idea of connecting those to libraries, and eventually connecting books to each other through citations and bibliographies, opens up some interesting research possibilities.

Open Access Week in Nepal / Open Knowledge Foundation

This post is part of our Open Access Week blog series to highlight great work in Open Access communities around the world.

Open Access Week was celebrated for the first time in Nepal for the opening 2 days: October 20, 21. The event, which was led by newly founded Open Access Nepal, and supported by EIFL and R2RC, featured a series of workshops, presentation, and peer to peer discussions and training by country leaders in Open Access, Open Knowledge, and Open Data including a 3 hour workshop on Open Science and Collaborative Research by Open Knowledge Nepal on the second day.

Open Access Nepal is a student led initiative that mostly includes students of MBBS. Most of the audience of Open Access Week celebrations here, hence, included med students, but engineering students, management students, librarians, professionals, and academics were also well represented. Participants discussed open access developments in Nepal and their roles in promoting and advancing open access.

EIFL and Right to Research Coalition provided financial support for the Open Access Week in Nepal. EIFL Open Access Program Manager Iryna Kuchma attended the conference as speaker and facilitator of workshops.

Skærmbillede 2014-10-23 kl. 16.37.26

Open Knowledge Nepal hosted an interactive session on Open Science and Collaborative Research on the second day of two. The session we led by Kshitiz Khanal, Team Leader of Open Access / Open Science for Open Knowledge Nepal with support from Iryna Kuchma and Nikesh Balami, Team Leader of Open Government Data. About 8-10 Open Access experts of the country were present inside the hall to assist participants. The session began a half an hour before lunch where participants were first asked to brainstorm till lunch was over about what they think Open Science and Collaborative Research is, and the challenges relevant to Open Access that they have faced / might face in their Research endeavors. The participants were seated in round tables in groups of 7-8 persons, making a total of 5 groups.

After lunch, one team member from each group took turns in the front to present the summary of their brain-storming in colored chart papers. Participants came up with near exact definitions and reflected the troubles researchers in the country have been facing regarding Open Access. As we can expect of industrious students, some groups impressed the session hosts and experts with interesting graphical illustrations.

Skærmbillede 2014-10-23 kl. 16.39.09

Skærmbillede 2014-10-23 kl. 16.39.39

Iryna followed the presentations by her presentation where she introduced the concept, principles, and examples related to Open Science. Kshitiz followed Iryna with his presentation on Collaborative Research.

Skærmbillede 2014-10-23 kl. 16.40.14

Session on Collaborative Research featured industry – academia collaborations facilitated by government. Collaborative Research needs more attention in Nepal as World Bank’s data of Nepal shows that total R&D investment is only equivalent to 0.3% of total GDP. Lambert Toolkit, created by the Intellectual Property Office of the UK, was also discussed. The toolkit provides agreement samples for industry – university collaborations, multi–party consortiums and few decision guides for such collaborations. The session also introduced version control and discussed simple web based tools for Collaborative Research like Google Docs, Etherpads, Dropbox, Evernote, Skype etc.

On the same day, Open Nepal also hosted a workshop about open data, and a session on Open Access Button was hosted by the organizers. Sessions in the previous day included sessions that enlightened the audience about Introduction to Open Access, Open Access Repositories, and growing Open Access initiatives all over the world.

This event dedicated to Open Access in Nepal was well received in the Open Communities of Nepal which has mostly concerned themselves with Open Data, Open Knowledge, and Open Source Software. A new set of audience became aware of the philosophy of Open. This author believes the event was a success story.

Skærmbillede 2014-10-23 kl. 16.41.08

IL2014: More Library Mashups Signing/Talk / Nicole Engard

I’m headed to Monterey for Internet Librarian this weekend. Don’t miss my talk on Monday afternoon followed by the book signing for More Library Mashups.

From Information Today Inc:

This October, Information Today, Inc.’s most popular authors will be at Internet Librarian 2014. For attendees, it’s the place to meet the industry’s top authors and purchase signed copies of their books at a special 40% discount.

The following authors will be signing at the Information Today, Inc., on Monday, October 27 from 5:00 to 6:00 P.M. during the Grand Opening Reception

Book Signing

The post IL2014: More Library Mashups Signing/Talk appeared first on What I Learned Today....

Mozilla Festival Day 1: CC Tools for Makers / Cynthia Ng

Creative Commons folks hosted a discussion on barriers and possible solutions to publishing and using CC licensed content. Facilitators Ryan Merkley (CEO) Matt Lee (Tech Lead) Ali Al Dallal (Mozilla Foundation) Our Challenge our tech is old, user needs are unmet (can be confusing, don’t know how to do attribution) focus on publishing vs. sharing […]

Mozilla Festival Day 1: Notes from Opening Plenery / Cynthia Ng

Start of Mozilla Festival 2014 with opening circle. CoderDojo Mary Moloney, Global CEO @marydunph @coderdojo global community of free programming clubs for young people each one of you is a giant, because you understand technology and the options it can give people how can I reach down to a young person and put them on […]

Testing Adobe Digital Editions 4.0.1, round 2 / Galen Charlton

Yesterday I did some testing of version 4.0.1 of Adobe Digital Editions and verified that it is now using HTTPS when sending ebook usage data to Adobe’s server adelogs.adobe.com.

Of course, because the HTTPS protocol encrypts the datastream to that server, I couldn’t immediately verify that ADE was sending only the information that the privacy statement says it is.

Emphasis is on the word “immediately”.  If you want to find out what a program is sending via HTTPS to a remote server, there are ways to get in the middle.  Here’s how I did this for ADE:

  1. I edited the hosts file to refer “adelogs.adobe.com” to the address of a server under my control.
  2. I used the CA.pl script from openssl to create a certificate authority of my very own, then generated an SSL certificate for “adelogs.adobe.com” signed by that CA.
  3. I put the certificate for my new certificate authority into the trusted root certificates store on my Windows 7 deskstop.
  4. I put the certificate in place on my webserver and wrote a couple simple CGI scripts to emulate the ADE logging data collector and capture what got sent to them.

I then started up ADE and flipped through a few pages of an ebook purchased from Kobo.  Here’s an example of what is now getting sent by ADE (reformatted a bit for readability):

"id":"F5hxneFfnj/dhGfJONiBeibvHOIYliQzmtOVre5yctHeWpZOeOxlu9zMUD6C+ExnlZd136kM9heyYzzPt2wohHgaQRhSan/hTU+Pbvo7ot9vOHgW5zzGAa0zdMgpboxnhhDVsuRL+osGet6RJqzyaXnaJXo2FoFhRxdE0oAHYbxEX3YjoPTvW0lyD3GcF2X7x8KTlmh+YyY2wX5lozsi2pak15VjBRwl+o1lYQp7Z6nbRha7wsZKjq7v/ST49fJL",
"h":"4e79a72e31d24b34f637c1a616a3b128d65e0d26709eb7d3b6a89b99b333c96e",
"d":[  
   {  
      "d":"ikN/nu8S48WSvsMCQ5oCrK+I6WsYkrddl+zrqUFs4FSOPn+tI60Rg9ZkLbXaNzMoS9t6ACsQMovTwW5F5N8q31usPUo6ps9QPbWFaWFXaKQ6dpzGJGvONh9EyLlOsbJM"
   },
   {  
      "d":"KR0EGfUmFL+8gBIY9VlFchada3RWYIXZOe+DEhRGTPjEQUm7t3OrEzoR3KXNFux5jQ4mYzLdbfXfh29U4YL6sV4mC3AmpOJumSPJ/a6x8xA/2tozkYKNqQNnQ0ndA81yu6oKcOH9pG+LowYJ7oHRHePTEG8crR+4u+Q725nrDW/MXBVUt4B2rMSOvDimtxBzRcC59G+b3gh7S8PeA9DStE7TF53HWUInhEKf9KcvQ64="
   },
   {  
      "d":"4kVzRIC4i79hhyoug/vh8t9hnpzx5hXY/6g2w8XHD3Z1RaCXkRemsluATUorVmGS1VDUToDAvwrLzDVegeNmbKIU/wvuDEeoCpaHe+JOYD8HTPBKnnG2hfJAxaL30ON9saXxPkFQn5adm9HG3/XDnRWM3NUBLr0q6SR44bcxoYVUS2UWFtg5XmL8e0+CRYNMO2Jr8TDtaQFYZvD0vu9Tvia2D9xfZPmnNke8YRBtrL/Km/Gdah0BDGcuNjTkHgFNph3VGGJJy+n2VJruoyprBA0zSX2RMGqMfRAlWBjFvQNWaiIsRfSvjD78V7ofKpzavTdHvUa4+tcAj4YJJOXrZ2hQBLrOLf4lMa3N9AL0lTdpRSKwrLTZAFvGd8aQIxL/tPvMbTl3kFQiM45LzR1D7g=="
   },
   {  
      "d":"bSNT1fz4szRs/qbu0Oj45gaZAiX8K//kcKqHweUEjDbHdwPHQCNhy2oD7QLeFvYzPmcWneAElaCyXw+Lxxerht+reP3oExTkLNwcOQ2vGlBUHAwP5P7Te01UtQ4lY7Pz"
   }
]

In other words, it’s sending JSON containing… I’m not sure.

The values of the various keys in that structure are obviously Base 64-encoded, but when run through a decoder, the result is just binary data, presumably the result of another layer of encryption.

Thus, we haven’t actually gotten much further towards verifying that ADE is sending only the data they claim to.  That packet of data could be describing my progress reading that book purchased from Kobo… or it could be sending something else.

That extra layer of encryption might be done as protection against a real man-in-the-middle attack targeted at Adobe’s log server — or it might be obfuscating something else.

Either way, the result remains the same: reader privacy is not guaranteed. I think Adobe is now doing things a bit better than they were when they released ADE 4.0, but I could be wrong.

If we as library workers are serious about protection patron privacy, I think we need more than assurances — we need to be able to verify things for ourselves. ADE necessarily remains in the “unverified” column for now.

Bookmarks for October 24, 2014 / Nicole Engard

Today I found the following resources and bookmarked them on <a href=

  • Klavaro
    Klavaro is just another free touch typing tutor program. We felt like to do it because we became frustrated with the other options, which relied mostly on some few specific keyboards. Klavaro intends to be keyboard and language independent, saving memory and time (and money).

Digest powered by RSS Digest

The post Bookmarks for October 24, 2014 appeared first on What I Learned Today....

CrossRef and Inera Recognized at New England Publishing Collaboration Awards Ceremony / CrossRef

Inera_NEPCO award.jpgOn Tuesday evening, 21 October 2014, Bookbuilders of Boston named the winners of the first New England Publishing Collaboration (NEPCo) Awards. From a pool of ten finalists, NEPCo judges October Ivins (Ivins eContent Solutions), Eduardo Moura (Jones & Bartlett Learning), Alen Yen (iFactory), and Judith Rosen of Publishers Weekly selected the following:

  • First Place: Inera, Inc., collaborating with CrossRef

  • Second Place (Tie): Digital Science, collaborating with portfolio companies; and NetGalley, collaborating with the American Booksellers Association

  • Third Place: The Harvard Common Press, collaborating with portfolio companies

Based on an embrace of disruption and the need to transform the traditional value chain of content creation, the New England Publishing Collaboration (NEPCo) Awards showcase results achieved by two or more organizations working as partners. Other companies short-listed for the awards this year were Cenveo Publisher Services, Firebrand Technologies, Focal Press (Taylor & Francis), Hurix Systems, The MIT Press, and StoryboardThat.

Criteria for the awards included, results achieved,industry significance,depth of collaboration, and presentation.

An audience voting component was included--Digital Science was the overall winner among audience members.

Keynote speaker David Weinberger, co-author of Cluetrain Manifesto and senior researcher at the Harvard Berkman Center, was introduced by David Sandberg, co-owner of Porter Square Books.

Source: Bookbuilders of Boston http://www.nepcoawards.com/

Link roundup October 24, 2014 / Harvard Library Innovation Lab

Frames, computers, design, madlibs and boats. Oh my!

Building the Largest Ship In the World, South Korea

What a _________ Job: How Mad Libs Are Written | Splitsider

Introduction – Material Design – Google design guidelines

Freeze Frame: Joey McIntyre and Public Garden Visitors Hop Into Huge Frames – Boston Visitors’ Guide

Uncovering the true cost of access / Open Knowledge Foundation

This post is part of our Open Access Week blog series to highlight great work in Open Access communities around the world.

Large amounts of public money are spent on obtaining access to published research results, amounting to billions of dollars per year.

Large amounts of public money are spent on obtaining access to published research results, amounting to billions of dollars per year.

Despite the huge amounts of public money spent on allowing researchers to access the published results of taxpayer funded research [1], there is little fiscal transparency in the scholarly publishing market and frequent examples of secrecy, where companies or brokers insert non-disclosure clauses into contracts so the cost of subscriptions remains opaque. This prevents objective analysis of the market, prevents libraries negotiating effectively with publishers for fair prices and makes it hard to ascertain the economic consequences of open access policies.

This matters. Open access campaigners are striving to make research results openly and freely available to everyone in a sustainable and cost effective manner. Without detailed data on current subscription costs for closed content and the emerging cost of article processing charges (APCs) [2], it is very difficult to accurately model and plan this transition.

Library budgets are stretched and their role within institutions is changing, making high journal costs an increasing concern.

Library budgets are stretched and their role within institutions is changing, making high journal costs an increasing concern.

Specifically, there are concerns that in the intervening period, publishers may be benefiting from ‘double dipping’ – offering hybrid products which incur APCs for open access articles and subscription fees for all other content which could result in higher overall income. In a market where the profit margins of several major publishers run at 35-40% and they exert monopolistic control over a large proportion of our accumulated scientific and scholarly knowledge, there is understandably a lot of anger and concern about the state and future of the market.

Over the past year, members of the Open Knowledge open science and open access working groups have joined many other advocates and concerned researchers, librarians and citizens in working tirelessly to gather information on the true cost of knowledge. Libraries do not routinely publish financial information at this level of granularity and may be constrained by contractual obligations, so the route chosen to obtain data in the UK has been Freedom of information act (FOI) requests. High profile mathematician and OA advocate Tim Gowers revealed that the cost at Elsevier journals at top universities. Two further rounds of FOI requests by librarian and OKFest attendee Stuart Lawson and Ben Meghreblian have given an even broader overview across five major publishers. This has been released as open data and efforts continue to enrich the dataset. Working group members in Finland and Hong Kong are working to obtain similar information for their countries and further inform open access advocacy and policy globally.

Subscription data only forms part of the industry picture. A data expedition at Oxford Open Science for Open Data Day 2014 tried to look into the business structure of academic publishers using Open Corporates and quickly encountered a high level of complexity so this area requires further work. In terms of APCs and costs to funders, the working groups contributed to a highly successful crowdsourcing effort led by Theo Andrew and Michelle Brook to validate and enrich the Wellcome Trust publication dataset for 2013-2014 with further information on journal type and cost, thus enabling a clearer view of the cost of hybrid journal publications for this particular funder and also illustrating compliance with open access policies.

Mapping open access globally at #OKFestOA.  The session conclusion was that far more data is needed to present a truly global view.

Mapping open access globally at #OKFestOA. The session conclusion was that far more data is needed to present a truly global view.

This work only scratches the surface and anyone who could help in a global effort to uncover the cost of access to scholarly knowledge would be warmly welcomed and supported by those who have now built up experience in obtaining this information. If funders and institutions have datasets they could contribute this would also be a fantastic help.

Please sign up to the wiki page here and join the related discussion forum for support in making requests. We hope by Open Access Week 2015 we’ll be posting a much more informative and comprehensive assessment of the cost of accessing scholarly knowledge!

Footnotes:

[1] A significant proportion of billions of dollars per year (estimated $9.4 billion on scientific journals alone in 2011). See STM report (PDF – 6.3MB).

[2] An open access business model where fees are paid to publishers for the service of publishing an article, which is then free to users.

Photo credits:

Money by 401(K) 2012 under CC-BY-SA 2.0

OKFest OA Map, Jenny Molloy, all copyright and related or neighboring rights waived to the extent possible under law using CC0 1.0 waiver. Published from the United Kingdom.

Library by seier+seier under CC-BY 2.0

Residency Program Success Stories, Part Two / Library of Congress: The Signal

The following is a guest post by Julio Díaz Laabes, HACU intern and Program Management Assistant at the Library of Congress.

This is the second part of a two part series on the former class of residents from the National Digital Stewardship Residency program. Part One covered four residents from the first year of the program and looked at their current professional endeavors and how the program helped them achieve success in their field. In this second part, we take a look at the successes of the remaining six residents of the 2013-2014 D.C class.

Top (left to right): Lauren Work, Jaime McCurry and Julia Blasé Bottom (left to right): Emily Reynolds, Molly Schwartz and Margo Padilla.

Top (left to right): Lauren Work, Jaime McCurry and Julia Blase
Bottom (left to right): Emily Reynolds, Molly Schwartz and Margo Padilla.

Lauren Work is employed as the Digital Collections Librarian at the Virginia Commonwealth University in Richmond, VA. She is responsible for Digitization Unit projects at VCU and is involved in a newly launched open access publishing platform and repository. Directly applying her experience during the residency, Lauren is also part of a team working to develop digital preservation standards at VCU and is participating in various digital discovery and outreach projects. On her experience being part of NDSR, Lauren said, “The residency gave me the ability to participate in and grow a network of information professionals focused on digital stewardship. This was crucial to my own professional growth.” Also, the ability to interact with fellow residents gave her “a tightly-knit group of people that I will continue to look to for professional support throughout my career.”

Following her residency at the Folger Shakespeare Library, Jaime McCurry  became the Digital Assets Librarian at Hillwood Estate, Museum and Gardens in Washington, D.C. She is responsible for developing and sustaining local digital stewardship strategies and preservation policies and workflows; development of a future digital institutional repository and performing outreach services to raise understanding and interest in Hillwood digital collections. On what was the most interesting aspect of her job, Jaime said “it’s the wide range of digital activities I am able to be involved in, from digital asset management to digital preservation, to access, outreach and web development.” In line with Lauren, Jaime stated, “NDSR helped me to establish a valuable network of colleagues and professionals in the DC area and also to further strengthen my project management and public speaking skills.”

At the conclusion of NDSR, Julia Blase accepted a position with Smithsonian Libraries as Project Manager for the Field Book Project, a collaborative initiative to improve the accessibility of field book content through cataloging, conservation, digitization and online publication of digital catalog data and images. For Julia, one of the most exiting aspects of the project is its cooperative nature; it involves staff at Smithsonian Libraries, Smithsonian Archives, Smithsonian National Museum of Natural History and members and affiliates of the Biodiversity Heritage Library. “NDSR helped introduce me to the community of digital library and archivist professionals in the DC area. It also gave me the chance to present at several conferences, including CNI (Coalition for Networked Information) in St. Louis, where I met some of the people I work with today.”

Emily Reynolds is a Library Program Specialist at the Institute of Museum and Library Services, a federal funding agency. She works on discretionary grant programs including the Laura Bush 21st Century Librarian Program, which supports education and professional development for librarians and archivists (the NDSR program in Washington D.C., Boston and New York were funded through this program). “The NDSR helped in my current job because of the networking opportunities that residents were able to create as a result. The cohort model allowed us to connect with professionals at each other’s organization, share expertise with each other, and develop the networks and professional awareness that are vital for success,” she said. On the most interesting aspect of her job, Emily commented that “because of the range of grants awarded by IMLS, I am able to stay up-to-date on some of the most exciting and innovative projects happening in all kinds of libraries and archives. Every day in the office is different, given the complexities of the grant cycle and the diversity of programs we support.”

Molly Schwartz was a resident at the Association of Research Libraries. Now she is a Junior Analyst at the U.S State Department in the bureau of International information Program’s Office of Audience Research and Measurement. One of her biggest achievements is being awarded a 2014-2015 Fulbright Grant to work with the National Library of Finland and Aalto University on her project, User-Centered Design for Digital Cultural Heritage Portals. During this time, she will focus her research on the National Library of Finland’s online portal, Finna and conduct user-experience testing to improve the portal’s usability with concepts form user-centered designs.

Lastly, Margo Padilla is now the Strategic Programs Manager at the Metropolitan New York Library Council. She works alongside METRO staff to identify trends and technologies, develop workshops and services and manage innovative programs that benefit libraries, archives and museums in New York City. She is also the Program Director for NDSR-New York . “I used my experience as a resident to refine and further develop the NDSR program. I was able to base a lot of the program structure on the NDSR-DC model and the experience of the NDSR-DC cohort.” Margo also says that her job is especially rewarding “because I have the freedom to explore new ideas or projects, and leveraging the phenomenal work of our member community into solutions for the entire library, archive and museum community.”

Seeing the wide scope of positions the residents accepted after finishing the program, it is clear the NDSR has been successful in creating in-demand professionals to tackle digital preservation in many forms across the private and public sectors. The 2014-2015 Boston and New York classes are already underway and the next Washington D.C. class begins in June of 2015 (for more on that, see this recent blog post) . We expect these new NDSR graduates to form the next generation of digital stewards and to reach the same level of success as those in our pilot program.

 

Global Open Data Index 2014 – Week ending October 24: Update / Open Knowledge Foundation

Skærmbillede 2014-10-23 kl. 23.41.52

Thank you so much for the amazing number of submissions we have received this week.

Entering the final week of the global sprint!

Next week is the FINAL WEEK of the global Index sprint. Please make sure your country is represented in the Global Open Data Index 2014.

Find out how you can contribute here.

If you would like to be reviewer for Global Open Data Index 2014, please sign up here.

We are missing some countries – can you help?

Europe – Armenia, Croatia, Hungary, Ukraine, Slovenia, Norway, Finland, Denmark, Sweden, Portugal, Poland

Americas – Honduras, Nicaragua, Guatemala, Brazil, USA

Asia/Pacific – South Korea, Taiwan, India, Mongolia, New Zealand, Australia

Africa – Sierra Leone, Mali, Malawi, Mozambique 

Join the Global Madness Day: October 30

Take part in a day of activities to make sure we get the most submissions through that we can for the Global Open Data Index 2014. Make sure your country is represented – the October 30 Global Madness Day is the last day in the sprint!

At 2pm GMT Rufus Pollock, the President & Founder of Open Knowledge will be chatting to Mor Rubinstein about the Index in a Google Hangout. Make sure you join the chat here!

Other events will take place throughout the day. See our twitter feed for updates #openindex14

Some practical tips…

Lastly, a couple of reminders on some key questions around the Index from Mor Rubinstein, Community coordinator for the Index:

1. What is machine readable? – This year we added help text for this question. Please read it when making your submissions this year. Frequently contributors categorise HTML format as a machine readable format. While it is easy to scrape HTML, it is actually NOT a machine readable format. Please use our guide if you are in doubt or send an email to the census list.

2. What is Openly Licensed? – Well, most of us are not lawyers, and the majority of us never pay attention to the term and conditions on a website (well, they are super long… so I can’t blame any of you for that). If you are confused, go to the Open definition which gives a one page overview on the subject.

Escape Meta Alt from Word / William Denton

Escape from Microsoft Word by Edward Mendelson is an interesting short post about writing in Microsoft Word compared to that old classic WordPerfect:

Intelligent writers can produce intelligent prose using almost any instrument, but the medium in which they write will always have some more or less subtle effect on their prose. Karl Popper famously denounced Platonic politics, and the resulting fantasies of a closed, unchanging society, in his book The Open Society and Its Enemies (1945). When I work in Word, for all its luxuriant menus and dazzling prowess, I can’t escape a faint sense of having entered a closed, rule-bound society. When I write in WordPerfect, with all its scruffy, low-tech simplicity, the world seems more open, a place where endings can’t be predicted, where freedom might be real.

But of course if the question is “Word or WordPerfect?” the answer is: Emacs. Everything is text.

Testing Adobe Digital Editions 4.0.1 / Galen Charlton

A couple hours ago, I saw reports from Library Journal and The Digital Reader that Adobe has released version 4.0.1 of Adobe Digital Editions.  This was something I had been waiting for, given the revelation that ADE 4.0 had been sending ebook reading data in the clear.

ADE 4.0.1 comes with a special addendum to Adobe’s privacy statement that makes the following assertions:

  • It enumerates the types of information that it is collecting.
  • It states that information is sent via HTTPS, which means that it is encrypted.
  • It states that no information is sent to Adobe on ebooks that do not have DRM applied to them.
  • It may collect and send information about ebooks that do have DRM.

It’s good to test such claims, so I upgraded to ADE 4.0.1 on my Windows 7 machine and my OS X laptop.

First, I did a quick check of strings in the ADE program itself — and found that it contained an instance of “https://adelogs.adobe.com/” rather than “http://adelogs.adobe.com/”.  That was a good indication that ADE 4.0.1 was in fact going to use HTTPS to send ebook reading data to that server.

Next, I fired up Wireshark and started ADE.  Each time it started, it contacted a server called adeactivate.adobe.com, presumably to verify that the DRM authorization was in good shape.  I then opened and flipped through several ebooks that were already present in the ADE library, including one DRM ebook I had checked out from my local library.

So far, it didn’t send anything to adelogs.adobe.com.  I then checked out another DRM ebook from the library (in this case, Seattle Public Library and its OverDrive subscription) and flipped through it.  As it happens, it still didn’t send anything to Adobe’s logging server.

Finally, I used ADE to fulfill a DRM ePub download from Kobo.  This time, after flipping through the book, it did send data to the logging server.  I can confirm that it was sent using HTTPS, meaning that the contents of the message were encrypted.

To sum up, ADE 4.0.1’s behavior is consistent with Adobe’s claims – the data is no longer sent in the clear and a message was sent to the logging server only when I opened a new commercial DRM ePub.  However, without decrypting the contents of that message, I cannot verify that it only information about that ebook from Kobo.

But even then… why should Adobe be logging that information about the Kobo book? I’m not aware that Kobo is doing anything fancy that requires knowledge of how many pages I read from a book I purchased from them but did not open in the Kobo native app.  Have they actually asked Adobe to collect that information for them?

Another open question: why did opening the library ebook in ADE not trigger a message to the logging server?  Is it because the fulfillmentType specified in the .acsm file was “loan” rather than “buy”? More clarity on exactly when ADE sends reading progress to its logging server would be good.

Finally, if we take the privacy statement at its word, ADE is not implementing a page synchronization feature as some, including myself, have speculated – at least not yet.  Instead, Adobe is gathering this data to “share anonymous aggregated information with eBook providers to enable billing under the applicable pricing model”.  However, another sentence in the statement is… interesting:

While some publishers and distributors may charge libraries and resellers for 30 days from the date of the download, others may follow a metered pricing model and charge them for the actual time you read the eBook.

In other words, if any libraries are using an ebook lending service that does have such a metered pricing model, and if ADE is sending reading progress information to an Adobe server for such ebooks, that seems like a violation of reader privacy. Even though the data is now encrypted, if an Adobe ID is used to authorize ADE, Adobe itself has personally identifying information about the library patron and what they’re reading.

Adobe appears to have closed a hole – but there are still important questions left open. Librarians need to continue pushing on this.

Evolving Role of VIVO in Research and Scholarly Networks Presented at the Thomson Reuters CONVERISTM Global User Group Meeting / DuraSpace News

Winchester, MA  Thomson Reuters hosted a CONVERIS Global User Group Meeting for current and prospective users in Hatton Garden, London, on October 1-2, 2014.  About 40 attendees from the UK, Sweden, the Netherlands, European Institutions from other countries, and the University of Botswana met to discuss issues pertaining to Research Information Management Systems, the CONVERIS Roadmap, research analytics, and new features and functions being provided by CONVERIS (http://converis5.com).

Notes from the DC-2014 Pre-conference workshop “Fonds & Bonds: Archival Metadata, Tools, and Identity Management” / HangingTogether

fondsbondsEarlier this month I had the good fortune to attend the “Fonds & Bonds” one-day workshop, just ahead of the DC-2014 meeting in Austin, TX. The workshop was held at the Harry Ransom Center of the University of Texas, Austin, which was just the right venue. Eric Childress from OCLC Research and Ryan Hildebrand from the Harry Ransom Center did much of the logistical work, while my OCLC Research colleague Jen Schaffner worked with Daniel Pitti of the Institute for Advanced Technology in the Humanities, University of Virginia and Julianna Barrera-Gomez of the University of Texas at San Antonio to organize the workshop agenda and presentations.

Here are some brief notes on a few of the presentations that made a particular impression on me.

The introduction by Gavan McCarthy (Director of the eScholarship Research Centre (eSRC), University of Melbourne) and Daniel Pitti to the Expert Group on Archival Description (EGAD) included a brief tour of standards development, how this led to the formation of EGAD, and noted EGAD’s efforts to develop the conceptual model for Records in Context (RIC). Daniel very ably set this work within its standards-development context, which was a great way to help focus the discussion on the specific goals of EGAD.

Valentine Charles (of Europeana) and Kerstin Arnold (from the ArchivesPortal Europe APEx project) provided a very good tandem presentation on “Archival Hierarchy and the Europeana Data Model”, with Kerstin highlighting the work of Archives Portal Europe and the APEx project. It was both reaffirming and challenging to hear that it’s difficult to get developers to understand an unexpected data model, when they confront it through a SPARQL endpoint or through APIs. We’ve experienced that in our work as well, and continue to spend considerable efforts in attempting to meet the challenge.

Tim Thompson (Princeton University Library) and Mairelys Lemus-Rojas (University of Miama Libraries) gave an overview of the Remixing Archival Metadata Project (RAMP) project, which was also presented in an OCLC webinar earlier this year. RAMP is “a lightweight web-based editing tool that is intended to let users do two things: (1) generate enhanced authority records for creators of archival collections and (2) publish the content of those records as Wikipedia pages.” RAMP utilizes both VIAF and OCLC Research’s WorldCat Identities as it reconciles and enhances names for people and organizations.

Ethan Gruber (American Numismatic Society) gave an overview of the xEAC project (Ethan pronounces xEAC as “zeek”), which he also presented in the OCLC webinar noted previously in which Tim presented RAMP. xEAC is an open-source XForms-based application for creating and managing EAC-CPF collections. Ethan is terrific at delving deeply into the possibilities of the technology at hand, and making the complex appear straight-forward.

Gavan McCarthy gave a quite moving presentation on the Find & Connect project, where we were able to see some of the previously-discussed descriptive standards and technologies resulting in something with real impact on real lives. Find & Connect is a resource for Forgotten Australians, former child migrants and others interested in the history of child welfare in Australia.

And Daniel Pitti gave a detailed presentation on the SNAC project. OCLC Research has supported this project from its early stages, providing access to NACO and VIAF authority data, and supplying the project with over 2M WorldCat records representing items and collections held by archival institutions … essentially the same data that supports most of OCLC Research’s ArchiveGrid project. The aspirations for the SNAC project are changing, moving from an experimental first phase where data from various sources was ingested, converted, and enriched to produce EAC-CPF records (with a prototype discovery layer on top of those), to the planning for a Cooperative Program which would transform that infrastructure into a sustainable international cooperative hosted by the U.S. National Archives and Records Administration. This is an ambitious and important effort that everyone in the community should be following.

The workshop was very well attended and richly informative. It provided a great way to quickly catch up on key developments and trends in the field. And the opportunity to easily network with colleagues in a congenial setting, including an hour to see a variety of systems demonstrated live, was also clearly appreciated.

ATO2014: Lessons from Open Source Schoolhouse / Nicole Engard

Charlie Reisinger from the Penn Manor School District talked to us next about open source at his school. This was an expanded version of his lightning talk from the other night.

Penn Manor has 9 IT team members – which is a very lean staff for 4500 devices. They also do a lot of their technology in house.

Before we talk about open source we took a tangent in to the nature of education today. School districts are so stuck on the model they’re using and have used for centuries. But today kids can learn anything they would like with a simple connection to the Internet. You can be connected to the most brilliant minds that you’d like. Teachers are no longer the fountains of all knowledge. The classroom hasn’t been transformed by technology – if you walked in to a classroom 60 years ago it would look pretty much like a classroom today.

In schools that do allow students to have laptops they lock them down. This is a terrible model for student inquiry. The reason most of us are here today is because we had a system growing up that we could get in to and try to break/fix/hack.

CharlieSo what is Penn Manor doing differently? First off they’re doing everything with open source. They use Koha, Moodle, Linux, WordPress, Ubuntu, OwnCloud, SIPfoundry and VirtualBox.

This came to them partially out of fiscal necessity. When Apple discontinued the white macbook the school was stuck in a situation where they needed to replace these laptops with some sort of affordable device. Using data they collected from the students laptops they found that students spent most of their time on their laptops in the browser or in a word processor so they decided to install Linux on laptops. Ubuntu was the choice because the state level testing would work on that operating systems.

This worked in elementary, but they needed to scale it up to the high schools which was much harder because each course needed different/specific software. They needed to decide if they could provide a laptop for every student.

The real guiding force in decided to provide one laptop per student was the English department. They said that they needed the best writing device that could be given to them. This knocked out the possibility of giving tablets to all students – instead a laptop allows for this need. Not only did they give all students laptops with Linux installed – they gave them all root access. This required trust! They created policies and told the students they trusted them to use the laptops as responsible learners. How’s that working out? Charlie has had 0 discipline issues associated with that. Now, if they get in to a jam where they screwed up the computer – maybe this isn’t such a bad thing because now they have to learn to fix their mistake.

Open SoftwareThey started this as a pilot program for 90 of their online students before deploying to all 1700 students. These computers include not just productivity software, but Steam! That got the kids attention. When they deployed to everyone though, Steam came off the computers, but the kids knew it was possible so it forced them to figure out how to install it on Linux which is not always self explanatory. This prodded the kids in to learning.

Charlie mentioned that he probably couldn’t have done this 5 years ago because the apps that are available today are so dense and so rich.

There was also the issue of training the staff on the change in software, but also in having all the kids with laptops. This included some training of the parents as well.

Along with the program they created a help desk program as a 4 credit honors level course as independent study for the high school students. They spent the whole time supporting the one to one program (one laptop per student). These students helped with the unpacking, inventorying, and the imaging (github.com/pennmanor/FLDT built by one of the students) of the laptops over 2 days. The key to the program is that the students were treated as equals. This program was was picked up and talked about on Linux.com.

Charlie’s favorite moment of the whole program was watching his students train their peers on how to use these laptops.

The post ATO2014: Lessons from Open Source Schoolhouse appeared first on What I Learned Today....

Bookmarks for October 23, 2014 / Nicole Engard

Today I found the following resources and bookmarked them on <a href=

  • Nest
  • Material Design Icons
    Material Design Icons are the official open-source icons featured in the Google Material Design specification.
  • SmartThings
    Control and monitor your home from one simple app

Digest powered by RSS Digest

The post Bookmarks for October 23, 2014 appeared first on What I Learned Today....

ATO2014: Open sourcing the public library / Nicole Engard

Phil Shapiro one of my fellow opensource.com moderators talked to us next about open source and libraries.

Too many people ask what is the future of libraries and not what “should the future be”. A book that we must read is “Expect More: Demanding Better Libraries For Today’s Complex World“. If we don’t expect more of libraries we’re not going to see libraries change. We have to change the frame of mind that libraries belong the directors – they actually belong to the people and they should be serving the people.

Phil asks how we get some community participate in managing libraries. Start looking at your library’s collection and see if there is at least 1% of the collection in the STEM arena. Should that percent be more? 5%, 10%, more? There is no real answer here, but maybe we need to make a suggestion to our libraries. Maybe instead our funds should go to empower the community more in the technology arena. Maybe we should have co-working space in our library – this can be fee based even – could be something like $30/mo. That would be a way for libraries to help the unemployed and the community as a whole.

Libraries are about so much more than books. People head to the library because they’re wondering about something – so having people who have practical skills on your staff is invaluable. Instead of pointing people to the books on the topic, having someone for them to talk to is a value added service. What are our competitors going to be doing while we’re waiting for the transition from analog to digital to happen in libraries. We need to set some milestones for all libraries. Right now it’s only the wealthy libraries that seem to be moving in this way.

A lot of the suggestions Phil had I’ve seen some of the bigger libraries in the US doing like hosting TED Talks, offering digital issues lectures, etc. You could also invite kids in there to talk about what they know/have learned.

Phil’s quote: “The library fulfills its promise when people of different ages, races, and cultures come together to pool their talents in creating new creative content.” One thing to think about is whether this change from analog to digital can happen in libraries without changing their names. Instead we could call them the digital commons [I'm not sure this is necessary - I see Phil's point - but I think we need to just rebrand libraries and market them properly and keep their name.]

Some awesome libraries include Chattanooga Public Library which has their 4th floor makerspace. In Colorado there are the Anythink Libraries. The Delaware Department of Libraries is creating a new makerspace.

Books are just one of the tools toward helping libraries enhance human dignity – there are so many other ways we can do this.

Phil showed us a video of his:



You can bend the universe by asking questions – so call your library and ask questions about open source or about new technologies so that we plant the seeds of change.

Further reading from Phil: http://sites.google.com/site/librarywritings.

The post ATO2014: Open sourcing the public library appeared first on What I Learned Today....

Homework assignment #5 – bis Sketchbookskool / Patrick Hochstenbach

I was so happy with my new Lamy fountain pen that I drew a second version of my homework assignment: one using my favorite Arthur and Fietje Precies characters.   Filed under: Comics, Doodles Tagged: cartoon, cat, christmas, doodle, fondue,

Homework assignment #5 Sketchbookskool / Patrick Hochstenbach

As second assignment we needed to draw some fantasy image. Preferably using some meta story inside the story. I was drawing monsters the whole week during my commute so I used these drawings as inspiration Filed under: Comics Tagged: cartoon,

Homework assignment #4 Sketchbookskool / Patrick Hochstenbach

This week we were asked to draw a memory: our first day at school. I tried to find old school pictures but didn’t find anything nice I could use. I only remembered I cried a lot on my first day

ATO2014: How ‘Open’ Changes Products / Nicole Engard

Next up at All Things Open was Karen Borchert talking about How ‘Open’ Changes Products.

We started by talking about the open product conundrum. There is a thing that happens when we think about creating products in an open world. In order to understand this we must first understand what a product is. A product is a good, idea, method, information or service that we want to distribute. In open source we think differently about this. We think more about tools and toolkits instead of packages products because these things are more conducive to contribution and extension. With ‘open’ products work a bit more like Ikea – you have all the right pieces and instructions but you have to make something out of it – a table or chair or whatever. Ikea products are toolkits to make things. When we’re talking about software most buyers are thinking what they get out of the box so a toolkit is not a product to our consumers.

A table is not a bikeOpen Atrium is a product that Phase2 produces and people say a lot about it like “It’s an intranet in a box” – but in reality it’s a toolkit. People use it a lot of different ways – some do what you’d expect them to do, others make it completely different. This is the great thing about open source – this causes a problem for us though in open source – because in Karen’s example a table != a bike. “The very thing that makes open source awesome is what makes our product hard to define.”

Defining a product in the open arena is simple – “Making an open source product is about doing what’s needed to start solving a customer problem on day 1.” Why are we even going down this road? Why are we creating products? Making something that is useable out of the box is what people are demanding. They also provide a different opportunity for revenue and profit.

This comes down to three things:

  • Understanding the value
  • Understanding the market
  • Understanding your business model

Adding value to open source is having something that someone who knows better than me put together. If you have an apple you have all you need to grow your own apples, but you’re not going to both to do that. You’d rather (or most people would rather) leave that to the expert – the farmer. Just because anyone can take the toolkit and build whatever they want with it that they will.

Two marketsMarkets are hard for us in open source because we have two markets – one that gives the product credibility and one that makes money – and often these aren’t the same market. Most of the time the community isn’t paying you for the product – they are usually other developers or people using it to sell to their clients. You need this market because you do benefit from it even if it’s not financially. You also need to work about the people who will pay you for the product and services. You have to invest in both markets to help your product succeed.

Business modelBusiness models include the ability to have two licenses – two versions of the product. There is a model around paid plugins or themes to enhance a product. And sometimes you see services built around the product. These are not all of the business models, but they are a few of the options. People buy many things in open products: themes, hosting, training, content, etc.

What about services? Services can be really important in any business model. You don’t have to deliver a completely custom set of services every time you deliver. It’s not less of a product because it’s centered around services.

Questions people ask?

Is it going to be expensive to deal with an open source product? Not necessarily but it’s not going to be free. We need to plan and budget properly and invest properly.

Am I going to make money on my product this year?
Maybe – but you shouldn’t count on it. Don’t bet the farm on your product business until you’ve tested the market.

Everyone charges $10/mo for this so I’m just going to charge that – is that cool? Nope! You need to charge what the product is worth and what people will pay for it and what you can afford to sell it for. Think about your ROI.

I’m not sure we want to be a products company. It’s very hard to be a product company without buy in. A lot of service companies ask this. Consider instead a pilot program and set a budget to test out this new model. Write a business plan.

The post ATO2014: How ‘Open’ Changes Products appeared first on What I Learned Today....

ATO2014: Women in Open Source Panel / Nicole Engard

Over lunch today we had a panel of 6 women in open source talk to us.

The first question was about their earlier days – what made them interested in open source or computer science or all of it.

Intros

Megan started in humanities and then just stumbled in to computer programming. Once she got in to it she really enjoyed it though. Elizabeth got involved with Linux through a boyfriend early on. She really fell in love with Linux because she was able to do anything she wanted with it. She joined the local Linux users group and they were really supportive and never really made a big deal about the fact that she was a woman. Her first task in the open source world was writing documentation (which was really hard) but from there her career grew. Erica has been involved in technology all her life (which she blames her brother for). When she went to school, she wanted to be creative and study arts, but her father gave her the real life speech and she realized that computer programming let her be creative and practical at the same time. Estelle started by studying architecture which was more sexist than her computer science program – toward the end of her college career she found that she was teaching people to use their computers. Karen was always the geekiest person she knew growing up – and her father really encouraged her. She went to engineering school and it wasn’t until she set up her Unix account at the college computer center. She got passionate in open source because of the pacemaker she needs to live – she realized that the entire system is completely proprietary and started thinking about the implications of that.

The career path

Estelle has noticed in the open source world that the men she knows on her level work for big corporations where as the women are working for themselves. This was because there aren’t as many options to move up the ladder. Now as for why she picked the career she picked it was because her parents were sexist and she wanted to piss them off! Elizabeth noticed that a lot of women get involved in open source because they’re recruited in to a volunteer organization. She also notices that more women are being paid to work on open source whereas men are doing it for fun more. Megan had never been interviewed by or worked for a woman until she joined academia. Erica noticed that the career path of women she has met is more convoluted than that of the men she has met. The men take computer science classes and then go in to the field, women however didn’t always know that these opportunities were available to them originally. Karen sees that women who are junior have to work a lot harder – they have to justify their work more often [this is something I totally had to deal with in the past]. Women in these fields get so tired because it’s so much work – so they move on to do something else. Erica says this is partially why she has gone to work for herself because she gets to push forward her own ideas. Megan says that there are a lot of factors that are involved in this problem – it’s not just one thing.

Is diversity important in technology?

Erica feels that if you’re building software for people you need ‘people’ not just one type of person working on the project. Megan says that a variety perspectives is necessary. Estelle says that because women often follow a different path to technology it adds even more diversity than just gender [I for example got in to the field because of my literature degree and the fact that I could write content for the website]. It’s also important to note that diversity isn’t just about gender – but so much more. Karen pointed out that even at 20 months old we’re teaching girls and boys differently – we start teaching boys math and problem solving earlier and we help the girls for longer. This reinforces the gender roles we see today. Elizabeth feels that diversity is needed to engage more talent in general.

What can we do to change the tide?

Megan likes to provide a variety in the types of problems she provides in her classes, with a variety of approaches so that it hits a variety of students instead of alienating those who don’t learn the way she’s teaching. Karen wants us to help women from being overlooked. When a woman make a suggestion acknowledge it – also stop people from interrupting women (because we are interrupted more). Don’t just repeat what the woman says but amplify it. Estelle brings up an example from SurveyMonkey – they have a mentorship program and also offer you to take off when you need to (very good for parents). Erica tries to get to youth before the preconceptions form that technology is for boys. One of the things she noticed was that language matters as well – telling girls you’re going to teach them to code turns them off, but saying we’re going to create apps gets them excited. Elizabeth echoed the language issue – a lot of the job ads are geared toward men as well. Editing your job ads will actually attract more women.

What have you done in your career that you’re most proud of?

Estelle’s example is not related to technology – it was an organization called POWER that was meant to help students who were very likely to have a child before graduation – graduate without before becoming a parent. It didn’t matter what what field they went in to – just that the finished high school. Erica is proud that she has a background that lets her mentor so many people. Elizabeth wrote a book! It was on her bucket list and now she has a second book in the works. It was something she never thought she could do and she did. She also said that it feels great to be a mentor to other women. Megan is just super proud of her students and watching them grow up and get jobs and be successful. Karen is mostly proud of the fact that she was able to turn something that was so scary (her heart condition) in to a way to articulate that free software is so important. She loves hearing others tell her story to other people to explain why freedom in software is so important.

The post ATO2014: Women in Open Source Panel appeared first on What I Learned Today....

Open Access and the humanities: On our travels round the UK / Open Knowledge Foundation

This post is part of our Open Access Week blog series to highlight great work in Open Access communities around the world. It is written by Alma Swan, Director of Key Perspectives Ltd, Director of Advocacy forSPARC Europe, and Convenor for Enabling Open Scholarship.

Large amounts of public money are spent on obtaining access to published research results, amounting to billions of dollars per year.

Large amounts of public money are spent on obtaining access to published research results, amounting to billions of dollars per year.

Whither the humanities in a world moving inexorably to open values in research? There has been much discussion and debate on this issue of late. It has tended to focus on two matters – the sustainability of humanities journals and the problem(s) of the monograph. Neither of these things is a novel topic for consideration or discussion, but nor have solutions been found that are satisfactory to all the key stakeholders, so the debate goes on.

While it does, some significant developments have been happening, not behind the scenes as such but in a quiet way nevertheless. New publishers are emerging in the humanities that are offering different ways of doing things and demonstrating that Open Access and the humanities are not mutually exclusive.

These publishers are scholar-led or are academy-based (university presses or similar). Their mission is to offer dissemination channels that are Open, viable and sustainable. They don’t frighten the horses in terms of trying to change too much, too fast: they have left the traditional models of peer review practice and the traditional shape and form of outputs in place. But they are quietly and competently providing Open Access to humanities research. What’s more, they understand the concerns, fears and some bewilderment of humanities scholars trying to sort out what the imperative for Open Access means to them and how to go about playing their part. They understand because they are of and from the humanities community themselves.

The debate about OA within this community has been particularly vociferous in the UK in the wake of the contentious Finch Report and the policy of the UK’s Research Councils. Fortuitously, the UK is blessed with some great innovators in the humanities, and many of the new publishing operations are also UK-based. This offers a great opportunity to show off these some new initiatives and help to reassure UK humanities authors at the same time. So SPARC Europe, with funding support from the Open Society Foundations, is now endeavouring to bring these new publishers together with members of the UK’s humanities community.

We are hosting a Roadshow comprising six separate events in different cities round England and Scotland. At each event there are short presentations by representatives of the new publishers and from a humanities scholar who can give the research practitioner perspective on Open Access. After the presentations, the publishers are available in a small exhibition area to display their publications and talk about their publishing programmes, their business models and their plans for the future.

The publishers taking part in the Roadshow are Open Book Publishers, Open Library of the Humanities, Open Humanities Press and Ubiquity Press. In addition, the two innovative initiatives OAPEN and Knowledge Unlatched are also participating. The stories from these organisations are interesting and compelling, and present a new vision of the future of publishing in the humanities.

Humanities scholars from all higher education institutions in the locality of each event are warmly invited to come along to the local Roadshow session. The cities we are visiting are Leeds, Manchester, London, Coventry, Glasgow and St Andrews. The full programme is available here.

We will assess the impact of these events and may send the Roadshow out again to new venues next year if they prove to be successful. If you cannot attend but would like further information on the publishing programmes described here, or would like to suggest other venues the Roadshow might visit, please contact me at sparceurope@arl.org

Results from the 2013 NDSA U.S. Web Archiving Survey / Library of Congress: The Signal

The following is a guest post from Abbie Grotke, Web Archiving Team Lead, Library of Congress and Co-Chair of the NDSA Content Working Group.

wa-survey2014-coverThe National Digital Stewardship Alliance is pleased to release a report of a 2013 survey of Web Archiving institutions (PDF) in the United States.

A bit of background: from October through November of 2013, a team of National Digital Stewardship Alliance members, led by the Content Working Group, conducted a survey of institutions in the United States that are actively involved in, or planning to start, programs to archive content from the web. This survey built upon a similar survey undertaken by the NDSA in late 2011 and published online in June of 2012. Results from the 2011-2012 NDSA Web Archiving Survey were first detailed in May 2, 2012 in “Web Archiving Arrives: Results from the NDSA Web Archiving Survey” on The Signal, and the full report (PDF) was released in July 2012.

The goal of the survey was to better understand the landscape of web archiving activities in the U.S. by investigating the organizations involved, the history and scope of their web archiving programs, the types of web content being preserved, the tools and services being used, access and discovery services being provided and overall policies related to web archiving programs. While this survey documents the current state of U.S. web archiving initiatives, comparison with the results of the 2011-2012 survey enables an analysis of emerging trends. The report therefore describes the current state of the field, tracks the evolution of the field over the last few years, and forecasts future activities and developments.

The survey consisted of twenty-seven questions (PDF) organized around five distinct topic areas: background information about the respondent’s organization; details regarding the current state of their web archiving program; tools and services used by their program; access and discovery systems and approaches; and program policies involving capture, availability and types of web content. The survey was started 109 times and completed 92 times for an 84% completion rate. The 92 completed responses represented an increase of 19% in the number of respondents compared with the 77 completed responses for the 2011 survey.

Overall, the survey results suggest that web archiving programs nationally are both maturing and converging on common sets of practices. The results highlight challenges and opportunities that are, or could be, important areas of focus for the web archiving community, such as opportunities for more collaborative web archiving projects. We learned that respondents are highly focused on the data volume associated with their web archiving activity and its implications on cost and the usage of their web archives.

Based on the results of the survey, cost modeling, more efficient data capture, storage de-duplication, and anything that promotes web archive usage and/or measurement would be worthwhile investments by the community. Unsurprisingly, respondents continue to be most concerned about their ability to archive social media, databases and video. The research, development and technical experimentation necessary to advance the archiving tools on these fronts will not come from the majority of web archiving organizations with their fractional staff time commitments; this seems like a key area of investment for external service providers.

We hope you find the full report interesting and useful, whether you are just starting out developing a web archiving program, have been active in this area for years, or are just interested in learning more about the state of web archiving in the United States.

ATO2014: Open source, marketing and using the press / Nicole Engard

Steven Vaughan-Nichols was up to talk to us about open source, marketing and using the press.

Before Steven was a journalist he was a techie. This makes him unusual as a journalist who actually gets technology. Steven is here to tell us that marketing is a big part of your job if you want a successful open source company. He has heard a lot of people saying that marketing isn’t necessary anymore. The reason it’s necessary is because writing great code is not enough – if no one else knows about it it doesn’t matter. You need to talk with people about the project to make it a success.

We like to talk about open source being a meritocracy – that’s not 100% true – the meritocracy is the ideal or a convenient fiction. The meritocracy is only part of the story – it’s not just about your programming it’s about getting the right words to the right people so that they know about your project. You need marketing for this reason.

Any successful project needs 2 things – 1 you already know – is that it solves a problem that needs a solution – the other part is that it must be able to convince a significant number of people that your project is the solution to their problem. One problem open source has is that they confuse open source with the community – they are not the same thing. Marketing is getting info about your project to the world. The community is used for defining what the project really is.

Peter Drucker, says “The aim of marketing is to know and understand the customer so well the product or service fits him and sells itself.” Knowing the customer better than they know themselves is not an easy job – but it’s necessary to market/sell your product/service. If your project doesn’t fit the needs of your audience then it won’t go anywhere.

David Packard: “Marketing is too important to be left to the marketing department” – and it really is. There is a tendency to see marketing as a separate thing. Marketing should not be a separate thing – it should be honest about what you do and it should be the process of getting that message to the world. Each person who works on the project (or for the company) is a representative of your product – we are always presenting out product to the world (you might not like it – but it’s true). If your name is attached to a project/company then people are going to be watching you. You need to avoid zinging competing products and portray a positive image about you and your product. Even if you’re not thinking about what you’re saying as marketing it is.

Branding is another thing that open source projects don’t always think this through enough – they think this is trivial. Branding actually does matter! What images and words and name you use to describe your product matter. These will become the shorthand that people see your project as. For example if you see the Apply logo you know what it’s about. In our world of open source there is the Red Hat shadow man – whenever you see that image you know that means Red Hat and all the associations you have with that. You can use that association in your marketing. People might not know what Firefox is (yes there are people who don’t know) but they do recognize the cute little logo.

You can no longer talk just on IRC or online, you have to get out there. You need to go to conferences and make speeches and get the word out to people. And always remember to invite people to participate because this is open source. You have to make an active network and get away from the keyboard and talk to people to get the word out there. At this point you need to start thinking about talking to people from the press.

One thing to say to people, to the press, is a statement that will catch on – a catch phrase that will reach the audience you want to reach. The press are the people to talk to the world at large. These are people who are talking to the broader world – talking to people at opensource.com and other tech sites is great – but if you want to make the next leap you need to get to these type of people. Don’t assume that the press you’re talking to don’t know what you’re talking about – but just because they happen to like open source or what you’re talking about – it does not mean that they will write only positive things. The press are critics – they’re not really on your side – even if they like you they won’t just talk your products up. You need to understand that going in.

Having said all that – you do need to talk to the press at some point. And when you do, you need to be aware of a few things. Never ever call the press – they are always on perpetual deadline – you can’t go wrong with email though. When you do send an email be sure to remember to cover a few important things: tell then what you’re doing, tell them what’s new (they don’t care that you have a new employee – they might care if a bigwig quits or is fired), get your message straight (if you don’t know what you’re doing then the press can’t figure it out), and hit it fast (tell them in the first line what you’re doing, who your audience is and why the world should care). Be sure to give the name of someone they can call and email for more info – this can’t be emphasized enough – so often Steven has gotten press releases without contact info on them. Put the info on your website – make sure that there is always a contact in your company for the press. Remember if your project is pretty to send screenshots – this will save the press a lot of time in installing and getting the right images. Steven says “You need to spoon feed us”.

You also want to be sure to know what the press person you’re contacting writes about – do your homework – don’t contact them with your press release if it’s not something they write about. Also be sure to speak in a language that the person you’re talking to will understand [I know I always shy away from OPAC and ILS when talking to the press]. Not everyone you’re talking to has experience in technology. Don’t talk down to the press, just be sure to talk to the person in words they understand. Very carefully craft your message – be sure to give people context and tell them why they should care – if you can’t tell them that there they can’t tell anyone else your story.

Final points – remember to be sweet and charming when talking to the press. When they say something that bothers you, don’t insult the press. If you alienate the press they will remember. In the end the press has more ink/pixels than you do – their words will have a longer reach than you do. If the press completely misrepresents you be sure to send a polite note to the person explaining what was wrong – without using the word ‘wrong’. Be firm, but be polite.

The post ATO2014: Open source, marketing and using the press appeared first on What I Learned Today....

Facebook's Warm Storage / David Rosenthal

Last month I was finally able to post about Facebook's cold storage technology. Now, Subramanian Muralidhar and a team from Facebook, USC and Princeton have a paper at OSDI that describes the warm layer between the two cold storage layers and Haystack, the hot storage layer. f4: Facebook's Warm BLOB Storage System is perhaps less directly aimed at long-term preservation, but the paper is full of interesting information. You should read it, but below the fold I relate some details.

A BLOB is a Binary Large OBject. Each type of BLOB contains a single type of immutable binary content, such as photos, videos, documents, etc. Section 3 of the paper is a detailed discussion of the behavior of BLOBs of different kinds in Facebook's storage system.

Figure 3 shows that the rate of I/O requests to BLOBs drops rapidly through time. The rates for different types of BLOB drop differently, but all 9 types have dropped by 2 orders of magnitude within 8 months, and all but 1 (profile photos) have dropped by an order of magnitude within the first week.

The vast majority of Facebook's BLOBs are warm, as shown in Figure 5 - notice the scale goes from 80-100%. Thus the vast majority of the BLOBs generate I/O rates at least 2 orders of magnitude less than recently generated BLOBs.

In my talk to the 2012 Library of Congress Storage Architecture meeting I noted the start of an interesting evolution:
a good deal of previous meetings was a dialog of the deaf. People doing preservation said "what I care about is the cost of storing data for the long term". Vendors said "look at how fast my shiny new hardware can access your data".  ... The interesting thing at this meeting is that even vendors are talking about the cost.
This year's meeting was much more cost-focused. The Facebook data make two really strong cases in this direction:
  • That significant kinds of data should be moved from expensive, high-performance hot storage to cheaper warm and then cold storage as rapidly as feasible.
  • That the I/O rate that warm storage should be designed to sustain is so different from that of hot storage, at least 2 and often many more orders of magnitude, that attempting to re-use hot storage technology for warm and even worse for cold storage is futile.
This is good, because hot storage will be high-performance flash or other solid state memory and, as I and others have been pointing out for some time, there isn't going to be enough of it to go around.

Haystack uses RAID-6 and replicates data across three data centers, using 3.6 times as much storage as the raw data. f4 uses two fault-tolerance techniques:
  • Within a data center it uses erasure coding with 10 data blocks and 4 parity blocks. Careful layout of the blocks ensures that the data is resilient to drive, host and rack failures at an effective replication factor of 1.4.
  • Between data centers it uses XOR coding. Each block is paired with a different block in another data center, and the XOR of the two blocks stored in a third. If any one of the three data centers fails, both paired blocks can be restored from the other two.
The result is fault-tolerance to drive, host, rack and data center failures at an effective replication factor of 2.1, reducing overall storage demand from Haystack's factor of 3.6 by nearly 42% for the vast bulk of Facebook's BLOBs.  When fully deployed, this will save 87PB of storage. Erasure-coding everything except the hot storage layer seems economically essential.

Another point worth noting that the paper makes relates to heterogeneity as a way of avoiding correlated failures:
We recently learned about the importance of heterogeneity in the underlying hardware for f4 when a crop of disks started failing at a higher rate than normal. In addition, one of our regions experienced higher than average temperatures that exacerbated the failure rate of the bad disks. This combination of bad disks and high temperatures resulted in an increase from the normal ~1% AFR to an AFR over 60% for a period of weeks. Fortunately, the high-failure-rate disks were constrained to a single cell and there was no data loss because the buddy and XOR blocks were in other cells with lower temperatures that were unaffected.

ATO2014: Women in Open Source / Nicole Engard

DeLisa Alexander from Red Hat was up next to talk to us about women in open source.

How many of you knew that the first computer – the ENIAC was programmed by women mathematicians? DeLisa is here to share with us a passion for open source and transparency – and something similarly important – diversity.

Why does diversity matter? Throughout history we have been able to innovate our way out of all kinds of problems. In the future we’re going to have to do this faster than ever before. Diversity of thoughts, theories and views is critical to this process. It’s not just “good” to think about diversity, it’s important to innovation and for solving problems for quickly.

Why are we having so much trouble finding talent? 47% of the workforce is made up of women but only 12% are getting computer and information science degrees – and only 1-5% of open source contributors are women. How much faster could we solve the world’s big problems with the other 1/2 of the population were participating? We need to be part of this process.

When you meet a woman who is successful in technology – there is usually one person who mentored her (man or woman) to feel positive about her path – we could be that voice for a girl or woman that we know. Another thing that we can do is help our kids understand what is going on and what opportunities there are. Kids today don’t think about the fact that the games they’re playing were developed by a human – they just think that computers magically have software on them. They have no clue that someone had to design the hardware and program the software [I actually had someone ask me once what 'software' was - the hardest question I've ever had to answer!].

We can each think about the opportunities in open source. There is the GNOME for women program, Girl Develop It and the Women in Open Source award.

The challenge for us is to decide on one person that we’re going to try and influence to stay in the field, join the field, nominate for an award. If each of us do this one thing, next year this room could be filled with 50% women.

The post ATO2014: Women in Open Source appeared first on What I Learned Today....

ATO2014: Open Source at Facebook / Nicole Engard

James Pearce from Facebook started off day 2 at All Things Open with his talk about open source at Facebook.

James started by playing a piece of music for us that was only ever heard in the Vatican until Mozart as a boy wrote down the music he heard and shared it with the world. This is what open source is like. Getting beautiful content out to the world. Being open trumps secrecy. At Facebook they have 211 open source projects – nearly all on Github with about 21 thousand forks and over 10 million lines of codes. In addition to software Facebook also open sources their hardware. Open source has always been part of the Facebook culture since day 1. The difference is that now that Facebook is so large they are much more capable of committing to share via open source.

Here’s the thing people forget about open source – open source is a chance to open the windows on what you’re doing – “Open source is like a breeze from an open window”. By using open source it means they have to think things through more and it means they’re doing a better job on their coding. Facebook however was not always so dedicated to open source – if you looked at their Github account a few years ago you were see a lot of unsupported projects or undocumented projects. “The problem if you throw something over the wall and don’t care about it it’s worth than not sharing it at all”. About a year ago Facebook decided to get their open source house in order.

The first thing they needed to do was find out what they owned and what was out there – which projects were doing well and which were doing badly. The good news was that they were able to use Github’s API to gather all this information and put it in to a database. They then make all this data available via the company intranet so that everyone can see what the status of things is. Once of the nice side effects of sharing this info and linking an employee to each project is that it gamifies things. The graphs can we used to make the teams play off each other. Using things like Github stars and forks they compete to see who is more popular. Why they’re not optimizing on the number of stars, but it does make things fun and keeps people paying attention to their projects.

Also using the data they were able to clean up their “social debt” – they had some pull requests that were over a year old with no response. This gets them thinking about the community health of these projects. They think about the depth of a project, how they’re going to be used and how they’re going to continue on. Sometimes the things they release are just a read only type thing. Other times they will have forked something and will have a stated goal to upstream it to the original project. Sometimes a project is no longer a Facebook specific project. Sometimes Facebook will deprecate a project – this happens with a project that is ‘done’ or is of no longer of use to anyone. Finally they have in the past rebooted a project when upstreaming was not an option.

After giving talks like this James finds that lots of people approach him to talk about their solutions and find that they’re all coming up with the same solutions and reinventing the wheel. So these groups have come together with the idea of pooling their resources and sharing. This was the way TODO started. This is not a Facebook initiative – they’re just one of 13 members who are keen to contribute and share what they learned. This group is thinking about a lot of challenges like why using open source in the first place, what are the policies for launching a new project, licenses, how to interact with communities, what are the metrics to measure the success of a project, etc etc. What they hope to do is start up conversations around these topics and publish these as blogposts.

The post ATO2014: Open Source at Facebook appeared first on What I Learned Today....

Making Open Access Everyone’s Business / ACRL TechConnect

Librarians should have a role in promoting open access content. The best methods and whether they are successful is a matter of heated debate. Take for an example a recent post by Micah Vandergrift on the ACRL Scholarly Communications mailing list, calling on librarians to stage a publishing walkout and only publish in open access library and information science journals. Many have already done so. Others, like myself, have published in traditional journals (only once in my case) but make a point of making their work available in institutional repositories. I personally would not publish in a journal that did not allow such use of my work, and I know many who feel the same way. 1 The point is, of course, to ensure that librarians are not be hypocritical in their own publishing and their use of repositories to provide open access–a long-standing problem pointed out by Dorothea Salo [2.Salo, Dorothea. “Innkeeper at the Roach Motel,” December 11, 2007. http://digital.library.wisc.edu/1793/22088.], among others2 We know that many of the reasons that faculty may hesitate to participate in open access publishing relate to promotion and tenure requirements, which generally are more flexible for academic librarians (though not in all cases–see Abigail Goben’s open access tenure experiment). I suspect that many of the reasons librarians aren’t participating more in open access has partly to do with more mundane reasons of forgetting to do so, or fearing that work is not good enough to make public.

But it shouldn’t be only staunch advocates of open access, open peer review, or new digital models for work and publishing who are participating. We have to find ways to advocate and educate in a gentle but vigorous manner, and reach out to new faculty and graduate students who need to start participating now if the future will be different. Enter Open Access Week, a now eight-year-old celebration of open access organized by SPARC. Just as Black Friday is the day that retailers hope to be in the black, Open Access Week has become an occasion to organize around and finally share our message with willing ears. Right?

It can be, but it requires a good deal of institutional dedication to make it happen. At my institution, Open Access Week is a big deal. I am co-chair of a new Scholarly Communications committee which is now responsible for planning the week (the committee used to just plan the week, but the scope has been extended). The committee has representation from Systems, Reference, Access Services, and the Information Commons, and so we are able to touch on all aspects of open access. Last year we had events five days out of five; this year we are having events four days out of five. Here are some of the approaches we are taking to creating successful conversations around open access.

    • Focus on the successes and the impact of your faculty, whether or not they are publishing in open access journals.

The annual Celebration of Faculty Scholarship takes place during Open Access Week, and brings together physical material published by all faculty at a cocktail reception. We obtain copies of articles and purchase books written by faculty, and set up laptops to display digital projects. This is a great opportunity to find out exactly what our faculty are working on, and get a sense of them as researchers that we may normally lack. It’s also a great opportunity to introduce the concept of open access and recruit participants to the institutional repository.

    • Highlight the particular achievements of faculty who are participating in open access.

We place stickers on materials at the Celebration that are included in the repository or are published in open access journals. This year we held a panel with faculty and graduate students who participate in open access publishing to discuss their experiences, both positive and negative.

  • Demonstrate the value the library adds to open access initiatives.

Recently bepress (which creates the Digital Commons repositories on which ours runs) introduced a real time map of repositories downloads that was a huge hit this year. It was a compelling visual illustration of the global impact of work in the repository. Faculty were thrilled to see their work being read across the world, and it helped to solve the problem of invisible impact. We also highlighted our impact with a new handout that lists key metrics around our repository, including hosting a new open access journal.

  • Talk about the hard issues in open access and the controversies surrounding it, for instance, CC-BY vs. CC-NC-ND licenses.

It’s important to not sugarcoat or spin challenging issues in open access. It’s important to include multiple perspectives and invite difficult conversations. Show scholars the evidence and let them draw their own conclusions, though make sure to step in and correct misunderstandings.

  • Educate about copyright and fair use, over and over again.

These issues are complicated even for people who work on them every day, and are constantly changing. Workshops, handouts, and consultation on copyright and fair use can help people feel more comfortable in the classroom and participating in open access.

  • Make it easy.

Examine what you are asking people to do to participate in open access. Rearrange workflows, cut red tape, and improve interfaces. Open Access Week is a good time to introduce new ideas, but this should be happening all year long.

We can’t expect revolutions in policy and and practice to happen overnight, or without some  sacrifice. Whether you choose to make your stand to only publish in open access journals or some other path, make your stand and help others who wish to do the same.

Notes
  1. Publishers have caught on to this tendency in librarians. For instance, Taylor and Francis has 12-18 month repository embargoes for all its journals except LIS journals. Whether this is because of the good work we have done in advocacy or a conciliatory gesture remains up for debate.
  2. Xia, Jingfeng, Sara Kay Wilhoite, and Rebekah Lynette Myers. “A ‘librarian-LIS Faculty’ Divide in Open Access Practice.” Journal of Documentation 67, no. 5 (September 6, 2011): 791–805. doi:10.1108/00220411111164673.

a wayback machine (pywb) on a cheap, shared host / Raffaele Messuti

For a long time the only free (i'm unaware of commercial ones) implementation of a web archival replay software has been the Wayback Machine (now Openwayback). It's a stable and mature software, with a strong community behind.
To use it you need to be confident with the deploy of a java web application; not so difficult, and documentation is exaustive.
But there is a new player in the game, pywb, developed by Ilya Kramer, a former Internet Archive developer.
Built in python, relatively simpler than wayback, and now used in a pro archiving project at Rhizome.

Stump The Chump D.C.: Meet The Panel / SearchHub

If you haven’t heard: On November 13th, I’ll be back in the hot seat at Lucene/Solr Revolution 2014 answering tough Solr questions — submitted by users like you — live, on stage, sight unseen.

Today, I’m happy to announce the Panel of experts that will be challenging me with those questions, and deciding which questions were able to Stump The Chump!

In addition to taunting me with the questions, and ridiculing all my “Um”s and “Uhh”s as a struggle to answer them, the Panel members will be awarding prizes to the folks who have submitted the question that do the best job of “Stumping” me. Questions can be submitted to our panel via stump@lucenerevolution.org any time until the day of the session. Even if you won’t be able to attend the conference, you can still participate — and do your part to humiliate me — by submitting your tricky questions.

To keep up with all the “Chump” news fit to print, you can subscribe to this blog (or just the “Chump” tag).

The post Stump The Chump D.C.: Meet The Panel appeared first on Lucidworks.

ATO2014: Pax Data / Nicole Engard

Doug Cutting from Cloudera gave our closing keynote on day 1.

Hadoop started a revolution. It is an open source platform that really harnesses data.

Doug CuttingIn movies the people who harness the data are always the bad guys – so how do we save ourselves from becoming the bad guy? What good is coming out of good data?

Education! The better data we have the better our education system can be. Education will be much better if we can have a custom experience for each student – these kinds of observations are fed by data. If we’re going to make this happen we’re going to need to study data about these students. The more data you amass the better predictions you can make. On the flip side it’s scary to collect data about kids. inBloom was an effort to collect this data, but they ended up shutting down because of the fear. There is a lot of benefit to be had, and it would be sad if we didn’t enable this type of application.

Heathcare is another area this becomes handy. Medical research benefits greatly from data. The better data we collect the better we can care for people. Once again this is an area that people have fears about shared data.

Climate is the last example. Climate is changing and in order to understand how we can effect it data plays a huge role. Data about our energy consumption is part of this. Some people say that certain data is not useful to collect – but this isn’t a good approach. We want to collect all the data and then evaluate it. You don’t know in advance what value the data you collect will have.

How do we collect this data if we don’t have trust? How do we build that trust? There are some technology solutions like encrypting data and anonymizing data sets – these methods are imperfect though. In fact if you anonymize the data too much it muddies it and makes it less useful. This isn’t just a technical problem – instead we need to build trust.

The first way to build trust is to be transparent. If you’re collecting data you need to let people know you’re collecting it and what you’re going to use it for.

The next key element is establishing best practices around data. These are the technical elements like encryption and anonymization. This also includes language to agree/disagree to ways our data is shared.

Next we need to draw clear lines that people can’t step over – for example we can’t show someone’s home address without their express permission. Which gives us a basis for the last element.

Enforcement and oversight is needed. We need someone who is checking up on these organizations that are collecting data. Regulation can sound scary to people, but we have come to trust it in many markets already.

This is not just a local issue – it needs to be a global effort. As professionals in this industry we need to think about how to build this trust and get to the point where data can be stored and shared.

The post ATO2014: Pax Data appeared first on What I Learned Today....

ATO2014: Saving the world: Open source and open science / Nicole Engard

Marcus Hanwell, another fellow opensource.com moderator, was the last session of the day with his talk about saving the world with open source and open science!

In science there was a strong ethic of ‘trust, but verify’ – and if you couldn’t reproduce the efforts of the scientist then the theory was dismissed. The ‘but verify’ part of that has kind of gone away in recent years. In science the primary measure of whether you were successful or not was to publish – citations to your work are key. Then when you do publish your content is locked down in costly journals instead of available in the public domain. So if you pay large amounts of money you can have access to the article – but not the data necessarily. Data is kept locked up more and more to keep the findings with the published person so that they get all the credit.

AcademeseJust like in the talk earlier today on what Academia can learn from open source Marcus showed us an article from the 17th century next to an article today – the method of publishing has not changed. Plus these articles are full of academese which is obtuse.

All of this makes it very important to show what’s in the black box. We need to show what’s going on in these experiments at all levels. This includes sharing your steps to run calculations – the source code used to get this info should be written in open source because now the tools used are basically notebooks with no version control system. We have to stop putting scientists on these pedestals and start to hold them accountable.

A great quote that Marcus shared from an Economist article was: “Scientific research has changed the world. Now it needs to change itself.” Another was “Publishing research without data is simply advertising, not science.” Scientists need to think more about licenses – they give their rights away to journals because they don’t pay enough attention to the licenses that are out there like the creative commons.

What is open? How do we change these behaviors? Open means that everyone has the same access. Certain basic rights are granted to all – the ability to share, modify and use the information. There is a fear out there that sharing our data means that we could prove that we’re wrong or stupid. We need to change this culture. We need more open data (shared in open formats) and using open source software, more open standards and open access.

We need to push boundaries – most of what is published in publicly funded so it should be open and available to all of us! We do need some software to share this data – that’s where we come in and where open source comes in. In the end the lesson is that we need to get scientists to show all their data and not reward academics solely for their citations because this model is rubbish. We need to find a new way to reward scientists though – a more open model.

The post ATO2014: Saving the world: Open source and open science appeared first on What I Learned Today....

ATO2014: Open Source in Healthcare / Nicole Engard

Luis Ibanez, my fellow opensource.com moderator, was up next to talk to us about Open Source in Healthcare. Luis’s story was so interesting – I hope I caught all the numbers he shared – but the moral of the story is that hospitals could save insane amounts of money if they switched to an open system.

There are 7 billion people on the planet making $72 trillion a year. In the US we have 320 million people and that’s 5% of the global population, but we make 22% of the economic production on the planet – what do we do with that money? 24% of that money is spent on healthcare ($3.8 trillion) – not just the government, this is the spending of the entire country. This is more than they’re spending in Germany and France. However we’re ranked 38th in healthcare quality in the world. France is #1 however and they spend only 12% of their money on healthcare. This is an example of how spending more money on the problem is not helping.

Is there something that geekdom can do to set this straight? Luis says ‘yes!’

So, why do we go to the doctor? To get information. We want the doctor to tell us if we have a problem they can fix and know how to fix it. Information connects directly to our geekdom.

Data centerToday if you go to a hospital our data will be stored in paper and will go in to a “data center” (a filing cabinet). In 2010 84% of hospitals were keeping paper records versus using software. The healthcare industry is the only industry that needs to be paid to get them to switch to using software to store this information – $20 billion spent between 2010 and 2013 to get us to 60% of hospitals storing information electronically. This is one of the reasons we’re spending so much on healthcare right now.

The problem here (and this is Luis’s rant) is that the hospitals have to pay for this software in the first place. And you’re not allowed to share anything about the system. You can’t take screenshots, you can’t talk about the features, you are completely locked down. This system will run your hospital (a combination of hotel, restaurant, and medical facility) – they have been called the most complex institution of the century. These systems for a 400 bed hospital cost $100 million – and they have to buy these systems with little or no knowledge of how they work because of the security measures around seeing/sharing information about the software. This is against the idea of a free market because of the NDA you have to sign to see the software and use the software.

An example that Luis gave us was Wake Forest hospital which ended up being in the red by $56 million. All because they bought software for $100 million – leading to them having to fire their people, stop making retirement payments and other cuts. [For me this sounds a lot like what libraries are doing - paying salaries for an ILS instead of putting money toward people and services instead and saving money on the ILS]

Another problem in the medical industry is that 41% (less than 1/2) have the capability to send secure messages to patients. This is not a technology problem – this is a cultural problem in the medical world. Other industries have solved this technology problem already.

So, why do we care about all of this? There are 5,723 hospitals in the US, 211 of them are federally run (typically military hospitals), 413 are psychiatric, 2,894 are non profits and the others are private or state run. That totals nearly 1 million beds and $830 billion a year is spent in hospitals. The software that these hospitals are buying costs about $250 billion.

The federal hospitals are running a system that was released in to the public domain called VistA. OSEHRA was founded to protect this software. This software those is written in MUMPS. This is the same language that the $100 million software is written in! Except there is a huge difference in price.

If hospitals switched they’d spend $0. To keep this software running/updated we’d need about 20 thousand developers – but if you divide that by the hospitals that’s 4 developers per hospital. These developers don’t need to be programmers though – they could be doctors, nurses pharmacists – because MUMPS is so easy to learn.

The post ATO2014: Open Source in Healthcare appeared first on What I Learned Today....

LITA Forum: Online Registration Ends Oct. 27 / LITA

Don’t miss your chance to register online for the 2014 LITA Forum “From Node to Network” to be held Nov. 5-8, 2014 at the Hotel Albuquerque in Albuquerque N.M. Online registration closes October 27, 2014. You can register on site, but it’s so much easier to have it all taken care of before you arrive in Albuquerque.

Book your room at the Hotel Albuquerque. The guaranteed LITA room rate date has passed, but when you call at: 505-843-6300 ask for the LITA room rate, there might be a few rooms left in our block.

Three keynote speakers will be featured at this year’s forum:

  • AnnMarie Thomas, Engineering Professor, University of St. Thomas
  • Lorcan Dempsey, Vice President, OCLC Research and Chief Strategist
  • Kortney Ryan Ziegler, Founder Trans*h4ck.

More than 30 concurrent colleague inspired sessions and a dozen poster sessions will provide a wealth of practical information on a wide range of topics.

Two preconference workshops will also be offered;

  • Dean B. Krafft and Jon Corson-Rikert of Cornell University Library will present
    “Linked Data for Libraries: How libraries can make use of Linked Open Data to share information about library resources and to improve discovery, access, and understanding for library users”
  • Francis Kayiwa of Kayiwa Consulting will present
    “Learn Python by Playing with Library Data”

Networking opportunities, a major advantage of a smaller conference, are an important part of the Forum. Take advantage of the Thursday evening reception and sponsor showcase, the Thursday game night, the Friday networking dinners or Kitchen Table Conversations, plus meals and breaks throughout the Forum to get to know LITA leaders, Forum speakers, sponsors, and peers.

2014 LITA Forums sponsors include EBSCO, Springshare, @mire, Innovative and OCLC.

Visit the LITA website for more information.

Library and Information Technology Association (LITA) members are information technology professionals dedicated to educating, serving, and reaching out to the entire library and information community. LITA is a division of the American Library Association.

LITA and the LITA Forum fully support the Statement of Appropriate Conduct at ALA Conferences

Islandora Deployments Repo / Islandora

Ever wonder what another institution's Islandora deployment looks like in detail? Look no further: York and Ryerson have shared their deployments wit the community on GitHub, including details such as software versions, general settings, XACML policies, and Drupal modules. If you would like to share your deployment, please contact Nick Ruest so he can add you as a collaborator on the repo.

ATO2014: Open Source & the Internet of Things / Nicole Engard

Erica Stanley was up next to talk to us about Open Source and the Internet of Things (IoT).

The Internet of Things (Connected Devices) is the connection of things and people over a network. Why the Internet of Things? Why now? Because technology has made it a possibility. Why open source Internet of Things? To ensure that innovation continues.

Some of the applications we have for connected devices are: Health/Fitness, Home/Environment and Identity. Having devices that are always connected to us allow us to do things like monitor our health so that we can see when something might be wrong before we feel symptoms. Some devices like this are vision (Google glass) related, smart watches, wearable cameras, wristbands (fitbit), smart home devices (some of which are on my wishlist), connected cars (cars that see that the car in front of you has stopped versus slowed down) and smart cities like Raleigh.

Internet of ThingsThere are many networking technologies these devices can use to stay connected, but bluetooth seems to be the default that is being used. There is a central device and a peripheral device – the central device wants the data that the peripheral device has. They use bluetooth to communicate with each other – the central device requesting info from the peripheral.

Cloud commuting, another important technology, has been one of the foundations for the Internet of Things – this is how we store all the info we’re passing back and forth. As we get more ability for our devices to learn we get more devices that can act on the data they’re gathering (there is a fitness app/device that will encourage you to get up and move once in a while for example).

Yet another technology that’s important is augmented reality showing us results of data in our day to day (Google glass showing you the directions to where you’re walking).

One challenge facing us is the fact that we have devices living in silos. So we have Google devices and Samsung devices – but they don’t talk to each other. We need to move towards a platform for connected devices. This will allow us to have a user controlled and created environment – where the devices I want to talk to each other can and the people I want to see the data can see the data. This allows us to personalize our environment but also secure our environment.

Speaking of security, there are some guidelines for developers that we can all follow to be sure to create secure devices. When building these devices we want to think about security from the very beginning. We need to understand our vulnerabilities, build security from the ground up. This starts with the OS so that we’re building an end-to-end solution. Obviously you want to be proactive in testing your apps and use updated APIs/frameworks/protocols.

Some tools you can use to get started as far as hardware: Arduino Compatible devices (Lilypad, Adafruit Flora and Gemma), Tessel, and Metawear. Software tools include: Spark Core, IoT Toolkit, Open.Sen.se, Cloud Foundry, Eclipse IoT Tools, and Huginn (which is kind of an open source IFTTT).

One thing to keep in mind when designing for IoT is that we no longer own the foreground – we might not have a screen or a full sized screen. We also have to think about integration with other devices and discoverablity of functionality if we don’t have a screen (gesture based device). Finally we have to keep in mind low energy and computing power. On the product side you want to think about the form factor – you don’t want a device that no one will want to wear. This also means creating personalizable devices

Remember that there is no ‘one size fits all’ – your device doesn’t have to be the same as others that are out there. Try to not get in the way of your user – build for people not technology! If we don’t try to take all of the user’s attention with the wearable then we’ll get more users.

The post ATO2014: Open Source & the Internet of Things appeared first on What I Learned Today....

ATO2014: How Raleigh Became an Open Source City / Nicole Engard

Next up was Jason Hibbets and Gail Roper who gave a talk about the open source initiative in Raleigh.

Gail started by saying ‘no one told us we had to be more open’. Instead there were signs that showed that this was a good way to go. In 2010 Forbes labeled Raleigh one of the most wired cities in the country, but what they really want is to be the most connected city in the country.

Raleigh has 3 initiatives open source, open data, and open access – the city wants to get gigabit internet connections to every household. So far they have a contract with AT&T and they are working with Google to see if Raleigh will become a Google fiber city.

The timeline leading up to this though required a lot of education of the community about what open meant. It didn’t mean that before this they were hiding things from the community. Instead they had to teach people about open source and open access. There were common stereotypes that the government had about open source – the image of a developer in his basement being among them.

Why did they do this? Why do they want to be an open city? Because of SMAC (Social, Mobile, Analytics, Cloud). Today’s citizens expect that anywhere on any device they should be able to connect to the web. Government organizations like Raleigh’s will have 100x the data to manage. So providing a government that is collaborative and connected to the community becomes a necessity not an option.

“Empowerment of individuals is a key part of what makes open source work, since in the end, innovations tend to come from small groups, not from large, structured efforts.” -Tim O’Reilly

Next up was Jason Hibbets who is the team lead on opensource.com by day and by night he supports the open Raleigh project. Jason shared with us how he helped make the open Raleigh vision a reality. He is not a coder, but he is a community manager. Government to him is about more than putting taxes in and getting out services – it’s about us – the members of the community.

Jason discovered CityCamp – a government unconference that brings together local citizens to build stronger communities where they live. These camps have allowed for people to come together to share their idea openly. Along the way the organizers of this local CityCamp became members of Code for America. Using many online tools they have made it easy to communicate with their local brigade and with others around the state. There is also a meetup group if you’re in the area. If you’re not local you can join a brigade in your area or start your own!

Jason has shared his story in his book The foundation for an open city.

The post ATO2014: How Raleigh Became an Open Source City appeared first on What I Learned Today....

Jobs in Information Technology: October 22 / LITA

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week

Head of Technology, Saline County Library,  Benton,  AR

Science Data Librarian,  Penn State University Libraries, University Park,  PA

Visit the LITA Job Site for more available jobs and for information on submitting a  job posting.

The Variation and the Damage Done / HangingTogether

IMG_0565Some of you may already know about my “MARC Usage in WorldCat” project, where I simply expose the contents of a number of MARC subfields in ordered lists of strings. The point, as I state on the site itself, is to expose “which elements and subfields have actually been used, and more importantly, how? This work seeks to use evidence of usage, as depicted in the largest aggregation of library data in the world — WorldCat — to inform decisions about where we go from here.”

One aspect of this is the quality, or lack thereof, of the actual data recorded. As an aggregator, we see it all. We see the typos, the added punctuation where none should be. We see the made up elements and subfields (yes, made up). We see data that is clearly in the completely wrong place in the record (what were they thinking?). We see it all.

So this week when I received a request for a specific report, as sometimes happens, I was happy to comply. The correspondent wanted to see the contents of the 775 $e subfield, which, according to the documentation should only have a “language code”. Catalogers know that you can’t make these up, they must come from the Library of Congress’ MARC Code List for Languages.

Sounds simple, right? If you encode a language in the 775 $e, it must come from that list. But that doesn’t prevent catalogers from embellishing (see all the variations for “eng” below and the number of times they were found; this does not include variations like “anglais”). Why not add punctuation? Or additional information, such as “bilingual”? I’ll tell you why not. Because it renders the data increasingly unusable without normalization.

And normalization comes at a cost. Easy normalization, such as removing punctuation, is straightforward. But at some point the easiest thing to do is to simply throw it away. If a string only occurs once, how important can it be?

As we move into a more fully machine-supported world for library metadata we will be facing more of these choices. Some will be harder than others. If you don’t believe me, just check out what we have to do with dates.

52861 eng
1249 eng.
400 (eng)
20 (eng.)
12 (eng).
3 eeng
2 [eng]
1 feng
1 eng~w(CaOOP) a472415
1 engw(CaOOP) a459037
1 engw(CaOOP) a371268
1 engw(CaOOP) 1-181456
1 engw(CaOOP) 01-0314275
1 engw(CaOOP) 01-0073869
1 enge
1 eng..
1 eng,
1 eng(CaOOP) a359090
1 eng(CaOOP) 1-320212
1 eng$x0707-9311
1 bilingual eng
1 (eng),

Photo by Suzanne Chapman, Creative Commons license CC BY-NC-SA 2.0

About Roy Tennant

Roy Tennant works on projects related to improving the technological infrastructure of libraries, museums, and archives.

ATO2014: What Academia Can Learn from Open Source / Nicole Engard

Arfon Smith from Github was up to talk to us about Academia and open source.

Arfon started with an example of a shared research proposal. So you create a document and then you edit the filename with each iteration because word processing applications is not good at tracking changes and allowing collaboration. Git though is meant for this very thing. So he showed us a book example on Github where the collaborators worked together on a document.

In open source there is this ubiquitous culture of reuse. Academia doesn’t do this – but why not? The problem is the publishing requirement in academia. The first problem is that ‘Novel’ results are preferred. You’re incentivized to publish new things to move ahead. The second problem is that the value of your citation is more powerful than the number of people you’ve worked with. And thirdly, and more generally, the format sucks. Even if it’s an electronic document it’s still hard to collaborate on it (see the document example above). This is state of the art technology … for the late 17th century. (Reinventing Discovery).

So, what do open source collaborations do well? There is a difference sometimes between open source and open source collaborations, this is an important distinction. Open source is the right to modify – it’s not the right to contribute back. An open source collaborations are highly collaborative development processes that allow anyone to contribute if they show an interest. This brings us back to the ubiquitous culture of reuse. These collaborations also expose the process by which they work together – unlike the current black box of research in academia.

Git processHow do we get 4000 people to work together then? Using git and Github specifically you can fork the code from an existing project and work on it without breaking other people’s work and then when you want to contribute it back you submit a pull request to the project. The beauty of this is ‘code first, permission later’ and every time this process happens the community learns.

The goal of a contribution of Github is to get it merged in to the product. Not all open source projects are receptive to these pull requests though, so those are not the collaborative types of projects.

Fernando Perez: “open source is .. reproducible by necessity.” If you don’t collaborate then these projects wouldn’t move forward – so they need to be collaborative. The difference in academia is that you have to work alone to and in a closed fashion to move ahead and get recognition.

Open can mean within your team or institution – it doesn’t have to be worldwide like in open source. But making your content electronic and available (which does not me a word doc or email) makes working together easier. Academia can learn from open source – more importantly academia must learn from open source to move forward.

All the above seems kind of negative, but Arfon did show us a lot of examples where people are sharing in academia – we just need to get this to be more widespread. Where might more significant change happen? The most obvious place to look is where communities form – like around a shared challenge – or around shared data. Science and big data are where we’re going to see this more hopefully.

There are challenges still though – so how do we make sharing the norm? The main problem is that academic reward ‘credit’ – so articles written by you solely. Tools like Astropy is hugely successful on github, but the authors had to write a paper about it to get credit. The other issue is trust – academics are reluctant to use other people’s stuff because we don’t know if their work is of value. In open source we have solved this problem already – if the package was downloading thousands of times it’s probably reliable. There are also tools like codeclimate that give your code a grade.

In short the barriers are cultural not technical!

The post ATO2014: What Academia Can Learn from Open Source appeared first on What I Learned Today....

New Open Access Button launches as part of Open Access Week / Open Knowledge Foundation

This post is part of our Open Access Week blog series to highlight great work in Open Access communities around the world.

button

Push Button. Get Research. Make Progress.

If you are reading this, I’m guessing that you too are a student, researcher, innovator, an everyday citizen with questions to answer, or just a friend to Open Knowledge. You may be doing incredible work and are writing a manuscript or presentation, or just have a burning desire to know everything about anything. In this case I know that you are also denied access to the research you need, not least because of paywalls blocking access to the knowledge you seek. This happens to me too, all the time, but we can do better. This is why we started the Open Access Button, for all the people around the world who deserve to see and use more research results than they can today.

Yesterday we released the new Open Access Button at a launch event in London, which you can download from openaccessbutton.org. The next time you’re asked to pay to access academic research. Push the Open Access Button on your phone or on the web. The Open Access Button will search the web for version of the paper that you can access.

If you get your research, you can make progress with your work. If you don’t get your research, your story will be used to help change the publishing system so it doesn’t happen again. The tool seeks to help users get the research they need immediately, or adds papers unavailable to a wish-list we can get started . The apps work by harnessing the power of search engines, research repositories, automatic contact with authors, and other strategies to track down the papers that are available and present them to the user – even if they are using a mobile device.

The London launch led other events showcasing the Open Access Button throughout the week, in Europe, Asia and the Middle East. Notably, the new Open Access Button was previewed at the World Bank Headquarters in Washington D.C. as part of the International Open Access Week kickoff event. During the launch yesterday, we reached at least 1.3 million people on social media alone. The new apps build upon a successful beta released last November that attracted thousands of users from across the world and drew lots of media attention. These could not have been built without a dedicated volunteer team of students and young researchers, and the invaluable help of a borderless community responsible for designing, building and funding the development.

Alongside supporting users, we have will start using the data and the stories collected by the Button to help make the changes required to really solve this issue. We’ll be running campaigns and supporting grassroots advocates with this at openaccessbutton.org/action as well as building a dedicated data platform for advocates to use our data . If you go there you now you can see the ready to be filled map, and your first action, sign our first petition, this petition in support of Diego Gomez, a student who faces 8 years in prison and a huge monetary fine for doing something citizens do everyday, sharing research online for those who cannot access it.

If you too want to contribute to these goals and advance your research, these are exciting opportunities to make a difference. So install the Open Access Button (it’s quick and easy!), give it a push, click or tap when you’re denied access to research, and let’s work together to fix this problem. The Open Access Button is available now at openaccessbutton.org.