In other words, it’s sending JSON containing… I’m not sure.
The values of the various keys in that structure are obviously Base 64-encoded, but when run through a decoder, the result is just binary data, presumably the result of another layer of encryption.
Thus, we haven’t actually gotten much further towards verifying that ADE is sending only the data they claim to. That packet of data could be describing my progress reading that book purchased from Kobo… or it could be sending something else.
That extra layer of encryption might be done as protection against a real man-in-the-middle attack targeted at Adobe’s log server — or it might be obfuscating something else.
Either way, the result remains the same: reader privacy is not guaranteed. I think Adobe is now doing things a bit better than they were when they released ADE 4.0, but I could be wrong.
If we as library workers are serious about protection patron privacy, I think we need more than assurances — we need to be able to verify things for ourselves. ADE necessarily remains in the “unverified” column for now.
Klavaro is just another free touch typing tutor program. We felt like to do it because we became frustrated with the other options, which relied mostly on some few specific keyboards. Klavaro intends to be keyboard and language independent, saving memory and time (and money).
First Place: Inera, Inc., collaborating with CrossRef
Second Place (Tie): Digital Science, collaborating with portfolio companies; and NetGalley, collaborating with the American Booksellers Association
Third Place: The Harvard Common Press, collaborating with portfolio companies
Based on an embrace of disruption and the need to transform the traditional value chain of content creation, the New England Publishing Collaboration (NEPCo) Awards showcase results achieved by two or more organizations working as partners. Other companies short-listed for the awards this year were Cenveo Publisher Services, Firebrand Technologies, Focal Press (Taylor & Francis), Hurix Systems, The MIT Press, and StoryboardThat.
Criteria for the awards included, results achieved,industry significance,depth of collaboration, and presentation.
An audience voting component was included--Digital Science was the overall winner among audience members.
Keynote speaker David Weinberger, co-author of Cluetrain Manifesto and senior researcher at the Harvard Berkman Center, was introduced by David Sandberg, co-owner of Porter Square Books.
Source: Bookbuilders of Boston http://www.nepcoawards.com/
The Interlibrary Loan Policies Directory will be updated this weekend. We have changed the mediatype for Atom-wrapped JSON responses from "application/json" to "application/atom+json". This change is backward compatible - users can continue using “application/json” as needed for the time being - but we do recommend incorporating this mediatype change soon.
The following is a guest post by Julio Díaz Laabes, HACU intern and Program Management Assistant at the Library of Congress.
This is the second part of a two part series on the former class of residents from the National Digital Stewardship Residency program. Part One covered four residents from the first year of the program and looked at their current professional endeavors and how the program helped them achieve success in their field. In this second part, we take a look at the successes of the remaining six residents of the 2013-2014 D.C class.
Top (left to right): Lauren Work, Jaime McCurry and Julia Blase Bottom (left to right): Emily Reynolds, Molly Schwartz and Margo Padilla.
Lauren Work is employed as the Digital Collections Librarian at the Virginia Commonwealth University in Richmond, VA. She is responsible for Digitization Unit projects at VCU and is involved in a newly launched open access publishing platform and repository. Directly applying her experience during the residency, Lauren is also part of a team working to develop digital preservation standards at VCU and is participating in various digital discovery and outreach projects. On her experience being part of NDSR, Lauren said, “The residency gave me the ability to participate in and grow a network of information professionals focused on digital stewardship. This was crucial to my own professional growth.” Also, the ability to interact with fellow residents gave her “a tightly-knit group of people that I will continue to look to for professional support throughout my career.”
Following her residency at the Folger Shakespeare Library, Jaime McCurry became the Digital Assets Librarian at Hillwood Estate, Museum and Gardens in Washington, D.C. She is responsible for developing and sustaining local digital stewardship strategies and preservation policies and workflows; development of a future digital institutional repository and performing outreach services to raise understanding and interest in Hillwood digital collections. On what was the most interesting aspect of her job, Jaime said “it’s the wide range of digital activities I am able to be involved in, from digital asset management to digital preservation, to access, outreach and web development.” In line with Lauren, Jaime stated, “NDSR helped me to establish a valuable network of colleagues and professionals in the DC area and also to further strengthen my project management and public speaking skills.”
At the conclusion of NDSR, Julia Blase accepted a position with Smithsonian Libraries as Project Manager for the Field Book Project, a collaborative initiative to improve the accessibility of field book content through cataloging, conservation, digitization and online publication of digital catalog data and images. For Julia, one of the most exiting aspects of the project is its cooperative nature; it involves staff at Smithsonian Libraries, Smithsonian Archives, Smithsonian National Museum of Natural History and members and affiliates of the Biodiversity Heritage Library. “NDSR helped introduce me to the community of digital library and archivist professionals in the DC area. It also gave me the chance to present at several conferences, including CNI (Coalition for Networked Information) in St. Louis, where I met some of the people I work with today.”
Emily Reynolds is a Library Program Specialist at the Institute of Museum and Library Services, a federal funding agency. She works on discretionary grant programs including the Laura Bush 21st Century Librarian Program, which supports education and professional development for librarians and archivists (the NDSR program in Washington D.C., Boston and New York were funded through this program). “The NDSR helped in my current job because of the networking opportunities that residents were able to create as a result. The cohort model allowed us to connect with professionals at each other’s organization, share expertise with each other, and develop the networks and professional awareness that are vital for success,” she said. On the most interesting aspect of her job, Emily commented that “because of the range of grants awarded by IMLS, I am able to stay up-to-date on some of the most exciting and innovative projects happening in all kinds of libraries and archives. Every day in the office is different, given the complexities of the grant cycle and the diversity of programs we support.”
Molly Schwartz was a resident at the Association of Research Libraries. Now she is a Junior Analyst at the U.S State Department in the bureau of International information Program’s Office of Audience Research and Measurement. One of her biggest achievements is being awarded a 2014-2015 Fulbright Grant to work with the National Library of Finland and Aalto University on her project, User-Centered Design for Digital Cultural Heritage Portals. During this time, she will focus her research on the National Library of Finland’s online portal, Finna and conduct user-experience testing to improve the portal’s usability with concepts form user-centered designs.
Lastly, Margo Padilla is now the Strategic Programs Manager at the Metropolitan New York Library Council. She works alongside METRO staff to identify trends and technologies, develop workshops and services and manage innovative programs that benefit libraries, archives and museums in New York City. She is also the Program Director for NDSR-New York . “I used my experience as a resident to refine and further develop the NDSR program. I was able to base a lot of the program structure on the NDSR-DC model and the experience of the NDSR-DC cohort.” Margo also says that her job is especially rewarding “because I have the freedom to explore new ideas or projects, and leveraging the phenomenal work of our member community into solutions for the entire library, archive and museum community.”
Seeing the wide scope of positions the residents accepted after finishing the program, it is clear the NDSR has been successful in creating in-demand professionals to tackle digital preservation in many forms across the private and public sectors. The 2014-2015 Boston and New York classes are already underway and the next Washington D.C. class begins in June of 2015 (for more on that, see this recent blog post) . We expect these new NDSR graduates to form the next generation of digital stewards and to reach the same level of success as those in our pilot program.
Americas – Honduras, Nicaragua, Guatemala, Brazil, USA
Asia/Pacific – South Korea, Taiwan, India, Mongolia, New Zealand, Australia
Africa – Sierra Leone, Mali, Malawi, Mozambique
Join the Global Madness Day: October 30
Take part in a day of activities to make sure we get the most submissions through that we can for the Global Open Data Index 2014. Make sure your country is represented – the October 30 Global Madness Day is the last day in the sprint!
At 2pm GMT Rufus Pollock, the President & Founder of Open Knowledge will be chatting to Mor Rubinstein about the Index in a Google Hangout. Make sure you join the chat here!
Other events will take place throughout the day. See our twitter feed for updates #openindex14
Some practical tips…
Lastly, a couple of reminders on some key questions around the Index from Mor Rubinstein, Community coordinator for the Index:
1. What is machine readable? – This year we added help text for this question. Please read it when making your submissions this year. Frequently contributors categorise HTML format as a machine readable format. While it is easy to scrape HTML, it is actually NOT a machine readable format. Please use our guide if you are in doubt or send an email to the census list.
2. What is Openly Licensed? – Well, most of us are not lawyers, and the majority of us never pay attention to the term and conditions on a website (well, they are super long… so I can’t blame any of you for that). If you are confused, go to the Open definition which gives a one page overview on the subject.
Escape from Microsoft Word by Edward Mendelson is an interesting short post about writing in Microsoft Word compared to that old classic WordPerfect:
Intelligent writers can produce intelligent prose using almost any instrument, but the medium in which they write will always have some more or less subtle effect on their prose. Karl Popper famously denounced Platonic politics, and the resulting fantasies of a closed, unchanging society, in his book The Open Society and Its Enemies (1945). When I work in Word, for all its luxuriant menus and dazzling prowess, I can’t escape a faint sense of having entered a closed, rule-bound society. When I write in WordPerfect, with all its scruffy, low-tech simplicity, the world seems more open, a place where endings can’t be predicted, where freedom might be real.
But of course if the question is “Word or WordPerfect?” the answer is: Emacs. Everything is text.
A couple hours ago, I saw reports from Library Journal and The Digital Reader that Adobe has released version 4.0.1 of Adobe Digital Editions. This was something I had been waiting for, given the revelation that ADE 4.0 had been sending ebook reading data in the clear.
ADE 4.0.1 comes with a special addendum to Adobe’s privacy statement that makes the following assertions:
It enumerates the types of information that it is collecting.
It states that information is sent via HTTPS, which means that it is encrypted.
It states that no information is sent to Adobe on ebooks that do not have DRM applied to them.
It may collect and send information about ebooks that do have DRM.
It’s good to test such claims, so I upgraded to ADE 4.0.1 on my Windows 7 machine and my OS X laptop.
First, I did a quick check of strings in the ADE program itself — and found that it contained an instance of “https://adelogs.adobe.com/” rather than “http://adelogs.adobe.com/”. That was a good indication that ADE 4.0.1 was in fact going to use HTTPS to send ebook reading data to that server.
Next, I fired up Wireshark and started ADE. Each time it started, it contacted a server called adeactivate.adobe.com, presumably to verify that the DRM authorization was in good shape. I then opened and flipped through several ebooks that were already present in the ADE library, including one DRM ebook I had checked out from my local library.
So far, it didn’t send anything to adelogs.adobe.com. I then checked out another DRM ebook from the library (in this case, Seattle Public Library and its OverDrive subscription) and flipped through it. As it happens, it still didn’t send anything to Adobe’s logging server.
Finally, I used ADE to fulfill a DRM ePub download from Kobo. This time, after flipping through the book, it did send data to the logging server. I can confirm that it was sent using HTTPS, meaning that the contents of the message were encrypted.
To sum up, ADE 4.0.1’s behavior is consistent with Adobe’s claims – the data is no longer sent in the clear and a message was sent to the logging server only when I opened a new commercial DRM ePub. However, without decrypting the contents of that message, I cannot verify that it only information about that ebook from Kobo.
But even then… why should Adobe be logging that information about the Kobo book? I’m not aware that Kobo is doing anything fancy that requires knowledge of how many pages I read from a book I purchased from them but did not open in the Kobo native app. Have they actually asked Adobe to collect that information for them?
Another open question: why did opening the library ebook in ADE not trigger a message to the logging server? Is it because the fulfillmentType specified in the .acsm file was “loan” rather than “buy”? More clarity on exactly when ADE sends reading progress to its logging server would be good.
Finally, if we take the privacy statement at its word, ADE is not implementing a page synchronization feature as some, including myself, have speculated – at least not yet. Instead, Adobe is gathering this data to “share anonymous aggregated information with eBook providers to enable billing under the applicable pricing model”. However, another sentence in the statement is… interesting:
While some publishers and distributors may charge libraries and resellers for 30 days from the date of the download, others may follow a metered pricing model and charge them for the actual time you read the eBook.
In other words, if any libraries are using an ebook lending service that does have such a metered pricing model, and if ADE is sending reading progress information to an Adobe server for such ebooks, that seems like a violation of reader privacy. Even though the data is now encrypted, if an Adobe ID is used to authorize ADE, Adobe itself has personally identifying information about the library patron and what they’re reading.
Adobe appears to have closed a hole – but there are still important questions left open. Librarians need to continue pushing on this.
Winchester, MA Thomson Reuters hosted a CONVERIS Global User Group Meeting for current and prospective users in Hatton Garden, London, on October 1-2, 2014. About 40 attendees from the UK, Sweden, the Netherlands, European Institutions from other countries, and the University of Botswana met to discuss issues pertaining to Research Information Management Systems, the CONVERIS Roadmap, research analytics, and new features and functions being provided by CONVERIS (http://converis5.com).
Earlier this month I had the good fortune to attend the “Fonds & Bonds” one-day workshop, just ahead of the DC-2014 meeting in Austin, TX. The workshop was held at the Harry Ransom Center of the University of Texas, Austin, which was just the right venue. Eric Childress from OCLC Research and Ryan Hildebrand from the Harry Ransom Center did much of the logistical work, while my OCLC Research colleague Jen Schaffner worked with Daniel Pitti of the Institute for Advanced Technology in the Humanities, University of Virginia and Julianna Barrera-Gomez of the University of Texas at San Antonio to organize the workshop agenda and presentations.
Here are some brief notes on a few of the presentations that made a particular impression on me.
The introduction by Gavan McCarthy (Director of the eScholarship Research Centre (eSRC), University of Melbourne) and Daniel Pitti to the Expert Group on Archival Description (EGAD) included a brief tour of standards development, how this led to the formation of EGAD, and noted EGAD’s efforts to develop the conceptual model for Records in Context (RIC). Daniel very ably set this work within its standards-development context, which was a great way to help focus the discussion on the specific goals of EGAD.
Valentine Charles (of Europeana) and Kerstin Arnold (from the ArchivesPortal Europe APEx project) provided a very good tandem presentation on “Archival Hierarchy and the Europeana Data Model”, with Kerstin highlighting the work of Archives Portal Europe and the APEx project. It was both reaffirming and challenging to hear that it’s difficult to get developers to understand an unexpected data model, when they confront it through a SPARQL endpoint or through APIs. We’ve experienced that in our work as well, and continue to spend considerable efforts in attempting to meet the challenge.
Tim Thompson (Princeton University Library) and Mairelys Lemus-Rojas (University of Miama Libraries) gave an overview of the Remixing Archival Metadata Project (RAMP) project, which was also presented in an OCLC webinar earlier this year. RAMP is “a lightweight web-based editing tool that is intended to let users do two things: (1) generate enhanced authority records for creators of archival collections and (2) publish the content of those records as Wikipedia pages.” RAMP utilizes both VIAF and OCLC Research’s WorldCat Identities as it reconciles and enhances names for people and organizations.
Ethan Gruber (American Numismatic Society) gave an overview of the xEAC project (Ethan pronounces xEAC as “zeek”), which he also presented in the OCLC webinar noted previously in which Tim presented RAMP. xEAC is an open-source XForms-based application for creating and managing EAC-CPF collections. Ethan is terrific at delving deeply into the possibilities of the technology at hand, and making the complex appear straight-forward.
And Daniel Pitti gave a detailed presentation on the SNAC project. OCLC Research has supported this project from its early stages, providing access to NACO and VIAF authority data, and supplying the project with over 2M WorldCat records representing items and collections held by archival institutions … essentially the same data that supports most of OCLC Research’s ArchiveGrid project. The aspirations for the SNAC project are changing, moving from an experimental first phase where data from various sources was ingested, converted, and enriched to produce EAC-CPF records (with a prototype discovery layer on top of those), to the planning for a Cooperative Program which would transform that infrastructure into a sustainable international cooperative hosted by the U.S. National Archives and Records Administration. This is an ambitious and important effort that everyone in the community should be following.
The workshop was very well attended and richly informative. It provided a great way to quickly catch up on key developments and trends in the field. And the opportunity to easily network with colleagues in a congenial setting, including an hour to see a variety of systems demonstrated live, was also clearly appreciated.
Charlie Reisinger from the Penn Manor School District talked to us next about open source at his school. This was an expanded version of his lightning talk from the other night.
Penn Manor has 9 IT team members – which is a very lean staff for 4500 devices. They also do a lot of their technology in house.
Before we talk about open source we took a tangent in to the nature of education today. School districts are so stuck on the model they’re using and have used for centuries. But today kids can learn anything they would like with a simple connection to the Internet. You can be connected to the most brilliant minds that you’d like. Teachers are no longer the fountains of all knowledge. The classroom hasn’t been transformed by technology – if you walked in to a classroom 60 years ago it would look pretty much like a classroom today.
In schools that do allow students to have laptops they lock them down. This is a terrible model for student inquiry. The reason most of us are here today is because we had a system growing up that we could get in to and try to break/fix/hack.
So what is Penn Manor doing differently? First off they’re doing everything with open source. They use Koha, Moodle, Linux, WordPress, Ubuntu, OwnCloud, SIPfoundry and VirtualBox.
This came to them partially out of fiscal necessity. When Apple discontinued the white macbook the school was stuck in a situation where they needed to replace these laptops with some sort of affordable device. Using data they collected from the students laptops they found that students spent most of their time on their laptops in the browser or in a word processor so they decided to install Linux on laptops. Ubuntu was the choice because the state level testing would work on that operating systems.
This worked in elementary, but they needed to scale it up to the high schools which was much harder because each course needed different/specific software. They needed to decide if they could provide a laptop for every student.
The real guiding force in decided to provide one laptop per student was the English department. They said that they needed the best writing device that could be given to them. This knocked out the possibility of giving tablets to all students – instead a laptop allows for this need. Not only did they give all students laptops with Linux installed – they gave them all root access. This required trust! They created policies and told the students they trusted them to use the laptops as responsible learners. How’s that working out? Charlie has had 0 discipline issues associated with that. Now, if they get in to a jam where they screwed up the computer – maybe this isn’t such a bad thing because now they have to learn to fix their mistake.
They started this as a pilot program for 90 of their online students before deploying to all 1700 students. These computers include not just productivity software, but Steam! That got the kids attention. When they deployed to everyone though, Steam came off the computers, but the kids knew it was possible so it forced them to figure out how to install it on Linux which is not always self explanatory. This prodded the kids in to learning.
Charlie mentioned that he probably couldn’t have done this 5 years ago because the apps that are available today are so dense and so rich.
There was also the issue of training the staff on the change in software, but also in having all the kids with laptops. This included some training of the parents as well.
Along with the program they created a help desk program as a 4 credit honors level course as independent study for the high school students. They spent the whole time supporting the one to one program (one laptop per student). These students helped with the unpacking, inventorying, and the imaging (github.com/pennmanor/FLDT built by one of the students) of the laptops over 2 days. The key to the program is that the students were treated as equals. This program was was picked up and talked about on Linux.com.
Charlie’s favorite moment of the whole program was watching his students train their peers on how to use these laptops.
Too many people ask what is the future of libraries and not what “should the future be”. A book that we must read is “Expect More: Demanding Better Libraries For Today’s Complex World“. If we don’t expect more of libraries we’re not going to see libraries change. We have to change the frame of mind that libraries belong the directors – they actually belong to the people and they should be serving the people.
Phil asks how we get some community participate in managing libraries. Start looking at your library’s collection and see if there is at least 1% of the collection in the STEM arena. Should that percent be more? 5%, 10%, more? There is no real answer here, but maybe we need to make a suggestion to our libraries. Maybe instead our funds should go to empower the community more in the technology arena. Maybe we should have co-working space in our library – this can be fee based even – could be something like $30/mo. That would be a way for libraries to help the unemployed and the community as a whole.
Libraries are about so much more than books. People head to the library because they’re wondering about something – so having people who have practical skills on your staff is invaluable. Instead of pointing people to the books on the topic, having someone for them to talk to is a value added service. What are our competitors going to be doing while we’re waiting for the transition from analog to digital to happen in libraries. We need to set some milestones for all libraries. Right now it’s only the wealthy libraries that seem to be moving in this way.
A lot of the suggestions Phil had I’ve seen some of the bigger libraries in the US doing like hosting TED Talks, offering digital issues lectures, etc. You could also invite kids in there to talk about what they know/have learned.
Phil’s quote: “The library fulfills its promise when people of different ages, races, and cultures come together to pool their talents in creating new creative content.” One thing to think about is whether this change from analog to digital can happen in libraries without changing their names. Instead we could call them the digital commons [I'm not sure this is necessary - I see Phil's point - but I think we need to just rebrand libraries and market them properly and keep their name.]
Some awesome libraries include Chattanooga Public Library which has their 4th floor makerspace. In Colorado there are the Anythink Libraries. The Delaware Department of Libraries is creating a new makerspace.
Books are just one of the tools toward helping libraries enhance human dignity – there are so many other ways we can do this.
Phil showed us a video of his:
You can bend the universe by asking questions – so call your library and ask questions about open source or about new technologies so that we plant the seeds of change.
I was so happy with my new Lamy fountain pen that I drew a second version of my homework assignment: one using my favorite Arthur and Fietje Precies characters. Filed under: Comics, Doodles Tagged: cartoon, cat, christmas, doodle, fondue,
As second assignment we needed to draw some fantasy image. Preferably using some meta story inside the story. I was drawing monsters the whole week during my commute so I used these drawings as inspiration Filed under: Comics Tagged: cartoon,
Next up at All Things Open was Karen Borchert talking about How ‘Open’ Changes Products.
We started by talking about the open product conundrum. There is a thing that happens when we think about creating products in an open world. In order to understand this we must first understand what a product is. A product is a good, idea, method, information or service that we want to distribute. In open source we think differently about this. We think more about tools and toolkits instead of packages products because these things are more conducive to contribution and extension. With ‘open’ products work a bit more like Ikea – you have all the right pieces and instructions but you have to make something out of it – a table or chair or whatever. Ikea products are toolkits to make things. When we’re talking about software most buyers are thinking what they get out of the box so a toolkit is not a product to our consumers.
Open Atrium is a product that Phase2 produces and people say a lot about it like “It’s an intranet in a box” – but in reality it’s a toolkit. People use it a lot of different ways – some do what you’d expect them to do, others make it completely different. This is the great thing about open source – this causes a problem for us though in open source – because in Karen’s example a table != a bike. “The very thing that makes open source awesome is what makes our product hard to define.”
Defining a product in the open arena is simple – “Making an open source product is about doing what’s needed to start solving a customer problem on day 1.” Why are we even going down this road? Why are we creating products? Making something that is useable out of the box is what people are demanding. They also provide a different opportunity for revenue and profit.
This comes down to three things:
Understanding the value
Understanding the market
Understanding your business model
Adding value to open source is having something that someone who knows better than me put together. If you have an apple you have all you need to grow your own apples, but you’re not going to both to do that. You’d rather (or most people would rather) leave that to the expert – the farmer. Just because anyone can take the toolkit and build whatever they want with it that they will.
Markets are hard for us in open source because we have two markets – one that gives the product credibility and one that makes money – and often these aren’t the same market. Most of the time the community isn’t paying you for the product – they are usually other developers or people using it to sell to their clients. You need this market because you do benefit from it even if it’s not financially. You also need to work about the people who will pay you for the product and services. You have to invest in both markets to help your product succeed.
Business models include the ability to have two licenses – two versions of the product. There is a model around paid plugins or themes to enhance a product. And sometimes you see services built around the product. These are not all of the business models, but they are a few of the options. People buy many things in open products: themes, hosting, training, content, etc.
What about services? Services can be really important in any business model. You don’t have to deliver a completely custom set of services every time you deliver. It’s not less of a product because it’s centered around services.
Questions people ask?
Is it going to be expensive to deal with an open source product? Not necessarily but it’s not going to be free. We need to plan and budget properly and invest properly.
Am I going to make money on my product this year? Maybe – but you shouldn’t count on it. Don’t bet the farm on your product business until you’ve tested the market.
Everyone charges $10/mo for this so I’m just going to charge that – is that cool? Nope! You need to charge what the product is worth and what people will pay for it and what you can afford to sell it for. Think about your ROI.
I’m not sure we want to be a products company. It’s very hard to be a product company without buy in. A lot of service companies ask this. Consider instead a pilot program and set a budget to test out this new model. Write a business plan.
Over lunch today we had a panel of 6 women in open source talk to us.
The first question was about their earlier days – what made them interested in open source or computer science or all of it.
Megan started in humanities and then just stumbled in to computer programming. Once she got in to it she really enjoyed it though. Elizabeth got involved with Linux through a boyfriend early on. She really fell in love with Linux because she was able to do anything she wanted with it. She joined the local Linux users group and they were really supportive and never really made a big deal about the fact that she was a woman. Her first task in the open source world was writing documentation (which was really hard) but from there her career grew. Erica has been involved in technology all her life (which she blames her brother for). When she went to school, she wanted to be creative and study arts, but her father gave her the real life speech and she realized that computer programming let her be creative and practical at the same time. Estelle started by studying architecture which was more sexist than her computer science program – toward the end of her college career she found that she was teaching people to use their computers. Karen was always the geekiest person she knew growing up – and her father really encouraged her. She went to engineering school and it wasn’t until she set up her Unix account at the college computer center. She got passionate in open source because of the pacemaker she needs to live – she realized that the entire system is completely proprietary and started thinking about the implications of that.
The career path
Estelle has noticed in the open source world that the men she knows on her level work for big corporations where as the women are working for themselves. This was because there aren’t as many options to move up the ladder. Now as for why she picked the career she picked it was because her parents were sexist and she wanted to piss them off! Elizabeth noticed that a lot of women get involved in open source because they’re recruited in to a volunteer organization. She also notices that more women are being paid to work on open source whereas men are doing it for fun more. Megan had never been interviewed by or worked for a woman until she joined academia. Erica noticed that the career path of women she has met is more convoluted than that of the men she has met. The men take computer science classes and then go in to the field, women however didn’t always know that these opportunities were available to them originally. Karen sees that women who are junior have to work a lot harder – they have to justify their work more often [this is something I totally had to deal with in the past]. Women in these fields get so tired because it’s so much work – so they move on to do something else. Erica says this is partially why she has gone to work for herself because she gets to push forward her own ideas. Megan says that there are a lot of factors that are involved in this problem – it’s not just one thing.
Is diversity important in technology?
Erica feels that if you’re building software for people you need ‘people’ not just one type of person working on the project. Megan says that a variety perspectives is necessary. Estelle says that because women often follow a different path to technology it adds even more diversity than just gender [I for example got in to the field because of my literature degree and the fact that I could write content for the website]. It’s also important to note that diversity isn’t just about gender – but so much more. Karen pointed out that even at 20 months old we’re teaching girls and boys differently – we start teaching boys math and problem solving earlier and we help the girls for longer. This reinforces the gender roles we see today. Elizabeth feels that diversity is needed to engage more talent in general.
What can we do to change the tide?
Megan likes to provide a variety in the types of problems she provides in her classes, with a variety of approaches so that it hits a variety of students instead of alienating those who don’t learn the way she’s teaching. Karen wants us to help women from being overlooked. When a woman make a suggestion acknowledge it – also stop people from interrupting women (because we are interrupted more). Don’t just repeat what the woman says but amplify it. Estelle brings up an example from SurveyMonkey – they have a mentorship program and also offer you to take off when you need to (very good for parents). Erica tries to get to youth before the preconceptions form that technology is for boys. One of the things she noticed was that language matters as well – telling girls you’re going to teach them to code turns them off, but saying we’re going to create apps gets them excited. Elizabeth echoed the language issue – a lot of the job ads are geared toward men as well. Editing your job ads will actually attract more women.
What have you done in your career that you’re most proud of?
Estelle’s example is not related to technology – it was an organization called POWER that was meant to help students who were very likely to have a child before graduation – graduate without before becoming a parent. It didn’t matter what what field they went in to – just that the finished high school. Erica is proud that she has a background that lets her mentor so many people. Elizabeth wrote a book! It was on her bucket list and now she has a second book in the works. It was something she never thought she could do and she did. She also said that it feels great to be a mentor to other women. Megan is just super proud of her students and watching them grow up and get jobs and be successful. Karen is mostly proud of the fact that she was able to turn something that was so scary (her heart condition) in to a way to articulate that free software is so important. She loves hearing others tell her story to other people to explain why freedom in software is so important.
This post is part of our Open Access Week blog series to highlight great work in Open Access communities around the world. It is written by Alma Swan, Director of Key Perspectives Ltd, Director of Advocacy forSPARC Europe, and Convenor for Enabling Open Scholarship.
Whither the humanities in a world moving inexorably to open values in research? There has been much discussion and debate on this issue of late. It has tended to focus on two matters – the sustainability of humanities journals and the problem(s) of the monograph. Neither of these things is a novel topic for consideration or discussion, but nor have solutions been found that are satisfactory to all the key stakeholders, so the debate goes on.
While it does, some significant developments have been happening, not behind the scenes as such but in a quiet way nevertheless. New publishers are emerging in the humanities that are offering different ways of doing things and demonstrating that Open Access and the humanities are not mutually exclusive.
These publishers are scholar-led or are academy-based (university presses or similar). Their mission is to offer dissemination channels that are Open, viable and sustainable. They don’t frighten the horses in terms of trying to change too much, too fast: they have left the traditional models of peer review practice and the traditional shape and form of outputs in place. But they are quietly and competently providing Open Access to humanities research. What’s more, they understand the concerns, fears and some bewilderment of humanities scholars trying to sort out what the imperative for Open Access means to them and how to go about playing their part. They understand because they are of and from the humanities community themselves.
The debate about OA within this community has been particularly vociferous in the UK in the wake of the contentious Finch Report and the policy of the UK’s Research Councils. Fortuitously, the UK is blessed with some great innovators in the humanities, and many of the new publishing operations are also UK-based. This offers a great opportunity to show off these some new initiatives and help to reassure UK humanities authors at the same time. So SPARC Europe, with funding support from the Open Society Foundations, is now endeavouring to bring these new publishers together with members of the UK’s humanities community.
We are hosting a Roadshow comprising six separate events in different cities round England and Scotland. At each event there are short presentations by representatives of the new publishers and from a humanities scholar who can give the research practitioner perspective on Open Access. After the presentations, the publishers are available in a small exhibition area to display their publications and talk about their publishing programmes, their business models and their plans for the future.
The publishers taking part in the Roadshow are Open Book Publishers, Open Library of the Humanities, Open Humanities Press and Ubiquity Press. In addition, the two innovative initiatives OAPEN and Knowledge Unlatched are also participating. The stories from these organisations are interesting and compelling, and present a new vision of the future of publishing in the humanities.
Humanities scholars from all higher education institutions in the locality of each event are warmly invited to come along to the local Roadshow session. The cities we are visiting are Leeds, Manchester, London, Coventry, Glasgow and St Andrews. The full programme is available here.
We will assess the impact of these events and may send the Roadshow out again to new venues next year if they prove to be successful. If you cannot attend but would like further information on the publishing programmes described here, or would like to suggest other venues the Roadshow might visit, please contact me at email@example.com
A bit of background: from October through November of 2013, a team of National Digital Stewardship Alliance members, led by the Content Working Group, conducted a survey of institutions in the United States that are actively involved in, or planning to start, programs to archive content from the web. This survey built upon a similar survey undertaken by the NDSA in late 2011 and published online in June of 2012. Results from the 2011-2012 NDSA Web Archiving Survey were first detailed in May 2, 2012 in “Web Archiving Arrives: Results from the NDSA Web Archiving Survey” on The Signal, and the full report (PDF) was released in July 2012.
The goal of the survey was to better understand the landscape of web archiving activities in the U.S. by investigating the organizations involved, the history and scope of their web archiving programs, the types of web content being preserved, the tools and services being used, access and discovery services being provided and overall policies related to web archiving programs. While this survey documents the current state of U.S. web archiving initiatives, comparison with the results of the 2011-2012 survey enables an analysis of emerging trends. The report therefore describes the current state of the field, tracks the evolution of the field over the last few years, and forecasts future activities and developments.
The survey consisted of twenty-seven questions (PDF) organized around five distinct topic areas: background information about the respondent’s organization; details regarding the current state of their web archiving program; tools and services used by their program; access and discovery systems and approaches; and program policies involving capture, availability and types of web content. The survey was started 109 times and completed 92 times for an 84% completion rate. The 92 completed responses represented an increase of 19% in the number of respondents compared with the 77 completed responses for the 2011 survey.
Overall, the survey results suggest that web archiving programs nationally are both maturing and converging on common sets of practices. The results highlight challenges and opportunities that are, or could be, important areas of focus for the web archiving community, such as opportunities for more collaborative web archiving projects. We learned that respondents are highly focused on the data volume associated with their web archiving activity and its implications on cost and the usage of their web archives.
Based on the results of the survey, cost modeling, more efficient data capture, storage de-duplication, and anything that promotes web archive usage and/or measurement would be worthwhile investments by the community. Unsurprisingly, respondents continue to be most concerned about their ability to archive social media, databases and video. The research, development and technical experimentation necessary to advance the archiving tools on these fronts will not come from the majority of web archiving organizations with their fractional staff time commitments; this seems like a key area of investment for external service providers.
We hope you find the full report interesting and useful, whether you are just starting out developing a web archiving program, have been active in this area for years, or are just interested in learning more about the state of web archiving in the United States.
Steven Vaughan-Nichols was up to talk to us about open source, marketing and using the press.
Before Steven was a journalist he was a techie. This makes him unusual as a journalist who actually gets technology. Steven is here to tell us that marketing is a big part of your job if you want a successful open source company. He has heard a lot of people saying that marketing isn’t necessary anymore. The reason it’s necessary is because writing great code is not enough – if no one else knows about it it doesn’t matter. You need to talk with people about the project to make it a success.
We like to talk about open source being a meritocracy – that’s not 100% true – the meritocracy is the ideal or a convenient fiction. The meritocracy is only part of the story – it’s not just about your programming it’s about getting the right words to the right people so that they know about your project. You need marketing for this reason.
Any successful project needs 2 things – 1 you already know – is that it solves a problem that needs a solution – the other part is that it must be able to convince a significant number of people that your project is the solution to their problem. One problem open source has is that they confuse open source with the community – they are not the same thing. Marketing is getting info about your project to the world. The community is used for defining what the project really is.
Peter Drucker, says “The aim of marketing is to know and understand the customer so well the product or service fits him and sells itself.” Knowing the customer better than they know themselves is not an easy job – but it’s necessary to market/sell your product/service. If your project doesn’t fit the needs of your audience then it won’t go anywhere.
David Packard: “Marketing is too important to be left to the marketing department” – and it really is. There is a tendency to see marketing as a separate thing. Marketing should not be a separate thing – it should be honest about what you do and it should be the process of getting that message to the world. Each person who works on the project (or for the company) is a representative of your product – we are always presenting out product to the world (you might not like it – but it’s true). If your name is attached to a project/company then people are going to be watching you. You need to avoid zinging competing products and portray a positive image about you and your product. Even if you’re not thinking about what you’re saying as marketing it is.
Branding is another thing that open source projects don’t always think this through enough – they think this is trivial. Branding actually does matter! What images and words and name you use to describe your product matter. These will become the shorthand that people see your project as. For example if you see the Apply logo you know what it’s about. In our world of open source there is the Red Hat shadow man – whenever you see that image you know that means Red Hat and all the associations you have with that. You can use that association in your marketing. People might not know what Firefox is (yes there are people who don’t know) but they do recognize the cute little logo.
You can no longer talk just on IRC or online, you have to get out there. You need to go to conferences and make speeches and get the word out to people. And always remember to invite people to participate because this is open source. You have to make an active network and get away from the keyboard and talk to people to get the word out there. At this point you need to start thinking about talking to people from the press.
One thing to say to people, to the press, is a statement that will catch on – a catch phrase that will reach the audience you want to reach. The press are the people to talk to the world at large. These are people who are talking to the broader world – talking to people at opensource.com and other tech sites is great – but if you want to make the next leap you need to get to these type of people. Don’t assume that the press you’re talking to don’t know what you’re talking about – but just because they happen to like open source or what you’re talking about – it does not mean that they will write only positive things. The press are critics – they’re not really on your side – even if they like you they won’t just talk your products up. You need to understand that going in.
Having said all that – you do need to talk to the press at some point. And when you do, you need to be aware of a few things. Never ever call the press – they are always on perpetual deadline – you can’t go wrong with email though. When you do send an email be sure to remember to cover a few important things: tell then what you’re doing, tell them what’s new (they don’t care that you have a new employee – they might care if a bigwig quits or is fired), get your message straight (if you don’t know what you’re doing then the press can’t figure it out), and hit it fast (tell them in the first line what you’re doing, who your audience is and why the world should care). Be sure to give the name of someone they can call and email for more info – this can’t be emphasized enough – so often Steven has gotten press releases without contact info on them. Put the info on your website – make sure that there is always a contact in your company for the press. Remember if your project is pretty to send screenshots – this will save the press a lot of time in installing and getting the right images. Steven says “You need to spoon feed us”.
You also want to be sure to know what the press person you’re contacting writes about – do your homework – don’t contact them with your press release if it’s not something they write about. Also be sure to speak in a language that the person you’re talking to will understand [I know I always shy away from OPAC and ILS when talking to the press]. Not everyone you’re talking to has experience in technology. Don’t talk down to the press, just be sure to talk to the person in words they understand. Very carefully craft your message – be sure to give people context and tell them why they should care – if you can’t tell them that there they can’t tell anyone else your story.
Final points – remember to be sweet and charming when talking to the press. When they say something that bothers you, don’t insult the press. If you alienate the press they will remember. In the end the press has more ink/pixels than you do – their words will have a longer reach than you do. If the press completely misrepresents you be sure to send a polite note to the person explaining what was wrong – without using the word ‘wrong’. Be firm, but be polite.
Last month I was finally able to post about Facebook's cold storage technology. Now, Subramanian Muralidhar and a team from Facebook, USC and Princeton have a paper at OSDI that describes the warm layer between the two cold storage layers and Haystack, the hot storage layer. f4: Facebook's Warm BLOB Storage System is perhaps less directly aimed at long-term preservation, but the paper is full of interesting information. You should read it, but below the fold I relate some details.
A BLOB is a Binary Large OBject. Each type of BLOB contains a single type of immutable binary content, such as photos, videos, documents, etc. Section 3 of the paper is a detailed discussion of the behavior of BLOBs of different kinds in Facebook's storage system.
Figure 3 shows that the rate of I/O requests to BLOBs drops rapidly through time. The rates for different types of BLOB drop differently, but all 9 types have dropped by 2 orders of magnitude within 8 months, and all but 1 (profile photos) have dropped by an order of magnitude within the first week.
The vast majority of Facebook's BLOBs are warm, as shown in Figure 5 - notice the scale goes from 80-100%. Thus the vast majority of the BLOBs generate I/O rates at least 2 orders of magnitude less than recently generated BLOBs.
a good deal of previous meetings was a dialog of the deaf. People doing preservation said "what I care about is the cost of storing data for the long term". Vendors said "look at how fast my shiny new hardware can access your data". ... The interesting thing at this meeting is that even vendors are talking about the cost.
This year's meeting was much more cost-focused. The Facebook data make two really strong cases in this direction:
That significant kinds of data should be moved from expensive, high-performance hot storage to cheaper warm and then cold storage as rapidly as feasible.
That the I/O rate that warm storage should be designed to sustain is so different from that of hot storage, at least 2 and often many more orders of magnitude, that attempting to re-use hot storage technology for warm and even worse for cold storage is futile.
This is good, because hot storage will be high-performance flash or other solid state memory and, as I and others have been pointing out for some time, there isn't going to be enough of it to go around.
Haystack uses RAID-6 and replicates data across three data centers, using 3.6 times as much storage as the raw data. f4 uses two fault-tolerance techniques:
Within a data center it uses erasure coding with 10 data blocks and 4 parity blocks. Careful layout of the blocks ensures that the data is resilient to drive, host and rack failures at an effective replication factor of 1.4.
Between data centers it uses XOR coding. Each block is paired with a different block in another data center, and the XOR of the two blocks stored in a third. If any one of the three data centers fails, both paired blocks can be restored from the other two.
The result is fault-tolerance to drive, host, rack and data center failures at an effective replication factor of 2.1, reducing overall storage demand from Haystack's factor of 3.6 by nearly 42% for the vast bulk of Facebook's BLOBs. When fully deployed, this will save 87PB of storage. Erasure-coding everything except the hot storage layer seems economically essential.
Another point worth noting that the paper makes relates to heterogeneity as a way of avoiding correlated failures:
We recently learned about the importance of heterogeneity in the underlying hardware for f4 when a crop of disks started failing at a higher rate than normal. In addition, one of our regions experienced higher than average temperatures that exacerbated the failure rate of the bad disks. This combination of bad disks and high temperatures resulted in an increase from the normal ~1% AFR to an AFR over 60% for a period of weeks. Fortunately, the high-failure-rate disks were constrained to a single cell and there was no data loss because the buddy and XOR blocks were in other cells with lower temperatures that were unaffected.
DeLisa Alexander from Red Hat was up next to talk to us about women in open source.
How many of you knew that the first computer – the ENIAC was programmed by women mathematicians? DeLisa is here to share with us a passion for open source and transparency – and something similarly important – diversity.
Why does diversity matter? Throughout history we have been able to innovate our way out of all kinds of problems. In the future we’re going to have to do this faster than ever before. Diversity of thoughts, theories and views is critical to this process. It’s not just “good” to think about diversity, it’s important to innovation and for solving problems for quickly.
Why are we having so much trouble finding talent? 47% of the workforce is made up of women but only 12% are getting computer and information science degrees – and only 1-5% of open source contributors are women. How much faster could we solve the world’s big problems with the other 1/2 of the population were participating? We need to be part of this process.
When you meet a woman who is successful in technology – there is usually one person who mentored her (man or woman) to feel positive about her path – we could be that voice for a girl or woman that we know. Another thing that we can do is help our kids understand what is going on and what opportunities there are. Kids today don’t think about the fact that the games they’re playing were developed by a human – they just think that computers magically have software on them. They have no clue that someone had to design the hardware and program the software [I actually had someone ask me once what 'software' was - the hardest question I've ever had to answer!].
The challenge for us is to decide on one person that we’re going to try and influence to stay in the field, join the field, nominate for an award. If each of us do this one thing, next year this room could be filled with 50% women.
James Pearce from Facebook started off day 2 at All Things Open with his talk about open source at Facebook.
James started by playing a piece of music for us that was only ever heard in the Vatican until Mozart as a boy wrote down the music he heard and shared it with the world. This is what open source is like. Getting beautiful content out to the world. Being open trumps secrecy. At Facebook they have 211 open source projects – nearly all on Github with about 21 thousand forks and over 10 million lines of codes. In addition to software Facebook also open sources their hardware. Open source has always been part of the Facebook culture since day 1. The difference is that now that Facebook is so large they are much more capable of committing to share via open source.
Here’s the thing people forget about open source – open source is a chance to open the windows on what you’re doing – “Open source is like a breeze from an open window”. By using open source it means they have to think things through more and it means they’re doing a better job on their coding. Facebook however was not always so dedicated to open source – if you looked at their Github account a few years ago you were see a lot of unsupported projects or undocumented projects. “The problem if you throw something over the wall and don’t care about it it’s worth than not sharing it at all”. About a year ago Facebook decided to get their open source house in order.
The first thing they needed to do was find out what they owned and what was out there – which projects were doing well and which were doing badly. The good news was that they were able to use Github’s API to gather all this information and put it in to a database. They then make all this data available via the company intranet so that everyone can see what the status of things is. Once of the nice side effects of sharing this info and linking an employee to each project is that it gamifies things. The graphs can we used to make the teams play off each other. Using things like Github stars and forks they compete to see who is more popular. Why they’re not optimizing on the number of stars, but it does make things fun and keeps people paying attention to their projects.
Also using the data they were able to clean up their “social debt” – they had some pull requests that were over a year old with no response. This gets them thinking about the community health of these projects. They think about the depth of a project, how they’re going to be used and how they’re going to continue on. Sometimes the things they release are just a read only type thing. Other times they will have forked something and will have a stated goal to upstream it to the original project. Sometimes a project is no longer a Facebook specific project. Sometimes Facebook will deprecate a project – this happens with a project that is ‘done’ or is of no longer of use to anyone. Finally they have in the past rebooted a project when upstreaming was not an option.
After giving talks like this James finds that lots of people approach him to talk about their solutions and find that they’re all coming up with the same solutions and reinventing the wheel. So these groups have come together with the idea of pooling their resources and sharing. This was the way TODO started. This is not a Facebook initiative – they’re just one of 13 members who are keen to contribute and share what they learned. This group is thinking about a lot of challenges like why using open source in the first place, what are the policies for launching a new project, licenses, how to interact with communities, what are the metrics to measure the success of a project, etc etc. What they hope to do is start up conversations around these topics and publish these as blogposts.
Librarians should have a role in promoting open access content. The best methods and whether they are successful is a matter of heated debate. Take for an example a recent post by Micah Vandergrift on the ACRL Scholarly Communications mailing list, calling on librarians to stage a publishing walkout and only publish in open access library and information science journals. Many have already done so. Others, like myself, have published in traditional journals (only once in my case) but make a point of making their work available in institutional repositories. I personally would not publish in a journal that did not allow such use of my work, and I know many who feel the same way. 1 The point is, of course, to ensure that librarians are not be hypocritical in their own publishing and their use of repositories to provide open access–a long-standing problem pointed out by Dorothea Salo [2.Salo, Dorothea. “Innkeeper at the Roach Motel,” December 11, 2007. http://digital.library.wisc.edu/1793/22088.], among others2 We know that many of the reasons that faculty may hesitate to participate in open access publishing relate to promotion and tenure requirements, which generally are more flexible for academic librarians (though not in all cases–see Abigail Goben’s open access tenure experiment). I suspect that many of the reasons librarians aren’t participating more in open access has partly to do with more mundane reasons of forgetting to do so, or fearing that work is not good enough to make public.
But it shouldn’t be only staunch advocates of open access, open peer review, or new digital models for work and publishing who are participating. We have to find ways to advocate and educate in a gentle but vigorous manner, and reach out to new faculty and graduate students who need to start participating now if the future will be different. Enter Open Access Week, a now eight-year-old celebration of open access organized by SPARC. Just as Black Friday is the day that retailers hope to be in the black, Open Access Week has become an occasion to organize around and finally share our message with willing ears. Right?
It can be, but it requires a good deal of institutional dedication to make it happen. At my institution, Open Access Week is a big deal. I am co-chair of a new Scholarly Communications committee which is now responsible for planning the week (the committee used to just plan the week, but the scope has been extended). The committee has representation from Systems, Reference, Access Services, and the Information Commons, and so we are able to touch on all aspects of open access. Last year we had events five days out of five; this year we are having events four days out of five. Here are some of the approaches we are taking to creating successful conversations around open access.
Focus on the successes and the impact of your faculty, whether or not they are publishing in open access journals.
The annual Celebration of Faculty Scholarship takes place during Open Access Week, and brings together physical material published by all faculty at a cocktail reception. We obtain copies of articles and purchase books written by faculty, and set up laptops to display digital projects. This is a great opportunity to find out exactly what our faculty are working on, and get a sense of them as researchers that we may normally lack. It’s also a great opportunity to introduce the concept of open access and recruit participants to the institutional repository.
Highlight the particular achievements of faculty who are participating in open access.
We place stickers on materials at the Celebration that are included in the repository or are published in open access journals. This year we held a panel with faculty and graduate students who participate in open access publishing to discuss their experiences, both positive and negative.
Demonstrate the value the library adds to open access initiatives.
Recently bepress (which creates the Digital Commons repositories on which ours runs) introduced a real time map of repositories downloads that was a huge hit this year. It was a compelling visual illustration of the global impact of work in the repository. Faculty were thrilled to see their work being read across the world, and it helped to solve the problem of invisible impact. We also highlighted our impact with a new handout that lists key metrics around our repository, including hosting a new open access journal.
Talk about the hard issues in open access and the controversies surrounding it, for instance, CC-BY vs. CC-NC-ND licenses.
It’s important to not sugarcoat or spin challenging issues in open access. It’s important to include multiple perspectives and invite difficult conversations. Show scholars the evidence and let them draw their own conclusions, though make sure to step in and correct misunderstandings.
Educate about copyright and fair use, over and over again.
These issues are complicated even for people who work on them every day, and are constantly changing. Workshops, handouts, and consultation on copyright and fair use can help people feel more comfortable in the classroom and participating in open access.
Make it easy.
Examine what you are asking people to do to participate in open access. Rearrange workflows, cut red tape, and improve interfaces. Open Access Week is a good time to introduce new ideas, but this should be happening all year long.
We can’t expect revolutions in policy and and practice to happen overnight, or without some sacrifice. Whether you choose to make your stand to only publish in open access journals or some other path, make your stand and help others who wish to do the same.
Publishers have caught on to this tendency in librarians. For instance, Taylor and Francis has 12-18 month repository embargoes for all its journals except LIS journals. Whether this is because of the good work we have done in advocacy or a conciliatory gesture remains up for debate. ↩
Xia, Jingfeng, Sara Kay Wilhoite, and Rebekah Lynette Myers. “A ‘librarian-LIS Faculty’ Divide in Open Access Practice.” Journal of Documentation 67, no. 5 (September 6, 2011): 791–805. doi:10.1108/00220411111164673. ↩
For a long time the only free (i'm unaware of commercial ones) implementation of a web archival replay software has been the Wayback Machine (now Openwayback). It's a stable and mature software, with a strong community behind.
To use it you need to be confident with the deploy of a java web application; not so difficult, and documentation is exaustive.
But there is a new player in the game, pywb, developed by Ilya Kramer, a former Internet Archive developer.
Built in python, relatively simpler than wayback, and now used in a pro archiving project at Rhizome.
In addition to taunting me with the questions, and ridiculing all my “Um”s and “Uhh”s as a struggle to answer them, the Panel members will be awarding prizes to the folks who have submitted the question that do the best job of “Stumping” me. Questions can be submitted to our panel via firstname.lastname@example.org any time until the day of the session. Even if you won’t be able to attend the conference, you can still participate — and do your part to humiliate me — by submitting your tricky questions.
Doug Cutting from Cloudera gave our closing keynote on day 1.
Hadoop started a revolution. It is an open source platform that really harnesses data.
In movies the people who harness the data are always the bad guys – so how do we save ourselves from becoming the bad guy? What good is coming out of good data?
Education! The better data we have the better our education system can be. Education will be much better if we can have a custom experience for each student – these kinds of observations are fed by data. If we’re going to make this happen we’re going to need to study data about these students. The more data you amass the better predictions you can make. On the flip side it’s scary to collect data about kids. inBloom was an effort to collect this data, but they ended up shutting down because of the fear. There is a lot of benefit to be had, and it would be sad if we didn’t enable this type of application.
Heathcare is another area this becomes handy. Medical research benefits greatly from data. The better data we collect the better we can care for people. Once again this is an area that people have fears about shared data.
Climate is the last example. Climate is changing and in order to understand how we can effect it data plays a huge role. Data about our energy consumption is part of this. Some people say that certain data is not useful to collect – but this isn’t a good approach. We want to collect all the data and then evaluate it. You don’t know in advance what value the data you collect will have.
How do we collect this data if we don’t have trust? How do we build that trust? There are some technology solutions like encrypting data and anonymizing data sets – these methods are imperfect though. In fact if you anonymize the data too much it muddies it and makes it less useful. This isn’t just a technical problem – instead we need to build trust.
The first way to build trust is to be transparent. If you’re collecting data you need to let people know you’re collecting it and what you’re going to use it for.
The next key element is establishing best practices around data. These are the technical elements like encryption and anonymization. This also includes language to agree/disagree to ways our data is shared.
Next we need to draw clear lines that people can’t step over – for example we can’t show someone’s home address without their express permission. Which gives us a basis for the last element.
Enforcement and oversight is needed. We need someone who is checking up on these organizations that are collecting data. Regulation can sound scary to people, but we have come to trust it in many markets already.
This is not just a local issue – it needs to be a global effort. As professionals in this industry we need to think about how to build this trust and get to the point where data can be stored and shared.
Marcus Hanwell, another fellow opensource.com moderator, was the last session of the day with his talk about saving the world with open source and open science!
In science there was a strong ethic of ‘trust, but verify’ – and if you couldn’t reproduce the efforts of the scientist then the theory was dismissed. The ‘but verify’ part of that has kind of gone away in recent years. In science the primary measure of whether you were successful or not was to publish – citations to your work are key. Then when you do publish your content is locked down in costly journals instead of available in the public domain. So if you pay large amounts of money you can have access to the article – but not the data necessarily. Data is kept locked up more and more to keep the findings with the published person so that they get all the credit.
Just like in the talk earlier today on what Academia can learn from open source Marcus showed us an article from the 17th century next to an article today – the method of publishing has not changed. Plus these articles are full of academese which is obtuse.
All of this makes it very important to show what’s in the black box. We need to show what’s going on in these experiments at all levels. This includes sharing your steps to run calculations – the source code used to get this info should be written in open source because now the tools used are basically notebooks with no version control system. We have to stop putting scientists on these pedestals and start to hold them accountable.
A great quote that Marcus shared from an Economist article was: “Scientific research has changed the world. Now it needs to change itself.” Another was “Publishing research without data is simply advertising, not science.” Scientists need to think more about licenses – they give their rights away to journals because they don’t pay enough attention to the licenses that are out there like the creative commons.
What is open? How do we change these behaviors? Open means that everyone has the same access. Certain basic rights are granted to all – the ability to share, modify and use the information. There is a fear out there that sharing our data means that we could prove that we’re wrong or stupid. We need to change this culture. We need more open data (shared in open formats) and using open source software, more open standards and open access.
We need to push boundaries – most of what is published in publicly funded so it should be open and available to all of us! We do need some software to share this data – that’s where we come in and where open source comes in. In the end the lesson is that we need to get scientists to show all their data and not reward academics solely for their citations because this model is rubbish. We need to find a new way to reward scientists though – a more open model.
Luis Ibanez, my fellow opensource.com moderator, was up next to talk to us about Open Source in Healthcare. Luis’s story was so interesting – I hope I caught all the numbers he shared – but the moral of the story is that hospitals could save insane amounts of money if they switched to an open system.
There are 7 billion people on the planet making $72 trillion a year. In the US we have 320 million people and that’s 5% of the global population, but we make 22% of the economic production on the planet – what do we do with that money? 24% of that money is spent on healthcare ($3.8 trillion) – not just the government, this is the spending of the entire country. This is more than they’re spending in Germany and France. However we’re ranked 38th in healthcare quality in the world. France is #1 however and they spend only 12% of their money on healthcare. This is an example of how spending more money on the problem is not helping.
Is there something that geekdom can do to set this straight? Luis says ‘yes!’
So, why do we go to the doctor? To get information. We want the doctor to tell us if we have a problem they can fix and know how to fix it. Information connects directly to our geekdom.
Today if you go to a hospital our data will be stored in paper and will go in to a “data center” (a filing cabinet). In 2010 84% of hospitals were keeping paper records versus using software. The healthcare industry is the only industry that needs to be paid to get them to switch to using software to store this information – $20 billion spent between 2010 and 2013 to get us to 60% of hospitals storing information electronically. This is one of the reasons we’re spending so much on healthcare right now.
The problem here (and this is Luis’s rant) is that the hospitals have to pay for this software in the first place. And you’re not allowed to share anything about the system. You can’t take screenshots, you can’t talk about the features, you are completely locked down. This system will run your hospital (a combination of hotel, restaurant, and medical facility) – they have been called the most complex institution of the century. These systems for a 400 bed hospital cost $100 million – and they have to buy these systems with little or no knowledge of how they work because of the security measures around seeing/sharing information about the software. This is against the idea of a free market because of the NDA you have to sign to see the software and use the software.
An example that Luis gave us was Wake Forest hospital which ended up being in the red by $56 million. All because they bought software for $100 million – leading to them having to fire their people, stop making retirement payments and other cuts. [For me this sounds a lot like what libraries are doing - paying salaries for an ILS instead of putting money toward people and services instead and saving money on the ILS]
Another problem in the medical industry is that 41% (less than 1/2) have the capability to send secure messages to patients. This is not a technology problem – this is a cultural problem in the medical world. Other industries have solved this technology problem already.
So, why do we care about all of this? There are 5,723 hospitals in the US, 211 of them are federally run (typically military hospitals), 413 are psychiatric, 2,894 are non profits and the others are private or state run. That totals nearly 1 million beds and $830 billion a year is spent in hospitals. The software that these hospitals are buying costs about $250 billion.
The federal hospitals are running a system that was released in to the public domain called VistA. OSEHRA was founded to protect this software. This software those is written in MUMPS. This is the same language that the $100 million software is written in! Except there is a huge difference in price.
If hospitals switched they’d spend $0. To keep this software running/updated we’d need about 20 thousand developers – but if you divide that by the hospitals that’s 4 developers per hospital. These developers don’t need to be programmers though – they could be doctors, nurses pharmacists – because MUMPS is so easy to learn.
Dean B. Krafft and Jon Corson-Rikert of Cornell University Library will present “Linked Data for Libraries: How libraries can make use of Linked Open Data to share information about library resources and to improve discovery, access, and understanding for library users”
Francis Kayiwa of Kayiwa Consulting will present “Learn Python by Playing with Library Data”
Networking opportunities, a major advantage of a smaller conference, are an important part of the Forum. Take advantage of the Thursday evening reception and sponsor showcase, the Thursday game night, the Friday networking dinners or Kitchen Table Conversations, plus meals and breaks throughout the Forum to get to know LITA leaders, Forum speakers, sponsors, and peers.
Library and Information Technology Association (LITA) members are information technology professionals dedicated to educating, serving, and reaching out to the entire library and information community. LITA is a division of the American Library Association.
Ever wonder what another institution's Islandora deployment looks like in detail? Look no further: York and Ryerson have shared their deployments wit the community on GitHub, including details such as software versions, general settings, XACML policies, and Drupal modules. If you would like to share your deployment, please contact Nick Ruest so he can add you as a collaborator on the repo.
Erica Stanley was up next to talk to us about Open Source and the Internet of Things (IoT).
The Internet of Things (Connected Devices) is the connection of things and people over a network. Why the Internet of Things? Why now? Because technology has made it a possibility. Why open source Internet of Things? To ensure that innovation continues.
Some of the applications we have for connected devices are: Health/Fitness, Home/Environment and Identity. Having devices that are always connected to us allow us to do things like monitor our health so that we can see when something might be wrong before we feel symptoms. Some devices like this are vision (Google glass) related, smart watches, wearable cameras, wristbands (fitbit), smart home devices (some of which are on my wishlist), connected cars (cars that see that the car in front of you has stopped versus slowed down) and smart cities like Raleigh.
There are many networking technologies these devices can use to stay connected, but bluetooth seems to be the default that is being used. There is a central device and a peripheral device – the central device wants the data that the peripheral device has. They use bluetooth to communicate with each other – the central device requesting info from the peripheral.
Cloud commuting, another important technology, has been one of the foundations for the Internet of Things – this is how we store all the info we’re passing back and forth. As we get more ability for our devices to learn we get more devices that can act on the data they’re gathering (there is a fitness app/device that will encourage you to get up and move once in a while for example).
Yet another technology that’s important is augmented reality showing us results of data in our day to day (Google glass showing you the directions to where you’re walking).
One challenge facing us is the fact that we have devices living in silos. So we have Google devices and Samsung devices – but they don’t talk to each other. We need to move towards a platform for connected devices. This will allow us to have a user controlled and created environment – where the devices I want to talk to each other can and the people I want to see the data can see the data. This allows us to personalize our environment but also secure our environment.
Speaking of security, there are some guidelines for developers that we can all follow to be sure to create secure devices. When building these devices we want to think about security from the very beginning. We need to understand our vulnerabilities, build security from the ground up. This starts with the OS so that we’re building an end-to-end solution. Obviously you want to be proactive in testing your apps and use updated APIs/frameworks/protocols.
Some tools you can use to get started as far as hardware: Arduino Compatible devices (Lilypad, Adafruit Flora and Gemma), Tessel, and Metawear. Software tools include: Spark Core, IoT Toolkit, Open.Sen.se, Cloud Foundry, Eclipse IoT Tools, and Huginn (which is kind of an open source IFTTT).
One thing to keep in mind when designing for IoT is that we no longer own the foreground – we might not have a screen or a full sized screen. We also have to think about integration with other devices and discoverablity of functionality if we don’t have a screen (gesture based device). Finally we have to keep in mind low energy and computing power. On the product side you want to think about the form factor – you don’t want a device that no one will want to wear. This also means creating personalizable devices
Remember that there is no ‘one size fits all’ – your device doesn’t have to be the same as others that are out there. Try to not get in the way of your user – build for people not technology! If we don’t try to take all of the user’s attention with the wearable then we’ll get more users.
Next up was Jason Hibbets and Gail Roper who gave a talk about the open source initiative in Raleigh.
Gail started by saying ‘no one told us we had to be more open’. Instead there were signs that showed that this was a good way to go. In 2010 Forbes labeled Raleigh one of the most wired cities in the country, but what they really want is to be the most connected city in the country.
Raleigh has 3 initiatives open source, open data, and open access – the city wants to get gigabit internet connections to every household. So far they have a contract with AT&T and they are working with Google to see if Raleigh will become a Google fiber city.
The timeline leading up to this though required a lot of education of the community about what open meant. It didn’t mean that before this they were hiding things from the community. Instead they had to teach people about open source and open access. There were common stereotypes that the government had about open source – the image of a developer in his basement being among them.
Why did they do this? Why do they want to be an open city? Because of SMAC (Social, Mobile, Analytics, Cloud). Today’s citizens expect that anywhere on any device they should be able to connect to the web. Government organizations like Raleigh’s will have 100x the data to manage. So providing a government that is collaborative and connected to the community becomes a necessity not an option.
“Empowerment of individuals is a key part of what makes open source work, since in the end, innovations tend to come from small groups, not from large, structured efforts.” -Tim O’Reilly
Next up was Jason Hibbets who is the team lead on opensource.com by day and by night he supports the open Raleigh project. Jason shared with us how he helped make the open Raleigh vision a reality. He is not a coder, but he is a community manager. Government to him is about more than putting taxes in and getting out services – it’s about us – the members of the community.
Jason discovered CityCamp – a government unconference that brings together local citizens to build stronger communities where they live. These camps have allowed for people to come together to share their idea openly. Along the way the organizers of this local CityCamp became members of Code for America. Using many online tools they have made it easy to communicate with their local brigade and with others around the state. There is also a meetup group if you’re in the area. If you’re not local you can join a brigade in your area or start your own!
New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.
Some of you may already know about my “MARC Usage in WorldCat” project, where I simply expose the contents of a number of MARC subfields in ordered lists of strings. The point, as I state on the site itself, is to expose “which elements and subfields have actually been used, and more importantly, how? This work seeks to use evidence of usage, as depicted in the largest aggregation of library data in the world — WorldCat — to inform decisions about where we go from here.”
One aspect of this is the quality, or lack thereof, of the actual data recorded. As an aggregator, we see it all. We see the typos, the added punctuation where none should be. We see the made up elements and subfields (yes, made up). We see data that is clearly in the completely wrong place in the record (what were they thinking?). We see it all.
So this week when I received a request for a specific report, as sometimes happens, I was happy to comply. The correspondent wanted to see the contents of the 775 $e subfield, which, according to the documentation should only have a “language code”. Catalogers know that you can’t make these up, they must come from the Library of Congress’ MARC Code List for Languages.
Sounds simple, right? If you encode a language in the 775 $e, it must come from that list. But that doesn’t prevent catalogers from embellishing (see all the variations for “eng” below and the number of times they were found; this does not include variations like “anglais”). Why not add punctuation? Or additional information, such as “bilingual”? I’ll tell you why not. Because it renders the data increasingly unusable without normalization.
And normalization comes at a cost. Easy normalization, such as removing punctuation, is straightforward. But at some point the easiest thing to do is to simply throw it away. If a string only occurs once, how important can it be?
As we move into a more fully machine-supported world for library metadata we will be facing more of these choices. Some will be harder than others. If you don’t believe me, just check out what we have to do with dates.
Arfon Smith from Github was up to talk to us about Academia and open source.
Arfon started with an example of a shared research proposal. So you create a document and then you edit the filename with each iteration because word processing applications is not good at tracking changes and allowing collaboration. Git though is meant for this very thing. So he showed us a book example on Github where the collaborators worked together on a document.
In open source there is this ubiquitous culture of reuse. Academia doesn’t do this – but why not? The problem is the publishing requirement in academia. The first problem is that ‘Novel’ results are preferred. You’re incentivized to publish new things to move ahead. The second problem is that the value of your citation is more powerful than the number of people you’ve worked with. And thirdly, and more generally, the format sucks. Even if it’s an electronic document it’s still hard to collaborate on it (see the document example above). This is state of the art technology … for the late 17th century. (Reinventing Discovery).
So, what do open source collaborations do well? There is a difference sometimes between open source and open source collaborations, this is an important distinction. Open source is the right to modify – it’s not the right to contribute back. An open source collaborations are highly collaborative development processes that allow anyone to contribute if they show an interest. This brings us back to the ubiquitous culture of reuse. These collaborations also expose the process by which they work together – unlike the current black box of research in academia.
How do we get 4000 people to work together then? Using git and Github specifically you can fork the code from an existing project and work on it without breaking other people’s work and then when you want to contribute it back you submit a pull request to the project. The beauty of this is ‘code first, permission later’ and every time this process happens the community learns.
The goal of a contribution of Github is to get it merged in to the product. Not all open source projects are receptive to these pull requests though, so those are not the collaborative types of projects.
Fernando Perez: “open source is .. reproducible by necessity.” If you don’t collaborate then these projects wouldn’t move forward – so they need to be collaborative. The difference in academia is that you have to work alone to and in a closed fashion to move ahead and get recognition.
Open can mean within your team or institution – it doesn’t have to be worldwide like in open source. But making your content electronic and available (which does not me a word doc or email) makes working together easier. Academia can learn from open source – more importantly academia must learn from open source to move forward.
All the above seems kind of negative, but Arfon did show us a lot of examples where people are sharing in academia – we just need to get this to be more widespread. Where might more significant change happen? The most obvious place to look is where communities form – like around a shared challenge – or around shared data. Science and big data are where we’re going to see this more hopefully.
There are challenges still though – so how do we make sharing the norm? The main problem is that academic reward ‘credit’ – so articles written by you solely. Tools like Astropy is hugely successful on github, but the authors had to write a paper about it to get credit. The other issue is trust – academics are reluctant to use other people’s stuff because we don’t know if their work is of value. In open source we have solved this problem already – if the package was downloading thousands of times it’s probably reliable. There are also tools like codeclimate that give your code a grade.
If you are reading this, I’m guessing that you too are a student, researcher, innovator, an everyday citizen with questions to answer, or just a friend to Open Knowledge. You may be doing incredible work and are writing a manuscript or presentation, or just have a burning desire to know everything about anything. In this case I know that you are also denied access to the research you need, not least because of paywalls blocking access to the knowledge you seek. This happens to me too, all the time, but we can do better. This is why we started the Open Access Button, for all the people around the world who deserve to see and use more research results than they can today.
Yesterday we released the new Open Access Button at a launch event in London, which you can download from openaccessbutton.org. The next time you’re asked to pay to access academic research. Push the Open Access Button on your phone or on the web. The Open Access Button will search the web for version of the paper that you can access.
If you get your research, you can make progress with your work. If you don’t get your research, your story will be used to help change the publishing system so it doesn’t happen again. The tool seeks to help users get the research they need immediately, or adds papers unavailable to a wish-list we can get started . The apps work by harnessing the power of search engines, research repositories, automatic contact with authors, and other strategies to track down the papers that are available and present them to the user – even if they are using a mobile device.
The London launch led other events showcasing the Open Access Button throughout the week, in Europe, Asia and the Middle East. Notably, the new Open Access Button was previewed at the World Bank Headquarters in Washington D.C. as part of the International Open Access Week kickoff event. During the launch yesterday, we reached at least 1.3 million people on social media alone. The new apps build upon a successful beta released last November that attracted thousands of users from across the world and drew lots of media attention. These could not have been built without a dedicated volunteer team of students and young researchers, and the invaluable help of a borderless community responsible for designing, building and funding the development.
Alongside supporting users, we have will start using the data and the stories collected by the Button to help make the changes required to really solve this issue. We’ll be running campaigns and supporting grassroots advocates with this at openaccessbutton.org/action as well as building a dedicated data platform for advocates to use our data .
If you go there you now you can see the ready to be filled map, and your first action, sign our first petition, this petition in support of Diego Gomez, a student who faces 8 years in prison and a huge monetary fine for doing something citizens do everyday, sharing research online for those who cannot access it.
If you too want to contribute to these goals and advance your research, these are exciting opportunities to make a difference. So install the Open Access Button (it’s quick and easy!), give it a push, click or tap when you’re denied access to research, and let’s work together to fix this problem.
The Open Access Button is available now at openaccessbutton.org.
Robb Hamilton and Greg Sheremeta from Red Hat spoke in this session about Bootstrap.
First up was Robb to talk about the problem. The problem that they had at Red Hat was that they had a bunch of products that all had their own different UI. They decided that as you went from product to product there should be a common UI. PatternFly was the initiative to make that happen.
PatternFly is basically Bootstrap + extra goodness.
Up next was Gregg to talk about using PatternFly on his project – oVirt. First when you have to work with multiple groups/products you need good communication. The UI team was very easy to reach out to, answering questions in IRC immediately and providing good documentation. One major challenge that Gregg ran in to was having to write the application in a server-side language and then get it to translate to the web languages that PatternFly was using.
Gregg’s favorite quote: “All problems in computer science can be solved by another level of indirection, except of course for the problem of too many indirections” – David Wheeler. So he needed to come up with a layer of indirection to get from his language to bootstrap. He Googled his problem though and found a library that would work for him.
Dwight Merriman from MongoDB was up next to talk to us about modern applications and data.
We’re not building the same things that we were before – we’re building a whole new class of applications that just didn’t exist before. When creating an app these days you might use pieces from 12 other applications, if you had to do this with a closed source project this would be very difficult. Open source makes the modern applications possible – otherwise you have 12 other things to go buy to make your application work.
We’re in the midst of the biggest technology change in the data layer in 25 years. We talk about big data and this is all part of it. One of the differences is the shape of the data. It’s not all tabular data anymore. The new tools we’re creating today are very good at handling these new shapes. Saying ‘unstructured data’ is inaccurate – it’s dynamic data – hence the word ‘shape’.
Speed is another aspect of this. Everything is real-time now – you don’t want to wait overnight for you report anymore. As developers as we build systems we need to start with a real-time mentality. While this sound logical – it’s actually a big change in the way we were taught which was to do things in batches. These days, computers are a lot faster so if you can do it (real-time) it’s a lot better.
We also need to think about our approach to writing code these days – this has changed a lot from how we were taught years ago. It’s not just about writing the perfect spec anymore, it’s a lot more collaboration with the customer. Iteration is necessary now – look at how Facebook changes a tiny bit every day.
Dwight then shared with us some real world examples from John Deer, Bosch and Edeva. Edeva is doing some interesting things with traffic data. They have built a technology that will see your speed when you’re driving over this one bridge in Sweden, if you’re going over the speed limit it will create speed bumps specifically for you. That’s just one say they’re putting their data to use in a real life scenario.
“There’s new stuff to do in all domains – in all fields – and we have the tools to do them now.”
Jeffrey Hammond from Forrester Research started this morning with a talk about Open Source – The Key Component of Modern Applications. Jeffrey wants to talk to us about why open source matters. It’s the golden age to be a developer. If you have people who work for you who are developers you need to understand what’s going on in our space right now. The industry is changing drastically.
When you started a software company years ago it would cost $5 to $10 million. Today software innovation cost about 90% less than it used to. This is because of a variety of things including: elastic infrastructure, services that we can call upon, managed APIs, open source software, and a focus on measurable feedback. Open source is one of the key parts of this. It is one of the driving forces of modern application development. In 2014 4 out of 5 developers use or have used open source software to develop or deploy their software.
The traits of modern applications show why we expect to see more and more open source software everywhere. One of those traits is the API. Another is asynchronous communication – a lot of the traditional frameworks that developers are used to using are not conducive to this so we’re seeing new frameworks and these are open source. We’re seeing less and less comparison of open source versus proprietary and more open source compared to open source.
Jeff showed us the Netflix’s engagement platform and how every part of their system is built on open source source. Most of the popular tools out there have this same architecture built on open source.
This development is being driven by open source communities. What Jess call collaborative collectives. Those of us looking to hire developers need to restructure to use the power of these collectives.
When asked if they write code on their own time 70% of developers say they do. That desire to write code on your own time is built on a variety of motives, all those motives represent intrinsic motivation – it makes them feel good. For those developers a little over 1 in 4 contribute to open source projects on their own time. So, if you’re looking to hire productive developers Jeff says there is a direct correlation between those who participate in open source to those who are amazing and productive programmers.
I’d add here that we need to educate the next generation in this model better so that they can get jobs when they graduate.
We are in a generational technology shift – web-based applications are very different from the systems that have come before them. The elasticity of open source licenses make them the perfect fit for these new modern architectures and comes naturally to most developers. Open source projects are driving the formation of groups of people who know how to work collaboratively successfully.
I am a user of technology much more than a creator. After I completed a masters in educational technology I knew to better use the skills I had learned it would benefit me to gain a better understanding of computer coding. My HTML skills were adequate but rusty, and I didn’t have any experience with other languages. To increase these skills I really did not want to have to take another for-credit course, but I also knew that I would have a better learning experience if I had someone of whom I could ask questions. Around this time, I was made aware of Girl Develop It. I have attended a few meetings and truly appreciate the instruction and the opportunity to learn new skills. As a way to introduce the readers of the LITA blog who might be interested in adding to their skill-set I interviewed Michelle Brush and Denisse Osorio de Large, the leaders of my local Girl Develop It group.
What is Girl Develop It?
MB: Girl Develop It is a national nonprofit organization dedicated to bringing more women into technology by offering educational and network-building opportunities.
DL: Girl Develop It is a nonprofit organization that exists to provide affordable and accessible programs to women who want to learn web and software development through mentorship and hands-on instruction.
What sparked your interest in leading a Girl Develop It group?
MB: I attended Strange Loop where Jen Myers spoke and mentioned her involvement in Girl Develop It. Then several friends reached out to me about wanting to do more for women in tech in Kansas City, so we decided to propose a chapter in Kansas City.
DL: Growing up my mom told me my inheritance was my education, and that my education was something no one would ever be able to take away from me. My education has allowed me to have a plentiful life, I wanted to pay it forward and this organization allowed to do just that.I’m also the proud mom of two little girls and I want to be a good example for them.
What is your favorite thing about working in the technology industry?
MB: Software can be like magic. You can build very useful and sometimes beautiful things from a pile of keywords and numbers. It’s also very challenging, so you get the same joy when your code works that you do when solving a really hard math problem.
DL: I love the idea of helping to create things that don’t exist and solving problems that no one else has solved. The thought of making things better, drives me.
Why do you believe more women should be working in information technology?
MB: If we can get women involved at the same percentages as we have men, we would solve our skills gap. It also helps that women bring a different perspective to the work.
DL: The industry as a whole will benefit from the perspective of a more diverse workforce. Also, this industry has the ability to provide a safe and stable environment where females can thrive and make a good living.
Are there other ways communities can be supportive of women entering the information technology industry?
MB: We need more visibility to the women already in the industry as that will make other women recognize they can be successful in the community as well. Partly it’s on women like me to seek out opportunities to be more visible, but it’s also on the community to remember to look outside of the usual suspects when looking for speakers, mentors, etc. It’s too easy to keep returning to the names you already know. Conferences like Strange Loop and Midwest.io are making strides in this area.
DL: I believe it starts with young girls and encouraging and nurturing their interest in STEM. It is very important that members of the community provide opportunities for girls to find their passion in the field of their choice.
Are any of you reading the LITA blog involved with Girl Develop It? I’d love to hear your stories!
Often Community is thought of as a soft topic. In reality, being part of a community (or more!) is admirable, a wonderful effort, both very fun but also sometimes tough and building and mobilising community action requires expertise and understanding of both tools and crowds – all relationships between stakeholders involved need to be planned with inclusivity and sustainability in mind.
This year Mozilla Festival (London, October 24-26), an event we always find very inspiring to collaborate with, will feature a track focusing on all this and more. Called Community Building, and co-wrangled by me and Bekka Kahn (P2PU / Open Coalition), the track has the ambitious aim to tell the story about this powerful and groundbreaking system, create the space where both newcomers and experienced community members can meet, share knowledge, learn from each other, get inspired and leave the festival feeling empowered and equipped with a plan for their next action, of any size and shape, to fuel the values they believe in.
We believe that collaboration between communities is what can really fuel the future of the Open Web movement and we put this belief into practice from our curatorship structure (we come from different organisations and are loving the chance to work together closely for the occasion) to the planning of the track’s programme, which is a combination of great ideas that were sent through the festival’s Call for Proposals and invitations we made to folks we knew would have had the ability to blow people’s mind with 60 minutes and a box of paper and markers at their disposal.
The track has two narrative arcs, connecting all its elements: one focusing on the topics which will be unpacked by each session, from gathering to organising and mobilising community power and one aiming to embrace all learnings from the track to empower us all, member of communities, to take action for change.
The track will feature participatory sessions (there’s no projector is sight!), an ongoing wall-space action and a handbook writing sprint. In addition to this, some wonderful allies, Webmaker Mentors, Mozilla Reps and the Space Wranglers team will help us make a question resonate all around the festival during the whole weekend: “What’s the next action, of any kind/ size/ location, you plan to take for the Open Web movement?”. Participants to our track, passer-bys feeding our wall action, folks talking with our allies will be encouraged to think about the answer to this, and, if not before, join our space for our Closing Circle on Sunday afternoon when we’ll all share with each other our plans for the next step, local or global, online or offline, that we want to take.
Furthermore, we also invite folks who’ll not be able to join us at the event to get in touch with us, know more about what we’re making and collaborate with us if they wish. Events can be an exclusive affair (they require time and funds to be attended) and we want to try to overcome this obstacle. Anyone will be welcome to connect with us in (at least) three ways. We’ll have a dedicated hashtag to keep all online/remote Community conversations going: follow and engage with #MozFestCB on your social media platform of choice, we’ll record a curated version of the feed on our Storify. We’ll also collect all notes, resources of documentation of anything that will happen in and around the track on our online home. The work to create a much awaited Community Building Handbook will be kicked off at MozFest and anyone who thinks could enrich it with useful learnings is invited to join the writing effort, from anywhere in the world.
If you’d like to get a head start on MozFest this year and spend some time with other open knowledge community minded folks, please join our community meetup on Friday evening in London.
This article discusses the changing nature of animated Graphics Interchange Format images (GIFs) as a form of visual communication on the Web, and how that can be adapted for the purposes of information literacy and library instruction. GIFs can be displayed simultaneously as a sequence of comic book like panels, allowing for a ‘birds eye view’ of all the steps of a process, viewing and reviewing steps as needed without having to rewind or replay an entire video. I discuss tools and practical considerations as well as limitations and constraints.
Introduction and Background
Animated GIFs are “a series of GIF files saved as one large file. Animated GIFs…provide short animations that typically repeat as long as the GIF is being displayed.” (High Definition) Animated GIFs were at one point one of the few options available for adding video-like elements to a web page. As web design aesthetics matured and digital video recording, editing, playback and bandwidth became more affordable and feasible, the animated GIF joined the blink tag and comic sans font as the gold, silver, and bronze medals for making a site look like it was ready to party like it’s 1999.
Even so, services like MySpace and fresh waves of web neophytes establishing a personal online space allowed the animated GIF to soldier on. Typically used purely for decoration without any particular function, and sometimes funny at first, then less so each subsequent viewing (like bumper stickers) animated GIFs ranged from benign to prodigiously distracting, best exemplified by that rococo entity: the sparkly unicorn:1
To be fair, some sites used animated GIFs with specific purposes, such as an early version of an American Sign Language site that used animated GIFs to demonstrate signing of individual words.2 As the web continued to evolve and function began to catch up with form, the animated GIF began to fade from the scene, especially with the advent of comparably fast-loading and high-resolution streaming video formats such as Quicktime and RealVideo. Flash, in conjunction with the rise of YouTube, established a de facto standard for video on the web for a time. In turn, with the ongoing adoption of HTML5 standards and the meteoric rise of mobile devices and their particular needs with regards to video formats, the web content landscape continues to develop and change.
I had personally written off the animated GIF as a footnote in early web history, until the last few years when I noticed them cropping up again with regularity. My initial reaction was ‘great, I’m officially old enough to see the first wave of web retro nostalgia’, but I began to notice some differences: instead of being images that simply waved their arms for attention, this new generation of animated GIFs often sketched out some sort of narrative: telling a joke, or riffing on a meme, such as the following:
This example combines an existing visual meme as a ‘punchline’ to clips from scenes in two different movies (Everything is Illuminated and Lord of the Rings) that pivots on two points of commonality: Elijah Wood and potatoes. I should note that when I first created this GIF, it was in ‘stacked’ format, or one continuous GIF to give the ‘punchline’ more impact, but I separated them here in keeping with the spirit of the article’s topic. In general, further thoughts and observations on the curious persistence and evolution of GIFs as a popular culture entity is discussed in this 2013 Wired article: The Animated GIF: Still Looping After All These Years.
Concepts and Rationale
At some point, an idea coalesced that a similar approach could be applied to instructional videos, specifically those supporting information literacy. Jokes and memes are, after all, stories of sorts, and information literacy instruction is too.
One initial attraction to exploring the use of animated GIFs was as an alternative to video. Given a choice between a video, even a short one, and some other media such as a series of captioned images or simple text, in most cases I will opt for the latter, especially if the subject matter demonstrates or explains how to do something. Some of this is merely personal preference, but I suspected others had the same inclination. In fact, a study by Mestre that compared the effectiveness of video vs. static images used for library tutorials indicated that participants had a disinclination to take the time to view instruction in video form. One participant comment in particular was interesting: “I think that a video tutorial really is only needed if you want to teach the complex things, but if it’s to illustrate simple information you don’t need to do it. In this case, a regular web page with added images and multimedia is all you need” (266). Furthermore, only five of twenty one participants indicated a preference for video over static image tutorials, and of those five, two “admitted that although they preferred the screencast tutorial, they would probably choose the static tutorial if they actually needed to figure out how to do something (270). Not only did the study show that students prefer not to watch videos, but students with a variety of learning style preferences were better able to complete or replicate demonstrated tasks when tutorials used a sequence of static images as compared to screencast videos (260).
Some reflection on why yielded the following considerations.
Scope and scale: A group of pictures or block of text gives immediate feedback on how much information is being conveyed. The length of a video will give some indication of this, but at a greater level of abstraction.
Sequence: Pictures and text have natural break points between steps of a process; the next picture, or a new paragraph or bullet point. This allows one to jump back to review an earlier step in the process, then move forward again in a way that is not disruptive to a train of thought. This is more difficult to do in video, especially if appropriate scene junctures are not built in with attendant navigation tools such as a click-able table of contents/scene list (i.e., you have to rewatch the video from the beginning to see step 3 again, or have a deft touch on the rewind/scrub bar). The Mestre study suggested that being able to quickly jump back to or review prior steps was important to participants (265).
Seeing the forest and the trees: This involves the concept of closure as described by Scott McCloud in Understanding Comics: “the…phenomenon of observing the parts but perceiving the whole”(63). Judicious choice and arrangement of sequences can allow one to see both the individual steps of a process and get a sense of an overall concept in less physical and temporal space than either a video or a series of static images. The main challenge in applying this concept is determining natural breaking points in a process, analogous to structuring scenes and transitions in a video or deciding on panel layout and what happens ‘off-screen’ between panels. Does the sequence of GIFs need to be a video that is chopped into as many parts as there are steps, or are there logical groupings that can be combined in each GIF?
Static and dynamic: This is where the animation factor comes into play. A series of animated GIFs allows for incorporating both the sequencing and closure components described above, while retaining some of the dynamic element of video. The static component involves several GIFs being displayed at once. This can be helpful for a multistep process where each step depends on properly executing the one before it, such as tying a bowtie. If you’re in the middle of one step, you can take in, at a glance, the previous or next step rather than waiting for the whole sequence to re-play. Depending on the complexity of the task, the simplification afforded by using several images compared to one can be subtle, but an analogy might be that it can make a task like hopping into an already spinning jump rope more like stepping onto an escalator—both tasks are daunting, but the latter markedly less so. The dynamic component involves how long and how much movement each image should include. A single or too few images, and you might as well stick with a video. Too many images and the process gets lost in a confusing array of too much information.
Using animated GIFs can also leverage existing content or tutorials. A sequence of GIFs can be generated from existing video tutorials. Conversely, the process of producing an efficient series of GIFs can also function as a storyboarding technique for making videos more concise and efficient or with appropriate annotation, selected individual frames of an animated GIF can be adapted to a series of static images for online use or physical handouts.
Animated GIFs might also be explored as an alternative instructional media where technological limitations are a consideration. For example, the area served by my library has a significant population that does not have access to broadband, and media that is downloaded and cached or saved locally might be more practical than streaming media. In terms of web technology, animated GIFs have been around a long time, but by the same token, are stable and widely supported and can be employed without special plugins or browser extensions. Once downloaded they may be viewed repeatedly without any further downloading or buffering times.
Applications, Practical Considerations, and Tools
In the section below I’ll discuss two specific examples I created of brief library tutorials using animated GIFs. The raw materials for creating the GIFs consisted of video footage recorded on an iPhone, video screen capture, and still images.
The first example of using this format is at http://www2.semo.edu/ksuhr/renew-examples.html. This page features four variants of instructions for renewing a book online. To some extent, the versions represent different approaches to implementing the concept, but probably more poignantly represent the process of trial and error in finding a workable approach. Notice: if the ‘different cloned versions of Ripley’ scene from Alien Resurrection3 disturbed you, you might want to proceed with caution (mostly kidding, mostly). I tried different sizes, arrangements and numbers of images. For the specific purpose here, three images seemed to strike a good balance between cramming too many steps into one segment and blinking visual overload.
The second example, at: http://www2.semo.edu/ksuhr/renew-examples.html#findbook, sticks with the three image approach for demonstrating how to track a call number from the catalog to a physical shelf location. The images produced in this example were very large, as much as 6MB. It is possible to shrink the file size by reducing the overall image size or optimizing the animated GIF. The optimized version is below the original. There is a distinct loss of image quality, but the critical information still seems to be retained; the text can still be read and the video is serviceable, although it has a certain ‘this is the scary part’ quality to it.
Creation of the two examples above revealed an assortment of practical considerations for and constraints of the animated GIF format. Animated GIF file sizes aren’t inherently smaller than video, especially streaming video. One advantage the animated GIF format has, as mentioned above, is that aside from not needing special plugins or extensions, they can be set to loop after downloading with no further user intervention or downloading of data. This facilitates the use of a series of moving images that illustrate steps that happen in sequence and can be parsed back and forth as necessary. This also helps in breaking up a single large video sequence into chunks of manageable size.
Depending on the task at hand, the usefulness of the animation factor can range from clarifying steps that might be difficult to grasp in one long sequence of static images (the bowtie example) to simply adding some visual interest or sense of forward propulsion to the demonstration of a process (the climbing the stairs example).
For some topics, it’s a fine line judgement call as to whether animated GIFs would add any clarity, or if a few thoughtfully-annotated screen shots would serve. While looking at non-library related examples, I found some demonstrations of variations on tying your shoe, both illustrated with static images or a single GIF demonstrating all of the steps. I found one to be learnable with the static images, and I actually regularly now use that method and tie my shoes one or two times a day instead of ten or twenty. A second, more complex method, was harder for me to grasp; between the complexity of the task, the number of images needed to illustrate the steps (which were displayed vertically, requiring scrolling to see them all), and the fact that it’s hard to scroll through images while holding shoelaces, I gave up. I also found it difficult to keep track of the steps with the single animated GIF. I can’t help but wonder if using several animated GIFs instead one for the entire process might have tipped the balance there.
In terms of tools, there is a variety of software that can get the task done. The examples above, including the mashup of Everything is Illuminated / Lord of the Rings, were done using Camtasia Studio versions 4 and 8 (a newer version became available to me whilst writing this article). The GIF optimization was done with Jasc Animation Shop v.2, which has been around at least fifteen years, but proved useful in reducing the file size of some of the example animated GIFs by nearly half.
Camtasia Studio is not terribly expensive, is available for Mac and Windows, and has some very useful annotation and production tools, but there are also freely-available programs that can be used to achieve similar results. A few Windows examples that I have personally used/tried:4
Screen capture: Jing and Hypercam.
Scene selection and excerpting: Free Video Slicer.
VLC is another option and is available on Mac and Linux as well. There is a Lifehacker article that details how to record a section of video.
Video to GIF conversion: Free Video to Gif Converter .
Captioning: Windows Movie Maker
The captioning in Camtasia and Movie Maker is a nice feature, but it should be noted that conversion to GIF removes any ADA compliance functionality of closed captions. An alternative is to simply caption each animated GIF with html text under each image. An inference can be drawn from the Mestre study that a bit of daylight between the visual and the textual information might actually be beneficial (268).
Some cursory web searching indicates that there are a variety, yea—even a plethora, of additional tools available; web-based and standalone programs, freeware, shareware and commercial.
Discussion and Where Next
The example information literacy GIFs discussed above both deal with very straightforward processes that are very task oriented. Initial impressions suggest that using animated GIFs for instruction would have a fairly narrow scope for usefulness, but within those parameters it could be a good alternative, or even the most effective approach. Areas for further exploration include using this approach for more abstract ideas, such as intellectual property issues, that could draw more upon the narrative power of sequential images. Conversely animated GIFs could serve to illuminate even more specific library-related processes and tasks (e.g.: how to use a photocopier or self checkout station.) Another unknown aspect is assessment and effectiveness. Since I assembled the examples used, I was naturally very familiar with the processes and it would be helpful to have data on whether this is a useful or effective method from an end user’s perspective.
The Mestre study made a fairly strong case that static images were more effective than video for instruction in basic tasks and the the sequentiality of the images was an important component of that (260, 265, 270). One aspect that warrants further investigation is whether the dynamic aspects of animated GIFs would add to the advantage of a sequence of images, if the movement would detract from the effectiveness of purely static images, or if they would provide a ‘third way’ that would draw on the strengths of the other two approaches to be even more effective than either.
In closing, I’d like to note that there is a peculiar gratification in finding a new application for a technology that’s been around at least as long as the Web itself. In reflecting on how the idea took shape, I find it interesting that it wasn’t a case of looking for a new way to deliver library instruction, rather that observing the use of a technology for unrelated purposes led to recognition that it could be adapted to a particular library-related need. I suppose the main idea I’d really like to communicate here is, to put it simply: be open to old ideas surprising you with new possibilities.
I would like to acknowledge the peer reviewers for this article: Ellie Collier and Paul Pival, and the Publishing Editor Erin Dorney for their kind support, invaluable insights, and unflagging assistance in transforming ideas, notes and thoughts and first drafts into a fully realized article. Many thanks to you all!
“Animated Gif.” High Definition: A-z Guide to Personal Technology. Boston: Houghton Mifflin, 2006. Credo Reference. Web. 13 October 2014.
McCloud, Scott. Understanding Comics. New York: Paradox Press, 2000. Print.
Mestre, Lori S. “Student Preference for Tutorial Design: A Usability Study.” Reference Services Review 40.2 (2012): 258-76. ProQuest. Web. 26 Sep. 2014.
In one scene in Alien Resurrection, a cloned version of the main character discovers several ‘rough draft’ clones of herself, gruesomely malformed and existing in suffering
As a side note, I’m simply listing them, rather than providing direct links, erring on the side of caution on security matters, but I have personally downloaded and used all of the above with no issues that I’m aware of. They are all also easily findable via a web search.
This week I, and a group of development data experts from around the world, met for three days in a small farmhouse in the Netherlands to produce a book on Responsible Development Data. Today, we’re very happy to launch the first version: comments and feedback are really welcome, and please feel free to share, remix, and re-use the content.
This book is offered as a first attempt to understand what responsible data means in the context of international development programming. We have taken a broad view of development, opting not to be prescriptive about who the perfect “target audience” for this effort is within the space. We also anticipate that some of the methods and lessons here may have resonance for related fields and practitioners.
The group of contributors working on this book brings together decades of experience in the sector of international development; our first hand experiences of horrific misuse of data within the sector, combined with anecdotal stories of (mis)treatment and usage of data having catastrophic effects within some of the world’s most vulnerable communities, has highlighted for us the need for a book tackling issues of how we can all deal with data in a responsible and respectful way.
Why this book?
It might have been an uneasy sense that the hype about a data revolution is overlooking both the rights of the people we’re seeking to help and the potential for harm that accompanies data and technology in development context. The authors of this book believe that responsibility and ethics are integral to the handling of development data, and that as we continue to use data in new, powerful and innovative ways, we have a moral obligation to do so responsibly and without causing or facilitating harm. At the same time, we are keenly aware that actually implementing responsible data practices involves navigating a very complex, and fast-evolving, minefield – one that most practitioners, fieldworkers, project designers and technologists have little expertise on. Yet.
We could have written another white paper that only we would read, or organised another conference that people would forget about. We tried instead to pool
our collective expertise and concerns, to produce a practical guide that would help our peers and the wider development community to think through these issues. With the support of Hivos, Book Sprints and the engine room, this book was collaboratively produced (in the Bietenhaven farm, 40 minutes outside of Amsterdam) in just three days.
The team: Kristin Antin (engine room), Rory Byrne (Security First), Tin Geber (the engine room), Sacha van Geffen (Greenhost), Julia Hoffmann (Hivos), Malavika Jayaram (Berkman Center for Internet & Society, Harvard), Maliha Khan (Oxfam US), Tania Lee (International Rescue Committee), Zara Rahman (Open Knowledge), Crystal Simeoni (Hivos), Friedhelm Weinberg (Huridocs), Christopher Wilson (the engine room), facilitated by Barbara Rühling of Book Sprints.