Imagine a world in which everyone’s life story can be made preservable and accessible.
The Mykive project seeks to provide people with a flexible set of tools that they can use to help people preserve, aggregate, and make accessible the digital legacies that are currently trapped in “the cloud” or in other locations that make them difficult to access, preserve, and use. By aligning the work of several innovative projects into a broader software environment and archival framework, myKive seeks to enable the development of the trusting partnerships that underlie successful archival acquisitions, and ultimately to build a storehouse of personal digital archives that are open to research, interpretation, analysis, and innovation.
Intrigued? Read more and comment at https://newschallenge.org/challenge/libraries/submissions/mykive-helping-archives
Kim Martin spoke to us about her dream. A Public Digital Humanities Centre. Digging Digital Humanities Humanities’ laboratory is the library. Researchers do their work in the libraries, but what happens when the space changes and becomes digital? Began looking at and interviewing people to how they are being affected, and what’s going on in […]
If a machine could read, what would change? That is, if a computer could ingest an article, essay or book, and then answer questions about it. We’re not talking about Turing Test intelligence here. Just the ability to summarize content within a context. What were the main points in the article? Was there any new information compared to previous articles on the subject? What was the theme of the book? Describe how tension was built through use of imagery? Suppose a machine respond to those questions. You might still prefer the answer of a human, but the answers would be good enough that you would declare them useful and satisfactory. If a machine could read, what would change?
Washington, DC & BOSTON — The Institute of Museum and Library Services (IMLS) announced today a $999,485 grant to the Digital Public Library of America (DPLA) for a major expansion of its infrastructure. The Digital Public Library of America brings together the riches of America’s libraries, archives, and museums, and makes them freely available to the world. It strives to contain the full breadth of human expression, from the written word, to works of art and culture, to records of America’s heritage, to the efforts and data of science. DPLA aims to expand this crucial realm of openly available materials, and make those riches more easily discovered and more widely usable and used.
This IMLS award builds on a 2012 IMLS grant to DPLA. With new funding DPLA will pursue a major expansion of its service hubs network. The goal is to at least double the number of DPLA service hubs and to use IMLS support to encourage other funders to make DPLA service hubs available to all institutions in every state in the union.
DPLA service hubs are state, regional, or other collaborations that host, aggregate, or otherwise bring together digital objects from libraries, archives, museums, and other cultural heritage institutions. State and regional hubs agree to collect content that describes their local history, but also content about the US broadly and, when available, international topics. Each service hub offers its partners services that range from professional development, digitization, metadata creation or enhancement, to hosting or metadata aggregation. They may also provide community outreach programs to increase users’ awareness of digital content of local relevance.
“We are proud of IMLS’s support for this important step in the expansion of the Digital Public Library of America,” said IMLS Director Susan H. Hildreth. “IMLS is committed to helping make the rich holdings of America’s libraries, archives, and museums more accessible to all.”
“The Digital Public Library of America and its rapidly growing community are enormously grateful to IMLS for this support, which will allow us to capitalize on what we have learned in our first year to make substantial progress on a national digital platform in the years to come,” said Dan Cohen, DPLA’s Executive Director.
Since launching in April 2013, DPLA has aggregated nearly 8 million items from well over a thousand institutions. DPLA plans to make its services available to all collections-based institutions in every state in the U.S., and to make collections freely available to students, teachers, researchers, and the general public.
So far this year I've attended two talks this year that were really revelatory; Krste Asanović's keynote at FAST 13, which I blogged about earlier, and Kestutis Patiejunas' talk about Facebook's cold storage systems. Unfortunately, Kestutis' talk was off-the-record, so I couldn't blog about it at the time. But he just gave a shorter version at the Library of Congress' Designing Storage Architectures workshop, so now I can blog about this fascinating and important system. Below the fold, the details.
The initial response to Facebook's announcement of their prototype Blu-ray cold storage system focused on the 50-year life of the disks, but it turns out that this isn't the interesting part of the story. Facebook's problem is that they have a huge flow of data that is accessed rarely but needs to be kept for the long-term at the lowest possible cost. They need to add bottom tiers to their storage hierarchy to do this.
The first tier they added to the bottom of the hierarchy stored the data on mostly powered-down hard drives. Some time ago a technology called MAID (Massive Array of Idle Drives) was introduced but didn't make it in the market. The idea was that by putting a large cache in front of the disk array, most of the drives could be spun-down to reduce the average power draw. MAID did reduce the average power draw, at the cost of some delay from cache misses, but in practice the proportion of drives that were spun-down wasn't as great as expected so the average power reduction wasn't as much as hoped. And the worst case was about the same as a RAID, because the cache could be thrashed in a way that caused almost all the drives to be powered up.
Facebook's design is different. It is aimed at limiting the worst-case power draw. It exploits the fact that this storage is at the bottom of the storage hierarchy and can tolerate significant access latency. Disks are assigned to groups in equal numbers. One group of disks is spun up at a time in rotation, so the worst-case access latency is the time needed to cycle through all the disk groups. But the worst-case power draw is only that for a single group of disks and enough compute to handle a single group.
Why is this important? Because of the synergistic effects knowing the maximum power draw enables. The power supplies can be much smaller, and because the access time is not critical, need not be duplicated. Because Facebook builds entire data centers for cold storage, the data center needs much less power and cooling. It can be more like cheap warehouse space than expensive data center space. Aggregating these synergistic cost savings at data center scale leads to really significant savings.
Nevertheless, this design has high performance where it matters to Facebook, in write bandwidth. While a group of disks is spun up, any reads queued up for that group are performed. But almost all the I/O operations to this design are writes. Writes are erasure-coded, and the shards all written to different disks in the same group. In this way, while a group is spun up, all disks in the group are writing simultaneously providing huge write bandwidth. When the group is spun down, the disks in the next group take over, and the high write bandwidth is only briefly interrupted.
Next, below this layer of disk cold storage Facebook implemented the Blu-ray cold storage that drew such attention. It has 12 Blu-ray drives for an entire rack of cartridges holding 10,000 100TB Blu-ray disks managed by a robot. When the robot loads a group of 12 fresh Blu-ray disks into the drives, the appropriate amount of data to fill them is read from the currently active hard disk group and written to them. This scheduling of the writes allows for effective use of the limited write capacity of the Blu-ray drives. If the data are ever read, a specific group has to be loaded into the drives, interrupting the flow of writes, but this is a rare occurrence. Once all 10,000 disks in a rack have been written, the disks will be loaded for reads infrequently. Most of the time the entire Petabyte rack will sit there idle.
It is this careful, organized scheduling of the system's activities at data center scale that enables the synergistic cost reductions of cheap power and space. It is, or at least may be, true that the Blu-ray disks have a 50-year lifetime but this isn't what matters. No-one expects the racks to sit in the data center for 50 years, at some point before then they will be obsoleted by some unknown new, much denser and more power-efficient cold storage medium (perhaps DNA).
After the off-the-record talk I was thinking about the synergistic effects that Facebook got from the hard limit the system provides on the total power consumption of the data center. This limit is fixed, the system schedules its activities to stay under the limit. I connected this idea to the Internet Archive's approach to operating their data center in the church in San Francisco without air conditioning. Most of the time, under the fog, San Francisco is quite cool, but there are occasional hot days. So the Internet Archive team built a scheduling system that, when the temperature rises, delays non-essential tasks. Since hot days are rare, these delays do not significantly reduce the system's throughput although they can impact the latency of the non-essential tasks.
The interesting thing about renewable energy sources, such as solar and wind, is that these days the output of these sources can be predicted with considerable accuracy. Suppose Facebook's scheduler could modulate the power limit dynamically, it could match the data center's demand to the power available from the solar panels or wind turbines powering it. This could enable, for example, off-the-grid cold storage data centers in the desert, eliminating some of the possible threats to the data.
Techcrunch is reporting that a group of librarians known as the "ALA ThinkTank" has acquired the assets of shuttered startup Readmill. The new owners will turn the website and apps into a "library books in the cloud" site for library patrons.
Over the weekend, Readmill announced that it had been "acqui-hired" by cloud storage startup DropBox, and that its app and website would cease functioning on July 1. "Many challenges in the world of ebooks remain unsolved, and we failed to create a sustainable platform for reading" said Readmill founder Henrik Berggren, in his farewell message to site members. "Failure to have a sustainable platform for reading really resonates with librarians" responded ThinkTank co-founder J.P. Porcaro. "It's a match made in heaven - devoted users, quixotic economics, and lots of books to distract the staff." Porcaro will serve as CEO of the new incarnation of Readmill.
New Readmill CEO J. P. Porcaro
The acquisition also solves a problem Porcaro had been wrestling with- how to spend the group's Bitcoin millions. Far from its present incarnation as a Social Enterprise/Facebook Group hybrid, ALA ThinkTank originated as a solution for housing destitute librarians from New Jersey during the bi-annual conventions of the American Library Association. The group figured that by renting a house instead of renting hotel rooms, they could save money, learn from peers and throw great parties. The accompanying off-the-grid commerce in "assets" was never intended- it just sort of happened.
One of the librarians was friends with Penn State grad student Ross Ulbricht, who convinced the group to use Bitcoin for the purchase and sale of beer, pizza and "ebooks". "He kept talking about piracy and medieval trade routes" reported Porcaro, "We thought he was normal ... though in retrospect it was kinda weird when he asked about using hitmen to collect overdue book fines."
The 10,000 fold increase in the value of ThinkTank's Bitcoin account over the past four years caught almost everyone completely off guard. The parties, which in past years were low rent, jeans-and-cardigan affairs, have morphed into multi-story "party hearty" extravaganzas packed with hipster librarians body-pierced with bitcoin encrusted baubles and wearing precious-metal badge ribbons.
Porcaro expects that Readmill's usage will skyrocket with the new management. He thinks that ALA ThinkTank's heady mix of critical pedagogy, "weeding" advice, gaming makerspaces, drink-porn, management theory, gender angst and a whiff of scandal are sure to "make it happen" for the moribund social reading site, which has suffered from the general boringness of books.
ThinkTank members are already hard at work planning the transition. A 13-step procedure that will allow Readmill users to keep their books exactly as they are has been spec-ed out by one library vendor. "If you like your ebooks you can keep them" Porcaro assured me. "If you don't like them, we can send them to India for you. Or Lafourche, Louisiana, your choice."
The backlash against the new Readmill has already begun. "Library books in the cloud is the dumbest thing I've ever heard of. How will people know which bits are theirs, and which need to be returned? How will we do inter-library loan? What will happen if it rains?" complained one senior library director who declined to be identified. "How will we get our books returned then?" she asked. "I don't even know HOW to hire a hitman."
In a press release, Scott Turow, past president of the Authors Guild, expressed his horror at the idea of "library books in the cloud." "Once again, librarians are scheming to take food out of the mouths of authors emaciated by hunger. These poor authors are dying miserable deaths, knowing that their copyrighted works are being misused and unread in this way. Library books in a cloud of nerve gas, more like!"
The American Library Association, which is completely unaffiliated with ALA ThinkTank, has formed a committee to study the cloud library ebook phenomenon.
List ten books that have stayed with you in some way. Don’t take more than a few minutes, and don’t think too hard. It is not about the ‘right’ books or great works of literature, just ones that have affected you in some way.
Once you’ve listed your books, you are supposed to “tag” 10 people. I am not usually a big fan of these chain letter things, but I really enjoyed reading the lists that were posted, particularly when they involved commentary. When my college friend Cathy tagged me, in turn, I asked OCLC Research colleagues to contribute.
Earlier this month, the Facebook Data Science team posted an analysis of the “top” books from the meme. It was interesting to see how many of the books listed showed up on our lists but perhaps even more interesting to see the interests of our group reflected in some of the more unusual choices.
If you’d like to check out our lists, please read on. If you’d like to play, consider yourself tagged — leave your list of books below. And enjoy!
Laozi, Bi Wang, and Zhe Su. Laozi dao de jing. Not sure which edition I read, but this was the first Chinese book I read cover-to-cover and served as a basis of a discussion with a philosophy professor at TaiDa. Really opened my mind to a completely different way of thinking, and influences me still..
Hersey, John. The Wall. Read this as a teenager. My introduction to Holocaust fiction and inspired me to read far more about it.
Bosworth, Allan R. America’s Concentration Camps. Read as a teenager. My introduction to Japanese internment camps. One of the books that made me realize that the US has a number of dark periods in its history beyond what I had learned in school…
Sartre, Jean-Paul. Huis clos. The first French book I read cover-to-cover, again as a teenager. It was a time when “L’enfer, c’est les autres” really resonated!
Polo, Marco, William Marsden, and Jon Corbino. The Travels of Marco Polo, the Venetian. Not sure which edition I read, but read it as a teenager and likely put the “traveling bug” into me. A factor in my living/traveling for 9 years before returning to the US…
Bultmann, Rudolf. Jesus Christ and Mythology. My first introduction to the concept of hermeneutics. It wasn’t even central to the book, but it’s such an important concept to understand that I’ll probably always remember where I first read about it.
Frankl, Viktor E. Man’s Search for Meaning. It just put all the petty bullshit “problems” of my life into perspective and offered insight into creating meaning in life.
Palmer, Parker J. Let Your Life Speak: Listening for the Voice of Vocation. For being so short, it’s a pretty gut wrenching book. The short version is that you’re probably not spending your working hours in the most fulfilling way. A simple insight, but the journey there is tough.
Hijuelos, Oscar. The Mambo Kings Play Songs of Love. Cuban American brothers in New York City and their visions of the perfect woman. I felt like the heat and humidity was enveloping me the entire time I was reading. Painfully human characters.
Tartt, Donna. The Secret History. An incredible writer in the technical sense who also weaves weird and wonderful tales. Classics prof draws his students into the supernatural, woo hoo! Many didn’t care for her next (The Little Friend), but I did. Can’t wait to dive into The Goldfinch.
Borges, Jorge Luis. Labyrinths: Selected Stories & Other Writings. Borges: the most amazing short story writer of all time. And of course the most fascinating fantasy library ever imagined is the centerpiece of The Circular Ruins.
Irving, John. A Prayer for Owen Meany: A Novel. Such a weird but endearing protagonist, matched only perhaps by Iggy J. Reilly in A Confederacy of Dunces (which, alas, didn’t occur to me until my list was already at ten).
Gardner, John. The Sunlight Dialogues. Gardner was one of my favorite novelists when I was in my 20s (add to that Vonnegut, Irving, and Robbins, weirdos all). Sunlight stands in for them. Or maybe I should have picked his retelling of Beowulf from the perspective of the monster Grendel.
García Márquez, Gabriel. One hundred years of solitude. Speaking as a student of Latin American literature, is it necessary to explain why this was, and is, so affecting and influential?
Eco, Umberto. The Name of the Rose. It’s the librarian in me, but also, I suppose, the fallen Catholic. Not to mention his amazing depiction of the Middle Ages.
Craig, Charmaine. The Good Men: A Novel of Heresy. More ex-Catholic fascination with Medieval times and the joys of the Inquisition! Craig evokes the era with extraordinary skill. And she did her research in lots of Medieval libraries. Oh, and the Cathar Heresy is a fascinating bit of French history.
Neruda, Pablo, and Nathaniel Tarn. Selected Poems. Extraordinarily beautiful use of the Spanish language, generally well-translated into English–but read him in the Spanish if you’re able. One of the top reasons why I’m glad I majored in Latin American literature.
Bruce says, “Not all of these remain influential, for me anyway. One thing they have in common is that I’ve read each one multiple times and have recommended them to others.”
Gilliam, Harold, and Gene M. Christman. Weather of the San Francisco Bay Region. This might be the little book that has influenced me the most. I still have my copy from 1970. The great Harold Gilliam taught me all about fog. His 1962 commentary at the end regarding climate change is fascinating from this distance.
Lee, Harper. To Kill a Mockingbird. This shows up on many lists, I imagine. If you read it when you were young, especially. Old Atticus is still kind of a role model. And I learned what a “chifforobe” is. That’s important information for a 12 year old.
DeLillo, Don. Libra. I’ve read this a bunch of times and am always entertained. It influenced me to read everything else from DeLillo.
Banks, Russell. Affliction. The take-away for me was advice given to Wade Whitehouse by his brother. Wade is plagued by problems, including a bad tooth. His brother says list your problems in priority order and tackle one at a time starting with the tooth. Wade doesn’t listen.
Catton, Bruce. Mr. Lincoln’s Army. Actually, the Army of the Potomac trilogy. Here’s Catton describing the battle of Antietam: “south of the fence, filling all of the ground between the road and the wood, was Mr. Miller’s thriving cornfield — THE cornfield, forever, after that morning.
Gibson, William. Pattern Recognition. I’m not exactly sure why but I always really enjoy re-reading this one. There must be some pattern to that.
Roy says, “Although I cheated and did 15. So sue me. ;-)” Never a rule follower, that Roy….
Kazantzakis, Nikos. Report to Greco. I fell in love with Kazantzakis before I met my Greek American wife. So my inevitable trip to his beloved Crete was made all the more sweet when it happened. I raise a glass of Raki and toast him and his work.
Herbert, Frank. Dune. The single best marriage of Ecology and Science Fiction there ever was, or ever will be. Two of my loves, joined at the hip and completely believable. Amazing.
Eiseley, Loren C. All the Strange Hours: The Excavation of a Life. Loren Eiseley is my hero. I need no other. A scientist, a thinker, an outdoorsman, a writer, a poet and a prose poet, a true Renaissance Man. What I aspire to be, and fall short of, but love to strive to achieve.
Tolkien, John Ronald Reuel. The Lord of the Rings. I read this as a mid-teen and the poem “All That is Gold Does Not Glitter” become my mantra, as I spent 17 virtually alone in my treehouse on an Indiana farm.
Trevor, Elleston. The Flight of the Phoenix. I’ve always tried to be the ultimate Boy Scout — prepared for anything, and ready to deal with whatever is thrown at me. So I fell in love with this story of doing exactly that to survive. Rebuild a crashed plane and fly it out of the desert. Awesome.
Sinclair, Upton. The Jungle. One of the best introductions to Socialism, buried, in the end, by its account of slaughterhouses. Which goes to prove that people care more about what they eat than just about anything else.
Abbey, Edward. Desert Solitaire: A Season in the Wilderness. I’ve always been in love with the outdoors, so this paean to nature, and to the desert that I learned to love in my teens and early 20s, really spoke to me. It still does.
Melissa writes, “Now that I step back and look at it, I wonder what it means that my list is made up of books I read as a child or a young adult. I’ve also read most of them to my children. I can interpret this in several ways: 1.) I’ve read these books so many times they’re burned in my mind, or 2.) I really love children’s books. When I was young I wanted to write children’s books when I grew up. That hasn’t happened yet but it still could. Maybe I just haven’t grown up yet?! ;-)”
Leithold, Louis. The Calculus, with Analytic Geometry. My dad pushed me into mathematics. (I suspect he felt weak in it.) I enjoyed it, but never felt the passion for it that I do for programming. But, this book was just about as good as it got. Leithold had a wonderful way of making the concepts simple.
Pratchett, Terry. Small Gods: A Novel of Discworld. A book about a man and his personal relationship with his god. This is one of the two books I try hardest to get people to read.
Pratchett, Terry. Reaper Man. “There is no justice, there is just us.” Terry Pratchett creates characters that you care about. I often cry while reading his books. One of his most endearing characters is Death.
Knuth, Donald Ervin. The Art of Computer Programming. My sophomore year of college was about working my way through this book. I won’t swear that a lot of it stuck to me, but the experience certainly did.
Heller, Joseph. Catch-22, A Novel. My mother told me to read this. I’ve always respected her suggestions and this was a good one. I was depressed for a week after reading it.
Cheech And Chong. Cheech And Chong. I know this is supposed to be books, but this album was exactly what my life in Azusa was like. I knew all the characters in this album. I snuck friends into the drive-in in the trunk of my car. Dave’s not here.
Chabon, Michael. The Yiddish Policeman’s Union. I love this book. It’s one of two books I try to make people read. It’s a great mystery. It’s a great love story. It’s a loving insight into Yiddish culture. The story is one surprise after another right up to the end.
Cherryh, C. J. Downbelow Station. I love the books of CJ Cherryh! This book is part of her Company series. It does a wonderful job of making you feel like you understand what it’s like to live on a space station. It’s not a happy life.
McCullough, Colleen. The Thorn Birds. The first book I checked out of the adult section of the library – don’t judge, I was 10 or 11.
Michener, James A. Centennial. I loved James Michener books because they were so very, very long. I have never wanted a story I liked to end.
King, Stephen. The Shining. Stephen King is an amazing story teller with a very twisted mind.
Huxley, Aldous. Brave New World. For a while I could not get enough of the dystopian future thing.
Steinbeck, John. The Grapes of Wrath. I spent several years in college and after doing research in archives trying to figure out why in the heck the Joads would move on from the FSA camp, which seemed like heaven to me.
Steinbeck, John. East of Eden. The levels of manipulation are fascinating.
Austen, Jane. Pride and Prejudice. A college friend assigned it to me. I love rereading it, and of course all the derivatives are fantastic.
This summer Springshare released LibGuides 2.0, which is a complete revamp of the LibGuides system. Many libraries use LibGuides, either as course/research guides or in some cases as the entire library website, and so this is something that’s been on the mind of many librarians this summer, whichever side of LibGuides they usually see. The process of migrating is not too difficult, but the choices you make in planning the new interface can be challenging. As the librarians responsible for the migration, we will discuss our experience of planning and implementing the new LibGuides platform.
Making the Decision to Migrate
While migrating this summer was optional, Springshare will probably only support LibGuides 1 for another two years, and at Loyola we felt it was better to move sooner rather than later. Over the past few years there were perpetual LibGuides cleanup projects, and this seemed to be a good opportunity to finalize that work. At the same time, we wanted to experiment with new designs for the library’s website that would bring it in closer alignment with the university’s new brand as well as make the site responsive, and LibGuides seemed like the ideal place to experiment with some of those ideas. Several new features, revealed on Springshare’s blog, resonated with subject-area specialists which was another reason to push for a migration sooner than later. We also wanted to have it in place before the first day of classes, which gave us a few months to experiment.
The Reference and Electronic Resources librarian, Will Kent, as well as the Head of Reference, Niamh McGuigan, and the Digital Services Librarian, Margaret Heller, worked in concert to make decisions, as well as inviting all the other reference and instruction librarians (as well as anyone else who was interested) to participate in the process. There were a few ground rules the core team went by, however: we were migrating and the process was iterative, i.e. we weren’t waiting for perfection to launch.
Planning the Migration
During the migration planning process, the small team of three librarians worked together to create a timeline, report to the library staff on progress, solicit feedback on the system, and update the LibGuide policies to reflect the new changes and functions. As far as front-end migration went, we addressed large staff-wide meetings, provided updates, polled subject specialists on the progress, prepared our 400 databases for conversion to the new A-Z list, and demonstrated new features, and opened changes that they should be aware of. We would relay updates from Springshare and handle any troubleshooting questions as they happened.
Given the new features – new categories, new ways of searching, the A-Z database list, and other features, it was important for us to sit down, discuss standards, and update our content policies. The good news was that most of our content was in good shape for the migration. The process was swift and barring inevitable, tiny bugs went smoothly.
Our original timeline was to present the migration steps at our June monthly joint meeting of collections and reference staff, and give a timeline one month until the July meeting to complete the work. For various reasons this ended up stretching until mid-August, but we still launched the day before classes began. We are constantly in the process of updating guide types, adding new resources, and re-classifying boxes to adhere to our new policies.
Working on the Design
LibGuides 2.0 provides two basic templates, a left navigation menu and a top tabbed menu that looks similar to the original LibGuides (additional templates are available with the LibGuides CMS product). We had originally discussed using the left navigation box template and originally began a design based on this, but ultimately people felt more comfortable with the tabbed navigation.
For the initial prototype, Margaret worked off a template that we’d used before for Omeka. This mirrors the Loyola University Chicago template very closely. We kept all of the LibGuides standard template–i.e. 1-3 columns with the number of columns and sections within the column determined by the page creator, but added a few additional pieces in the header and footer, as well as making big changes to the tabs.
The first step in planning the design was to understand what customization happened in the template, and which in the header and footer which are entered separately in the admin UI. Margaret sketched out our vision for the site on the whiteboard wall to determine existing selectors and those that would need to be added, as well as get a sense of whether we would need to change the content section at all. In the interests of completing the project in a timely fashion, we determined that the bare minimum of customization to unify the research guides with the rest of the university websites would be the first priority.
The Look & Feel section under ‘Admin’ has several tabs with sections for Header and Footer, Custom CSS/JS, and layout of pages–Guide Pages Layout is the most relevant for this post.
The new LibGuides platform is responsive, but we needed to account for several items we added to the interface. We added a search box that would allow users to search the entire university website, as well as several new logos, so Margaret added a few media queries to adjust these features on a phone or tablet, as well as adjust the spacing of the custom footer.
Improving the Design
Our first design was ready to present to the subject librarians a month after the migration process started. It was based on the principle of matching the luc.edu pages closely (example), in which the navigation tabs across the top have unusual cutouts, and section titles are very large. No one was very happy with this result, however, as it made the typical LibGuides layout with multiple sections on a page unusable and the tabs not visible enough. While one approach would have been to change the navigation to left navigation menu and limit the number of sections, the majority of the subject librarians preferred to keep things closer to what they had been, with a view to moving toward a potential new layout in the future.
Once we determined a literal interpretation of the university website was not usable for our content, we found inspiration for the template body from another section of the university website that was aimed at presenting a lot of dynamic content with multiple sections, but kept the standard luc.edu header. This allowed us to create a page that was recognizably part of Loyola, but presented our LibGuides content in a much more usable form.
The other piece we borrowed from the university website was sticky tabs. This was an attempt to make the tabs more visible and usable based on what we knew from usability testing on the old platform and what users would already know from the university site. Because LibGuides is based on the Bootstrap framework, it was easy to drop this in using the Affix plugin (tutorial on how to use this)1. The tabs are translucent so they don’t obscure content as one scrolls down.
Our final result was much more popular with everyone. It has a subtle background color and border around each box with a section header that stands out but doesn’t overwhelm the content. The tabs are not at all like traditional LibGuides tabs, functioning somewhat more like regular header links.
Over the summer we were not able to conduct usability testing on the new interface due to the tight timeline, so the first step this fall is to integrate it into our regular usability testing schedule to make iterative changes based on user feedback. We also need to continue to audit the page to improve accessibility.
The research guides are one of the most used links on our website (anywhere between 10,000 and 20,000 visits per month), so our top priority was to make sure the migration did not interfere with use – both in terms of patron access and content creation by the subject-area librarians. Thanks to our feedback sessions, good communication with Springshare, and reliable new platform, the migration went smoothly without interruption.
About our guest author: Will Kentis Reference/Instruction and Electronic Resources Librarian and subject specialist for Nursing and Chemistry at Loyola University Chicago. He received his MSLIS from University of Illinois Urbana-Champaign in 2011 with a certificate in Community Informatics.
You may remember that in the Bootstrap Responsibly post Michael suggested it wasn’t necessary to use this, but it is the most straightforward way in LibGuides 2.0 ↩
In this interview, part of the Insights Interview series, FADGI talks with Dave Rice and Devon Landes about the QCTools project.
In a previous blog post, I interviewed Hannah Frost and Jenny Brice about the AV Artifact Atlas, one of the components of Quality Control Tools for Video Preservation, an NEH-funded project which seeks to design and make available community oriented products to reduce the time and effort it takes to perform high-quality video preservation. The less “eyes on” time it takes to do QC work, the more time can be redirected towards quality control and assessment of video on the digitized content most deserving of attention.
QCTools’ Devon Landes
In this blog post, I interview archivists and software developers Dave Rice and Devon Landes about the latest release version of the QCTools, an open source software toolset to facilitate accurate and efficient assessment of media integrity throughout the archival digitization process.
Kate: How did the QCTools project come about?
Devon: There was a recognized need for accessible & affordable tools out there to help archivists, curators, preservationists, etc. in this space. As you mention above, manual quality control work is extremely labor and resource intensive but a necessary part of the preservation process. While there are tools out there, they tend to be geared toward (and priced for) the broadcast television industry, making them out of reach for most non-profit organizations. Additionally, quality control work requires a certain skill set and expertise. Our aim was twofold: to build a tool that was free/open source, but also one that could be used by specialists and non-specialists alike.
QCTools’ Dave Rice
Dave: Over the last few years a lot of building blocks for this project were coming in place. Bay Area Video Coalition had been researching and gathering samples of digitization issues through the A/V Artifact Atlas project and meanwhile FFmpeg had made substantial developments in their audiovisual filtering library. Additionally, open source technology for archival and preservation applications has been finding more development, application, and funding. Lastly, the urgency related to the obsolescence issues surrounding analog video and lower costs for digital video management meant that more organizations were starting their own preservation projects for analog video and creating a greater need for an open source response to quality control issues. In 2013, the National Endowment for the Humanities awarded BAVC with a Preservation and Access Research and Development grant to develop QCTools.
Kate: Tell us what’s new in this release. Are you pretty much sticking to the plan or have you made adjustments based on user feedback that you didn’t foresee? How has the pilot testing influenced the products?
QCTools provides many playback filters. Here the left window shows a frame with the two fields presented separately (revealing the lack of chroma data in field 2). The right window here shows the V plane of the video per field to show what data the deck is providing.
Devon: The users’ perspective is really important to us and being responsive to their feedback is something we’ve tried to prioritize. We’ve had several user-focused training sessions and workshops which have helped guide and inform our development process. Certain processing filters were added or removed in response to user feedback; obviously UI and navigability issues were informed by our testers. We’ve also established a GitHub issue tracker to capture user feedback which has been pretty active since the latest release and has been really illuminating in terms of what people are finding useful or problematic, etc.
The newest release has quite a few optimizations to improve speed and responsiveness, some additional playback & viewing options, better documentation and support for the creation of an xml-format report.
Dave: The most substantial example of going ‘off plan’ was the incorporation of video playback. Initially the grant application focused on QCTools as a purely analytical tool which would assess and present quantifications of video metrics via graphs and data visualization. Initial work delved deeply into identifying methodology to use to pick out the right metrics to find what could be unnatural to digitized analog video (such as pixels too dissimilar from their temporal neighbors, or the near-exact repetition of pixel rows, or discrepancies in the rate of change over time between the two video fields). When presenting the earliest prototypes of QCTools to users a recurring question was “How can I see the video?” We redesigned the project so that QCTools would present the video alongside the metrics along with various scopes, meters and visual tools so that now it has a visual and an analytic side.
Here the QCTools vectorscope shows a burst of illegal color values. With the QCTools display of plotted graphs this corresponds to a spike in the maximum saturation (SATMAX).
Devon:Bay Area Video Coalition connected us with a group of testers from various backgrounds and professional environments so we’ve been able to tap into a pretty varied community in that sense. Also, their A/V Artifact Atlas has also been an important resource for us and was really the starting point from which QCTools was born.
Dave: This project would not at all be feasible without the existing work of FFmpeg. QCTools utilizes FFmpeg for all decoding, playback, metadata expression and visual analytics. The QCTools data format is an expression of FFmpeg’s ffprobe schema, which appeared to be one of the only audiovisual file format standards that could efficiently store masses of frame-based metadata.
Kate: What are the plans for training and documentation on how to use the product(s)?
Devon: We want the documentation to speak to a wide range of backgrounds and expertise, but it is a challenge to do that and as such it is an ongoing process. We had a really helpful session during one of our tester retreats where users directly and collaboratively made comments and suggestions to the documentation; because of the breadth of their experience it really helped to illuminate gaps and areas for improvement on our end. We hope to continue that kind of engagement with users and also offer them a place to interact more directly with each other via a discussion page or wiki. We’ve also talked about the possibility of recording some training videos and hope to better incorporate the A/V Artifact Atlas as a source of reference in the next release.
Kate: What’s next for QCTools?
Dave: We’re presenting the next release of QCTools at the Association of Moving Image Archivists Annual Meeting on October 9th for which we anticipate supporting better summarization of digitization issues per file in a comparative manner. After AMIA, we’ll focus on audio and the incorporation of audio metrics via FFmpeg’s EBUr128 filter. QCTools has been integrated into workflows at BAVC, Dance Heritage Coalition, MOMA, Anthology Film Archives and Die Osterreichische Mediathek so the QCTools issue tracker has been filling up with suggestions which we’ll be tackling in the upcoming months.
The Open Definition performs an essential function as a “standard”, ensuring that when you say “open data” and I say “open data” we both mean the same thing. This standardization, in turn, ensures the quality, compatibility and simplicity essential to realizing one of the main practical benefits of “openness”: the greatly increased ability to combine different datasets together to drive innovation, insight and change.
Quality: open data should mean the freedom for anyone to access, modify and share that data. However, without a well-defined standard detailing what that means we could quickly see “open” being diluted as lots of people claim their data is “open” without actually providing the essential freedoms (for example, claiming data is open but actually requiring payment for commercial use). In this sense the Open Definition is about “quality control”.
Simplicity: a big promise of open data is simplicity and ease of use. This is not just in the sense of not having to pay for the data itself, its about not having to hire a lawyer to read the license or contract, not having to think about what you can and can’t do and what it means for, say, your business or for your research. A clear, agreed definition ensures that you do not have to worry about complex limitations on how you can use and share open data.
Let’s flesh these out in a bit more detail:
Quality Control (avoiding “open-washing” and “dilution” of open)
A key promise of open data is that it can freely accessed and used. Without a clear definition of what exactly that means (e.g. used by whom, for what purpose) there is a risk of dilution especially as open data is attractive for data users. For example, you could quickly find people putting out what they call “open data” but only non-commercial organizations can access the data freely.
Thus, without good quality control we risk devaluing open data as a term and concept, as well as excluding key participants and fracturing the community (as we end up with competing and incompatible sets of “open” data).
A single piece of data on its own is rarely useful. Instead data becomes useful when connected or intermixed with other data. If I want to know about the risk of my home getting flooded I need to have geographic data about where my house is located relative to the river and I need to know how often the river floods (and how much).
That’s why “open data”, as defined by the Open Definition, isn’t just about the freedom to access a piece of data, but also about the freedom connect or intermix that dataset with others.
Unfortunately, we cannot take compatibility for granted. Without a standard like the Open Definition it becomes impossible to know if your “open” is the same as my “open”. This means, in turn, that we cannot know whether it’s OK to connect (or mix) your open data and my open data together (without consulting lawyers!) – and it may turn out that we can’t because your open data license is incompatible with my open data license.
Think of power sockets around the world. Imagine if every electrical device had a different plug and needed a different power socket. When I came over to your house I’d need to bring an adapter! Thanks to standardization at least in a given country power-sockets are almost always the same – so I bring my laptop over to your house without a problem. However, when you travel abroad you may have to take adapter with you. What drives this is standardization (or its lack): within your own country everyone has standardized on the same socket type but different countries may not share a standard and hence you need to get an adapter (or run out of power!).
For open data, the risk of incompatibility is growing as more open data is released and more and more open data publishers such as governments write their own “open data licenses” (with the potential for these different licenses to be mutually incompatible).
The Open Definition helps prevent incompatibility by:
Setting out a set of clear principles that every open data license should conform to (not by mandating one single license – or even specific license terms)
The Evergreen project will participate in the Outreach Program for Women, a program organized through the GNOME Foundation to improve gender diversity in Free and Open Source Software projects.
The Executive Oversight Board voted last month to fund one internship through the program. The intern will work on a project for the community from December 9, 2014 to March 9, 2015. The Evergreen community has identified five possible projects for the internship: three are software development projects, one is a documentation project, and one is a user experience project.
Candidates for the program have started asking questions in IRC and on the mailing list as they prepare to submit their applications, which are due on October 22, 2014. They will also be looking for feedback on their ideas. Please take the opportunity to share your thoughts with them on these ideas since it will help strengthen their application.
If you are an OPW candidate trying to decide on a project, take some time to stop into the #evergreen IRC channel to learn about our project and to get to know the people responsible for the care and feeding of Evergreen. We are an active and welcoming community that includes not only developers, but the sys admins and librarians who use Evergreen on a daily basis.
To get started, read through the Learning About Evergreen section of our OPW page. Try Evergreen out on one of our community demo servers, read through the documentation, and sign up for our mailing lists to learn more about the community. If you are planning to apply for a coding project, take some time to download and install Evergreen. Each project has an application requirement that you should do before submitting the application. Please take time to review that application requirement and find some way you can contribute to the project.
We look forward to working with you on the project!
From federal funding to support for school librarians to net neutrality, 2015 will be a critical year for federal policies that impact libraries. We need to be working now to build the political relationships necessary to make sure these decisions benefit our community. Fortunately, the November elections provide a great opportunity to do so.
In a new free webinar hosted by the American Library Association (ALA) and Advocacy Guru Stephanie Vance, leaders will discuss how all types of library supporters can legally engage during an election season, as well as what types of activities will have the most impact. Webinar participants will learn 10 quick and easy tactics, from social media to candidate forums that will help you take action right away. If you want to help protecting our library resources in 2015 and beyond, then this is the session for you. Register now as space is limited.
Before you read this post, be aware that this web page is sharing your usage with Google, Facebook, StatCounter.com, unglue.it and Harlequin.com. Google because this is Blogger. Facebook because there's a "Like" button, StatCounter because I use it to measure usage, and Harlequin because I embedded the cover for Rebecca Avery's Maid to Crave directly from Harlequin's website. Harlequin's web server has been sent the address of this page along with you IP address as part of the HTTP transaction that fetches the image, which, to be clear, is not a picture of me.
I'm pretty sure that having read the first paragraph, you're now able to give informed consent if I try to sell you a book (see unglue.it embed -->) and constitute myself as a book service for the purposes of a New Jersey "Reader Privacy Act", currently awaiting Governor Christie's signature. That act would make it unlawful to share information about your book use (borrowing, downloading, buying, reading, etc.) with a third party, in the absence of a court order to do so. That's good for your reading privacy, but a real problem for almost anyone running a commercial "book service".
Let's use Maid to Crave as an example. When you click on the link, your browser first sends a request to Harlequin.com. Using the instructions in the returned HTML, it then sends requests to a bunch of web servers to build the web page, complete with images, reviews and buy links. Here's the list of hosts contacted as my browser builds that page:
seal.verisign.com (A security company)
www.goodreads.com (The review comes from GoodReads. They're owned by Amazon.)
stats.g.doubleclick.net (Doubleclick is an advertising network owned by Google)
cdn.gigya.com (Gigya’s Consumer Identity Management platform helps businesses identify consumers across any device, achieve a single customer view by collecting and consolidating profile and activity data, and tap into first-party data to reach customers with more personalized marketing messaging.)
www.facebook.com (I'm told this is a social network)
fbstatic-a.akamaihd.net (Akamai is here helping to distribute facebook content)
platform.twitter.com (yet another social network)
edge.quantserve.com (QuantCast is an "audience research and behavioural advertising company")
All of these servers are given my IP address and the URL of the Harlequin page that I'm viewing. All of these companies except Verisign, Norton and Akamai also set tracking cookies that enable them to connect my browsing of the Harlequin site with my activity all over the web. The Guardian has a nice overview of these companies that track your use of the web. Most of them exist to better target ads at you. So don't be surprised if, once you've visited Harlequin, Amazon tries to sell you romance novels.
Certainly Harlequin qualifies as a commercial book service under the New Jersey law. And certainly Harlequin is giving personal information (IP addresses are personal information under the law) to a bunch of private entities without a court order. And most certainly it is doing so without informed consent. So its website is doing things that will be unlawful under the New Jersey law.
But it's not alone. Almost any online bookseller uses services like those used by Harlequin. Even Amazon, which is pretty much self contained, has to send your personal information to Ingram to fulfill many of the book orders sent to it. Under the New Jersey law, it appears that Amazon will need to get your informed consent to have Ingram send you a book. And really, do I care? Does this improve my reading privacy?
The companies that can ignore this law are Apple, Target, Walmart and the like. Book services are exempt if they derive less than 2% of their US consumer revenue from books. So yay Apple.
Lord knows we need some basic rules about privacy of our reading behavior. But I think the New Jersey law does a lousy job of dealing with the realities of today's internet. I wonder if we'll ever start a real discussion about what and when things should be private on the web.
This special rate is intended for a limited number of graduate students enrolled in ALA accredited programs. In exchange for a discounted registration, students will assist the LITA organizers and the Forum presenters with on-site operations. This year’s theme is “Transformation: From Node to Network.” We are anticipating an attendance of 300 decision makers and implementers of new information technologies in libraries.
The selected students will be expected to attend the full LITA National Forum, Thursday noon through Saturday noon. This does not include the pre-conferences on Thursday and Friday. You will be assigned a variety of duties, but you will be able to attend the Forum programs, which include 3 keynote sessions, 30 concurrent sessions, and a dozen poster presentations.
The special student rate is $180 – half the regular registration rate for LITA members. This rate includes a Friday night reception at the hotel, continental breakfasts, and Saturday lunch. To get this rate you must apply and be accepted per below.
To apply for the student registration rate, please provide the following information:
Complete contact information including email address,
The name of the school you are attending, and
150 word (or less) statement on why you want to attend the 2014 LITA Forum
Please send this information no later than October 6, 2014 to email@example.com, with “2014 LITA Forum Student Registration Request” in the subject line.
Those selected for the student rate will be notified no later than October 10, 2014.
The following post was authored by Erin Engle, Michelle Gallinger, Butch Lazorchak, Jane Mandelbaum and Trevor Owens from the Library of Congress.
The Library of Congress held the 10th annual Designing Storage Architectures for Digital Collections meeting September 22-23, 2014. This meeting is an annual opportunity for invited technical industry experts, IT professionals, digital collections and strategic planning staff and digital preservation practitioners to discuss the challenges of digital storage and to help inform decision-making in the future. Participants come from a variety of government agencies, cultural heritage institutions and academic and research organizations.
The DSA Meeting. Photo credit: Peter Krogh/DAM Useful Publishing.
Throughout the two days of the meeting the speakers took the participants back in time and then forward again. The meeting kicked-off with a review of the origins of the DSA meeting. It started ten years ago with a gathering of Library of Congress and external experts who discussed requirements for digital storage architectures for the Library’s Packard Campus of the National Audio-Visual Conservation Center. Now, ten years later, the speakers included representatives from Facebook and Amazon Web Services, both of which manage significant amounts of content and neither of which existed in 2004 when the DSA meeting started.
The theme of time passing continued with presentations by strategic technical experts from the storage industry who began with an overview of the capacity and cost trends in storage media over the past years. Two of the storage media being tracked weren’t on anyone’s radar in 2004, but loom large for the future – flash memory and Blu-ray disks. Moving from the past quickly to the future, the experts then offered predictions, with the caveat that predictions beyond a few years are predictably unpredictable in the storage world.
Another facet of time – “back to the future” – came up in a series of discussions on the emergence of object storage in up-and-coming hardware and software products. With object storage, hardware and software can deal with data objects (like files), rather than physical blocks of data. This is a concept familiar to those in the digital curation world, and it turns out that it was also familiar to long-time experts in the computer architecture world, because the original design for this was done ten years ago. Here are some of the key meeting presentations on object storage:
Several speakers talked about the impact of the passage of time on existing digital storage collections in their institutions and the need to perform migrations of content from one set of hardware or software to another as time passes. The lessons of this were made particularly vivid by one speaker’s analogy, which compared the process to the travails of someone trying to manage the physical contents of a car over one’s lifetime.
Even more vivid was the “Cost of Inaction” calculator, which provides black-and-white evidence of the costs of not preserving analog media over time, starting with the undeniable fact that you have to start with an actual date in the future for the “doomsday” when all your analog media will be unreadable.
The DSA Meeting. Photo Credit: Trevor Owens
Several persistent time-related themes engaged the participants in lively interactive discussions during the meeting. One topic was the practical methods for checking the data integrity of content in digital collections. This concept, called fixity, has been a common topic of interest in the digital preservation community. Similarly, a thread of discussion on predicting and dealing with failure and data loss over time touched on a number of interesting concepts, including “anti-entropy,” a type of computer “gossip” protocol designed to query, detect and correct damaged distributed digital files. Participants agreed it would be useful to find a practical approach to identifying and quantifying types of failures. Are the failures relatively regular but small enough that the content can be reconstructed? Or are the data failures highly irregular but catastrophic in nature?
Another common theme that arose is how to test and predict the lifetime of storage media. For example, how would one test the lifetime of media projected to last 1000 years without having a time-travel machine available? Participants agreed to continue the discussions of these themes over the next year with the goal of developing practical requirements for communication with storage and service providers.
The meeting closed with presentations from vendors working on the cutting edge of new archival media technologies. One speaker dealt with questions about the lifetime of media by serenading the group with accompaniment from a 32-year-old audio CD copy of Pink Floyd’s “Dark Side of the Moon.” The song “Us and Them” underscored how the DSA meeting strives to bridge the boundaries placed between IT conceptions of storage systems and architectures and the practices, perspectives and values of storage and preservation in the cultural heritage sector. The song playing back from three decade old media on a contemporary device was a fitting symbol of the objectives of the meeting.
In 2012 the Open Knowledge launched the Global Open Data Index to help track the state of open data around the world. We’re now in the process of collecting submissions for the 2014 Open Data Index and we want your help!
How can you contribute?
The main thing you can do is become a Contributor and add information about the state of open data in your country to the Open Data Index Survey. More details and quickstart guide to contributing here »
We also have other ways you can help:
Become a Mentor: Mentors support the Index in a variety of ways from engaging new contributors, mentoring them and generally promoting the Index in their community. Activities can include running short virtual “office hours” to support and advise other contributors, promoting the Index with civil society organizations – blogging, tweeting etc. To apply to be a Mentor, please fill in this form.
Become a Reviewer: Reviewers are specially selected experts who review submissions and check them to ensure information is accurate and up-to-date and that the Index is generally of high-quality. To apply to be a Reviewer, fill in this form.
For twitter, keep an eye on updates via #openindex2014
Key dates for your calendar
We will kick off on September 30th, in Mexico City with a virtual and in-situ event at Abre LATAM and ConDatos (including LATAM regional skillshare meeting!). Keep an eye on Twitter to find out more details at #openindex14 and tune into these regional sprints:
Europe / MENA / Africa (October 8-10) – with a regional Google Hangout on 9/10.
Asia / Pacific (October 13-15) – with a regional Google Hangout on 13/10.
All day virtual event to wrap-up (October 17)
More on this to follow shortly, keep an eye on this space.
Why the Open Data Index?
The last few years has seen an explosion of activity around open data and especially open government data. Following initiatives like data.gov and data.gov.uk, numerous local, regional and national bodies have started open government data initiatives and created open data portals (from a handful three years ago there are now nearly 400 open data portals worldwide).
But simply putting a few spreadsheets online under an open license is obviously not enough. Doing open government data well depends on releasing key datasets in the right way.
Moreover, with the proliferation of sites it has become increasingly hard to track what is
happening: which countries, or municipalities, are actually releasing open data and which aren’t? Which countries are releasing data that matters? Which countries are releasing data in the right way and in a timely way?
The Global Open Data Index was created to answer these sorts of questions, providing an up-to-date and reliable guide to the state of global open data for policy-makers, researchers, journalists, activists and citizens.
The first initiative of its kind, the Global Open Data Index is regularly updated and provides the most comprehensive snapshot available of the global state of open data. The Index is underpinned by a detailed annual survey of the state of open data run by Open Knowledge in collaboration with open data experts and communities around the world.
The American Library Association (ALA) today announced the launch of “Progress in the Making,” (pdf) a new educational campaign that will explore the public policy opportunities and challenges of 3D printer adoption by libraries. Today, the association released “Progress in the Making: An Introduction to 3D Printing and Public Policy,” a tip sheet that provides an overview of 3D printing, describes a number of ways libraries are currently using 3D printers, outlines the legal implications of providing the technology, and details ways that libraries can implement simple yet protective 3D printing policies in their own libraries.
“As the percentage of the nation’s libraries helping their patrons create new objects and structures with 3D printers continues to increase, the legal implications for offering the high-tech service in the copyright, patent, design and trade realms continues to grow as well,” said Alan S. Inouye, director of the ALA Office for Information Technology Policy. “We have reached a point in the evolution of 3D printing services where libraries need to consider developing user policies that support the library mission to make information available to the public. If the library community promotes practices that are smart and encourage creativity, it has a real chance to guide the direction of the public policy that takes shape around 3D printing in the coming years.”
Over the next coming months, ALA will release a white paper and a series of tip sheets that will help the library community better understand and adapt to the growth of 3D printers, specifically as the new technology relates to intellectual property law and individual liberties.
This tip sheet is the product of collaboration between the Public Library Association (PLA), the ALA Office for Information Technology Policy (OITP) and United for Libraries, and coordinated by OITP Information Policy Analyst Charlie Wapner. View the tip sheet (pdf).
If you are writing configuration to take a pattern to match against files in a file system…
You probably want Dir.globs, not regexes. Dir.glob is in the stdlib. Dir.glob’s unix-shell-style patterns are less expressive than regexes, but probably expressive enough for anything you need in this use case, and much simpler to deal with for common patterns in this use case.
…I don’t even feel like thinking about how to express as a regexp that you don’t want child directories, but only directly there.
Dir.glob will find matches from within a directory on local file system — but if you have a certain filepath in a string you want to test for a match against a dirglob, you can easily do that too with Pathname.fnmatch, which does not even require the string to exist in the local file system but can still check it for a match against a dirglob.
I presented a version of this talk at the Supporting Cultural Heritage Open Source Software (SCHOSS) Symposium in Atlanta, GA in September 2014. This talk was generously sponsored by LYRASIS and the Andrew Mellon Foundation.
I often feel like an Open Source failure.
I haven’t submitted 500 patches in my free time, I don’t spend my after-work hours rating html5 apps, and I was certainly not a 14 year old Linux user. Unlike the incredible group of teenaged boys with whom I write my Mozilla Communities newsletter and hang out with on IRC, I spent most of my time online at that age chatting with friends on AOL Instant Messenger and doing my homework.
I am a very poor programmer. My Wikipedia contributions are pretty sad. I sometimes use Powerpoint. I never donated my time to Open Source in the traditional sense until I started at Mozilla as a GNOME OPW intern and while the idea of data gets me excited, the thought of spending hours cleaning it is another story.
I was feeling this way the other day and chatting with a friend about how reading celebrity news often feels like a better choice after work than trying to find a new open source project to contribute to or making edits to Wikipedia. A few minutes later, a message popped up in my inbox from an old friend asking me to help him with his application to library school.
I dug up my statement of purpose and I was extremely heartened to read my words from three years ago:
I am particularly interested in the interaction between libraries and open source technology… I am interested in innovative use of physical and virtual space and democratic archival curation, providing free access to primary sources.
It felt good to know that I have always been interested in these topics but I didn’t know what that would look like until I discovered my place in the open source community. I feel like for many of us in the cultural heritage sector the lack of clarity about where we fit in is a major blocker, and I do think it can be associated with contribution to open source more generally. Douglas Atkin, Community Manager at Airbnb, claims that the two main questions people have when joining a community are “Are they like me? And will they like me?”. Of course, joining a community is a lot more complicated than that, but the lack of visibility of open source projects in the cultural heritage sector can make even locating a project a whole lot more complicated.
As we’ve discussed in this working group, the ethics of cultural heritage and Open Source overlap considerably and
the open source community considers those in the cultural heritage sector to be natural allies.
In his article, “Who are you empowering?” Hugh Rundle writes: (I quote this article all the time because I believe it’s one of the best articles written about library tech recently…)
A simple measure that improves privacy and security and saves money is to use open source software instead of proprietary software on public PCs.
Community-driven, non-profit, and not good at making money are just some of the attributes that most cultural heritage organizations and open source project have in common, and yet, when choosing software for their patrons, most libraries and cultural heritage organizations choose proprietary systems and cultural heritage professionals are not the strongest open source contributors or advocates.
The main reasons for this are, in my opinion:
1. Many people in cultural heritage don’t know what Open Source is.
In a recent survey I ran of the Code4Lib and UNC SILS listservs, nearly every person surveyed could accurately respond to the prompt “Define Open Source in one sentence” though the responses varied from community-based answers to answers solely about the source code.
My sample was biased toward programmers and young people (and perhaps people who knew how to use Google because many of the answers were directly lifted from the first line of the Wikipedia article about Open Source, which is definitely survey bias,) but I think that it is indicative of one of the larger questions of open source.
Is open source about the community, or is it about the source code?
Many people, librarians and otherwise, will ask: (I would argue most, but I am operating on anecdotal evidence)
Why should we care about whether or not the code is open if we can’t edit it anyway? We just send our problems to the IT department and they fix it.
Many people in cultural heritage don’t have many feelings about open source because they simply don’t know what it is and cannot articulate the value of one over the other. Proprietary systems don’t advertise as proprietary, but open source constantly advertises as open source, and as I’ll get to later, proprietary systems have cornered the market.
This movement from darkness to clarity brings most to mind a story that Kathy Lussier told about the Evergreen project, where librarians who didn’t consider themselves “techy” jumped into IRC to tentatively ask a technical question and due to the friendliness of the Evergreen community, soon they were writing the documentation for the software themselves and were a vital part of their community, participating in conferences and growing their skills as contributors.
In this story, the Open Source community engaged the user and taught her the valuable skill of technical documentation. She also took control of the software she uses daily and was able to maintain and suggest features that she wanted to see. This situation was really a win-win all around.
What institution doesn’t want to see their staff so well trained on a system that they can write the documentation for it?
2. The majority of the market share in cultural heritage is closed-source, closed-access software and they are way better at advertising than Open Source companies.
Last year, my very wonderful boss in the cataloging and metadata department of the University of North Carolina at Chapel Hill came back from ALA Midwinter with goodies for me: pens and keychains and postits and tote bags and those cute little staplers. “I only took things from vendors we use,” she told me.
Similarly, free, open source systems for cultural heritage are unfortunately not a high percentage of the American market. Wikipedia has a great list of proprietary and open source ILSs and OPACs, the languages they’re written in, and their cost. Marshall Breeding writes that FOSS software is picking up some market share, but it is still “the alternative” for most cultural heritage organizations.
There are so many reasons for this small market share, but I would argue (as my previous anecdote did for me,) that a lot of it has to do with the fact that these proprietary vendors have much more money and are therefore a lot better at marketing to people in cultural heritage who are very focused on their work. We just want to be able to install the thing and then have it do the thing well enough. (An article in Library Journal in 2011 describes open source software as: “A lot of work, but a lot of control.”)
As Jack Reed from Stanford and others have pointed out, most of the cost of FOSS in cultural heritage is developer time, and many cultural heritage institutions believe that they don’t have those resources. (John Brice’s example at the Meadville Public Library proves that communities can come together with limited developers and resources in order to maintain vital and robust open source infrastructures as well as significantly cut costs.)
The academic publishing model is, for more reasons than one, completely antithetical to the ethics of cultural heritage work, and yet they maintain a large portion of the cultural heritage market share in terms of both knowledge acquisition and software. Megan Forbes reminds us that the platform Collection Space was founded as the alternative to the market dominance of “several large, commercial vendors” and that cost put them “out of reach for most small and mid-sized institutions.”
Open source has the chance to reverse this vicious cycle, but institutions have to put their resources in people in order to grow.
While certain companies like OCLC are working toward a more equitable future, with caveats of course, I would argue that the majority of proprietary cultural heritage systems are providing inferior product to a resource poor community.
3. People are tired and overworked, particularly in libraries, and to compound that, they don’t think they have the skills to contribute.
These are two separate issues, but they’re not entirely disparate so I am going to tackle them together.
There’s this conception outside of the library world that librarians are secret coders just waiting to emerge from their shells and start categorizing datatypes instead of MARC records (this is perhaps a misconception due to a lot of things, including the sheer diversity of types of jobs that people in cultural heritage fill, but hear me out.)
Learning to program computers takes time and instruction and while programs like Women who Code and Girl Develop It can begin educating librarians, we’re still faced with a workforce that’s over 80% female-identified that learned only proprietary systems in their work and a small number of technology skills in their MLIS degrees.
Library jobs, and further, cultural heritage jobs are dwindling. Many trained librarians, art historians, and archivists are working from grant to grant on low salaries with little security and massive amounts of student loans from both undergraduate and graduate school educations. If they’re lucky to get a job, watching television or doing the loads of professional development work they’re expected to do in their free time seems a much better choice after work than continuing to stare at a computer screen for a work-related task or learn something completely new. For reference: an entry-level computer programmer can expect to make over $70,000 per year on average. An entry-level librarian? Under $40,000. I know plenty of people in cultural heritage who have taken two jobs or jobs they hate just to make ends meet, and I am sure you do too.
One can easily say, “Contributing to open source teaches new skills!” but if you don’t know how to make non-code contributions or the project is not set up to accept those kinds of contributions, you don’t see an immediate pay-off in being involved with this project, and you are probably not willing to stay up all night learning to code when you have to be at work the next day or raise a family. Programs like Software Carpentry have proven that librarians, teachers, scientists, and other non-computer scientists are willing to put in that time and grow their skills, so to make any kind of claim without research would be a reach and possibly erroneous, but I would argue that most cultural heritage organizations are not set up in a way to nurture their employees for this kind of professional development. (Not because they don’t want to, necessarily, but because they feel they can’t or they don’t see the immediate value in it.)
In addition, many open source projects operate with a “patches welcome!” or “go ahead, jump in!” or “We don’t need a code of conduct because we’re all nice guys here!” mindset, which is not helpful to beginning coders, women, or really, anyone outside of a few open source fanatics.
I’ve identified a lot of problems, but the title of this talk is “Creating the Conditions for Open Source Community” and I would be remiss if I didn’t talk about what works.
Diversification, both in terms of types of tasks and types of people and skillsets as well as a clear invitation to get involved are two absolute conditions for a healthy open source community.
As communities grow, it’s important to be able to recognize and support contributors in ways that feel meaningful. That could be a trip to a conference they want to attend, a Linkedin recommendation, a professional badge, or a reference, or best yet: you could ask them what they want. Our network for contributors and staff is adding a “preferred recognition” system. Don’t know what I want? Check out my social profile. (The answer is usually chocolate, but I’m easy.)
Finding diverse contribution opportunities has been difficult for open source since, well, the beginning of open source. Even for us at Mozilla, with our highly diverse international community and hundreds of ways to get involved, we often struggle to bring a diversity of voices into the conversation, and to find meaningful pathways and recognition systems for our 10,000 contributors.
In my mind, education is perhaps the most important part of bringing in first-time contributors. Organizations like Open Hatch and Software Carpentry provide low-cost, high-value workshops for new contributors to locate and become a part of Open Source in a meaningful and sustained manner. Our Webmaker program introduces technical skills in a dynamic and exciting way for every age.
Mentorship is the last very important aspect of creating the conditions for participation. Having a friend or a buddy or a champion from the beginning is perhaps the greatest motivator according to research from a variety of different papers. Personal connection runs deep, and is a major indicator for community health. I’d like to bring mentorship into our conversation today and I hope that we can explore that in greater depth in the next few hours.
With mentorship and 1:1 connection, you may not see an immediate uptick in your project’s contributions, but a friend tells a friend tells a friend and then eventually you have a small army of motivated cultural heritage workers looking to take back their knowledge.
You too can achieve on-the-ground action. You are the change you wish to see.
A few months ago, my colleague decided to create a module and project around updating the Mozilla Wiki, a long-ignored, frequently used, and under-resourced part of our organization. As an information scientist and former archivist, I was psyched. The space that I called Mozilla’s collective memory was being revived!
We started meeting in April and it became clear that there were other wiki-fanatics in the organization who had been waiting for this opportunity to come up. People throughout the organization were psyched to be a part of it. In August, we held a fantastically successful workweek in London, reskinned the wiki, created a regular release cycle, wrote a manual and a best practice guide, and are still going strong with half contributors and half paid-staff as a regular working group within the organization. Our work has been generally lauded throughout the project, and we’re working hard to make our wiki the resource it can be for contributors and staff.
To me, that was the magic of open source. I met some of my best friends, and at the end of the week, we were a cohesive unit moving forward to share knowledge through our organization and beyond. And isn’t that a basic value of cultural heritage work?
I am still an open source failure. I am not a code fanatic, and I like the ease-of-use of my used IPhone. I don’t listen to techno and write Javscript all night, and I would generally rather read a book than go to a hackathon.
And despite all this, I still feel like I’ve found my community.
I am involved with open source because I am ethically committed to it, because I want to educate my community of practice and my local community about what working open can bring to them.
When people ask me how I got involved with open source, my answer is: I had a great mentor, an incredible community and contributor base, and there are many ways to get involved in open source.
While this may feel like a new frontier for cultural heritage, I know we can do more and do better.
Open up your work as much as you can. Draw on the many, many intelligent people doing work in the field. Educate yourself and others about the value that open source can bring to your institution. Mentor someone new, even if you’re shy. Connect with the community and treat your fellow contributors with respect.Who knows?
You may get an open source failure like me to contribute to your project.
Join us for our next installment of CopyTalk, October 2nd at 2pm Eastern Time. It’s FREE.
In the webinar titled Open Licensing and the Public Domain: Tools and policies to support libraries, scholars, and the public, Timothy will discuss the Creative Commons (CC) licenses and public domain instruments, with a particular focus on how these tools are being used within the GLAM (galleries, libraries, archives and museums) sector. He’ll also talk about the evolving Open Access movement–including legal and technological challenges to researchers and publishers–and how librarians and copyright experts are helping address these issues. Finally, he’ll discuss the increasing role of institutional policies and funding mandates that are being adopted to support the creation and sharing of content and data in the public commons.
Timothy Vollmer is Public Policy Manager for Creative Commons. He coordinates public policy positions in collaboration with CC staff, international affiliate network, and a broad community of copyright experts. Timothy helps educate policymakers at all levels and across various disciplines such as education, data, science, culture, and government about copyright licensing, the public domain, and the adoption of open policies. Prior to CC, Timothy worked on information policy issues for the American Library Association in Washington, D.C. He is a graduate of the University of Michigan, School of Information, and helped establish the Open.Michigan initiative.
One-on-one technology help is one of the greatest services offered by the modern public library. Our ability to provide free assistance without an underlying agenda to sell a product puts us in a unique and valuable position in our communities. While one-on-one sessions are one of my favorite job duties, I must admit that they can also be the most frustrating, primarily because of passwords. It is rare that I assist a patron and we don’t encounter a forgotten password, if not several. Trying to guess the password or resetting it usually eats up most of our time. I wish that I were writing this post as an authority on how to conquer the war on passwords, but I fear that we’re losing the battle. One day we’ll look back and laugh at the time we wasted trying to guess our passwords; resetting them again and again, but it’s been 10 years since Bill Gates predicted the death of the password, so I’m not holding my breath.
The latest answer to this dilemma is password managers like Dashlane and Last Pass. These are viable solutions for some, but the majority of the patrons I work with have little experience with technology and a password manager is simply too overwhelming.
I’ve been thinking a lot about passwords lately; I’ve read countless articles about how to manage passwords, and I don’t think there’s an easy answer. That said, I think that the best thing librarians can do is change our attitude about passwords in general. Instead of considering them to be annoyances we should view them as tools. Passwords should empower us, not annoy us. Passwords are our first line of defense against hackers. If we want to protect the content we create, it’s our responsibility to create and manage strong passwords. This is exactly the perspective we should share with our patrons. Instead of griping about patrons who don’t know their email passwords, we should take this opportunity to educate our patrons. We should view this encounter as a chance to stop patrons from using one password across all of their accounts or God forbid, using 123456 as their password.
If a patron walks away from a one-on-one help session with nothing more than a stronger account password and a slightly better understanding of online security, then that is a victory for the librarian.
What’s your take on the password dilemma? Do you have any suggestions for working with patrons in one-on-one situations? Please share your thoughts in the comments.
Recent outbreaks across the globe and in the U.S. have increased public awareness of the potential public health impacts of infectious diseases. As a result, many librarians are assisting their patrons in finding credible information sources on topics such as Ebola, Chikungunya and pandemic influenza.
Siobhan Champ-Blackwell is a librarian with the U.S. National Library of Medicine Disaster Information Management Research Center. She selects material to be added to the NLM disaster medicine grey literature data base and is responsible for the Center’s social media efforts. She has over 10 years of experience in providing training on NLM products and resources.
Elizabeth Norton is a librarian with the U.S. National Library of Medicine Disaster Information Management Research Center where she has been working to improve online access to disaster health information for the disaster medicine and public health workforce. She has presented on this topic at national and international association meetings and has provided training on disaster health information resources to first responders, educators, and librarians working with the disaster response and public health preparedness communities.
So I need to talk about something on my mind but blurt it out hastily and therefore with less finesse than I’d prefer. There has been a Recent Unpleasantness in LibraryLand where a librarian sued two other librarians for libel. Normally we are a free-speechy sort of group not inclined to sue one another over Things People Said, but as noted in this post by bossladywrites (another academic library director–we are legion), we are not in normal times. And as Meredith observes in another smart post, it is hard to see the upside of any part of this. Note: I’m not going to discuss the actual details of the lawsuit; I’m more interested in the state of play that got us there. To quote my own tweet:
Not going to wade deeply into #teamharpy except to note that “thought leaders from the library community” are generally not pro-SLAPP.
But first — the context for my run-on sentences and choppy transitions, this being a personal blog and therefore sans an editor to say “stop, stick to topic.” The last two weeks have featured a fender-bender with our Honda where the other driver decided to file a medical claim, presumably for chipping a nail, as you can’t do much damage at 5 mph, even when you are passing on the right and running a stop sign; intense work effort around a mid-year budget adjustment; an “afternoon off” to do homework during which the Most Important Database I needed at that moment was erratic at best; a terrible case of last-minuting by another campus department that should really know better; and the death at home last Saturday of our 18-year-old cat Emma, which included not only the trauma of her departure, but also the mild shame of bargain-shopping for a pet crematorium early last Sunday morning after the first place I called wanted more than I felt would be reasonable for my own cremation.
Now Emma’s ashes are on the shelf with the ashes of Darcy, Dot, and Prada; I am feeling no longer so far behind on homework, though I have a weekend ahead of me that needs to feature less Crazy and more productivity; and I have about 45 minutes before I drive Sandy to a Diabetes Walk, zoom to the Alemany farmer’s market, then settle in for some productive toiling.
It will sound hypocritical for a librarian who has been highly visible for over two decades to say this, but I agree that there is a hyper-rock-stardom afoot in our profession, and I do wonder if bossladywrites isn’t correct that social media is the gasoline over its fire. It does not help when programs designed to help professionals build group project skills have “leader” in the title and become so heavily coveted that librarians publicly gnash teeth and wail if they are not selected, as if their professional lives have been ruined.
It will also sound like the most sour of grapes to say this (not being a Mover & Shaker), and perhaps it is, but there is also a huge element of Shiny in the M&S “award,” which after all is bestowed by an industry magazine and based on a rather casual referral process. There are some well-deserved names mingling with people who are there for reasons such as schmoozing a nomination from another Famous Name (and I know of more than one case of post-nomination regret). Yet being selected for a Library Journal Mover & Shaker automatically labels that person with a gilded status, as I have seen time and again on committees and elsewhere. It’s a magazine, people, not a professional committee.
We own this problem. I have participated in professional activities where it was clear that these titles — and not the performance behind them — fast-tracked librarians for nominations far too premature for their skills. (And no, I am not suggesting the person that brought the suit is an EL–I don’t know that, though I know he was an M&S.) I am familiar with one former EL (not from MPOW!) who will take decades if ever to live up to anything with “leader” in the title, and have watched him get proposed as a candidate for association-wide office–by virtue of being on the magic EL-graduate roster.
Do I think Emerging Leaders is a good program? If I didn’t, I wouldn’t have carved money out of our tiny budget to commit to supporting one at MPOW. Do I think being an EL graduate means you are qualified for just about anything the world might offer, and your poop don’t stink? No, absolutely not. I did not single out one person due to magical sparkly librarian powers; it had a lot more to do with this being a good fit for that librarian at the time, just as I have helped others at MPOW get into leadership programs, research institutes, information-literacy boot camps, and skill-honing committees. It’s just part of my job.
The over-the-top moment for me with EL was the trading cards. Really? Coronets and fanfare for librarians learning project management and group work? Couldn’t we at least wait until their work was done? Of the tens of thousands of librarians in the U.S. alone, less than one hundred become ELs every year. The vast majority of the remainder are “emerging” just fine in their own right; there are great people doing great work that you will never, ever hear of. Why not just give us all trading cards — yes, every damn librarian? And before you conclude KGS Hates EL, keep in mind I have some serious EL street cred, having not only sponsored an EL but also for successfully proposing GLBTRT’s first EL and making a modest founding donation to its effort to boot.
Then there was ALA’s “invitational summit” last spring where fewer than 100 “thought leaders from the library field” gathered to “begin a national conversation.” Good for them, but as one of the uninvited, I could not resist poking mild fun at this on Twitter, partly for its exclusivity and partly because this “national conversation” was invisible to the rest of the world. I was instantly lathered in Righteous Indignation by some of the chosen people who attended — and not even to my (social network) face, but in the worst passive-aggressive librarian style, through “vaguebook” comments on social networks. (And a la Forrest Gump, the person who brought the lawsuit against the two librarians was at this summit, too, though I give the organizers credit for blending interesting outliers along with the usual suspects.) If you take yourself that seriously, you need a readjustment — perhaps something we can discuss if that conversation is ever launched.
I have a particularly bitter taste in my mouth about the absentee rockstar librarian syndrome because I had one job, eons ago, where I succeeded an absentee leader who had been on the conference circuit for several years, and all the queen’s horses couldn’t put that department together again. There were a slew of other things that were going wrong, but above all, the poor place stank of neglect. The mark of a real rock star is the ability to ensure that no one back at the ranch ever has any reason to begrudge you your occasional Shiny Moment. Like the way so many of us learn hard lessons, it gave me pause about my own practices, and caused me to silently beg forgiveness from past organizations for any and all transgressions.
Shiny Syndrome can twist people’s priorities and make the quotidian seem unimportant (along with making them boors at dinner parties, as Meredith recounts). Someone I intensely dislike is attributed with saying that 80 percent of life is showing up, a statement I grudgingly agree is spot-on. When people ask if I would run for some office or serve on some very busy board, or even do a one-off talk across country, I point out that I have a full-time job and am a full-time student (I barely have time to brew beer more than three times a year these days!). But it’s also true that I get a huge amount of satisfaction simply from showing up for work every day, as well as activities that likely sound dull but to me are very exciting, such as shared-print pilots and statewide resource sharing, as well as the interviews I am conducting for a research paper that is part of my doctoral process, a project that has big words like Antecedents in the title but is to me fascinating and rewarding.
I also get a lot of pleasure from professional actions that don’t seem terribly fun, such as pursuing the question of whether there should be a Planning and Budget Assembly, a question that may seem meaningless to some; in fact, at an ALA midwinter social, one Shiny Person belittled me for my actions on PBA to the point where I left the event in tears. Come to think of it, that makes two white men who have belittled me for pursuing the question of PBA, which brings up something Meredith and bossladywrites hint at: the disproportionate number of rockstar librarians who are young, white, and male. They left off age, but I feel that acutely; far too often, “young” is used as a synonym for forward-thinking, tech-savvy, energetic, smart, creative, and showcase-worthy.
I do work in a presentation now and then — and who can complain about being “limited” to the occasional talk in Australia and New Zealand (I like to think “I’m big, really big, in Palmerston North”), though my favorite talk in the last five years was to California’s community college library directors, because they are such a nice group and it was a timely jolt of Vitamin Colleague — but when I do, I end up talking about my work in one way or the other. And one of the most touching moments of my career happened this August when at an event where MPOW acknowledged my Futas Award — something that honors two decades of following Elizabeth Futas’ model of outspoken activism, sometimes at personal risk, sometimes wrongheadedly, sometimes to no effect, but certainly without pause — I realized that some of our faculty thought I was receiving this award for my efforts on behalf of my dear library, as if there were an award for fixing broken bathroom exhaust fans and replacing tables and chairs, activities that along with the doctoral program take up the space where shiny stuff would go. That flash of insight was one of the deepest, purest moments of joy in my professional life. I got to be two people that day: the renegade of my youth, and the macher of my maturity.
Finally, I am now venturing into serious geezer territory, but back in the day, librarians were rock stars for big stuff, like inventing online catalogs, going to jail rather than revealing their patrons’ identities, and desegregating state associations. These days you get your face, if not on the cover of Rolling Stone, as a centerfold in a library magazine, position yourself as a futurist or guru, go ping ping ping all over the social networks, and you’re now at every conference dais. (In private messaging about this topic, I found myself quoting the lyrics from “You’re So Vain.”)
Name recognition has always had its issues (however convenient it is for those of us, like me, who have it). I often comment, and it is not false modesty, that I know some people vote for me for the wrong reasons. I have my areas of competence, but I know that name recognition and living in a state with a large population (as I am wont to do) play a role in my ability to get elected. (Once I get there, I like to think I do well enough, but that is beside the point. A favorite moment of mine, from back when I chaired a state intellectual freedom committee, was a colleague who remarked, clearly surprised, that”you know how to run a meeting!”) And of course, there are rock stars who rock deservedly, and sometimes being outward-facing is just part of the package (and some of us can’t help it — I was that little kid that crazy people walked up to in train stations to gift with hand-knit sweaters, and yes, that really happened). But we seem to have gone into a new space, where a growing percentage of Shiny People are famous for being shiny. It’s not good for us, and it’s not good for them, and it’s terrible for our profession.
There are three components to a functioning Diva system:
The IIP Image Server, a highly optimized image server;
A .json file containing measurement data about the image collection, used by the front-end component to determine the layout of the viewer;
This is the fourth and final post in our software development practices series. In our most recent post we discussed how Acceptance Criteria could be used to encapsulate the details of the user experience that the system should provide. This week we'll talk about how developers can use tests to determine whether or not the system is satisfying the Acceptance Criteria.
Last week, I had the pleasure of attending a dinner to honor the National Student Poets. Each year, the National Student Poets Program recognizes five extraordinary high school students, who receive college scholarships and opportunities to present their work at writing and poetry events across the country—which includes events at libraries.
To qualify for the National Student Poets Program, one must demonstrate excellence in poetry, provide evidence that they received prior awards for their work, and successfully navigate a multi-level selection process. The program is sponsored and hosted by the President’s Committee on the Arts and Humanities, the Institute of Museum and Library Services, Scholastic Art & Writing Awards, and several other groups, with the dinner hosted at the fabulous, new Google Washington Office—altogether an interesting collaboration.
The students began the day at the White House, and they read their poetry in the Blue Room, hosted by the First Lady. Then they met with a group of White House speechwriters to talk about the creation of a different kind of “poetry.” At the dinner, I sat next to one of the incoming (2014) National Student Poets, Cameron Messinides, a 17-year old from Greenville, South Carolina. He, as well as the other honorees, exhibited impressive, almost intimidating ability and poise in their presentations and informal conversation.
The advent of the digital age does not, of course, negate important forms of intellectual endeavor such as poetry, but does raise questions about how these forms of traditional communication extend online. And for the American Library Association (ALA), there are further questions about how libraries may best participate in this extension. Then there is the question of how to convey such library possibilities to decision makers and influencers. Thus, under the rubric of our Policy Revolution! Initiative as well as a new Office for Information Technology Policy program, we are exploring the needs and opportunities of children and youth with respect to technology and libraries with this eye on engaging national decision makers and influencers.
Well, OK, the event was fun too. With all due deference to our Empress of E-rate (Marijke Visser, who is the associate director of the ALA Office for Information Technology Policy), one cannot spend all of one’s time on E-rate and such matters, though even so, admittedly one can see a plausible link between E-rate, libraries, and poetry. So even at this dinner, E-rate did lurk in the back of my mind… I guess there is no true escape from E-rate.
Last year’s class of Residents, along with LC staff, at the ALA Mid-winter conference
The Library of Congress Office of Strategic Initiatives, in partnership with the Institute of Museum and Library Services, has recently announced the 2015 National Digital Stewardship Residency program, which will be held in the Washington, DC area starting in June 2015.
As you may know (NDSR was well represented on the blog last year), this program is designed for recent graduates with an advanced degree who are interested in the field of digital stewardship. This will be the fourth class of residents for this program overall – the first in 2013, was held in Washington, DC and the second and third classes, starting in September 2014, are being held concurrently in New York and Boston.
The five 2015 residents will each be paired with an affiliated host institution for a 12-month program that will provide them with an opportunity to develop, apply and advance their digital stewardship knowledge and skills in real-world settings. The participating hosts and projects for the 2015 cohort will be announced in early December and the applications will be available shortly after. News and updates will be posted to the NDSR webpage, and here on The Signal.
In addition to providing great career benefits for the residents, the successful NDSR program also provides benefits to the institutions involved as well as the library and archives field in general. For an example of what the residents have accomplished in the past, see this previous blog post about a symposium held last spring, organized entirely by last year’s residents.
Another recent success for the program – all of the former residents now have substantive jobs or fellowships in a related field. Erica Titkemeyer, a former resident who worked at the Smithsonian Institution Archives, now has a position at the University of North Carolina at Chapel Hill as the Project Director and AV Conservator for the Southern Folklife Collection. Erica said the Residency provided the opportunity to utilize skills gained through her graduate education and put them to practical use in an on-the-job setting. In this case, she was involved in research and planning for preservation of time-based media art at the Smithsonian.
Erica notes some other associated benefits. “I had a number of chances to network within the D.C. area through the Library of Congress, external digital heritage groups and professional conferences as well,” she said. “I have to say, I am most extremely grateful for having had a supportive group of fellow residents. The cohort was, and still remains, a valuable resource for knowledge and guidance.”
This residency experience no doubt helped Erica land her new job, one that includes a lot of responsibility for digital library projects. “Currently we are researching options and planning for mass-digitization of the collection, which contains thousands of recordings on legacy formats pertaining to the music and culture of the American South,” she said.
George Coulbourne, Executive Program Officer at the Library of Congress, remarked on the early success of this program: “We are excited with the success of our first class of residents, and look forward to continuing this success with our upcoming program in Washington, DC. The experience gained by the residents along with the tangible benefits for the host institution will help set the stage for a national residency model in digital preservation that can be replicated in various public and private sector environments.”
So, this is a heads-up to graduate students and all interested institutions – start thinking about how you might want to participate in the 2015 NDSR. Keep checking our website and blog for updated information, applications, dates, etc. We will post this information as it becomes available.
One of my happier duties as a LITA Board member is reviewing Emerging Leader applications to decide whom the division should sponsor. I just finished this year’s round of review this morning, and now that my choices are safely submitted (but fresh on my mind) I can share what I’m looking for, in hopes that it’s useful to future Emerging Leader candidates as you develop your applications.
But first, a caveat: last year and this, I would have been happy with LITA sponsoring at least half of the candidates I saw, if only we could. Really the only unpleasant part of reviewing applications is that we can’t sponsor everyone we’d like to; I see so many extraordinarily talented, driven people in the application pile, and it’s actually painful not to be able to put all of them at the top of my list.
Okay! That said…
Things I want to see
People who have gotten things done.
People who haven’t just done an excellent job with duties as assigned, but who have perceived a need and initiated something to solve it.
People who have marshaled resources and buy-in, even though they are (as is the case for most EL candidates) in a junior position, or outside a formal hierarchy.
Letters of recommendation that speak to the things you can’t credibly address about yourself (communication, leadership skills), using specific examples.
Diversity is a specific (and large) part of the rubric I’m asked to use, and I’m going to give it extended treatment here. First, not gonna lie: most people in the pool are white women, and you have an uphill battle to prove your understanding of diversity if you’re one of them. (I am also a white woman, and the same goes for me.) Second, I’m not looking for evidence that you care about diversity or think it’s a good thing (of course you do. what are you, some kind of a jerk? no). I’m looking for concrete evidence that you actually get it. Tell me that you wrote a thesis on some topic that required you to grapple with primary sources and major thinkers on some diversity-related topic. Tell me about the numerous conference presentations you’ve done that required this kind of thinking. Tell me about the work, whether paid or volunteer, that you’ve done with diverse populations. Tell me about how you’ve gone out of your way, and maybe out of your comfort zone, to actually do something that deepens your awareness, develops your skills, and diversifies your network.
If you belong to a population that gives you special insight about some axis of diversity (and many white women do!), tell me about that, too. I don’t give full credit for that – I’d still like to see that you’ve theorized or worked on some sort of diversity issue – but it does give me faith that you have some sort of relevant insight and experience.
There are many kinds of diversity that have shown up in EL apps and there’s no one that matters most to me, nor do I expect any candidate to have experience with all of them. But you need to have done something. And if you really haven’t, at least acknowledge and problematize that fact; if you do this and the rest of your application is exemplary you may still be in the running for me.
Things I do not want to see
I had 20 applications to review this year. I am reviewing them as a volunteer, amidst the multiple proposals I am writing this month and the manuscript due in November and the course and webinar I’ll be teaching soon and my regular duties on two boards and helping lead a major fundraising campaign and writing code for a couple of clients and the usual housework and childcare things and occasionally even having a life and, this week, some pretty killer insomnia. Seriously, if you give me any excuse to stop reading your application, I will take it.
Do not give me the excuse to stop reading.
Some things that will make me stop reading:
If your application is in any way incomplete (didn’t answer all the questions, missing one or more references, no resume).
Significant or frequent errors of grammar, spelling, or usage.
Shallow treatment of the diversity question (see above).
I also might stop reading overly academic prose, particularly if it reads like you’re not 100% comfortable with that (admittedly pretty weird) genre. I do want to see that you’re smart and have a good command of English, but communication within associations is a different genre than journal articles. Talk to me in your voice (but get someone to proofread). Particularly if you’re a current student or a recent graduate: I give you permission to not write an academic paper. (I implore you not to write an academic paper.) My favorite EL applications sparkle with personality. They speak with humility or confidence or questioning or insight or elegance. A few even make me laugh.
I would prefer it if you spell out acronyms, at least on their first occurrence. You can assume that I recognize ALA and its divisions, but there are a lot of acronyms in the library world, and they’re not all clear outside their context. If you’re active in CLA, is that California or Colorado or Connecticut? Or Canada?
Some information about mechanics
Pulling back the curtain for a moment here: the web site where I access your application materials does not have super-awesome design or usability, and this impacts (sometimes unfairly) how I rate your answers.
Your answers to the questions are displayed in all bold letters. This makes it hard to read long paragraphs. Please use paragraph breaks thoughtfully.
Your recommenders’ text appears to be displayed without any paragraph breaks at all, if they’ve typed it directly into the site. Ow. Please ask them to upload letters as files instead.
Speaking of which: I use Pages. On a Mac. Your .docx file will probably look wrong to me. If you’ve invested time and graphic design skills in lovingly crafting a resume, I want to see! Please upload your resume as .pdf, and ask your recommenders to upload their letters as .pdf too. (On reflection I feel bad about this because it’s a famously poor format for accessibility. But seriously, your .docx looks bad.)
Whew! Glad I got to say all that Hope this helps future EL candidates. I look forward to reading your applications next year!
Recently, there was a thread stated by a frustrated Drupal user on the Code4Lib (Code for Libraries) mailing list. It drew many thoughtful and occasionally passionate responses. This was mine:
I think that it is widely conceded that it is a good idea to use the most suitable tool for a given task. But what does that mean? There is a long list of conditions and factors that go into selecting tools, some reflecting immediate needs, some reflecting long term needs and strategy, and others reflecting the availability of resources, and these interact in many ways, many of them problematic.
I have given the genesis of Cherry Hill’s tech evolution at the end of this missive. The short version is that we started focused on minimizing size and complexity while maximizing performance, and over time have moved to an approach that balances those against building and maintenance cost along with human and infrastructure resource usage.
It is one thing to have a vision, regular readers of this blog will know I have them all the time, its yet another to see it starting to form through the mist into a reality. Several times in the recent past I have spoken of the some of the building blocks for bibliographic data to play a prominent part in the Web of Data. The Web of Data that is starting to take shape and drive benefits for everyone. Benefits that for many are hiding in plain site on the results pages of search engines. In those informational panels with links to people’s parents, universities, and movies, or maps showing the location of mountains, and retail outlets; incongruously named Knowledge Graphs.
OK, you may say, we’ve heard all that before, so what is new now?
As always it is a couple of seemingly unconnected events that throw things into focus.
Event 1: An article by David Weinberger in the DigitalShift section of Library Journal entitled Let The Future Go. An excellent article telling libraries that they should not be so parochially focused in their own domain whilst looking to how they are going serve their users’ needs in the future. Get our data out there, everywhere, so it can find its way to those users, wherever they are. Making it accessible to all. David references three main ways to provide this access:
APIs – to allow systems to directly access our library system data and functionality
Linked Data – can help us open up the future of libraries. By making clouds of linked data available, people can pull together data from across domains
The Library Graph – an ambitious project libraries could choose to undertake as a group that would jump-start the web presence of what libraries know: a library graph. A graph, such as Facebook’s Social Graph and Google’s Knowledge Graph, associates entities (“nodes”) with other entities
(I am fortunate to be a part of an organisation, OCLC, making significant progress on making all three of these a reality – the first one is already baked into the core of OCLC products and services)
It is the 3rd of those, however, that triggered recognition for me. Personally, I believe that we should not be focusing on a specific ‘Library Graph’ but more on the ‘Library Corner of a Giant Global Graph’ – if graphs can have corners that is. Libraries have rich specialised resources and have specific needs and processes that may need special attention to enable opening up of our data. However, when opened up in context of a graph, it should be part of the same graph that we all navigate in search of information whoever and wherever we are.
ZBW contributes to WorldCat, and has 1.2 million oclc numbers attached to it’s bibliographic records. So it seemed interesting, how many of these editions link to works and furthermore to other editions of the very same work.
The post is interesting from a couple of points of view. Firstly the simple steps they took to get at the data, really well demonstrated by the command-line calls used to access the data – get OCLCNum data from WorldCat.or in JSON format – extract the schema:exampleOfWork link to the Work – get the Work data from WorldCat, also in JSON – parse out the links to other editions of the work and compare with their own data. Command-line calls that were no doubt embedded in simple scripts.
Secondly, was the implicit way that the corpus of WorldCat Work entity descriptions, and their canonical identifying URIs, is used as an authoritative hub for Works and their editions. A concept that is not new in the library world, we have been doing this sort of things with names and person identities via other authoritative hubs, such as VIAF, for ages. What is new here is that it is a hub for Works and their relationships, and the bidirectional nature of those relationships – work to edition, edition to work – in the beginnings of a library graph linked to other hubs for subjects, people, etc.
The ZBW Labs experiment is interesting in its own way – simple approach enlightening results. What is more interesting for me, is it demonstrates a baby step towards the way the Library corner of that Global Web of Data will not only naturally form (as we expose and share data in this way – linked entity descriptions), but naturally fit in to future library workflows with all sorts of consequential benefits.
The experiment is exactly the type of initiative that we hoped to stimulate by releasing the Works data. Using it for things we never envisaged, delivering unexpected value to our community. I can’t wait to hear about other initiatives like this that we can all learn from.
So who is going to be doing this kind of thing – describing entities and sharing them to establish these hubs (nodes) that will form the graph. Some are already there, in the traditional authority file hubs: The Library of Congress LC Linked Data Service for authorities and vocabularies (id.loc.gov), VIAF, ISNI, FAST, Getty vocabularies, etc.
As previously mentioned Work is only the first of several entity descriptions that are being developed in OCLC for exposure and sharing. When others, such as Person, Place, etc., emerge we will have a foundation of part of a library graph – a graph that can and will be used, and added to, across the library domain and then on into the rest of the Global Web of Data. An important authoritative corner, of a corner, of the Giant Global Graph.
As I said at the start these are baby steps towards a vision that is forming out of the mist. I hope you and others can see it too.
This week, 57 years ago, was a tumultuous one for nine African American students at Central High School in Little Rock, Arkansas. Now better known as the Little Rock Nine, these high school students were part of a several year battle to integrate Little Rock School District after the landmark 1954 Brown v. Board of Education Supreme Court ruling.
From that ruling on, it was a tough uphill battle to get the Little Rock School District to integrate. On a national level, all eight congressmen from Arkansas were part of the “Southern Manifesto,” encouraging Southern states to resist integration. On a local level, white citizens’ councils, like the Capital Citizens Council and the Mothers’ League of Central High School, were formed in Little Rock to protest desegregation. They also lobbied politicians, in particular Arkansas Governor Orval Faubus, who went on to block the 1957 desegregation of Central High School.
These tensions escalated throughout September 1957—which saw the Little Rock Nine barred from entering the school by Arkansas National Guard troops sent by Faubus. Eventually, Federal District Judge Ronald Davies was successful in ordering Faubus to stop interfering with desegregation. Integration began during this week, 57 years ago.
On September 23, 1957, the nine African American students entered Central High School by a side door, while a mob of more than 1,000 people crowded the building. Local police were overwhelmed, and the protesters began attacking African American reporters outside the school building.
President Eisenhower, via Executive Order 10730, sent the U.S. Army to Arkansas to escort the Little Rock Nine into school, on September 25, 1957. The students attended classes with soldiers by their side. By the end of the month, a now federalized National Guard had mostly taken over protection of the students. While eventually the protests died down, the abuse and tension did not. The school was eventually shut down from 1958 through fall 1959 as the struggle for segregation continued.
Through the DPLA, you can get a better sense of what that struggle and tension was like. In videos from our service hub, Digital Library of Georgia, you can view news clips recorded during this historic time in Little Rock. These videos are a powerful testament to the struggle of the Little Rock Nine, and the Civil Rights movement as a whole.
I gave a plenary talk at the 3rd EUDAT Conference's session on sustainability entitled Economic Sustainability of Digital Preservation. Below the fold is an edited text with links to the sources.
I'm David Rosenthal from the LOCKSS (Lots Of Copies Keep Stuff Safe) Program at the Stanford Libraries. We've been sustainably preserving digital information for a reasonably long time, and I'm here to talk about some of the lessons we've learned along the way that are relevant for research data.
In May 1995 Stanford Libraries' HighWire Press pioneered the shift of academic journals to the Web by putting the Journal of Biological Chemistry on-line. Almost immediately librarians, who pay for this extraordinarily expensive content, saw that the Web was a far better medium than paper for their mission of getting information to current readers. But they have a second mission, getting information to future readers. There were both business and technical reasons why, for this second mission, the Web was a far worse medium than paper:
The advent of the Web forced libraries to change from purchasing a copy of the content to renting access to the publisher's copy. If the library stopped paying the rent, it would lose access to the content.
Because in the Web the publisher stored the only copy of the content, and because it was on short-lived, easily rewritable media, the content was at great risk of loss and damage.
As a systems engineer, I found the paper library system interesting as an example of fault-tolerance. It consisted of a loosely-coupled network of independent peers. Each peer stored copies of its own selection of the available content on durable, somewhat tamper-evident media. The more popular the content, the more peers stored a copy. There was a market in copies; as content had fewer copies, each copy became more valuable, encouraging the peers with a copy to take more care of it. It was easy to find a copy, but it was hard to be sure you had found all copies, so undetectably altering or deleting content was difficult. There was a mechanism, inter-library loan and copy, for recovering from loss or damage to a copy.
The LOCKSS Program started in October 1998 with the goal of replicating the paper library system for the Web. We built software that allowed libraries to deploy a PC, a LOCKSS box, that was the analog for the Web of the paper library's stacks. By crawling the Web, the box collected a copy of the content to which the library subscribed and stored it. Readers could access their library's copy if for any reason they couldn't get to the publisher's copy. Boxes at multiple libraries holding the same content cooperated in a peer-to-peer network to detect and repair any loss or damage.
The program was developed and went into early production with initial funding from the NSF, and then major funding from the Mellon Foundation, the NSF and Sun Microsystems. But grant funding isn't a sustainable business model for digital preservation. In 2005, the Mellon Foundation gave us a grant with two conditions; we had to match it dollar-for-dollar and by the end of the grant in 2007 we had to be completely off grant funding. We made both conditions, and we have (with one minor exception which I will get to later) been off grant funding and in the black ever since. The LOCKSS Program has two businesses:
We develop, and support libraries that use, our open-source software for digital preservation. The software is free, libraries pay for support. We refer to this as the "Red Hat" business model
Under contract to a separate not-for-profit organization called CLOCKSS run jointly by publishers and libraries, we use our software to run a large dark archive of e-journals and e-books. This archive has recently been certified as a "Trustworthy Repository" after a third-party audit which awarded it the first-ever perfect score in the Technologies, Technical Infrastructure, Security category.
The first lesson that being self-sustaining for 7 years has taught us is "Its The Economics, Stupid". Research in two areas of preservation, e-journals and the public Web, indicates that in each of these two areas combining all current efforts preserves less than half the content that should be preserved. Why less than half? The reason is that the budget for digital preservation isn't adequate to preserve even half using current technology. This leaves us with three choices:
Do nothing. In that case we can stop worrying about bit rot, format obsolescence, operator error and all the other threats digital preservation systems are designed to combat. These threats are dwarfed by the threat of can't afford to preserve. It is going to mean that more than 50% of the stuff that should be available to future readers isn't.
Double the budget for digital preservation. This is so not going to happen. Even if it did, it wouldn't solve the problem because, as I will show, the cost per unit content is going to rise.
Halve the cost per unit content of current systems. This can't be done with current architectures. Yesterday morning I gave a talk at the Library of Congress describing a radical re-think of long-term storage architecture that might do the trick. You can find the text of the talk on my blog.
Unfortunately, the structure of research funding means that economics is an even worse problem for research data than for our kind of content. There's been quite a bit of research into the costs of digital preservation, but it isn't based on a lot of good data. Remedying that is important. I'm on the advisory board of an EU-funded project called 4C that is trying to remedy that. If you have any kind of cost data you can share please go to http://www.4cproject.eu/ and submit it to the Curation Cost Exchange.
As an engineer, I'm used to using rules of thumb. The one I use to summarize most of the cost research is that ingest takes half the lifetime cost, preservation takes one third, and access takes one sixth.
Research grants might be able to fund the ingest part, this is a one-time up-front cost. But preservation and access are ongoing costs for the life of the data, so grants have no way to cover them. We've been able to ignore this problem for a long time, for two reasons. From at least 1980 to 2010 costs followed Kryder's Law, the disk analog of Moore's Law, dropping 30-40%/yr. This meant that, if you could afford to store the data for a few years, the cost of storing it for the rest of time could be ignored, because of course Kryder's Law would continue forever. The second is that as the data got older, access to it was expected to become less frequent. Thus the cost of access in the long term could be ignored.
Kryder's Law held for three decades, an astonishing feat for exponential growth. Something that goes on that long gets built into people's model of the world, but as Randall Munroe points out, in the real world exponential curves cannot continue for ever. They are always the first part of an S-curve.
This graph, from Preeti Gupta of UC Santa Cruz plots the cost per GB of disk drives against time. In 2010 Kryder's Law abruptly stopped. In 2011 the floods in Thailand destroyed 40% of the world's capacity to build disks, and prices doubled. Earlier this year they finally got back to 2010 levels. Industry projections are for no more than 10-20% per year going forward (the red lines on the graph). This means that disk is now about 7 times as expensive as was expected in 2010 (the green line), and that in 2020 it will be between 100 and 300 times as expensive as 2010 projections.
Thanks to aggressive marketing, it is commonly believed that "the cloud" solves this problem. Unfortunately, cloud storage is actually made of the same kind of disks as local storage, and is subject to the same slowing of the rate at which it was getting cheaper. In fact, when all costs are taken in to account, cloud storage is not cheaper for long-term preservation than doing it yourself once you get to a reasonable scale. Cloud storage really is cheaper if your demand is spiky, but digital preservation is the canonical base-load application.
You may think that cloud storage is a competitive market; in fact it is dominated by Amazon. When Google recently started to get serious about competing, they pointed out that Amazon's margins on S3 may have been minimal at introduction, by then they were extortionate:
cloud prices across the industry were falling by about 6 per cent each year, whereas hardware costs were falling by 20 per cent. And Google didn't think that was fair. ... "The price curve of virtual hardware should follow the price curve of real hardware."
Notice that the major price drop triggered by Google was a one-time event; it was a signal to Amazon that they couldn't have the market to themselves, and to smaller players that they would no longer be able to compete.
In fact commercial cloud storage is a trap. It is free to put data in to a cloud service such as Amazon's S3, but it costs to get it out. For example, getting your data out of Amazon's Glacier without paying an arm and a leg takes 2 years. If you commit to the cloud as long-term storage, you have two choices. Either keep a copy of everything outside the cloud (in other words, don't commit to the cloud), or stay with your original choice of provider no matter how much they raise the rent.
The storage part of preservation isn't the only on-going cost that will be much higher than people expect, access will be too. In 2010 the Blue Ribbon Task Force on Sustainable Digital Preservation and Access pointed out that the only real justification for preservation is to provide access. With research data this is a difficulty, the value of the data may not be evident for a long time. Shang dynasty astronomers inscribed eclipse observations on animal bones. About 3200 years later, researchers used these records to estimate that the accumulated clock error was about 7 hours. From this they derived a value for the viscosity of the Earth's mantle as it rebounds from the weight of the glaciers.
But the advent of "Big Data" techniques mean that, going forward, scholars increasingly want not to access a few individual items in a collection, but to ask questions of the collection as a whole. For example, the Library of Congress announced that it was collecting the entire Twitter feed, and almost immediately had 400-odd requests for access to the collection. The scholars weren't interested in a few individual tweets, but in mining information from the entire history of tweets. Unfortunately, the most the Library of Congress can afford to do with the feed is to write two copies to tape. There's no way they can afford the compute infrastructure to data-mine from it. We can get some idea of how expensive this is by comparing Amazon's S3, designed for data-mining type access patterns, with Amazon's Glacier, designed for traditional archival access. S3 is currently at least 2.5 times as expensive; until recently it was 5.5 times.
The real problem here is that scholars are used to having free access to library collections, but what scholars now want to do with archived data is so expensive that they must be charged for access. This in itself has costs, since access must be controlled and accounting undertaken. Further, data-mining infrastructure at the archive must have enough performance for the peak demand but will likely be lightly used most of the time, increasing the cost for individual scholars. A charging mechanism is needed to pay for the infrastructure. Fortunately, because the scholar's access is spiky, the cloud provides both suitable infrastructure and a charging mechanism.
For smaller collections, Amazon provides Free Public Datasets, Amazon stores a copy of the data with no charge, charging scholars accessing the data for the computation rather than charging the owner of the data for storage.
Even for large and non-public collections it may be possible to use Amazon. Suppose that in addition to keeping the two archive copies of the Twitter feed on tape, the Library kept one copy in S3's Reduced Redundancy Storage simply to enable researchers to access it. For this year, it would have averaged about $4100/mo, or about $50K. Scholars wanting to access the collection would have to pay for their own computing resources at Amazon, and the per-request charges; because the data transfers would be internal to Amazon there would not be bandwidth charges. The storage charges could be borne by the library or charged back to the researchers. If they were charged back, the 400 initial requests would each need to pay about $125 for a year's access to the collection, not an unreasonable charge. If this idea turned out to be a failure it could be terminated with no further cost, the collection would still be safe on tape. In the short term, using cloud storage for an access copy of large, popular collections may be a cost-effective approach. Because the Library's preservation copy isn't in the cloud. they aren't locked-in.
One thing it should be easy to agree on about digital preservation is that you have to do it with open-source software; closed-source preservation has the same fatal "just trust me" aspect that closed-source encryption (and cloud storage) suffer from. Sustaining open source preservation software is interesting, because unlike giants like Linux, Apache and so on it is a niche market with little commercial interest.
We have managed to sustain open-source preservation software well for 7 years, but have encountered one problem. This brings me to the exception I mentioned earlier. To sustain the free software, paid support model you have to deliver visible value to your customers regularly and frequently. We try to release updated software every 2 months, and new content for preservation weekly. But this makes it difficult to commit staff resources to major improvements to the infrastructure. These are needed to address problems that don't impact customers yet, but will in a few years unless you work on them now.
The Mellon Foundation supports a number of open-source initiatives, and after discussing this problem with them they gave us a small grant specifically to work on enhancements to the LOCKSS system such as support for collecting websites that use AJAX, and for authenticating users via Shibboleth. Occasional grants of this kind may be needed to support open-source preservation infrastructure generally, even if pay-for-support can keep it running.
Unfortunately, economics aren't the only hard problem facing the long-term storage of data. There are serious technical problems too. Lets start by examining the technical problem in its most abstract form. Since 2007 I've been using the example of "A Petabyte for a Century". Think about a black box into which you put a Petabyte, and out of which a century later you take a Petabyte. Inside the box there can be as much redundancy as you want, on whatever media you choose, managed by whatever anti-entropy protocols you want. You want to have a 50% chance that every bit in the Petabyte is the same when it comes out as when it went in.
Now consider every bit in that Petabyte as being like a radioactive atom, subject to a random process that flips it with a very low probability per unit time. You have just specified a half-life for the bits. That half-life is about 60 million times the age of the universe. Think for a moment how you would go about benchmarking a system to show that no process with a half-life less than 60 million times the age of the universe was operating in it. It simply isn't feasible. Since at scale you are never going to know that your system is reliable enough, Murphy's law will guarantee that it isn't.
Here's some back-of-the-envelope hand-waving. Amazon's S3 is a state-of-the-art storage system. Its design goal is an annual probability of loss of a data object of 10-11. If the average object is 10K bytes, the bit half-life is about a million years, way too short to meet the requirement but still really hard to measure.
Note that the 10-11 is a design goal, not the measured performance of the system. There's a lot of research into the actual performance of storage systems at scale, and it all shows them under-performing expectations based on the specifications of the media. Why is this? Real storage systems are large, complex systems subject to correlated failures that are very hard to model.
Worse, the threats against which they have to defend their contents are diverse and almost impossible to model. Nine years ago we documented the threat model we use for the LOCKSS system. We observed that most discussion of digital preservation focused on these threats:
but that the experience of operators of large data storage facilities was that the significant causes of data loss were quite different:
Building systems to defend against all these threats combined is expensive, and can't ever be perfectly effective. So we have to resign ourselves to the fact that stuff will get lost. This has always been true, it should not be a surprise. And it is subject to the law of diminishing returns. Coming back to the economics, how much should we spend reducing the probability of loss?
Consider two storage systems with the same budget over a decade, one with a loss rate of zero, the other half as expensive per byte but which loses 1% of its bytes each year. Clearly, you would say the cheaper system has an unacceptable loss rate.
However, each year the cheaper system stores twice as much and loses 1% of its accumulated content. At the end of the decade the cheaper system has preserved 1.89 times as much content at the same cost. After 30 years it has preserved more than 5 times as much at the same cost.
Why is this? Because the collection was always a series of samples of the Web, the losses merely add a small amount of random noise to the samples. But the samples are so huge that this noise is insignificant. This isn't something about the Internet Archive, it is something about very large collections. In the real world they always have noise; questions asked of them are always statistical in nature. The benefit of doubling the size of the sample vastly outweighs the cost of a small amount of added noise. In this case more really is better.
To sum up, the good news is that sustainable preservation of digital content such as research data is possible, and the LOCKSS Program is an example.
The bad news is that people's expectations are way out of line with reality. It isn't possible to preserve nearly as much as people assume is already being preserved, nearly as reliably as they assume it is already being done. This mismatch is going to increase. People don't expect more resources yet they do expect a lot more data. They expect that the technology will get a lot cheaper but the experts no longer believe it will.
Research data, libraries and archives are a niche market. Their problems are technologically challenging but there isn't a big payoff for solving them, so neither industry nor academia are researching solutions. We end up cobbling together preservation systems out of technology intended to do something quite different, like backups.
Social media is something I have in common with popular library speaker Joe Murphy. We’ve both given talks about the power of social media at loads of conferences. I love the radical transparency that social media enables. It allows for really authentic connection and also really authentic accountability. So many bad products and so much bad behavior have come to light because of social media. Everyone with a cell phone camera can now be an investigative reporter. So much less can be swept under the rug. It’s kind of an amazing thing.
But what’s disturbing is what has not become more transparent. Sexual harassment for one. When a United States senator doesn’t feel like she can name the man who told her not to lose weight after having her baby because “I like my girls chubby,” then we know this problem is bigger than just libraryland.
It’s been no secret among many women (and some men) who attend and speak at conferences like Internet Librarian and Computers in Libraries that Joe Murphy has a reputation for using these conferences as his own personal meat markets. Whether it’s true or not, I don’t know. I’ve known these allegations since before 2010, which was when I had the privilege of attending a group dinner with him.
He didn’t sexually harass anyone at the table that evening, but his behavior was entitled, cocky, and rude. He barely let anyone else get a word in edgewise because apparently what he had to say (in a group with some pretty freaking illustrious people) was more important than what anyone else had to say. The host of the dinner apologized to me afterwards and said he had no idea what this guy was like. And that was the problem. This information clearly wasn’t getting to the people who needed it most; particularly the people who invited him to speak at conferences. For me, it only cemented the fact that it’s a man’s world (even in our female-dominated profession) and men can continue to get away with and profit from offering more flash than substance and behaving badly.
Why don’t we talk about sexual harassment in the open? I can only speak from my own experience not revealing a public library administrator who sexually harassed me at a conference. First, I felt embarrassed, like maybe I’d encouraged him in some way or did something to deserve it. Second, he was someone I’d previously liked and respected and a lot of other people liked and respected him, and I didn’t want to tarnish his reputation over something that didn’t amount to that much. Maybe also the fact that he was so respected also made me scared to say something, because, in the end, it could end up hurting me.
People who are brave enough to speak out about sexual harassment and name names are courageous. As Barbara Fister wrote, they are whistleblowers. They protect other women from suffering a similar fate, which is noble. When Lisa Rabey and nina de jesus (AKA #teamharpy) wrote about behavior from Joe Murphy that many of us had been hearing about for years, they were acting as whistleblowers, though whistleblowers who had only heard about the behavior second or third-hand, which I think is an important distinction. I believe they shared this information in order to protect other women. And now they’re being sued by Joe Murphy for 1.25 million dollars in damages for defaming his character. You can read the statement of claim here. I assume he is suing them in Canada because it’s easier to sue for libel and defamation outside of the U.S.
On his blog, Wayne Biven’s Tatum wonders “whether the fact of the lawsuit might hurt Murphy within the librarian community more than any accusations of sexual harassment.” Is it the Streisand effect, whereby Joe Murphy is bringing more attention to his alleged behavior by suing these women? It’s possible that this will bite him in the ass more than the original tweets and blog post (which I hadn’t seen prior) ever could.
I fear the impact of this case will be that women feel even less safe speaking out against sexual harassment if they believe that they could be sued for a million or more dollars. In the end, how many of us really have “proof” that we were sexually harassed other than our word? If you know something that substantiates their allegations of sexual predatory behavior, consider being a witness in #teamharpy’s case. If you don’t but still want to help, contribute to their defense fund.
That said, that this information comes second or third-hand does concern me. I don’t know for a fact that Joe Murphy is a sexual predator. Do you? Here’s what I do know. Did he creep me out when I interacted with him? Yes. Did he creep out other women at conferences? Yes. Did he behave like an entitled jerk at least some of the time? Yes. Do many people resent the fact that a man with a few years of library experience who hasn’t worked at a library in years is getting asked to speak at international conferences when all he offers is style and not substance? Yes.
While all of the rumors about him that have been swirling around for at least the past 4-5 years may be 100% true, I don’t know if they are. I don’t know if anyone has come out and said they were harassed by him beyond the general “nice shirt” comment that creeped out many women. As anyone who has read my blog for a while knows, I am terrified of groupthink. So I feel really torn when it comes to this case. Part of me wonders whether my dislike of Joe Murphy makes me more prone to believe these things. Another part of me feels that these allegations are very consistent with my experience of him and with the rumors over these many years. But I’m not going to decide whether the allegations are true without hearing it from someone who experienced it first-hand.
I wish I could end this post on a positive note, but this is pretty much sad for everyone. Sad for the two librarians who felt they were doing a courageous thing (and may well have been) by speaking out and are now being threatened by a tremendously large lawsuit. Sad for the victims of harassment who may be less likely to speak out because of this lawsuit. And sad for Joe Murphy if he is truly innocent of what he’s been accused (and imagine for a moment the consequences of tarring and feathering an innocent man). I wish we lived in a world where we felt as comfortable reporting abuse and sexual harassment as we do other wrongdoing. I wish as sharp a light was shined on this as has recently been shined on police brutality, corporate misbehavior, and income inequality. And maybe the only positive is that this is shining a light on the fact that this happens and many women, even powerful women, do not feel empowered to report it.
A notion that haunts me is found in Neil Gaiman’s The Sandman: the library of the Dreaming, wherein can be found books that no earth-bound librarian can collect. Books that caught existence only in the dreams – or passing thoughts – of their authors. The Great American Novel. Every Great American Novel, by all of the frustrated middle managers, farmers, and factory workers who had their heart attack too soon. Every Great Nepalese Novel. The conclusion of the Wheel of Time, as written by Robert Jordan himself.
That library has a section containing every book whose physical embodiment was stolen. All of the poems of Sappho. Every Mayan and Olmec text – including the ones that, in the real world, did not survive the fires of the invaders.
Books can be like cockroaches. Text thought long-lost can turn up unexpectedly, sometimes just by virtue of having been left lying around until someone things to take a closer look. It is not an impossible hope that one day, another Mayan codex may make its reappearance, thumbing its nose at the colonizers and censors who despised it and the culture and people it came from.
Books are also fragile. Sometimes the censors do succeed in utterly destroying every last trace of a book. Always, entropy threatens all. Active measures against these threats are required; therefore, it is appropriate that librarians fight the suppression, banning, and challenges of books.
Banned Books Week is part of that fight, and is important that folks be aware of their freedom to read what they choose – and to be aware that it is a continual struggle to protect that freedom. Indeed, perhaps “Freedom to Read Week” better expresses the proper emphasis on preserving intellectual freedom.
But it’s not enough.
I am also haunted by the books that are not to be found in the Library of the Dreaming – because not even the shadow of their genesis crossed the mind of those who could have written them.
Because their authors were shot for having the wrong skin color.
Because their authors were cheated of an education.
Total no. participating publishers & societies 5363
Total no. voting members 2609
% of non-profit publishers 57%
Total no. participating libraries 1902
No. journals covered 36,035
No. DOIs registered to date 69,191,919
No. DOIs deposited in previous month 582,561
No. DOIs retrieved (matched references) in previous month 35,125,120
DOI resolutions (end-user clicks) in previous month 79,193,741
Brazilian Journal of Internal Medicine
Brazilian Journal of Irrigation and Drainage - IRRIGA
Djokosoetono Research Center
Education Association of South Africa
Laboreal, FPCE, Universidade do Porto
Libronet Bilgi Hizmetleri ve Yazilim San. Tic. Ltd., Sti.
Open Access Text Pvt, Ltd.
Pontifical University of John Paul II in Krakow
Revista Brasileira de Quiropraxia - Brazilian Journal of Chiropractic
Scientific Online Publishing, Co. Ltd.
Symposium Books, Ltd.
Turkiye Yesilay Cemiyeti
Uniwersytet Ekonomiczny w Krakowie - Krakow University of Economics
Volgograd State University
IJNC Editorial Committee
Japanese Association of Cardioangioscopy
Lithuanian Universtity of Educational Sciences
The Operations Research Society of Japan
Acta Medica Anatolia
Ankara University Faculty of Agriculture
Dnipropetrovsk National University of Railway Transport
English Language and Literature Association of Korea
Institute for Humanities and Social Sciences
Institute of Korean Independence Movement Studies
Journal of Chinese Language and Literature
Journal of Korean Linguistics
Knowledge Management Society of Korea
Korea Association for International Commerce and Information
Korea Research Institute for Human Settlements
Korean Academic Society for Public Relations
Korean Marketing Association
Korean Society for Art History
Korean Society for the Study of Physical Education
Korean Society of Consumer Policy and Education
Law Research Institute, University of Seoul
Research Institute Centerprogamsystem, JSC
Research Institute of Science Education, Pusan National University
Research Institute of Social Science
Silicea - Poligraf, LLC
The Altaic Society of Korea
The Hallym Academy of Sciences
The Korean Association of Ethics
The Korean Association of Translation Studies
The Korean Society for Culture and Arts Education Studies
The Korean Society for Feminist Studies in English Literature
The Korean Society for Investigative Cosmetology
The Regional Association of Architectural Institute of Korea
The Society for Korean Language and Literary Research
Ural Federal University
V.I. Shimakov Federal Research Center of Transplantology and Artificial Organs
World Journal of Traditional Chinese Medicine
Yonsei Institute for North Korean Studies
Last updated September 10, 2014
Fucape Business School
Journal Issues Limited
Revista Bio Ciencias
The Russian Law Academy of the Ministry of Justice of the RF
Japan Society for Simulation Technology
Asian Journal of Education
Center for Studies of Christian Thoughts and Culture
Contemporary Film Research Institute
Democratic Legal Studies Association
Foreign Studies Institute
Institute for English Cultural Studies
Institute for Japanese Studies
Institute for Philosophy
Institute for the Translation of Korean Classics
Institute of Humanities
International Journal of Entrepreneurial Knowledge
Korean Academy of Kinesiology
Korean Association for the Study of English Language and Linguistics (KASELL)
Korean Logistics Society
The Association of Korean Education
The Korean Philosophy of Education Society
The Korean Society for School Science
Last Thursday, the U.S. House Judiciary Subcommittee on Courts, Intellectual Property, and the Internet held a hearing to gather information about the work of the U.S. Copyright Office and to learn about the challenges the Office faces in trying to fulfill its many responsibilities. Testifying before the Committee was Maria Pallante, Register of Copyrights and Director of the Copyright Office (view Pallante’s testimony (pdf)). Pallante gave a thorough overview of the Office’s administrative, public policy and regulatory functions, and highlighted a number of ways in which the Office’s structure and position within the federal bureaucracy create inefficiencies in its day-to-day operations. Pallante described these inefficiencies as symptoms of a larger problem: The 1976 Copyright Act vested the Office with the resources and authority it needed to thrive in an analog world, but it failed to anticipate the new needs the Office would develop in adjusting to a digital world.
Although the Office’s registration system—the system by which it registers copyright claims—was brought online in 2008, Pallante describes it as nothing more than a 20th century system presented in a 21st century format. The Office’s recordation system—the process by which it records copyright documents—is still completed manually and has not been updated for decades. Pallante considers fully digitizing the registration and recordation functions of the Copyright Office a top priority:
From an operational standpoint, the Office’s electronic registration system was fully implemented in 2008 by adapting off-the-shelf software. It was designed to transpose the paper-based system of the 20th century into an electronic interface, and it accomplished that goal. However, as technology continues to move ahead we must continue to evaluate and implement improvements. Both the registration and recordation systems need to be increasingly flexible to meet the rapidly changing needs of a digital marketplace.
Despite Pallante’s commitment to updating these systems, she cited her lack of administrative autonomy within the Library of Congress and her Office’s tightening budget as significant impediments to achieving this goal. Several members of the Committee suggested that the Office would have greater latitude to update its operations for the digital age if it were moved out from under the authority of the Library of Congress (LOC). While Pallante did not explicitly support this idea, she was receptive to suggestions from members of the Subcommittee that her office carries out very specialized functions that differ from those that are carried out by the rest of the LOC. Overall, Pallante seemed open to—if not supportive of—having a longer policy discussion on the proper position of the Copyright Office within the federal government.
In addition to providing insight into the inner-workings of the copyright office, the hearing continued the policy discussion on the statutory and regulatory frameworks that govern the process of documenting a copyright. As the Judiciary Committee continues to review the copyright law, it will be interesting to see if it further examines statutory and regulatory changes to the authority and structure of the Copyright Office.