I’m not going to apologize for not posting for, um, almost a year. As you can see from my last post, which I really just recommend skipping and taking this lesson from: 2017 was extremely hard on me. It turns out, 2018 hasn’t gone especially well, either, in many ways that I don’t want to talk about right now. Instead, I want to talk about one specific thing that has been very good: I’m finally on a road toward using my data skills professionally!
I’ve mentioned that I’m an adjunct for the local community college, working as a reference librarian. That is not where I want to be, career-wise, for a number of reasons. (I’m not demeaning my coworkers or the college’s students. Rather, “adjunct” and “reference” both really need to be temporary appellations for me. I’m a technical person: give me your website or your metadata or, heck, even some spreadsheets, I’m not picky; have me teach people how to use technology; let me shepherd through policies about shared software usage; even have me answer chat and email questions from patrons; just, please, never make me physically sit at the reference desk, because I hate almost everything about that experience.) Really, I want to be working with metadata or data, which is pretty funny when you think about how much of my first masters degree I did in MATLAB; I should have stuck with that, I guess, rather than … all that other stuff I did afterward.
Anyway, when the college announced a Data Analytics Certificate for [post-bachelors] professionals, you bet I used my free tuition to sign up for as many classes as I could! (Two. We get two classes per semester.)
So this semester, my Monday nights were spent in DAT-102, Introduction to Data Analytics. It was a 20,000 foot intro course, so we didn’t get into a lot of depth on anything or cover that much that was new for me—the stated goal was to give students an opportunity to decide whether data analysis was a good path for them or not. I did learn some Excel tricks (and relearn how to do pivot tables, for probably the third time, because I keep going multiple years at a time without needing them—which really just points to bad life choices, doesn’t it?), and I picked up a couple of new data sources to play with. I also finally understand and can explain p values, which was not a thing my engineering-focused probability and statistics courses ever really covered. (Should I not admit that? Eh. We all learn things that “everyone knows,” every day.) It was also a fun class, because the professor broke up the 3 hour(!) time block with activities to try to build up students’ intuition about things like probability distributions and experiment design. It was a good experience.
Tuesday nights were CIT-129, Python 2. Yes, I minored in computer science, but it’s been *cough* more than 10 years since then. And yes, I’ve written Python professionally, but only for a very short time, under very bad circumstances, which didn’t really enhance my learning much. So, again, I was in a course where we didn’t really go over anything I hadn’t seen before, but we went over things formally! Which was great! I’m definitely a better programmer for having taken it. Plus, it was a small class, everyone in it was really good, and our professor (same professor for both classes) was a fantastic teacher, so we went so much faster than the original syllabus had suggested we might. I really enjoyed it! Our final meeting, where we shared our semester projects, was last night, and I’m pretty sad that it’s over. Sadder still: there is no Python 3.
One cool thing about all of this, assuming that no one comes along and bumps me from Python 1 (adjuncts can always be bumped) and that the dean signs off on my colleague and me splitting Data Analytics 1 (this is likely but has not yet happened, that I know of), is that I’ll get to have a major impact on a program that is both new and fairly unique. Not a lot of community colleges are doing data analytics, yet. Ours is offering two versions of the program: an associate’s degree (a normal enough thing for a community college to offer) and that post-bachelors certificate (a bit less common of an offering). There’s an official list of courses with descriptions on the college website, but it was put together before they hired anyone with a lot of data expertise. (Makes sense: you can’t hire the data faculty until you have a data program!) Now they have a couple of data-focused faculty members (more, if you include adjuncts like me), who I think have made some updates to the high-level program, but when it comes to really detailed planning (e.g. syllabi), the courses are still very much in development. “We are building the ship as we sail,” my colleague says. … Which is a lot of potential for me to have an impact! (For instance: we will be covering data ethics in 201. And hopefully every other course in the program. But definitely 201, if I’m helping to write the syllabus!)
I’m pretty excited about it! And, yes, I did say adjuncting isn’t something I want to do forever, above. Of course, few people do! But adjunct teaching is a better gig—both in terms of how much I can help students and, probably, in pay (adjunct instructors are in a union; adjunct librarians are not)—plus, adjunct-teaching these classes will be a lot of fun. And challenging: both classes are being taught in three-hour chunks one night a week! Also, with only 1.5 classes, I’ll still be able to work at the library for a reasonable number of hours per week, which is nice, because we’re implementing most of the Springshare suite over the next few months, and there’s a lot I can contribute to that, as well. (I’m a reference librarian and just an adjunct, sure, but our web librarian has too much to do. Which means sometimes fun projects trickle down to me! I make sure to get scheduled time at-work-but-off-desk each week, so that I can do projects.)
I do have a couple of job applications open right now, and if you happen to be a potential employer who has read this far, 1) thanks(!), and 2) I want to point out: both of the classes start at 6pm, rather than taking place midday, so we can make it work. I’m also going to prepare really thoroughly during December and early January, to make the load lighter during the semester.
* I would absolutely teach online again, but most students in their first programming course need some dedicated in-person time. Also, the expectations that many students seem to have about online courses are not really compatible with something as work intensive as learning to program. (May vary by degree?)
As an intern in Digital Content and Collections (DCC), I have been working with various older digital collections and projects, to address any issues that have come up since their creation. Each of these collections is unique in content and format, with digitization, description, and access processes designed within the context of grants, stakeholder goals, user needs, and technical capacity. Most recently, I completed a project to ensure access to digitized materials from the Superiorland Library Cooperative (SLC) and through this project learned several lessons that are useful for me -- and anyone else working with older digitized materials -- to keep in mind for future projects with digital collections.
This post was written by Josh Hogan, who received a DLF HBCU Fellowship to attend the 2018 Forum.
Josh Hogan is the Assistant Head of the Digital Services Department at the Atlanta University Center Robert W. Woodruff Library. His primary responsibilities include taking a leadership role in a variety of digital curation activities, including digitization, metadata creation, repository management, and digital preservation.
Prior to assuming this role, he was the Metadata & Digital Resources Librarian at AUC Woodruff Library and spent several years as a manuscript archivist at the Atlanta History Center. Josh is strongly interested in digital scholarship/humanities research as well as the potential uses of open source software in digital preservation workflows.
“This neat separation, keeping your nose to the professional grindstone and leaving politics to your left-over moments, assumes that your profession is not inherently political. It is neutral. Teachers are objective and unbiased. Textbooks are eclectic and fair. The historian is even-handed and factual. The archivist keeps records, a scrupulously neutral job.” –Howard Zinn
In reflecting on my first experience at the DLF Forum, I am sure I’m not the first or only person to liken the experience to drinking from a fire hose. The many and varied experiences I had while in Las Vegas could fill several reflection pieces and would, perhaps, make for disjointed reading for those not living in my brain. After giving it some thought, I decided that the most important facet of my experience was finding an organization to which I could wholeheartedly and enthusiastically contribute.
I’ve chosen the quote above from Howard Zinn as an illustration of what I mean. As information professionals – librarians, archivists, etc.—we are often encouraged to make neutrality a central tenet of our ethical lives. I am not so sure that that approach is very healthy for us as well-rounded human beings. I have no beef with objectivity, i.e., a willingness to base our conclusions on the facts or evidence as those are presented to us, but I agree with Zinn that standing by neutrally only perpetuates the status quo. In other words, you can’t really be neutral on a moving train. I say all of this because I believe that the community encouraged by DLF is one that doesn’t demand a limp neutrality but an informed engagement.
This was evident across many of the sessions I attended, from Anasuya Sengupta’s excellent and thought-provoking plenary, “Decolonizing Knowledge, Decolonizing the Internet,” to the many engaging and important sessions put together by the Labor Working Group members. The work of DLF’s practitioners is grounded in reality and facts, but it is also engaged passionately with the issues and concerns of the broader communities that we serve.
The work of DLF’s practitioners is grounded in reality and facts, but it is also engaged passionately with the issues and concerns of the broader communities that we serve.
As part of my experience, I had an opportunity to serve as a moderator to a couple of excellent sessions. The first was a session organized by the Labor Working Group on “Organizing for Change, Organizing for Power” to address ways to change one’s work place, community, etc. for the better. The presenters were so well organized that I had only to be there as a backup and was able to participate in the discussion. I also (lightly) moderated a powerful session on Citizen Scholarship which aimed to “discuss how we are building communities, supporting engaged pedagogies, and transforming institutional cultures through collaborative and situated knowledge-making work.” These activities both inspire me and remind me why I wanted to pursue work as a librarian/archivist in the first place.
Finally, attending DLF gave me a great opportunity to get plugged into the emerging Digital Scholarship Working Group, formerly known as the “Miracle Workers.” I was struck by the diversity of backgrounds and interests of people working on Digital Humanities and Digital Scholarship issues, and I am excited to contribute to their activities as the group grows.
In short, being a Fellow was an excellent experience that introduced me to an organization I really believe in. I plan to be a contributor for years to come.
If you’d like to get involved with the scholarship committee for the 2019 Forum (October 13-16, 2019 in Tampa, FL), sign up to join the Planning Committee now! More information about 2019 fellowships will be posted in late spring.
Both the book Cane, and its author, Jean Toomer, resist easy classification. It’s now considered by many literary critics to be one of the major American literary works of its time, but despite admiring reviews, it did not sell widely, was out of print for notable intervals, and is not well-known to the general public. The book’s depictions of African Americans (both in the characters portrayed and in the expression of the author) didn’t fit into the expectations of many white and black readers. As Langston Hughes put it, “”O, be respectable, write about nice people, show how good we are,’ say the Negroes. ‘Be stereotyped, don’t go too far, don’t shatter our illusions about you, don’t amuse us too seriously. We will pay you,’ say the whites. Both would have told Jean Toomer not to write Cane.” Toomer himself, who had both black and white ancestry and attended segregated schools on both sides of the color line growing up, didn’t want his publisher to emphasize his “colored blood” in the marketing of the book. His racial self-identification would vary over the course of his life.
The book’s structure was also unusual, being a series of vignettes, not always straightforwardly connected, employing prose, poetry, and dramatic dialogue. It’s often classified as part of the Harlem Renaissance that was getting increasing attention in the 1920s, but it could just as easily be grouped with the sort of experimental modernist literature that white writers like James Joyce, Virginia Woolf, and Gertrude Stein were publishing.
The book was published and copyrighted in 1923, and Toomer renewed the copyright in 1950. By then, Toomer had decided not to pursue literary fame, and was occasionally publishing pieces like this one on his adopted Quaker religion. In between, he wrote and sometimes published other literary works, but I have not found copyright renewals under his name other than the renewal for Cane, though he lived until 1967. The influence of the book lives on in other writers who have appreciated it, though. One of those writers is Alice Walker, who said in a 1973 interview that the book “has been reverberating in me to an astonishing degree. I love it passionately, could not possibly exist without it.”
Many 20th century literary works by African American authors are now in the public domain, either because they were published before 1923, or because they did not have their copyrights renewed. (For instance, many of Zora Neale Hurston‘s early works were never renewed, and the influential NAACP magazine The Crisis also did not renew its copyrights, though some of its contributors renewed theirs.) Some of these works are online now, and some are not. In 20 days, I’m looking forward to seeing Cane join the public domain here, and have it and other often-overlooked works by African Americans become free to read and appreciate online.
From October 2 to November 20, 2017, a working group of volunteers representing five NDSA member institutions and interest groups conducted a survey of organizations in the United States actively involved in, or planning to start, programs to archive content from the Web. This effort builds upon and extends a broader effort begun in three earlier surveys, which the working group has conducted since 2011.
The goal of these surveys is to better understand the landscape of Web archiving activities in the United States by investigating the organizations involved; the history and scope of their Web archiving programs; the types of Web content being preserved; the tools and services being used; access and discovery services being offered; and overall policies related to Web archiving programs.
A few major takeaways from the report include:
Public libraries participating in the survey increased to 13% (15 of 119) of respondents from less than 3% of respondents in each of the previous surveys.
A growing comfort for archiving without permission or notification. Seventy percent (71 of 101) of institutions in 2017 did not seek permission or attempt to notify the content owner. Also, 91% (106 of 117) of respondents reported never receiving a takedown or stop crawling request.
A notable 51% (23 of 45) of organizations reported using Webrecorder, which was publicly launched in 2016 as a browser-based tool to allow for the capture of content difficult to capture via traditional link-based crawling.
Archive-It continued to be the preferred external service for harvesting Web content, with 94% (97 of 103) of respondents using this service. While the vast majority of respondents are utilizing Archive-It, few (20 of 108) organizations reported downloading their WARCs for local preservation or access, continuing a trend denoted in previous surveys.
Diversification of the field, maturation of programs, and technological developments presented areas of progress for the profession, while access to archived content and institutional support for program expansion remained relatively unchanged from prior survey years.
NDSA Web Archiving Survey Working Group
Interested in activities like this, or in joining with other organizations committed to the long-term preservation of digital information? Get involved with NDSA yourself at: http://ndsa.org/get-involved/
SharePoint is just a document management system so why are there long threads on Reddit devoted to its inadequacy? Why do your users hate it so much? Can you make it better? Here are a few tips to improve the SharePoint experience.
Besides all of that drama, many companies create separate SharePoint domains for every minor department or subdivision of the company. Moreover, some of what you need for context in order to find things in SharePoint, may not be in SharePoint at all. In other words, to do search you need a system designed for modern search. You also need to consider using security groups rather than having separate SharePoint domains, and to think carefully about how to organize your documents into libraries.
It Is Too Slow
Obviously, you need adequate numbers of disks and CPU for your SharePoint servers.
Use groups instead of individual user permissions as this slows down the indexing process (individual permissions are also a pain to manage).
Out of the box, SharePoint isn’t the prettiest thing. However, SharePoint is highly customizable, especially the web layer. Brand it with your corporate look and feel — or a more attractive variant of it. Using HTML and CSS, you can change the look and feel with minimum hardcore technical effort.
They Can’t Share Things
You can improve your users’ ease of sharing if you:
In the end, SharePoint is a powerful tool for organizing internal content, controlling access, and sharing content externally — if used and configured correctly. SharePoint’s search capabilities aren’t great, but can be tuned. You will be better off using dedicated search software to provide more relevant, personalized, and context-sensitive results.
The makerspace at Abilene Christian University has been operational since 2015. It is unusual in that it is an academic space at a private university that offers equal service privileges to both the campus and the community. In an attempt to encourage a maker mindset within our broader region, we began offering a series of day camps for elementary and middle school students. To our surprise and delight, the homeschool community became our biggest group of participants. What started as serendipity is now a conscious awareness of this group of patrons. In this article, I outline how our camps are structured, what we discovered about the special needs and interests of homeschool families, and how we incorporate this knowledge into outreach and camp activities. I also share how we evaluate the camps for impact not only upon campers but also within the larger goals of the library and university.
Makerspaces, also known as Fablabs or hackerspaces, are a growing area of service for libraries. Makerspaces are places that combine tools, technology, and expertise to let people create physical and sometimes digital objects. Initially found mostly in engineering departments, makerspaces have grown internationally to encompass all disciplines and ages (Lou & Peek, 2016), placing them well within the mission scope of academic, school, and public libraries.
Central to the ethos of makerspaces is the idea of community. Makerspaces allow individuals from all ages, experience levels, and backgrounds to come together and collaborate in a shared space. In his formative essay, Dale Dougherty, founder of the maker movement, suggests that the ability of makerspaces to transform education is that they offer anyone a “chance to participate in communities of makers of all ages by sharing your work and expertise. Making can be a compelling social experience, built around relationships” (Dougherty, 2013, p. 9).
The library at Abilene Christian University (ACU) seeks to encourage this educational transformation through its own makerspace, which we call the Maker Lab. First opened in 2015, the ACU Library Maker Lab is an academic makerspace that is open to all areas of the campus as well as to the public. In an attempt to grow the maker mindset both in our community and among our campus families, we decided to host a series of children’s and youth day camps called Maker Academy. To our surprise and delight, the homeschool community became our biggest group of participants. What started as serendipity is now a conscious awareness of this group of patrons. This article will outline how our camps are structured, what we discovered about the special needs and interests of homeschool families, and how we incorporate this knowledge into outreach and camp activities. We will also share how we evaluate the camps for impact not only upon campers but also within the larger goals of the library and university.
College preparedness and overall learning is a concern that academic librarians share with their colleagues in public and school libraries, so involving the kindergarten to 12th grade (K-12) population at universities is not unheard of even though it may not be commonplace (Tvaruzka, 2009). The University of Nevada began holding workshops for teachers of K-12 students to familiarize them with college research assignments and to help them infuse age-appropriate library skills into their own assignments. They expanded the program to involve middle school students to introduce them to the support and services a college library can provide (Godbey, Fawley, Goodman, & Wainscott, 2015). At the University of South Alabama, librarians conduct summer enrichment programs to enhance the research literacy of high school students who are preparing for health careers (Rossini, Burnham, & Wright, 2013). Although they may not work extensively with children and teens, libraries of higher education do reach out to younger people.
Camps and summer experiences are a different environment from formal classes, and necessarily require different activities and teaching techniques suitable to the age of the campers. Academic libraries vary in the degree to which they are directly involved with K-12 summer activities. Many offer tours and scavenger hunts for children on campus as part of other events (North Carolina State University, n.d.; University of Michigan, 2017; Washington State University Libraries, 2018). Others interact with academic camps hosted by different departments to do a session on research skills, often with a more informal and problem-oriented approach. Ohio university librarians, for example, developed interactive ways of introducing high school engineering students to their discipline’s literature and the role it plays in developing new products (Huge, Houdek, & Saines, 2002), and a team of librarians at Carnegie Mellon expanded their roles to work with middle school girls pursuing engineering (Beck, Berard, Baker, & George, 2010). Each example offers a good description of adapting traditional library content, and teaching brief research workshops for youths.
The literature has fewer examples of extended camps that involve the library for the entirety of the camp experience. Temple University Library in Philadelphia involves high school students in a summer intensive initiative to do community mapping and thus build knowledge of Geographic Information Systems (GIS) coding (Masucci, Organ, & Wiig, 2016). In a collaborative model, the University of St. Thomas partners with St. Paul Public Library to host a five-session series for middle school students to implement laser cutting, circuitry, and basic engineering design skills (Haugh, Lang, Thomas, Monson, & Besser, 2016). Virginia Commonwealth University Library works with its School of Engineering to offer extensive research instruction appropriate for youths participating in a campus camp where they design and prototype their own inventions (Arendt, Hargraves, & Roseberry, 2017). In these examples, the library is more of a full partner and host of the camp experience, although they typically share the work with other units.
With respect to homeschool students, a surprisingly small body of literature addresses the educational needs of the audiences the library serves as opposed to the programs the library offers. By far, any information regarding library service to homeschoolers is from a public library perspective. This paucity of articles, especially of recent publication, suggests a gap in the literature, The majority of articles discuss the interests of families who homeschool and the implications upon library programming and collections (Blankenship, 2008; Johnson, 2012; Shinn, 2008). Paradise talks about the need to have child-friendly environments, especially with computer accessibility, that accommodate families and children working for extended sessions (2008). Others have observations about communication and outreach, noting that families who homeschool tend to connect with each other at the community level, making it imperative that the library utilize these dialog channels to reach out to members (Hilyard, 2008; Willingham, 2008). Particularly relevant to those planning educational activities is an article on creating successful programs for children in homeschools and the need to include hands-on learning (Mishler, 2013). Although these sources have a public library orientation, they provide useful suggestions for other librarians from libraries experienced in working with homeschooled youths, especially to those librarians in higher education, who may be less familiar with this age group.
Brief History of the ACU Library Maker Lab
Abilene Christian University is a private university located in Abilene, TX. With an enrollment of just under 5,000 students, ACU falls within the Carnegie classifications as a “Masters Colleges and Universities: Larger Programs” institution. There is one library on campus that serves all disciplines.
The Maker Lab is an 8,000 square foot space within the library. It was established in 2013 with funding from the university Provost’s Office and the library’s own budget. It has many of the tools typically found in makerspaces: 3D printers, a laser cutter, vinyl cutter, sewing machines and fiber arts supplies, power tools and hand tools for woodworking, and electronics workbenches. It is staffed by a full-time lab manager, a librarian who divides her time between the Maker Lab and the regular reference and instructional services within the library, and a cadre of student workers who supply a total of 132 student hours a week, Monday through Saturday.
Most of what the Maker Lab offers is free. There is no membership fee or charge for tool usage or machine time. We do charge for some materials, like 3D printer filament, sign vinyl, or large sheets of plywood. The full list of materials and their prices are available via the Maker Lab Store1. Makers are welcome to bring their own materials if they want something other than what is already on the premises. We maintain a fairly robust “scrap pile” of leftover and donated material, and these are free. We have found that having materials on hand, particularly free scrap, is a good way to encourage newcomers to try something and helps overcome initial barriers to getting started.
A significant distinction of the ACU Library Maker Lab is that it is completely open to all individuals regardless of whether they are affiliated with the university. This policy was a deliberate decision from the space’s inception. The ACU Library as a whole has a rich tradition of serving anyone who is in the building and of collaborating with other libraries of all types in the area. The Maker Lab continues that tradition. Even the computers in the Maker Lab do not require a special university login. The space is designed to be as convenient as possible for all makers to use.
Since its opening, the Maker Lab has enjoyed a healthy yet somewhat narrow use among our institution’s population. We wanted to build upon our user base and encourage a maker mindset among other groups. This desire was the inspiration for Maker Academy.
Maker Academy is a series of day camps for kids. We first offered it in the summer of 2014 and have continued every summer since then, refining the program each year. We presently have three camps: one for children in 4th and 5th grade, and two middle school camps for youths in 6th through 8th grades. Each camp lasts three days and goes from 9:00 a.m. – 4:00 p.m. We charge $100 per maker. This fee covers all making supplies, lunch and snacks for each day, and a camp T-shirt. We host the camps as outreach, not a profit-making activity. We do not make money from the camps. We cap registration at 20 makers per camp because that is the maximum our space and staff can accommodate.
Maker Academy introduces kids to tinkering and learning through making. From activities like building catapults, kites, and go-carts, they learn science principles as well as prototyping and fabrication skills. They might learn soldering and basic circuitry to wire a lamp or make an Arduino robot – a small microcontroller or simplified circuit board designed to operate mechanical devices. They experiment with various pieces of graphics software to create their own T-shirt designs or decorations for their projects. Each activity fosters curiosity, problem solving, and creativity. Activities change each summer to give makers a new experience each time.
We initially advertised the camps to those we thought would be our primary audience, namely children of faculty and youth in local schools. We made posters, sent email blasts, wrote campus newsletter articles, posted on blogs, and advertised on the library’s web page. We talked with many colleagues personally. We sent out fliers and emails to public elementary and middle schools in the area. Then we eagerly waited to see who would enroll.
For the first set of camps, only 12 kids registered. It was a very disappointing turnout for all the effort we expended. We distributed more emails and more reminders, but they made little difference.
The turnaround occurred when a parent who homeschooled heard about the camps through a source unknown to us, and volunteered to share the news on the local homeschool listserv. Within one day, all the camps filled up and overflowed to a waitlist. We were overwhelmed and amazed. We realized that we had unintentionally overlooked a significant portion of our community and sphere of influence. We needed to focus more on the homeschool community and how we could include them, not only in our marketing, but as a planned audience in our programming. But first, we needed to know more about this population.
Characteristics of Homeschoolers and Library Implications
We wanted to understand the needs of those who homeschool and how the library could speak to those needs. We found helpful published information that we mentioned earlier in this article, but most of what we learned came from speaking with parents, observing the kids, and getting feedback along the way. Over time, we noted several characteristics of local homeschool families that inform how we structure our outreach and services.
Homeschoolers are a tight community with a culture of sharing.
Those who homeschool often do so because they can choose their own curriculum and approach to education. There is no national group that oversees homeschool education. Because of homeschooling’s independent nature, there is no single association to which homeschool families will belong and from whom the library can get a convenient list of members in its area. While there are broad-based companies that offer customized curriculum guides, homeschooling tends to be organized around multiple state or regional groups and small co-ops for common interests. It is incumbent upon the library to identify these groups in order to reach members.
The local homeschool groups not only connect members but they also foster information sharing. Since there is no central governing agency, families share news among themselves about homeschooling. The regional groups provide a loose organization in which the sharing takes place. If the library can identify the local homeschool groups, it can become part of the resources that will be shared with others.
Homeschool groups have specialized but very effective communication channels.
Parents exchange information, curriculum ideas, and expertise via email lists. Local chapters will announce education activities that support independent learning. These listservs and announcements often constitute the main vehicle of support for homeschooling families, so they are very active and very efficient (Turner, 2016). More importantly, they also tend to lie outside a library’s traditional outreach channels.
When the parent who homeschooled posted the news about our Maker Academy, all it took was one initial email for most of the local homeschool community to know about the camps and to respond. It was a very effective means of communication. To use it, however, we had to get on the email list. We realized that our usual outreach channels were not as widespread as we thought, and we needed to expand our reach to draw in this special yet substantial portion of our community.
Homeschoolers are anxious for innovation and meaningful educational opportunities.
Surveys indicate that dissatisfaction with academic instruction at traditional schools is the second highest reason families decide to homeschool (McQuiggan & Megra, 2017). The same survey goes on to report that 39 percent of homeschool families rank a desire for nontraditional instruction as “important.” Real world, personalized learning is valued by many who practice homeschooling.
This is good news for libraries because innovative learning is very much what makerspaces are all about. Nearly every activity in our Maker Academy involves hands-on learning with a direct tie to real world skills. For example, students participate in making their camp T-shirts and learn about screen printing and vinyl cutting. They learn how a basic circuit works and put that knowledge into practice by wiring their own custom-made lamps. Especially effective are activities that build upon recently acquired skills and that naturally scaffold up to more advanced knowledge. We frequently will start beginners with a “learn to solder” project where they connect LEDs to a battery to make a wearable badge that lights up. Soldering, we explain, is the basis of many electronic projects. Then we introduce a switch where they can turn the light on and off. Next they solder multiple switches to connect a speaker to amplify music from a cell phone. Learning by doing lets young makers build their knowledge in ways that are effective, practical, and that both they and their parents can appreciate.
Homeschool families, like anyone else, appreciate free tools they can use at home and shared resources they can use at libraries.
When asked where they get educational material, over 70 percent of homeschool families cite free content on websites and material from the public library as their main sources of curriculum material (Redford, Battle, & Bielick, 2016). They are looking for tools that are easily accessible and affordable yet provide quality education.
This creates a rich opportunity for the library to teach with open source/open access tools. It is the nature of makerspaces, and increasingly of libraries, to embrace and even favor open source and open access. Although many of the high-tech machines in a makerspace are expensive, they run using open source software that anyone with a computer can download for free. We promote these tools in Maker Academy. We take the classes to our computer lab and introduce them to design programs like Inkscape for graphics and TinkerCad for 3D design. We tell them and their parents that this software is free and that they can use these tools at home and then come to the library to print their designs on a laser cutter or on a 3D printer. This opens enormous possibilities and leverages the library-community relationship. It creates opportunities to maintain contact as homeschool families come back after camp to use the library.
Many homeschool families look for opportunities for their children to engage socially and to work as part of a team.
Researchers define socialization differently, and the answer to the question of what it means to be properly socialized is a highly personal one. However, there is some general consensus that socialization involves a common set of abilities: (a) functional life skills that enable people to operate successfully in the “real” world; (b) social skills including the ability to listen, to interact, and to develop a strong sense of identity and values; and (c) a sense of civic engagement, or of being willing and able to give back to society (Kunzman, 2016; Neuman & Guterman, 2017). Homeschool parents are aware of the need to avoid isolationism, and they actively seek opportunities that provide socialization experiences in ways that conventional schools may not. They want their children to respect others, to develop a sense of teamwork, and to get along with people of diverse backgrounds. They use a variety of resources outside the family to provide these opportunities (Medlin, 2013).
To respond to the need for social enabling, we deliberately schedule Maker Academy activities that facilitate these softer skills. Makers may work on individual projects, helping each other as needed, but they also have group projects that they develop collaboratively. They may divide into groups to make gravity powered go-carts where they have to share design decisions, take turns using tools, and agree on a team name and logo. Sometimes they construct individual paper roller coasters, but then they combine their individual models into one big camp model. Simply having a camp experience lets participants take part in positive social interactions, while group activities encourage them to practice brainstorming, collaboration, and consensus-building skills.
As a further way to inject social skills, we incorporate a grand Show and Tell on the last day of camp. We invite friends and families to see the final results of what their children have been doing. Makers show what they have made, explain how they made it, and what they learned along the way. They often will take their parents on a tour of the Maker Lab, explaining in detail how the machines work and how they used them. We serve refreshments, and there is lots of interaction. The whole event serves to clinch the camp experience for everyone. A fun presentation time completes the creative cycle for makers by allowing them to share their results and practice answering questions publicly. Parents see evidence of the broader, holistic social skills beyond just the technical learning.
Evaluation is an important yet often challenging part of every library initiative. It is especially crucial for a nontraditional service like makerspaces and youth day camps since university administration may question the expenditure of time and effort on a population not directly affiliated with the institution. Gibson and Dixon (2011) offer a framework that libraries can use for assessing the impact of various programs. It is particularly relevant in that it stresses degree of engagement rather than only the metrics of attendance and popularity as indicators of success.
Effectiveness encompasses not only how many people a program reached but also the depth of outreach. Were we effective in reaching people in terms of numbers, and how wide a net did we cast? Were we inclusive?
Our original outreach to faculty families and public schools was not very effective. It garnered only 12 responses for our three maker camps. It was not until we serendipitously discovered the additional segment of the homeschool community that our outreach became more effective. We went from 12 campers that we had to go to great lengths to recruit to registration that now fills up within the first day, plus a waitlist.
The key for us was learning about the homeschool segment of our community, and how they communicate. Our typical outreach channels were too narrow; they excluded this significant community. We had to broaden our promotional methods to include more than we had been doing.
We learned an important lesson. At the start of any new marketing initiative, we now try to ask ourselves, “What groups are we missing because we are defining our user base too narrowly?”
Proof of Educational Benefit
Proof of educational benefit is especially important for academic libraries. Academic libraries are, at their heart, educational institutions. Their mission is tied to learning. While gate counts and program attendance can indicate popularity, the deeper question we have to answer is whether or not anyone learned anything worthwhile.
Fortunately, educational evidence is fairly easy for something like Maker Academy. The objects the campers make serve as proof of what they learned. Campers make T-shirts, build catapults, assemble speakers powered by a raspberry pi, create robotic drawing machines from a small motor, and make many other things. Parents see these objects displayed in the Show and Tell at the close of camp, the children can explain how their creations work, and they can take their projects home. These objects are physical evidence of learning.
Paper roller coaster activity and whiteboard showing lessons learned from the project. From ACU Maker Academy 2017.
We have also discovered that sometimes we have to make other types of learning a little more evident. A paper roller coaster might seem like a fun construction activity, but there are some important principles of speed and motion, not to mention principles of life, that kids learn from it. Asking young makers to articulate what they discovered and then writing these on a whiteboard uncovers the deeper learning behind the activity (Image 1). Consider these examples from what young makers in our 2017 Maker Academy said they learned:
Mistakes are fixable.
Collaboration = better ideas.
Heavy goes slow. Light goes fast.
Tape is good.
Tape is bad.
Test your ideas.
For parents, we call the completed projects and accompanying responses “what the kids made” and “what they learned.” For university administrators, we use the pedagogical terms of “learning artifacts” and “reflective practice.” Each audience has its own jargon, but no matter the terminology, it is proof of learning
According to Gibson and Dixon (2011), strategic positioning involves factors like increasing the attractiveness of the library as a partner, developing friends and allies, and increasing the quality of relationships. It implies taking conscious steps that will put the library in a good place in the future. With noticeable growth of homeschooling (Ray, 2018), it is likely that a significant portion of future college students will come from a homeschool environment. Colleges and universities would benefit by cultivating relationships with this market segment.
Library programming, particularly that which involves creating something, can play a part in recruiting (Scalfani & Shedd, 2015). At ACU, Maker Academy brings many people to campus who would not otherwise come. It introduces them to the layout of the buildings. It gets them inside the library where they see the environment, meet the librarians, and hopefully learn that the campus is not totally closed off to them. Inviting them here early encourages them to think favorably about higher education, and possibly the library’s parent institution, as part of their future. Making friends with external constituencies, demonstrating how the university is a partner for success, and aiding in recruiting are all forms of social capital that administrators value.
Long-term Learning Outcomes
Outcomes refer to changed behaviors or attitudes that program participants exhibit as a result of the program. We see program outcomes in the form of voluntary extra involvement in the Maker Lab. This comes from homeschool co-ops who want to bring their group in for a tour and a hands-on activity. It also comes from a homeschool robotics group that designs robotic parts with the open source software they learned in Maker Academy and visits in person to print the parts using our 3D printers. A particularly significant outcome indicator is when we host a Maker Festival and some of the young Maker Academy graduates volunteer to exhibit new things they have made or to teach a workshop on a skill they have learned in the makerspace (Image 2). Going from student to teacher is a powerful outcome.
ACU Library MakerFest 2017
Using maker day camps as an outreach tool has been a very successful program for us. We learned some important lessons about our users as well as about our own practices. But these lessons are not only for us. They can speak to other libraries as well.
Libraries of all types seek to serve others. Whether providing storytime in a public library, curriculum resources for a school, or research help for a paper or dissertation, we all seek to connect our services with others’ needs. Serving our patrons starts with an awareness of them. It may be that other libraries, like ours, tend to define our patron base too narrowly. We use more effort to look in the same places using the same outreach tools and fail to realize who we might be missing in the process. For ACU, that group was homeschoolers. Simply asking the question, “Who are we missing?” is a good reminder for all of us to look more widely and inclusively at those we can reach.
Secondly, we in libraries can realize that we have more to offer than we may think. Pedagogical research has long advocated for participatory learning, social engagement, reflective inquiry, and technology blended with traditional methods. Libraries offer these things, and they are a natural match for homeschool families. Hands-on activities, group learning, and teaching with open source tools are what libraries do, and makerspaces are innately geared to many of the forms of teaching and learning that educational reform is calling for. Libraries can confidently come forward to show how we demonstrate solutions and learning outcomes, and we can be leaders in reaching new markets beyond the people already on campus.
Incorporating families who school at home into the library’s larger community is first a matter of communication rather than one of program content. Many of the needs and interests of homeschool families are similar to those in mainstream education. Hilyard notes, “The only real challenge libraries face in serving homeschooling families is reaching them.” (2008, p. 18). Ten years later, this initial challenge remains the same today. Plugging into the already existent local listservs and co-ops is imperative and transformed our effectiveness in reaching those around us.
Part of thinking beyond the immediate college-age population is thinking more inclusively of special interest segments of a library’s external community. For us, this special interest was those who homeschool. Parents, children, and—in the future—teachers and homeschool co-ops are all part of our expanded patron base. We remind ourselves that sometimes success depends less on fishing in the same spot but on how wide we cast our nets.
My gratitude goes out to Sarah Naper, external reviewer for this article, and to Amy Koester, internal reviewer. You gave extensively of your time and experience, and you encouraged me to go deeper by your questions. Not only this article but this librarian is better because of your involvement. Thanks, too, to Kellee Warren whose work as publishing editor guided me through the publication process, further polished the article, and kept me focused along the way. To Darren Wilson and my other colleagues in the ACU Library Maker Lab, thanks for your work every day and for making learning fun.
Arendt, J., Hargraves, R. S. H., & Roseberry, M. I. (2017). University library services to engineering summer campers. In 2017 ASEE Annual Conference & Exposition. Retrieved from https://peer.asee.org/29060
Beck, D., Berard, G., Baker, B., & George, N. (2010). Summer engineering experience for girls (SEE): An evolving hands-on role for the engineering librarian. In 2010 Annual Conference & Exposition (pp. 15.1146.1-15.1146.25). Retrieved from https://peer.asee.org/16030
Blankenship, C. (2008). “Is today a homeschool day at the library?” Public Libraries, 47(5), 24–26.
Dougherty, D. (2013). The maker mindset. In M. Honey & D. E. Kanter (Eds.), Design, Make, Play: Growing the Next Generation of STEM Innovators, (pp. 7–11). New York, NY: Routledge.
Godbey, S., Fawley, N., Goodman, X., & Wainscott, S. (2015). Ethnography in action: Active learning in academic library outreach to middle school students. Journal of Library Administration, 55(5), 362–375. https://doi.org/10.1080/01930826.2015.1047262
Haugh, A., Lang, O., Thomas, A. P., Monson, D., & Besser, D. (2016). Assessing the effectiveness of an engineering summer day camp. In 2016 ASEE Annual Conference & Exposition (Vol. Paper #15045). https://doi.org/10.18260/p.26311
Hilyard, N. B. (2008). Welcoming homeschoolers to the library. Special Section, 47(5), 17–27.
Huge, S., Houdek, B., & Saines, S. (2002). Teams and tasks: Active bibliographic instruction with high school students in a summer engineering program. At Ohio University, 63(5), 335–337.
Johnson, A. (2012). Youth matters: Make room for homeschoolers. American Libraries, 43(5/6), 86–86.
Kunzman, R. (2016). Homeschooler socialization: Skills, values, and citizenship. In M. Gaither (Ed.), The Wiley Handbook of Home Education. John Wiley & Sons, Inc.
Lou, N., & Peek, K. (2016, February 23). By the numbers: The rise of the makerspace. Popular Science, 288(2), 88.
Masucci, M., Organ, D., & Wiig, A. (2016). Libraries at the crossroads of the digital content divide: Pathways for information continuity in a youth-led geospatial technology program. Journal of Map & Geography Libraries, 12(3), 295–317. https://doi.org/10.1080/15420353.2016.1224795
McQuiggan, M., & Megra, M. (2017). Parent and family involvement in education: Results from the national household education surveys program of 2016. (No. NCES 2017-102). National Center for Education Statistics. Retrieved from https://eric.ed.gov/?id=ED575972
Mishler, T. (2013). Hands-on homeschool programs. Florida Libraries, 56(1), 7–8.
Neuman, A., & Guterman, O. (2017). What are we educating towards? Socialization, acculturization, and individualization as reflected in home education. Educational Studies, 43(3), 265–281. https://doi.org/10.1080/03055698.2016.1273763
Paradise, C. (2008). Our homeschool alliance is a winner. Public Libraries, 47(5), 21–22.
Ray, B. D. (2018, January 13). Research facts on homeschooling. Retrieved May 16, 2018, from https://www.nheri.org/research-facts-on-homeschooling/
Redford, J., Battle, D., & Bielick, S. (2016). Homeschooling in the United States: 2012. National Center for Education Statistics. Retrieved from https://eric.ed.gov/?id=ED569947
Rossini, B., Burnham, J., & Wright, A. (2013). The Librarian’s Role in an Enrichment Program for High School Students Interested in the Health Professions. Medical Reference Services Quarterly, 32(1), 73–83. https://doi.org/10.1080/02763869.2013.749136
Scalfani, V. F., & Shedd, L. C. (2015). Recruiting students to campus. College & Research Libraries News, 76(2), 76–91.
Shinn, L. (2008). A home away from home: Libraries & homeschoolers. School Library Journal, 54(8), 38–42.
Turner, M. L. (2016). Getting to know homeschoolers. Journal of College Admission, 233, 40–43.
Tvaruzka, K. (2009). Warning: children in the library! Welcoming children and families into the academic library. Education Libraries, 32(2), 21–26.
Thomas Nelson Page‘s Washington and its Romance was a popular history of Washington, DC prior to its becoming the nation’s capital in 1800. It was one of Page’s last works to be published. He died in 1922, and the book came out in 1923, with illustrations by Walter and Emily Shaw Reese. Copyright to the book was renewed in 1950, and it will enter the public domain 21 days from now.
The book may be of more interest today as a period piece than as a historical reference. As its title suggests, the tone of the book is more sentimental than scholarly. And those sentiments tended to be partial. As a summary of his work at the University of Alabama notes, Page did much to promote a “moonlight and magnolias” view of the Old South that romanticized the perspective of white Southerners (such as Page, who was born on a slave plantation in Virginia), and largely excused slavery and lynching.
But Page’s perspective was quite popular in 1923, a year notable for the rise of the Ku-Klux Klan, which had been revived after the success of D. W. Griffith’s 1915 movie Birth of a Nation, and which would peak in influence later in the 1920s. Page was a popular and prominent author, even serving for six years as ambassador to Italy under Woodrow Wilson.
Washington and its Romance was reviewed generally positively in a 1924 issue of the Catholic Historical Review, a journal published by Washington’s Catholic University of America. (The reviewer takes particular interest in the book’s account of the founding of Georgetown College, a Jesuit institution.) The copy of the review I’m linking to is a copy in JSTOR, which I can read thanks to my institution’s subscription, but which may restrict access for readers who don’t have a JSTOR subscription. HathiTrust also has a scan of this issue, but their copy is also only available to search, and not to read, since it’s after the 1922 “bright line” where publications are known to be public domain without requiring any special research.
As I noted in my talk yesterday, though, many American scholarly journals did not renew copyrights. That includes The Catholic Historical Review. The only active renewal I have found associated with it (before the automatic renewals that apply to copyrights after 1963) is for a single article in the January 1961 issue. The review I cite above, and decades of later articles from this journal, are already in the US public domain.
I’ve created a copyright information page for this journal, and as I noted in my talk yesterday, I’m happy to do the same for other journals published in the mid-20th century that people tell me they’re interested in, if they aren’t already represented in our copyright renewals inventory. (You can use this form to suggest titles; if you say in the “anything else we should know?” blank that you’re interested in a copyright information page, that would be helpful.) Also, if folks want to add more information about journals already in the inventory, I’m happy to show them how to research periodical copyrights and create or add to the JSON files for them that my knowledge base uses.
I hope to make the inventory, and its accompanying decision guide, trusted enough that people can rely on them to open access to content like the review I cite above. I’ll consider it a notable early success if we see an copy of this review, and the rest of its issue, become openly available before all of 1924 enters the public domain in 2020.
Marina Georgieva presented a poster at Digital Preservation 2018. Please read on for a closer look at her work, one of the many great offerings from this year’s event. For more posters, please visit https://osf.io/view/ndsa2018/.
Marina holds Master’s Degree in Library Science with Information Technology concentration from the University of Wisconsin – Milwaukee. She’s currently Visiting Digital Collections Librarian at the University of Nevada – Las Vegas. Her passion is large-scale digitization with cutting edge technologies. Her research interests include project management in large-scale digitization and approaches for achieving higher digitization efficiency such as staffing and training, development of workflow, procedures and guidelines. Marina is also involved in metadata and authority work as well as metadata remediation projects.
The digital librarian: the liaison between digital collections and digital preservation
At UNLV Libraries, the role of the Digital Collections Librarian goes beyond the traditional routine tasks of digitization, metadata management, project management, workflow development and team management. Digital Collections Librarians serve as links between digitization and digital preservation and do everything in between to draft sustainable digital preservation workflows alongside their colleagues in the Special Collections Technical Services Department. Technical Services Librarians are responsible for the preservation of born-digital archival materials, whereas the Digital Collections Librarians’ roles entail being information architects directly engaged in the process of preparing master files of in-house and outsourced reformatted materials for digital preservation.
In recent years, the UNLV Libraries Digital Collections Department has completed numerous large-scale digitization projects that yielded hundreds of thousands new archival digital objects that require long-term preservation. Currently all these archival files are stored on a server, referred to as ‘The Digital Vault’.
One of the invisible, often overlooked, yet very important roles of the Digital Librarian is to verify that all images from completed digitization projects are properly organized in meaningful easy-to-navigate directories and that all files are in the appropriate file format. It is common practice for folder directories (created and organized during the actual process of digitization) to remain intact and be moved to the Digital Vault for long-term storage in their original order. There they get merged in the collection-appropriate existing folders or, if necessary, a new folder is created.
Additionally, UNLV Digital Collections has thousands of images from legacy collections stored in the Digital Vault. All of these digital objects live on the Digital Collections website, but some of the archival master folders have redundant data; others are saved in inappropriate file formats, and still others have non-normalized file naming. In the recent years, there has been an effort to clean up and restructure these legacy folders in order to make the archival files easily discoverable and to optimize the storage space before the content of the Digital Vault gets migrated to a new more robust system (UNLV Special Collections and Archives is currently building an instance of Islandora CLAW that will back up files in Amazon Glacier).
The role of the UNLV Libraries Digital Librarian that relates directly to the digital preservation is outlined in the poster presented at 2018 NDSA DigiPres Forum (click here for access). Here we will just briefly touch upon few of the major responsibilities:
File naming conventions
For current digitization projects, file naming has been normalized and it happens in a structured and logical way depending on the type of collection being digitized. During the process of preparing collections for digitization, the librarian analyzes the content, makes decisions regarding the grouping of the digital objects and assigns collection-level and item-level digital identifiers. To achieve consistency and logical arrangement, the digital librarian maintains and updates spreadsheets with assigned and available digital identifiers.
For example, if the collection consists of archival photographic materials, the assigned digital collection alias will be ‘PHO’ with the sequential numeric identifiers. These identifiers will logically follow the structure and numbering of all other previously digitized photo collections.
As mentioned earlier, most of the newly digitized collections remain in the original directory structure that was developed during the scanning process. The digital librarian ensures that the file naming on directory level and on file level is accurate and the data set is ready to be moved to the Digital Vault.
It is important to mention that often digital librarians need to deal with and manage more identifiers beyond those that identify archival structure (collection, folder) and those that identify the intellectual unit (item) so that they can accurately reflect the structure of materials. So they also need to create a third type which may involve multiple image files that comprise a single digital object; for example, back and front of a printed item or multiple items on a page in a scrapbook.
Legacy collections bring more challenge and sometimes need some clean up as their file naming may be inconsistent. Depending on the project, the digital librarian may decide to keep the file structure intact or to rearrange the folders in more normalized way that follows the current preservation practices.
Decisions on archival file formats
UNLV Libraries Digital Collections have chosen TIFF file format for long-term preservation of archival master files. TIFF is the preferred format for in-house digitized reflective materials and transparencies.
The file format for digitized periodicals may vary depending on the project. In-house digitized periodicals and newspaper clippings are preserved in TIFF just as photographs and films, while periodicals digitized as part of the National Digital Newspaper Program are stored in the original Library of Congress approved data sets. These data sets include newspaper pages in JP2, PDF and TIFF formats along with the accompanying metadata encoded in XML METS/Alto schema.
Legacy collections may contain files in JPG format. This usually applies to collections accessioned as already digitized materials. The reason why they usually they remain in this format is that UNLV Libraries Special Collections do not have holdings of the original materials and therefore, it is impossible to re-digitize the items in the proper archival format.
Building directories in the Digital Vault
Current digitization and digital preservation efforts follow well-established practices regarding how files are nested in directories so that they have logical structure and are easy to navigate.
For example, the archival master files of a digitized photographic collection get migrated to the general folder that holds all archival files of all photo collections. This directory contains a blend of additional sub-folders that represent compound objects and files that represent the single objects. It is nested in a higher level Photo Collections folder.
To illustrate the scenario above, please examine the following example.
[…] Digital Vault\PHO Photo Collections
> PHO Archival Images
Legacy collections usually get reorganized, especially if the folder structure is not logical or there are redundant files and folders.
Outsourced periodicals are kept in the original directory structure as created by the vendor. Data sets arrive separately and each of them is considered a batch and is stored separately in a parent-level folder hosting only outsourced periodicals.
[…] Digital Vault\NDNP Local Backup
> Batch Aurora
– Original data set as received by vendor
> Batch Beatty
– Original data set as received by vendor
> Batch Caliente
– Original data set as received by vendor
Communication with the IT Department
The Digital Vault is a directory with limited access – librarians get “view only” mode and they need to communicate all needs for data migration, remediation requests and new decisions regarding folder structure to the IT department who maintains the Digital Vault.
The role of the digital librarian in this communication is critical, because of the limited access to the server. Usually this communication is informal writing – simply emailing the requests or updates along with instructions what needs to be accomplished. Recently, for some larger and more complicated clean-up projects, the UNLV digital librarians adopted Google Sheets. The advantages of Google Sheets is that more than one person can access and edit the document simultaneously to communicate changes and project updates.
The archival files prepared for migration are stored in a temporary location and once the move is complete, the digital librarian checks if all files and directories were moved successfully, and if there are any corrupted files or other discrepancies that need attention. Upon verification, the files in the temporary location are deleted permanently.
In remediation scenarios when legacy stuff needs to be cleaned up (deleted, moved elsewhere or restructured) usually the communication includes initial written instructions followed up by one-on-one meeting. The one-on-one meeting talks through all the necessary changes and serves as a final overview of the project as the nature of these actions is irreversible.
The day-to-day job of digital librarians has a slightly different focus than digital preservation, and yet digital librarians play a valuable role in building structure, organizing, and cleaning up data. Digital librarians are very user-focused – not just on internal end-users of the archival masters, but on library users who may need delivery of master images via a library system (online) or directly (via image reproduction). Their outstanding organizational skills and attention to detail not only make the data easily discoverable and ready for migration, but also optimize the storage space and lay the groundwork for a smooth migration to a new, more robust system. In institutions just starting to make baby steps in digital preservation, the digital librarian plays a key role of advancing a step closer to a robust and efficient long-term digital preservation strategy.
Following the pattern we established with the first two Islandoracons, we're going to have one day during the main conference that's comprised of 90-minute workshops running in multiple tracks. Also following the first two, we're looking to you, the Islandora community and potential Islandoracon attendees, to tell us what kinds of workshops you want to see.
We have a table here where you can add your suggestions. A few weeks into 2019, we'll take these suggestions and turn them into a survey so the community can vote on some favorites. The Islandoracon Planning Committee will then to their level best to find the best possible instructors for the topics you choose.
Please give us a topic, a rough description of what you think the workshop would cover, and if you can, the level it should aim for and the platform (Islandora 7.x or Islandora CLAW) that you think should be covered.
Natural language search is the insane idea that maybe we can talk to computers in the same way we talk to people. Absolutely nuts, I know.
With the increasing popularity of virtual assistants like Siri and Alexa, and devices like Google Home and Apple’s Homepod, natural language search is ready for prime time in the devices in our homes, our offices, and in our pockets.
Alexa, Siri, and Google Home are Search Apps
All these devices and virtual assistants making their way into our homes and hearts have search technology at their core. Any time you query a system or database or application and the system has to decide which results to display – or say – it’s a search application. OpenTable, Tinder, Google Maps are all search-based applications. Search technology is at the core of nearly every popular software application you use today at work, at home, at play, at your desk, or on your smartphone.
But how you interact with these systems is changing.
The Annoyance of Search
In the old days, if you were searching a database or set of files or documents for a particular word or phrase, you’d have to learn a whole arcane set of commands and operators.
Boolean in a Nutshell
You’d have to know Boolean operators and other logic so you could search this table WHERE this word AND that word OR that other word appear but NOT this other word and then SORT by this field. (Got it?)
Each system had its own idiosyncrasies that only experienced users would know and you might have to run queries and reports multiple times to make sure you got the right results back – or to make sure you got all the results possible. You’d have to know the structure of the database or data set you’re querying and which fields to look at.
These requirements for using search systems put barriers to entry for people wanting to find information to do their jobs at work or trying to do research at a library. You’d have to ask a specialist who knew the ins and outs of each system and wait for them to run the report or query for you and print out the results (and hoped they answered the question you originally had).
Evolution of Natural Language Search
You couldn’t just say out loud what you wanted to know and then have it delivered to you instantly.
But what if you could? That’s natural language search.
“What was last year’s recognized revenue from the AIPAC region?”
“What restaurant was it Mary mentioned last week in text message?”
“Who hosted the highest rated Oscars telecast ever and what year was it?”
And the system takes that request whether spoken or typed into a box, takes it apart, figures out what you’re looking for, what you aren’t, where you’re searching, and what to include and turns it into the query that it can submit to a database or search system in order to return the results right back to you.
The Technology Behind Natural Language Search
On the backend of things there’s several bits of technology at play.
Let’s say you ask your favorite nearby listening smart device:
What band is Joe Perry in?
First the device wakes up and records an audio file. The audio file gets sent across the internet and is received by the search system.
The audio file is processed by a speech-to-text API that filters out background noise, analyzes it to find the various phonemes, matches it up to words and converts the spoken word into a plain English sentence.
This query gets examined by the search system. The search system notices there’s a proper name in two words in the sentence: Joe Perry. Picking out these various people, places, and things from a data set, collection of files, or group of text is called named entity recognition and is a pretty standard feature for most search applications.
So the system knows that the words Joe Perry refers to a person. But there might be several notable Joe Perrys in the database so the system has to resolve these ambiguities.
There’s Joe Perry the NFL football player, Joe Perry the snooker champion (totally not kidding), Joe Perry the Maine politician, and Joe Perry the popular musician. The word band in the query alerts the system that we’re probably looking for careers associated with a band like a composers, musicians, or singers. That’s how it disambiguates which Joe Perry we’re looking for.
A database of semantic information about musicians might have information about Perry’s songs, career, and yes the bands he’s been a part of during his career.
The system takes apart the sentence, sees the user is asking for a band associated with Joe Perry. It looks at a database of musicians, performers, songs, albums, and bands. It sees that Joe Perry is semantically associated with several bands but the main one is Aerosmith.
The query has been spoken, converted to text, turned into a query, sent to the system and it comes back with the answer that Joe Perry is in a band called Aerosmith along with related metadata that he’s been in Aerosmith since it formed in 1970. The system puts this answer together into a sentence.
It ships that sentence off to text-to-speech API to piece together the words into a sentence that sounds like a human being. Sends that audio file back to the device and it answers back:
Joe Perry has been in the band Aerosmith since 1970.
The question is asked just like you’d ask a human being — and answered in exactly the same way (and correctly).
Natural language search reduces the barriers to information and access to enhance our lives during work or play or when trying to settle a bar bet over a piece of pop culture. When users can talk to devices just like they talk their friends, more people can get more value out of the applications and services we build.
Top image from 1986’s Star Trek IV: The Voyage Home where Scotty tries to talk to a 1980s computer through its mouse (video).
First, a little more about the Interest Group. The group includes more than 80 individuals, representing nearly 50 Research Library Partnership member institutions in nine countries. Participants are distributed across a range of RDM roles, and represent both strategic and practitioner perspectives. The Interest Group is an opportunity for participants to interact with OCLC Research staff and each other, sharing experiences about RDM services and practices, and pooling knowledge about the current state – and future evolution – of RDM. Interest Group discussions are catalyzed by the topics covered in the accompanying webinar series, but are free-ranging and flexible to accommodate participants’ interests.
Like our Interest Group discussions following the first webinar, our latest discussions drew participants from North America, Europe, and the Asia-Pacific region. The starting point for our discussion was a model from the penultimate report in the Realities of RDM report series, in which we depicted four categories of incentives driving universities to acquire RDM capacity.
We discussed how these incentives operated in different university contexts. A number of participants indicated that compliance with mandates from funders, government agencies, and national directives was a strong initial driver in incentivizing the acquisition of RDM capacity at their institutions; however, some also noted that more recently, meeting publisher requirements for data availability as a condition of publication has been a key source of researcher requests for RDM support. Participants agreed that tracking the increasingly complex web of mandates and compliance requirements from funders, the public sector, and now publishers is challenging. While the job of monitoring the appearance and ongoing evolution of data mandates from a variety of sources often falls to the university library, staff also engage with other campus units to keep abreast of the latest developments, including the Research Office and even individual academic departments. Several participants noted that their institutions track mandates through DMPTool, an open-source online tool supporting the creation of data management plans.
Institutional interest in data and data management can be a strong driver for acquiring RDM capacity. We discussed the ways institutional strategy in the RDM space was articulated in university data policies. The responses varied considerably, with many participants indicating that their university had an institutional data policy in place, while others either had a draft policy under consideration or none at all. For participants whose university did have a data policy, feelings were mixed in regard to its effectiveness. For example, one participant indicated that the data policy helped align the university’s stance on data management with that of research funders, and served notice that the university had certain expectations of its researchers in regard to data management. Similarly, another participant observed that a data policy can clarify data management expectations, as well as signal institutional interest and commitment in this area. But one participant noted that although their university had a data policy, there was little in the way of enforcement built into it; in practice, it took the form more of a set of suggestions for good data management rather than a statement of requirements. It was additionally noted that university data policies in Europe were perhaps more advanced than elsewhere, due to more extensive national reporting requirements for research outputs.
Much of our discussion focused on the class of incentives represented in the figure by researcher demand for RDM services, and in particular, what universities were doing to engage with researchers around data management needs and requirements. Several participants noted that engagement with researchers often begins by providing support for creating data management plans, while others mentioned the provision of data storage capacity as another good way to gain entry into the researcher’s data management workflow. An interesting strand of discussion took place around the idea that much of the current RDM support being offered tends to cluster around the beginning and end of the data management process – that is, with data management plans at the beginning of the research process, and then the storage and sharing of final data once the project is complete. Looking ahead, a key question for many RDM practitioners is how to extend RDM services into the “valley” between these endpoints.
We had a spirited conversation around the problem of incentivizing researchers to engage in good data management practices. For example, several participants noted that while researchers often embraced the provision of data storage capacity, it was much more difficult to motivate them to provide adequate documentation for the data sets stored there, or to indicate the period of retention needed. Helping researchers see that data management is a worthy investment of their time is crucial: for example, at one institution, staff try to engage researcher interest by providing dedicated trainers to help researchers use REDCap (an electronic data capture system used extensively in academia for collecting clinical data), and show them how the system improves their data management practices. It was generally agreed that a key obstacle to motivating researchers to optimize their data for re-use – such as providing adequate documentation and metadata to support understanding and discovery of the data – is the lack of good data re-use stories that illustrate the tangible benefits of observing good data practices.
In addition to these topic areas, participants offered many wide-ranging comments about RDM services. For example, we learned that one university initiated a pilot project with departments in the life sciences in which RDM librarians sat in on project review boards and offered feedback on data management plans. Another participant noted the potential for drawing lessons on re-use from the sharing of software in academic circles, where there is a prevailing ethos that encourages code to be shared and built upon through services like GitHub. As we have seen with software, establishing a “culture of sharing” is very important in encouraging data re-use.
Several participants noted that while scarcity of resources can sometimes put interactions with other campus units on a competitive footing, it can also facilitate collaboration based on a mutual need to stretch limited resources for RDM service provision. The intersection between research ethics, privacy, and data management was mentioned by several participants as a growing concern, with researchers sometimes required to address ethical issues around data storage and sharing in their data management plans, or as part of a project’s ethics review. The European General Data Protection Regulation (GDPR), which went into effect in 2018, has amplified the need to address privacy issues in data storage and re-use. And finally, participants noted again and again how important it was to engage closely with researchers to understand how data management aligned with their daily workflows – as one participant noted, it is difficult to understand researcher needs without getting into their office!
These are just some of the highlights from a wide-ranging and informative discussion with RDM professionals within the Research Library Partnership. We thank all who participated in the discussion!
I gave a talk at the Fall CNI meeting entitled Blockchain: What's Not To Like? The abstract was:
We're in a period when blockchain or "Distributed Ledger Technology" is the Solution to Everything™, so it is inevitable that it will be proposed as the solution to the problems of academic communication and digital preservation. These proposals typically assume, despite the evidence, that real-world blockchain implementations actually deliver the theoretical attributes of decentralization, immutability, anonymity, security, scalability, sustainability, lack of trust, etc. The proposers appear to believe that Satoshi Nakamoto revealed the infallible Bitcoin protocol to the world on golden tablets; they typically don't appreciate or cite the nearly three decades of research and implementation that led up to it. This talk will discuss the mis-match between theory and practice in blockchain technology, and how it applies to various proposed applications of interest to the CNI audience.
Below the fold, an edited text of the talk with links to the sources, and much additional material. The colored boxes contain quotations that were on the slides but weren't spoken.
It’s one of these things that if people say it often enough it starts to sound like something that could work,
I'd like to start by thanking Cliff Lynch for inviting me back even though I'm retired, and for letting me debug the talk at Berkeley's Information Access Seminar. I plan to talk for 20 minutes, leaving plenty of time for questions. A lot of information will be coming at you fast. Afterwards, I encourage you to consult the whole text of the talk and much additional material on my blog. Follow the links to the sources to get the details you may have missed.
The first comes from commercial interests where management of rights, IP and ownership is complex, hard to do, and has led to unusable systems that are driving researchers to sites like SciHub, scaring the bejesus out of publishers in the process.
The other trend is for a desire to move to a decentralised web and a decentralised system of validation and reward, in a way trying to move even further away from the control of publishers.
It is absolutely fascinating to me that two diametrically opposite philosophical sides are converging on the same technology as the answer to their problems. Could this technology perhaps be just holding up an unproven and untrustworthy mirror to our desires, rather than providing any real viable solutions?
This is not to diminish Nakamoto's achievement but to point out that he stood on the shoulders of giants. Indeed, by tracing the origins of the ideas in bitcoin, we can zero in on Nakamoto's true leap of insight—the specific, complex way in which the underlying components are put together.
More than fifteen years ago, nearly five years before Satoshi Nakamoto published the Bitcoin protocol, a cryptocurrency based on a decentralized consensus mechanism using proof-of-work, my co-authors and I won a "best paper" award at the prestigious SOSP workshop for a decentralized consensus mechanism using proof-of-work. It is the protocol underlying the LOCKSS system. The originality of our work didn't lie in decentralization, distributed consensus, or proof-of-work. All of these were part of the nearly three decades of research and implementation leading up to the Bitcoin protocol, as described by Arvind Narayanan and Jeremy Clark in Bitcoin's Academic Pedigree. Our work was original only in its application of these techniques to statistical fault tolerance; Nakamoto's only in its application of them to preventing double-spending in cryptocurrencies.
We're going to walk through the design of a system to perform some function, say monetary transactions, storing files, recording reviewers' contributions to academic communication, verifying archival content, whatever. Being of a naturally suspicious turn of mind, you don't want to trust any single central entity, but instead want a decentralized system. You place your trust in the consensus of a large number of entities, which will in effect vote on the state transitions of your system (the transactions, reviews, archival content, ...). You hope the good entities will out-vote the bad entities. In the jargon, the system is trustless (a misnomer).
Techniques using multiple voters to maintain the state of a system in the presence of unreliable and malign voters were first published in The Byzantine Generals Problem by Lamport et al in 1982. Alas, Byzantine Fault Tolerance (BFT) requires a central authority to authorize entities to take part. In the blockchain jargon, it is permissioned. You would rather let anyone interested take part, a permissionless system with no central control.
In the case of blockchain protocols, the mathematical and economic reasoning behind the safety of the consensus often relies crucially on the uncoordinated choice model, or the assumption that the game consists of many small actors that make decisions independently.
The security of your permissionless system depends upon the assumption of uncoordinated choice, the idea that each voter acts independently upon its own view of the system's state.
If anyone can take part, your system is vulnerable to Sybil attacks, in which an attacker creates many apparently independent voters who are actually under his sole control. If creating and maintaining a voter is free, anyone can win any vote they choose simply by creating enough Sybil voters.
From a computer security perspective, the key thing to note ... is that the security of the blockchain is linear in the amount of expenditure on mining power, ... In contrast, in many other contexts investments in computer security yield convex returns (e.g., traditional uses of cryptography) ... analogously to how a lock on a door increases the security of a house by more than the cost of the lock.
So creating and maintaining a voter has to be expensive. Permissionless systems can defend against Sybil attacks by requiring a vote to be accompanied by a proof of the expenditure of some resource. This is where proof-of-work comes in; a concept originated by Cynthia Dwork and Moni Naor in 1992. To vote in a proof-of-work blockchain such as Bitcoin's or Ethereum's requires computing very many otherwise useless hashes. The idea is that that the good voters will spend more, compute more useless hashes, than the bad voters.
much of the innovation in blockchain technology has been aimed at wresting power from centralised authorities or monopolies. Unfortunately, the blockchain community’s utopian vision of a decentralised world is not without substantial costs. In recent research, we point out a ‘blockchain trilemma’ – it is impossible for any ledger to fully satisfy the three properties shown in [the diagram] simultaneously ... In particular, decentralisation has three main costs: waste of resources, scalability problems, and network externality inefficiencies.
Brunnermeir and Abadi's Blockchain Trilemma shows that a blockchain has to choose at most two of the following three attributes:
Obviously, your system needs the first two, so the third has to go. Running a voter (mining in the jargon) in your system has to be expensive if the system is to be secure. No-one will do it unless they are rewarded. They can't be rewarded in "fiat currency", because that would need some central mechanism for paying them. So the reward has to come in the form of coins generated by the system itself, a cryptocurrency. To scale, permissionless systems need to be based on a cryptocurrency; the system's state transitions will need to include cryptocurrency transactions in addition to records of files, reviews, archival content, whatever.
Your system needs names for the parties to these transactions. There is no central authority handing out names, so the parties need to name themselves. As proposed by David Chaum in 1981 they can do so by generating a public-private key pair, and using the public key as the name for the source or sink of each transaction.
we created a small Bitcoin wallet, placed it on images in our honeyfarm, and set up monitoring routines to check for theft. Two months later our monitor program triggered when someone stole our coins.
This was not because our Bitcoin was stolen from a honeypot, rather the graduate student who created the wallet maintained a copy and his account was compromised. If security experts can't safely keep cryptocurrencies on an Internet-connected computer, nobody can. If Bitcoin is the "Internet of money," what does it say that it cannot be safely stored on an Internet connected computer?
In practice this is implemented in wallet software, which stores one or more key pairs for use in transactions. The public half of the pair is a pseudonym. Unmasking the person behind the pseudonym turns out to be fairly easy in practice.
The security of the system depends upon the user and the software keeping the private key secret. This can be difficult, as Nicholas Weaver's computer security group at Berkeley discovered when their wallet was compromised and their Bitcoins were stolen.
The capital and operational costs of running a miner include buying hardware, power, network bandwidth, staff time, etc. Bitcoin's volatile "price", high transaction fees, low transaction throughput, and large proportion of failed transactions mean that almost no legal merchants accept payment in Bitcoin or other cryptocurrency. Thus one essential part of your system is one or more exchanges, at which the miners can sell their cryptocurrency rewards for the "fiat currency" they need to pay their bills.
Who is on the other side of those trades? The answer has to be speculators, betting that the "price" of the cryptocurrency will increase. Thus a second essential part of your system is a general belief in the inevitable rise in "price" of the coins by which the miners are rewarded. If miners believe that the "price" will go down, they will sell their rewards immediately, a self-fulfilling prophesy. Permissionless blockchains require an inflow of speculative funds at an average rate greater than the current rate of mining rewards if the "price" is not to collapse. To maintain Bitcoin's price at $4K requires an inflow of $300K/hour.
In order to spend enough to be secure, say $300K/hour, you need a lot of miners. It turns out that a third essential part of your system is a small number of “mining pools”. Bitcoin has the equivalent of around 3M Antminer S9s, and a block time of 10 minutes. Each S9, costing maybe $1K, can expect a reward about once every 60 years. It will be obsolete in about a year, so only 1 in 60 will ever earn anything.
To smooth out their income, miners join pools, contributing their mining power and receiving the corresponding fraction of the rewards earned by the pool. These pools have strong economies of scale, so successful cryptocurrencies end up with a majority of their mining power in 3-4 pools. Each of the big pools can expect a reward every hour or so. These blockchains aren’t decentralized, but centralized around a few large pools.
Since then there have been other catastrophic bugs in these smart contracts, the biggest one in the Parity Ethereum wallet software ... The first bug enabled the mass theft from "multisignature" wallets, which supposedly required multiple independent cryptographic signatures on transfers as a way to prevent theft. Fortunately, that bug caused limited damage because a good thief stole most of the money and then returned it to the victims. Yet, the good news was limited as a subsequent bug rendered all of the new multisignature wallets permanently inaccessible, effectively destroying some $150M in notional value. This buggy code was largely written by Gavin Wood, the creator of the Solidity programming language and one of the founders of Ethereum. Again, we have a situation where even an expert's efforts fell short.
Recent game-theoretic analysis suggests that there are strong economic limits to the security of cryptocurrency-based blockchains. For safety, the total value of transactions in a block needs to be less than the value of the block reward.
Your system needs an append-only data structure to which records of the transactions, files, reviews, archival content, whatever are appended. It would be bad if the miners could vote to re-write history, undoing these records. In the jargon, the system needs to be immutable (another misnomer).
The blockchain is mutable, it is just rather hard to mutate it without being detected, because of the Merkle tree’s hashes, and easy to recover, because there are Lots Of Copies Keeping Stuff Safe. But this is a double-edged sword. Immutability makes systems incompatible with the GDPR, and immutable systems to which anyone can post information will be suppressed by governments.
Cryptokitties’ popularity exploded in early December and had the Ethereum network gasping for air. ... Ethereum has historically made bold claims that it is able to handle unlimited decentralized applications ... The Crypto-Kittie app has shown itself to have the power to place all network processing into congestion. ... at its peak [CryptoKitties] likely only had about 14,000 daily users. Neopets, a game to which CryptoKitties is often compared, once had as many as 35 million users.
A user of your system wanting to perform a transaction, store a file, record a review, whatever, needs to persuade miners to include their transaction in a block. Miners are coin-operated; you need to pay them to do so. How much do you need to pay them? That question reveals another economic problem, fixed supply and variable demand, which equals variable "price". Each block is in effect a blind auction among the pending transactions.
So lets talk about CryptoKitties, a game that bought the Ethereum blockchain to its knees despite the bold claims that it could handle unlimited decentralized applications. How many users did it take to cripple the network? It was far fewer than non-blockchain apps can handle with ease; CryptoKitties peaked at about 14K users. NeoPets, a similar centralized game, peaked at about 2,500 times as many.
The first big smart contract, the DAO or Decentralized Autonomous Organization, sought to create a democratic mutual fund where investors could invest their Ethereum and then vote on possible investments. Approximately 10% of all Ethereum ended up in the DAO before someone discovered a reentrancy bug that enabled the attacker to effectively steal all the Ethereum. The only reason this bug and theft did not result in global losses is that Ethereum developers released a new version of the system that effectively undid the theft by altering the supposedly immutable blockchain.
The loot was restored by a "hard fork", the blockchain's version of mutability. Since then it has become the norm for "smart contract" authors to make them "upgradeable", so that bugs can be fixed. "Upgradeable" is another way of saying "immutable in name only".
Permissionless systems trust:
The core developers of the blockchain software not to write bugs.
The developers of your wallet software not to write bugs.
The developers of the exchanges not to write bugs.
The operators of the exchanges not to manipulate the markets or to commit fraud.
The developers of your upgradeable "smart contracts" not to write bugs.
The owners of the smart contracts to keep their secret key secret.
The owners of the upgradeable smart contracts to avoid losing their secret key.
The owners and operators of the dominant mining pools not to collude.
The speculators to provide the funds needed to keep the “price” going up.
Users' ability to keep their secret key secret.
Users’ ability to avoid losing their secret key.
Other users not to transact when you want to.
So, this is the list of people your permissionless system has to trust if it is going to work as advertised over the long term.
You started out to build a trustless, decentralized system but you have ended up with:
A trustless system that trusts a lot of people you have every reason not to trust.
A decentralized system that is centralized around a few large mining pools that you have no way of knowing aren’t conspiring together.
An immutable system that either has bugs you cannot fix, or is not immutable
A system whose security depends on it being expensive to run, and which is thus dependent upon a continuing inflow of funds from speculators.
A system whose coins are convertible into large amounts of "fiat currency" via irreversible pseudonymous transactions, which is thus an irresistible target for crime.
If the “price” keeps going up, the temptation for your trust to be violated is considerable. If the "price" starts going down, the temptation to cheat to recover losses is even greater.
Maybe it is time for a re-think.
Suppose you give up on the idea that anyone can take part and accept that you have to trust a central authority to decide who can and who can’t vote. You will have a permissioned system.
The first thing that happens is that it is no longer possible to mount a Sybil attack, so there is no reason running a node need be expensive. You can use BFT to establish consensus, as IBM’s Hyperledger, the canonical permissioned blockchain system does. You need many fewer nodes in the network, and running a node just got way cheaper. Overall, the aggregated cost of the system got orders of magnitude cheaper.
Now there is a central authority it can collect “fiat currency” for network services and use it to pay the nodes. No need for cryptocurrency, exchanges, pools, speculators, or wallets, so much less temptation for bad behavior.
Permissioned systems trust:
The central authority.
The software developers.
The owners and operators of the nodes.
The secrecy of a few private keys.
This is now the list of entities you trust. Trusting a central authority to determine the voter roll has eliminated the need to trust a whole lot of other entities. The permissioned system is more trustless and, since there is no need for pools, the network is more decentralized despite having fewer nodes.
a Byzantine quorum system of size 20 could achieve better decentralization than proof-of-work mining at a much lower resource cost.
How many nodes does your permissioned blockchain need? The rule for BFT is that 3f + 1 nodes can survive f simultaneous failures. That's an awful lot fewer than you need for a permissionless proof-of-work blockchain. What you get from BFT is a system that, unless it encounters more than f simultaneous failures, remains available and operating normally.
The problem with BFT is that if it encounters more than f simultaneous failures, the state of the system is irrecoverable. If you want a system that can be relied upon for the long term you need a way to recover from disaster. Successful permissionless blockchains have Lots Of Copies Keeping Stuff Safe, so recovering from a disaster that doesn't affect all of them is manageable.
So in addition to implementing BFT you need to back up the state of the system each block time, ideally to write-once media so that the attacker can't change it. But if you're going to have an immutable backup of the system's state, and you don't need continuous uptime, you can rely on the backup to recover from failures. In that case you can get away with, say, 2 replicas of the blockchain in conventional databases, saving even more money.
I've shown that, whatever consensus mechanism they use, permissionless blockchains are not sustainable for very fundamental economic reasons. These include the need for speculative inflows and mining pools, security linear in cost, economies of scale, and fixed supply vs. variable demand. Proof-of-work blockchains are also environmentally unsustainable. The top 5 cryptocurrencies are estimated to use as much energy as The Netherlands. This isn't to take away from Nakamoto's ingenuity; proof-of-work is the only consensus system shown to work well for permissionless blockchains. The consensus mechanism works, but energy consumption and emergent behaviors at higher levels of the system make it unsustainable.
It can be very hard to find reliable sources about cryptocurrencies because almost all cryptocurrency journalism is bought and paid for.
When cryptocurrency issuers want positive coverage for their virtual coins, they buy it. Self-proclaimed social media personalities charge thousands of dollars for video reviews. Research houses accept payments in the cryptocurrencies they are analyzing. Rating “experts” will grade anything positively, for a price.
All this is common, according to more than two dozen people in the cryptocurrency market and documents reviewed by Reuters. ... “The main reason why so many inexperienced individuals invest in bad crypto projects is because they listen to advice from a so-called expert,” said Larry Cermak, head of analysis at cryptocurrency research and news website The Block. Cermak said he does not own any cryptocurrencies and has never promoted any. “They believe they can take this advice at face value even though it is often fraudulent, intentionally misleading or conflicted.”
The boxer Floyd Mayweather and the music producer DJ Khaled have been fined for unlawfully touting cryptocurrencies.
The two have agreed to pay a combined $767,500 in fines and penalties, the Securities and Exchange Commission (SEC) said in a statement on Thursday. They neither admitted nor denied the regulator’s charges.
According to the SEC, Mayweather and Khaled failed to disclose payments from three initial coin offerings (ICOs), in which new currencies are sold to investors.
The women on this boat are polished and perfect; the men, by contrast, seem strangely cured—not like medicine, but like meat. They are almost all white, between the ages of 30 and 50, and are trying very hard to have the good time they paid thousands for, while remaining professional in a scene where many thought leaders have murky pasts, a tendency to talk like YouTube conspiracy preachers, and/or the habit of appearing in magazines naked and covered in strawberries. That last is 73-year-old John McAfee, who got rich with the anti-virus software McAfee Security before jumping into cryptocurrencies. He is the man most of the acolytes here are keenest to get their picture taken with and is constantly surrounded by private security who do their best to aesthetically out-thug every Armani-suited Russian skinhead on deck. Occasionally he commandeers the grand piano in the guest lounge, and the young live-streamers clamor for the best shot. John McAfee has never been convicted of rape and murder, but—crucially—not in the same way that you or I have never been convicted of rape or murder.
On 7th December 2018 Bitcoin's "price" was around $3,700.
Bitcoin now at $16,600.00. Those of you in the old school who believe this is a bubble simply have not understood the new mathematics of the Blockchain, or you did not cared enough to try. Bubbles are mathematically impossible in this new paradigm. So are corrections and all else
Similarly, most of what your read about blockchain technology is people hyping their vaporware. A "trio of monitoring, evaluation, research, and learning, (MERL) practitioners in international development" started out enthusiastic about the potential of blockchain technology, so they did some research:
We documented 43 blockchain use-cases through internet searches, most of which were described with glowing claims like “operational costs… reduced up to 90%,” or with the assurance of “accurate and secure data capture and storage.” We found a proliferation of press releases, white papers, and persuasively written articles. However, we found no documentation or evidence of the results blockchain was purported to have achieved in these claims. We also did not find lessons learned or practical insights, as are available for other technologies in development.
We fared no better when we reached out directly to several blockchain firms, via email, phone, and in person. Not one was willing to share data on program results, MERL processes, or adaptive management for potential scale-up. Despite all the hype about how blockchain will bring unheralded transparency to processes and operations in low-trust environments, the industry is itself opaque. From this, we determined the lack of evidence supporting value claims of blockchain in the international development space is a critical gap for potential adopters.
Every time the word "price" appears here, it has quotes around it. The reason is that there is a great deal of evidence that the exchanges, operating an unregulated market, are massively manipulating the exchange rate between cryptocurrencies and the US dollar. The primary mechanism is the issuance of billions of dollars of Tether, a cryptocurrency that is claimed to be backed one-for-one by actual US dollars in a bank account, and thus whose value should be stable. There has never been an audit to confirm this claim, and the trading patterns in Tether are highly suspicious. Tether, and its parent exchange Bitfinex, are the subject of investigations by the CFTC and federal prosecutors:
As Bitcoin plunges, the U.S. Justice Department is investigating whether last year’s epic rally was fueled in part by manipulation, with traders driving it up with Tether -- a popular but controversial digital token.
While federal prosecutors opened a broad criminal probe into cryptocurrencies months ago, they’ve recently homed in on suspicions that a tangled web involving Bitcoin, Tether and crypto exchange Bitfinex might have been used to illegally move prices, said three people familiar with the matter.
John Lewis is an economist at the Bank of England. His The seven deadly paradoxes of cryptocurrency provides a skeptical view of the economics of cryptocurrencies that nicely complements my more technology-centric view. My comments on his post are here. Remember that a permissionless blockchain requires a cryptocurrency; if the economics don't work neither does the blockchain.
You can find my writings about blockchain over the past five years here. In particular:
The DAO was designed as a series of contracts that would raise funds for ethereum-based projects and disperse them based on the votes of members. An initial token offering was conducted, exchanging ethers for "DAO tokens" that would allow stakeholders to vote on proposals, including ones to grant funding to a particular project.
That token offering raised more than $150m worth of ether at then-current prices, distributing over 1bn DAO tokens.
[In May 2016], however, news broke that a flaw in The DAO's smart contract had been exploited, allowing the removal of more than 3m ethers.
Subsequent exploitations allowed for more funds to be removed, which ultimately triggered a 'white hat' effort by token-holders to secure the remaining funds. That, in turn, triggered reprisals from others seeking to exploit the same flaw.
An effort to blacklist certain addresses tied to The DAO attackers was also stymied mid-rollout after researchers identified a security vulnerability, thus forcing the hard fork option.
Exit scams are rife in the ICO world. Here is a recent example:
Blockchain company Pure Bit has seemingly walked off with $2.7 million worth of investors’ money after raising 13,000 Ethereum in an ICO. Transaction history shows that hours after moving all raised funds out of its wallet, the company proceeded to take down its website. It now returns a blank page. ... This is the latest in a string of exit scams that took place in the blockchain space in 2018. Indeed, reports suggested exit scammers have thieved more than $100 million worth of cryptocurrency over the last two years alone. Subsequent investigations hint the actual sum of stolen cryptocurrency could be even higher.
More detail on the lack of decentralization in practice:
in Bitcoin, the weekly mining power of a single entity has never exceeded 21% of the overall power. In contrast, the top Ethereum miner has never had less than 21% of the mining power. Moreover, the top four Bitcoin miners have more than 53% of the average mining power. On average, 61% of the weekly power was shared by only three Ethereum miners. These observations suggest a slightly more centralized mining process in Ethereum.
Although miners do change ranks over the observation period, each spot is only contested by a few miners. In particular, only two Bitcoin and three Ethereum miners ever held the top rank. The same mining pool has been at the top rank for 29% of the time in Bitcoin and 14% of the time in Ethereum. Over 50% of the mining power has exclusively been shared by eight miners in Bitcoin and five miners in Ethereum throughout the observed period. Even 90% of the mining power seems to be controlled by only 16 miners in Bitcoin and only 11 miners in Ethereum.
"Ethereum’s smart contract ecosystem has a considerable lack of diversity. Most contracts reuse code extensively, and there are few creators compared to the number of overall contracts. ... the high levels of code reuse represent a potential threat to the security and reliability. Ethereum has been subject to high-profile bugs that have led to hard forks in the blockchain (also here) or resulted in over $170 million worth of Ether being frozen; like with DNS’s use of multiple implementations, having multiple implementations of core contract functionality would introduce greater defense-in-depth to Ethereum."
P&Ds have dramatic short-term impacts on the prices and volumes of most of the pumped tokens. In the first 70 seconds after the start of a P&D, the price increases by 25% on average, trading volume increases 148 times, and the average 10-second absolute return reaches 15%. A quick reversal begins 70 seconds after the start of the P&D. After an hour, most of the initial effects disappear. ... prices of pumped tokens begin rising five minutes before a P&D starts. The price run-up is around 5%, together with an abnormally high volume. These results are not surprising, as pump group organizers can buy the pumped tokens in advance. When we read related messages posted on social media, we find that some pump group organizers offer premium memberships to allow some investors to receive pump signals before others do. The investors who buy in advance realize great returns. Calculations suggest that an average return can be as high as 18%, even after considering the time it may take to unwind positions. For an average P&D, investors make one Bitcoin (about $8,000) in profit, approximately one-third of a token’s daily trading volume. The trading volume during the 10 minutes before the pump is 13% of the total volume during the 10 minutes after the pump. This implies that an average trade in the first 10 minutes after a pump has a 13% chance of trading against these insiders and on average they lose more than 2% (18%*13%).
A summary of the bad news about vote-buying in blockchains:
The existence of trust-minimizing vote buying and Dark DAO primitives imply that users of all on-chain votes are vulnerable to shackling, manipulation, and control by plutocrats and coercive forces. This directly implies that all on-chain voting schemes where users can generate their own keys outside of a trusted environment inherently degrade to plutocracy, ... Our schemes can also be repurposed to attack proof of stake or proof of work blockchains profitably, posing severe security implications for all blockchains.
This is the briefest of take-aways from my attendance at Fantastic Futures, a conference on artificial intelligence (AI) in libraries.  From the conference announcement introduction:
The Fantastic futures-conferences, which takes place in Oslo december 5th 2018, is a collaboration between the National Library of Norway and Stanford University Libraries, and was initiated by the National Librarian at the National Library of Norway, Aslak Sira Myhre and University Librarian at Stanford University Libraries, Michael Keller.
First of all, I had the opportunity to attend and participate in a pre-conference workshop. Facilitated by Nicole Coleman (Stanford University) and Svein Brygfjeld (National Library of Norway), the workshop’s primary purpose was to ask questions about AI in libraries, and to build community. To those ends the two dozen or so of us were divided into groups where we discussed what a few AI systems might look like. I was in a group discussing the possibilities of reading massive amounts of text and/or refining information retrieval based on reader profiles. In the end our group thought such things were feasible, and we outlined how they might be accomplished. Other groups discussed things such as metadata creation and collection development. Towards the end of the day we brainstormed next steps, and at the very least try to use the ai4lib mailing list to a greater degree. 
The next day, the first real day of the conference, was attended by more than a couple hundred of people. Most were from Europe, obviously, but from my perspective about as many were librarians as non-librarians. There was an appearance by Nancy Pearl, who, as you may or may not know, is a Seattle Public Library librarian embodied as an action figure.  She was brought to the conference because the National Library of Norway’s AI system is named Nancy. A few notable quotes from some of the speakers, as least from my perspective, included:
George Zarkadakis – “Robots ought not to pretend to not be robots.”
Meredith Broussard – “AI uses quantitative data but qualitative data is necessary also.”
Barbara McGillivray – “Practice the typical research process but annotate it with modeling; humanize the algorithms.”
Nicole Coleman – “Put the human in the loop … The way we model data influences the way we make interpretations.”
The presenters generated lively discussion, and I believe the conference was a success by the vast majority of attendees. It is quite likely the conference will be repeated next year and be held at Stanford.
What are some of my take-aways? Hmmm:
Machine learning is simply the latest incarnation of AI, and machine learning algorithms are only as unbiased as the data used to create them. Be forewarned.
We can do this. We have the technology.
There is too much content to process, and AI in libraries can used to do some of the more mechanical tasks. The creation and maintenance of metadata is a good example. But again, be forewarned. We were told this same thing with the advent of word processors, and in the end, we didn’t go home early because we got our work done. Instead we output more letters.
Metadata is not necessary. Well, that was sort of a debate, and (more or less) deemed untrue.
It was an honor and a privilege to attend the pre-conference workshop and conference. I sincerely believe AI can be used in libraries, and the use can be effective. Putting AI into practice will take time, energy, & prioritization. How do this and simultaneously “keep the trains running” will be a challenge. On the other hand, AI in libraries can be seen as an opportunity to demonstrate the inherent worth of cultural heritage institutions. ai4lib++
P.S. Along the way I got to see some pretty cool stuff: Viking ships, a fort, “The Scream”, and a “winterfest”. I also got to experience sunset at 3:30 in the afternoon.
The New Yorker this week has a profile of Google programmer pair Jeff Dean and Sanjay Ghemawat — if the annoying phrase “super star programmer” applies to anyone it’s probably these guys, who among other things conceived and wrote the original Google Map Reduce implementation– that includes some comments I find unusually insightful about some aspects of the craft of writing code. I was going to say “for a popular press piece”, but really even programmers talking to each other don’t talk about this sort of thing much. I recommend the article, but was especially struck by this passage:
At M.I.T., [Sanjay’s] graduate adviser was Barbara Liskov, an influential computer scientist who studied, among other things, the management of complex code bases. In her view, the best code is like a good piece of writing. It needs a carefully realized structure; every word should do work. Programming this way requires empathy with readers. It also means seeing code not just as a means to an end but as an artifact in itself. “The thing I think he is best at is designing systems,” Craig Silverstein said. “If you’re just looking at a file of code Sanjay wrote, it’s beautiful in the way that a well-proportioned sculpture is beautiful.”
…“Some people,” Silverstein said, “their code’s too loose. One screen of code has very little information on it. You’re always scrolling back and forth to figure out what’s going on.” Others write code that’s too dense: “You look at it, you’re, like, ‘Ugh. I’m not looking forward to reading this.’ Sanjay has somehow split the middle. You look at his code and you’re, like, ‘O.K., I can figure this out,’ and, still, you get a lot on a single page.” Silverstein continued, “Whenever I want to add new functionality to Sanjay’s code, it seems like the hooks are already there. I feel like Salieri. I understand the greatness. I don’t understand how it’s done.”
I aspire to write code like this, it’s a large part of what motivates me and challenges me.
I think it’s something that (at least for most of us, I don’t know about Dean and Ghemawat), can only be approached and achieved with practice — meaning both time and intention. But I think many of the environments that most working programmers work in are not conducive to this practice, and in some cases are actively hostile to it. I’m not sure what to think or do about that.
It is most important when designing code for re-use, when designing libraries to be used in many contexts and by many people. If you are only writing code for a particular business “seeing code not just as a means to an end but as an artifact in itself” may not be what’s called for. It really is a means to an end of the business purposes. Spending too much time on “the artifact itself”, I think, has a lot of overlap with what is often derisively called “bike-shedding”. But when creating an artifact that is intended to be used by lots of other programmers in lots of other contexts to build things to meet their business purposes — say, a Rails… or a samvera — “empathy with readers” (which is very well-said, and very related to:) and creating an artifact where “it seems like the hooks are already there” are pretty much indispensable to creating something successful at increasing the efficiency and success of those developers using the code.
It’s also not easy even if it is your intention, but without the intention, it’s highly unlikely to happen by accident. In my experience TDD can (in some contexts) actually be helpful to accomplishing it — but only if you have the intention, if you start from developer use-cases, and if you do the “refactor” step of “red-green-refactor”. Just “getting the tests to pass” isn’t gonna do it. (And from the profile, I suspect Dean and Ghemawat may not write tests at all — TDD is neither necessary nor sufficient). That empathy part is probably necessary — understanding what other programmers are going to want to do with your code, how they are going to come to it, and putting yourself in their place, so you can write code that anticipates their needs.
I’m not sure what to do with any of this, but I was struck by the well-written description of what motivates me in one aspect of my programming work.
MCN affiliate Rachael Winter Durant is the Digital Assets Manager at the Portland Art Museum, where she implements the workflows and determines the long-term strategies for digital delivery and preservation of cultural heritage information.
She places deep value in utilizing standards and procedures that allow people and institutions to engage digitally with cultural objects and preserve these assets for the future. To promote this engagement, she has chosen three primary areas to focus her early career professional development: digital accessibility, the description of visual resources, and digital preservation.
As a member of the Museum Computer Network (MCN), she’s excited about the opportunity to attend DLF Forum 2018 and NDSA Digital Preservation 2018 and to engage with colleagues outside of the museum field who are also active practitioners working and innovating around these issues, specifically through the lens of librarianship.
As I prepared to attend DLF Forum 2018 — my first DLF Forum, which I would be attending as theMuseum Computer Network (MCN) affiliate through aGLAM Cross-Pollinator Fellowship — I read thecode of conduct, combed through theschedule, and attendedmuseum cohort visioning sessions. I felt energized by the focus on social justice and ethics in our profession, but in many ways preparing for the Forum did not feel dissimilar from a courting ritual between two organizations. I found myself a little anxious, gauging what my institution and DLF could offer each other and how we could help each other grow and be supported. Coming from a mid-size art museum that functions as 501c3 nonprofit, I was determined to bring knowledge and ideas back to my institution that I could put into action, even though my department doesn’t have the team of open-source software developers that many of the DLF member institutions appear to have in their libraries.
When I arrived at the Forum, what I found was a community of deeply thoughtful librarian technologists examining the very core of digital librarianship; the power structures, cultural contexts, and unacknowledged invisible labor in our profession — and their unique repercussions in the digital sphere. These foundational contexts to how knowledge repositories have developed and been sustained throughout history are as critical at the smallest institutions as they are at the largest. In a recent New York Times article titled “The Newest Jim Crow” by Michelle Alexander, the author quotes data scientist Cathy O’Neil saying, “It’s tempting to believe that computers will be neutral and objective, but algorithms are nothing more than opinions embedded in mathematics.” This dangerous assumption of fairness and neutrality was one of the critical issues that DLF Forum 2018 tackled, and my biggest takeaways from the week centered around strategies to combat it.
This dangerous assumption of fairness and neutrality was one of the critical issues that DLF Forum 2018 tackled, and my biggest takeaways from the week centered around strategies to combat it.
Discussions about colonialism, ethics, and power in digital libraries happened in both explicit and implicit ways throughout the Forum. But the presentation I have revisited again and again since the Forum is Rafia Mirza’s and Brett D. Currier’s “Towards a Praxis of Library Documentation.” Admittedly, having taken an early career detour as a technical writer, I am predisposed towards calls for good documentation. Yet, I had not considered the deep-seated ethical implications and power dynamics of thoughtful documentation, or a lack thereof. As Mirza and Currier explain, “Documentation is a fixation of everyone’s common understanding upon a decision in written form.” And the implications of ethical documentation are much deeper than just increased productivity. When we take the time to examine the power dynamics of how documentation is created (if it is created), who has access to it, and who uses it, it has the ability to build trust and even support ethical labor practices. A praxis of documentation should push us to examine where unethical practices are introduced into our processes. By making expectations explicit, labor becomes more visible, laborers feel empowered and even respected.
When we take the time to examine the power dynamics of how documentation is created (if it is created), who has access to it, and who uses it, it has the ability to build trust and even support ethical labor practices.
In the weeks since the Forum, I have returned to the Museum and reflected on my notes from the sessions. How am I showing up in my work for my colleagues? Student workers? Under-represented communities? Where do gaps in documentation exist for my projects, and what problems could be solved if good documentation existed? Mirza and Currier point out that “along a long enough timeline, everything becomes collaborative.” With this in mind, I pull up my wiki and begin writing.
If you’d like to get involved with the scholarship committee for the 2019 Forum (October 13-16, 2019 in Tampa, FL), sign up to join the Planning Committee now! More information about 2019 fellowships will be posted in late spring.
Washington, DC, where I’m going for a conference today, is the source of a lot of public domain material. As noted in an earlier post, “works of the United States government”, regardless of their age, are not subject to copyright protection. As explained on this government website, those works include any work by a federal employee or official that’s part of their official duties. For instance, when the President of the United States makes a speech to Congress, or issues an official proclamation, that’s a government work, and it’s not copyrighted. But if the President publishes a novel while in the White House, that novel would be copyrightable, since writing novels is not part of the President’s job.
Likewise, once Presidents leave office, anything they write is copyrightable, even if it’s the sort of thing they might have written or delivered while they occupied the White House. That’s why Woodrow Wilson’s article “The Road Away From Revolution” is still under US copyright, at least for the next 22 days. He might well have delivered it as a speech if he were President, but by 1923, when it was published, he was a private citizen. The piece was copyrighted, and his widow renewed the copyright in 1950.
The piece shows Wilson uneasy with the state of the world after he left office. “The world has been made safe for democracy,” he writes, “but democracy has not yet made the world safe against irrational revolution”. The revolution he specifically names is the 1917 revolution in Russia, which by 1923 had ended up with the Bolsheviks firmly in power, and establishing the Soviet Union at the end of 1922. But Wilson is also concerned that the revolutionaries may have had a point in their condemnation of capitalism. “Is it not,” he asks, “too true that capitalists have often seemed to regard the men whom they used as mere instruments of profit… legitimate to exploit with as slight cost to themselves as possible, either of money or sympathy?”
Looking back, I wonder if Wilson was implicitly rebuking his successor in office, who campaigned on the promise “Less government in business and more business in government,” and whose administration was facing a number of scandals involving uses of government office for personal enrichment. Wilson called for a higher standard of justice and society, which “must include sympathy and helpfulness and a willingness to forgo self-interest in order to promote the welfare, happiness, and contentment of others and of the community as a whole.” He describes these values as requirements of “Christian civilization”, but they’re also values I see many non-Christians uphold and promote today. And they’re in as much need of support now as they were in 1923.
Wilson’s exhortation ran in the August 1923 issue of the Atlantic Monthly, though it was copyrighted as a book (presumably since it was also issued as a pamphlet), and the copyright renewal doesn’t mention the Atlantic. I’ve added a note to the copyright information we have on The Atlantic Monthly to warn that the renewal for the book also presumably covers the work’s magazine publication. (Otherwise, one might presume the issue was in the public domain, since it has no other renewals that I’m aware of.)
In general, if you’re looking at a periodical that seems to be in the public domain due to nonrenewal, but has a particularly famous work or author in its content, it often doesn’t hurt to do a double-check of that work’s copyright, to avoid misunderstandings. If you find any such non-obvious renewals covering content in a periodical, let me know and I’ll add notes in appropriate places in our inventory of periodical renewals.
You may remember that in August this year, mySociety and Open Knowledge International launched a survey, looking for the sources of digital files that hold electoral boundaries… for every country in the world. Well, we are still looking!
There is a good reason for this hunt: the files are integral for people who want to make online tools to help citizens contact their local politicians, who need to be able to match users to the right representative. From mySociety’s site TheyWorkForYou to Surfers against Sewage’s Plastic Free Parliament campaign, to Call your Rep in the US, all these tools required boundary data before they could be built.
Photo by Chase Clark on Unsplash
We know that finding this data openly licensed is still a real challenge for many countries, which is of course why we launched the survey. We encourage people to continue to submit links to the survey, and we would love if people experienced in electoral boundary data, could help by reviewing submissions: if you are able to offer a few hours of help, please email email@example.com
The EveryBoundary survey FAQs tell you everything you need to know about what to look for when boundary hunting. But we also wanted to share some top tips that we have learnt through our own experiences.
Start the search by looking at authoritative sources first: electoral commissions, national mapping agencies, national statistics bodies, government data portals.
Look for data formats (.shp, .geojson, kml etc), and not just a PDF.
Ask around if you can’t find the data: if a map is published digitally, then the data behind it exists somewhere!
Confuse administrative boundaries with electoral boundaries — they can be the same, but they often aren’t (even when they share a name).
Assume boundaries stay the same — check for redistricting, and make sure your data is current.
If you get stuck
Electoral boundaries are normally defined in legislation; sometimes this takes the form of lists of the administrative subdivisions which make up the electoral districts. If you can get the boundaries for the subdivisions you can build up the electoral districts with this information.
Make FOI requests to get hold of the data.
If needed, escalate the matter. We have heard of groups writing to their representatives, explaining the need for the data . And don’t forget: building tools that strengthen democracy is a worthwhile cause.
mySociety is asking people to share electoral boundary data as part of efforts to make information on every politician in the world freely available to all, and support the creation of a Democratic Commons. Electoral boundary files are an essential part of the data infrastructure of a Democratic Commons. A directory of electoral boundary sources is a potential benefit to many people and organisations — so let’s keep up the search!
It is no secret that library user needs are evolving quickly, and OCLC services are evolving even faster to keep up with these trends. This is why OCLC originally launched DEVCONNECT, so developers can learn about OCLC APIs and build skills that allow libraries to get more out of their OCLC services. Based on feedback from our community, DEVCONNECT 2018 workshops involved more hands-on coding and were online as opposed to in-person. Nonetheless, it was a productive, informative, and engaging series!
Titia van der Werf and I, with our colleague Shenghui Wang, organized a “mini-symposium on Linked Data” held in the OCLC Leiden office on 19 November, 2018. We brought together 43 staff from a range of Dutch academic libraries, cultural heritage institutions, the National Library, the National Archive, and the Institute for Sound and Vision and OCLC colleagues based in Europe to discuss how linked data meets interoperability challenges in the online information environment. Masterfully facilitated by Titia, the mini-symposium featured Art of Hosting techniques to amplify interaction among the participants. Participants convened around tables of 6 or 7, determined by their interest in a specific topic, and took notes on flip charts that served as the basis of reporting on their discussions.
Participants came primarily to learn more about linked data, network with others, and what institutions were doing with linked data. The afternoon was devoted to two sessions: 1) Knowledge Exploration—where experts described their linked data implementations and the lessons learned and answered questions from the other participants— and 2) Open Space, where participants selected specific topics proposed by other participants and discussed them. By the end of the day, 85% of the participants rated the mini-symposium “useful”or “valuable.”
Knowledge Exploration: The six linked data implementations discussed:
OCLC Linked Data Wikibase Prototype: Karen gave an overview of OCLC Research’s work with 16 U.S. libraries using the Wikibase entity-based platform that generates linked data. The prototype participants enjoyed working with the platform, as they could concentrate on entities and create linked data without needing to know the technical underpinnings like RDF and SPARQL. Wikibase embeds multilingualism, and it was easy to create statements in any script. We learned that differentiating presentation from the data itself was a challenge for catalogers unfamiliar with using Wikidata. We added a discovery layer so that catalogers could see the relationships they created and see the added value of interoperating with data retrieved from other linked data sources. In the absence of rules, documentation and best practices became more important. We also learned the need to add a “retriever” application, so prototype participants could ingest data already created elsewhere. Participants also valued the ability to include the source for each statement created, so people could evaluate their trustworthiness.
Tresoar linked data project ECHOES (RedBot.frl): Olav Kwakman explained that Project ECHOES provides the infrastructure for RedBot.frl, a site that aggregates digital collections describing the Frisian cultural heritage. The project aims to dissolve barriers to accessing diverse collections of different groups and languages through an integrated platform, offering a view of Europe as a whole. The project partners are: Erfgoed Leiden en Omstreken (project lead), Tresoar (secretary of the Deltaplan Digitalisering Cultureel Erfgoed in Fryslân), Diputació de Barcelona, Generalitat de Catalunya, and the Consorci de Serveis Universitaris de Catalunya (technology partner). The source data sets come from libraries, archives, and museums but most are not available as linked data; the project transforms them into linked data using Schema.org, the Europeana Data Model, and the CIDOC Conceptual Reference Model. He noted the difficulties of transforming legacy data into linked data; “data quality is king”. On its first attempt, only 30% of the data could be converted into linked data, the remaining 70% had technical errors because of typos, inconsistent data, or misuse of fields. A key lesson is the value of enabling the community to help correct the data. The project uses a FryskeHanne platform both to improve the data quality and to expand their linked data datasets. As the linked data is stored in a separate linked data datastore, it can be made available to the public to expand and enrich your source data. He also noted that there is no one presentation that will fit all use cases, and suggests building presentations to showcase your data based on a theme, and enable communities to build their own presentations.
International Institute for Social History website (changing to https://iisg.amsterdam 31 January 2019): Eric de Ruijter noted that the original intention of the IISH was to improve its website and make its data re-usable and interoperable. Linked data provided the means to do this and was not the goal itself. They hired a contractor, Triple, which aggregated three sources of data: MARC records, Encoded Archive Descriptions of the IISH’s archival collections, data sets, and some articles. The data was marked up as RDF and saved in a local triple store, which is currently available only to the web site. It supports searching through all collections and provides a faceted overview of all entries extracted from the aggregation of data sources. They made use of “unifying resources” such as the Getty’s Art and Architecture Thesaurus (AAT), Library of Congress headings, and the Virtual International Authority File (VIAF) for identifiers which “turned names into people.” Lessons learned: Monitor your contractors; they know the technology but not the semantics of library data. Employ an iterative process, and just start and learn, rather than trying to address all possible eventualities. The extra work helps improve your source data.
KoninklijkeBibliotheek’s data.bibliotheken.nl: René Voorburg noted that as the KB is responsible for collecting everything published in the Netherlands and publishing the corresponding metadata, linked data would expose it to a larger audience on the web. But linked data does not yet have a “proper place” in the KB’s daily workflow. Although the KB has converted its data to RDF triple stores using Schema.org (where vagueness is considered a plus), only those who are familiar with SPARQL (an RDF query language) can take advantage of it, such as scientists who are happy with the results. They have enriched their metadata for authors with identifiers pulled from VIAF and Wikidata, and publish linked data about book titles on the edition level. But the published linked data dataset is not maintained and has not had any updates since it was first published. They are struggling with how to make it easier to provide the data and privacy issues.
Data Archiving and Networked Services (DANS): Andrea Scharnhorst focused on issues around the long-term archiving of RDF datasets. Increasingly, social sciences and humanities (SSH) produce RDF datasets, and consequently they become objects for long-term archiving with DANS services (EASY, Dataverse). What to do with specific vocabularies used in SSH? Which vocabularies should be deposited alongside the RDF datasets which use part of them? Who is responsible for the maintenance of these vocabularies and for resolving an archive’s URI when the domain name changes? How much of an“audit trail” (provenance) is necessary (who did what when and why)? What criteria should be used to decide whether a web site should be archived, and which URIs should be snapshotted for ingest? Together with the Vrije Universiteit Amsterdam, host of the LOD Laundromat—a curated version of the LOD cloud that DANS archives here—DANS is working on a Digging into the Knowledge Graph project on issues of indexing and preserving Knowledge Organization Systems to provide access to data both in the present and the future.
Rijksdienst Cultureel Erfgoed (RCE) [Cultural Heritage Agency of TheNetherlands]: Joop Vanderheiden noted that the RCE is in the process of implementing a linked open dataset describing Dutch monuments and archaeology, to expose their data to a wider audience on the web. RCE will be publishing all its data about building (62,000) and archaeological monuments (1,400). In addition, all data from its archaeological information system ARCHIS will be published as linked data. Its API and SPARQL endpoint will be available to everyone in January 2019. Even in the initial stages, it is possible to show what is possible. They have garnered more interest in their work through “LOD for Dummies” presentations. Their work has benefited from the Digital Environment Act (Digitaal Stelsel Omgevingswet or DSO) and the Dutch Digital Heritage Network (NDE).
Open Space: Participants discussed three questions:
How can we explain/understand benefits of linked data without getting lost in technical details? Among the benefits cited: Point to specific examples of successful linked data implementations; the value of providing bridges among different silos of data without destroying the integrity of the respective sources; take advantage of others’ databases so that you don’t have to replicate the work—saving labor costs; enrich your own data with the expertise provided by others; ability to provide cross-cultural, multilingual access; give your users a better, richer experience; increase the visibility of your collections and expand your user base now and in the future.
Many individual projects—how can they be related to each other? Participants referred to Tim Berners-Lee’s original set of linked data principles and stressed the need to conform to international standards. Wikidata is an example of bringing together multiple sources. Relationships among individual projects could be more easily established if implementers reused existing ontologies rather than creating their own. We need to share best practices with each other.
How to integrate LOD into your CBS/ local cataloging system? Participants recommended that source data should not be modified but linked (via link tables) to trustworthy “sources of truth.” WorldCat was viewed as a “meta-cache” for a discovery layer.Participants wondered what criteria to use to determine which data sources should be consumed. They noted that maintaining relationships among entities in linked data sources was important, and the need for all thesauri to be publicly available as linked data.
Takeaways: Participants valued the discussions. One noted that they received some answers to the questions they came with, but were returning home with even more questions—“potentially a good thing?” Some found consolation that cultural heritage institutions were more or less on the same level. The brief descriptions of a variety of specific linked data projects were appreciated. If linked data is published, you have to “keep it alive” (update it). Some noted the gap between people who work with linked data and those with the technical know-how. Proper planning and funding at the institutional level are needed. “We are not alone in this! I would like to come together more often.”
“In previous years he had let the festival which for centuries had illuminated the marvel of the Maccabees with the glow of candles pass by unobserved. Now, however, he used it as an occasion to provide his children with a beautiful memory for the future.”
That’s a (translated) quote from “The Menorah”, a short story by Theodor Herzl originally published in Die Welt in 1897. It seems an apt story both for Hanukkah, which ends at sundown tomorrow, and for illustrating Herzl’s own character. Hanukkah, commemorating the rededication of the Second Temple, has themes of light and self-determination for Jews. Herzl, too, envisioned a brighter, self-determined future for the Jewish people. He was an advocate of Zionism so influential that he was mentioned by name, decades after his death, in the 1948 Declaration of the Establishment of the State of Israel as “the spiritual father of the Jewish State”.
Herzl died in 1904, so most of his writings, including “The Menorah”, have been in the public domain for a while. (This page links to online copies of many of his writings.) But US copyright law treats unpublished works differently from published ones. For instance, unpublished works have been the only ones entering the public domain here for the last 20 Public Domain Days– specifically, works that had not been published before 2003 or registered for copyright before 1978, by authors who died more than 70 years ago. Back in 1923, though, unpublished works were under indefinite “common-law” copyright. The limited-time statutory copyright terms of federal copyright law started once a work was published or registered for copyright.
Thus, the US copyright clock for Herzl’s diaries (or “Tagebücher” in German) would not start until their first publication, well after his death, in 1922 and 1923. The first volume of a three-volume edition was published and copyrighted in 1922, and is in the public domain now. Copyrights for the other two volumes were registered effective January 1, 1923, and renewed in 1950. Those volumes, then, are still in copyright in the US for another 23 days, but at the new year, the complete edition will be in the public domain here.
The first published edition of a revered figure’s private papers is not always the best. In “Theodor Herzl’s Diaries as a Bildungsroman”, published in the Spring-Summer 1999 issue of Jewish Social Studies, Shlomo Avineri writes that the compilers of the first edition “chose to erect an heroic monument, not provide a full text”. Passages unflattering to Herzl or his contemporaries in the Zionist movement were cut, with no notice of the omissions. “Such defensive strategies”, Avineri writes, “tend to diminish the stature of the person they aim to protect.”
A more complete version did not appear in print until the 1960s, says Avineri, and that was an American English translation. The Complete Diaries of Theodor Herzl, edited by Raphael Patai, was a 5-volume set translated from manuscripts in the Central Zionist Archives in Jerusalem, and published in New York by the Theodor Herzl Foundation. It was registered for copyright in 1961, with a 1960 copyright notice. Unlike the 1922-1923 edition, though, copyright for this version was not renewed, as was required for US publications of the time.
So is this better version in the public domain now? Probably not, in my view, since the work includes translations of previously published German diary entries that are still under copyright– namely, those portions first published in the 1923 volumes. But come January 1, the copyright for those 1923 volumes will expire in the US, and if there were no other earlier publications for what was first published in Patai’s edition, then we may have this edition, as well as the 1922-1923 first edition, joining the public domain in the US then.
I’m not an expert in the publishing history of Herzl’s papers, or in all the ins and outs of copyrights of unpublished and partly-published works. Comments who know more about either are welcome.
Happy Hanukkah to all my readers who celebrate it!
1923 was a big year for novelty songs. We’ve already noted “Barney Google”, one of the hits of the year. On Twitter, Bill Higgins, who recalls his mother singing that song when he was young, reminded me of an even bigger novelty hit, “Yes! We Have No Bananas”. The rapid expansion of radio broadcasting quickly put the the song into the ears of listeners across the country and beyond, and its sheet music reportedly sold 2 million copies within 3 months of its initial publication. Copyrighted in 1923, and renewed in 1950, the song will join the public domain 24 days from now.
How much of this resemblance was deliberate recycling, and how much was just coincidence? At least some of it was deliberate; in particular, the “old fashioned to-mah-to” phrase in “Bananas” is sung to the melodic motif of Porter’s “Old Fashioned Garden”, and an article about the song that appeared in the June 14, 1923 issue of Variety says that Porter’s publisher granted permission for that bit. On the other hand, Spaeth admits in The Common Sense of Music (1924) that the melodic phrase used for both “oh bring back my Bonnie to me” and “we have no bananas today” is a common melodic ending used in many songs.
With a limited number of tones in the western scale, and a limited number of common chord progressions, patterns, and basic rhythms, pretty much every song reuses elements from other songs. If the “Bananas” chorus could be completely derived from one previous work, one could make a credible argument for plagiarism, but having to quote five different works to reconstruct (most of) the chorus supports the argument that the chorus is mostly an original melange of generic musical patterns that live in the public domain. Indeed, after Spaeth became famous from his performances and books, he was frequently called as an expert witness to make arguments of that sort in court cases alleging musical plagiarism
If someone had brought such a plagiarism suit against “Yes! We Have No Bananas”, it would have also helped that the songs Spaeth cites in his routine (other than the licensed “Old Fashioned Garden”) were legally in the public domain as well. Handel’s Hallelujah Chorus premiered in the 18th century; the “My Bonnie” melody is a traditional folk tune; “Marble Halls” dates from 1843; and “Seeing Nellie Home” is from the 1850s. Since US copyright terms in 1923 ran a maximum of 56 years, copyrights to any of these works had expired well before “Bananas” came out. If the 95-year maximum terms that now apply to 1923’s songs applied then, though, Silver and Cohn might have had more reason to be cautious in their composing.
Much of this article is based on what I learned from Gary A. Rosen’s book Unfair to Genius (2012) as well as Sigmund Spaeth’s Words and Music (1926). I thank Kip Williams and Bill Higgins for pointing me towards these sources. This song’s calendar entry goes out to the folks at Making Light, where the song and its predecessors were discussed some years back. If you’d like to make your own requests or dedications, you can leave them as a blog comment or contact me.
In yesterday’s advent calendar post, I noted that in the 1920s, “newspapers and the comic strips that appeared in them were considered ephemeral, and copyright renewal was very rare”. One newspaper that represents an exception to this rule is the New York Tribune, which now has active copyright renewals for all of its issues from January 1, 1923 onward. All of its 1923 issues will be joining the public domain 25 days from now.
As I see it, the Tribune‘s most distinctive additions to the public domain in January will be its columns and features. Those included regular pieces by Don Marquis, including new dispatches from his characters archy and mehitabel. The Tribune also ran serialized fiction, as did many other newspapers then. From July through September of 1923 it ran weekly installments of a new adventure of Hugh Lofting‘s Doctor Dolittle, one that would not be published in book form until 1948 (as Doctor Dolittle and the Secret Lake).
I’ll be happy to see the New York Tribune and Herald available for digitization in January. But even now we could be reading digitized news stories from 1923 in other public domain newspapers, and follow the further revelations in the Teapot Dome Scandal after President Harding’s sudden death in office, or read American coverage of the end of the Irish Civil War. We could follow day-to-day the pennant races of the Yankees and the Giants, who would eventually face each other in the 1923 World Series, through New York’s other newspapers now in the public domain. Right now, though, there’s not a whole lot of newspaper content freely readable online from 1923, compared to what’s available from 1922.
The main effect of Public Domain Day 2019 for American newspapers, then, is not so much putting a lot of 1923 newspaper content in the public domain, but making it much easier to know that 1923 newspaper content is in the public domain. Once January 1 arrives, you can safely assume that the contents of any US newspaper with a 1923 dateline is free to digitize, share, and adapt. Before then, you may have to do a lot more research to be sure of that. So there’s less of it online.
The draft guide for determining the copyright status of serial issues that I announced yesterday is meant to make it easier to tell whether something of interest in a newspaper, magazine, journal, or other serial from the mid-20th century is public domain. Of those types of serials, newspapers will still be among the most challenging to copyright-clear, since many of them routinely included reprints from other publications or syndicated content. But original news articles, which are often of the most historically interesting parts of newspapers, should be significantly easier to clear. And I haven’t seen systematic renewals in the Catalog of Copyright Entries of syndicated comics or photos published before the 1930s, so many 1920s newspaper issues might not be that hard to clear in their entirety. Magazines, which tend to rely less on reprints and syndications, should be less difficult to clear, and journals, which typically contain completely original articles, will probably be easier still.
I’m glad that the upcoming Public Domain Day will give us access to more of American newspaper memory, both through the outright expiration of copyright, and through putting another year’s worth of newspapers past the easy “bright line” for determining public domain status. But I also want to make it easier to identify and bring back to light public domain newspapers and other serial content well past 1923. There’s a lot to remember over the last 95 years, and much that can be forgotten over that length of time.
Margo Padilla manages services, programs, and initiatives related to archiving and born-digital stewardship for the Metropolitan New York Library Council (METRO). She serves as the primary contact with archives in the METRO region and designs and implements archives-related services in Studio599. Before joining METRO in 2014, she was a resident in the inaugural cohort of the National Digital Stewardship Residency program. Prior to that, she worked at The Bancroft Library at the University of California, Berkeley on digital projects and initiatives. Margo received her MLIS with a concentration in Management, Digitization, and Preservation of Cultural Heritage and Records from San Jose State University and her undergraduate degree from the University of California, Berkeley.
As a first-time attendee at the DLF Forum, I was immediately struck by the diverse group of individuals that were pulled together and the welcoming environment that was created by organizers. It felt like a space where everyone could be their authentic self and speak honestly about their experiences. This provided an opportunity to engage in impactful conversations and generated an elevated awareness on a number of different topics. It felt like a particularly critical time to be in such an environment, when long-held professional practices, methods for representation in memory organizations, and workplace and labor issues are being challenged and beginning to shift.
The sessions that drew my interest were those focused on community archives. Through my own work with independent memory projects, I have observed a rejection of mainstream archival institutions and practices. Traditional archives are, in most cases, perceived as being part of the ivory tower of academia and inaccessible to marginalized groups, a place where their stories are interpreted through the lens of white culture. In theCommunity Archives: New Theories of Time, Access, Community, and Agency session, panelists explained that community archives are creating a space where individuals can document their own histories and ensure collections are given the right context. While there is the desire to increase representation in traditional archival collections, the panel encouraged the audience to respect the autonomy of community archives and instead seek ways to empower independent memory workers through equitable partnerships. The support, collaboration, and autonomy fostered indigital scholarship labs is one potential model for how we can transform use of traditional archival reading rooms, changing them to learning environments where memory workers are invited to develop skills to steward their own collections. Two notable projects in this area are theCommunity Archives Lab at UCLA and theSouthern Historical Collection’s Community-Driven Archives.
While there is the desire to increase representation in traditional archival collections, the panel encouraged the audience to respect the autonomy of community archives and instead seek ways to empower independent memory workers through equitable partnerships.
I reflected further on the perception of institutional archives during Jennifer Ferretti’s talk on theBuilding Community and Solidarity: Disrupting Exploitative Labor Practices in Libraries and Archives panel. In addition to pointing out the lack of POC in management and leadership roles, Ferretti called for an end to “fit culture” within LAMs which, as she pointed out, implicitly expects POC to leave their culture at the door in order to fit within the dominant white culture of the profession. As we explore issues of representation in our collections and how we might collaborate with community-based archives, we must also continue to do the work and take action to address representation within our own profession. Independent memory workers’ mistrust of traditional archives and their perception of themselves as outsiders may correlate with the absence of representation on LAM staff, an issue that may be compounded by perceiving those few POC that are in administrative roles as having to modify their identities in order to be the “right fit” for the organization.
As we explore issues of representation in our collections and how we might collaborate with community-based archives, we must also continue to do the work and take action to address representation within our own profession.
Attending the DLF Forum was a distinct conference experience and I am so grateful for the generous support that enabled me to attend a conference where involuted issues like these could be candidly discussed. I’m particularly grateful that the DLF recognizes the need to support mid-career professionals as we continue to expand our knowledge and develop our careers.
If you’d like to get involved with the scholarship committee for the 2019 Forum (October 13-16, 2019 in Tampa, FL), sign up to join the Planning Committee now! More information about 2019 fellowships will be posted in late spring.
In this “From The Field” series, we’ll explore the Query Workbench within Fusion Server and walk through helpful tips and tricks on making the most of your search results. This post discusses how to quickly (in less than five minutes) highlight search terms within search results and explore other available highlighting features. Let’s start the timer:
What Is Highlighting?
When users are presented with search results, they often see snippets of information related to their search. Highlighting reveals the keywords inside those snippets of results so the user can visually see the occurrences. This functionality enhances the user experience and usability of search results.
To get started, we’re going to use a previously built Fusion App that performed a website crawl of lucidworks.com. After logging in to Fusion, selecting our app, and opening the Query Workbench from the Querying menu, we’ll be presented with the crawled documents.
The highlighting features are driven by Solr query parameters, through the Additional Query Parameters stage. Open the Add a Stage dropdown menu and select Additional Query Parameters to add the stage to the Query Pipeline. (Click here for Query Pipelines documentation).
On the Additional Query Parameters stage, name the stage by adding a label, such as “Highlighting.” We’ll begin by adding the two required Solr parameters (hl and hl.fl):
We give the hl parameter a value of true to enable the highlighting, and the hl.fl (field list) parameter a wildcard value of * to match all fields where highlighting is possible. In production, you will want to explicitly define the fields to match. Click Save to apply the changes. Hint: You can click the Cancel button to close out the stage panel.
By default, the Query Workbench does not display highlighted results. To enable display of highlighted results, open the Format Results options at the bottom and check the Display highlighting? option. Click Save to apply the change.
Now let’s test a query to see the highlighting in action. In our query field, we’ll perform a search for data:
We can now see matches from the query being highlighted, as well as the fields which contain the matches. The actual highlighted fragments as seen under the result in the Query Workbench belong to the highlighting section of the response header. To view the response, click on URI tab and copy/paste the Working URI into a new browser tab:
This Query Pipeline API response provides a highlighting section for each document with the matching snippets per field:
"Lucidworks | Dark <em>Data</em>"
"What you know about your <em>data</em> is only the tip of the iceberg. #darkdata @Lucidworks"
"Lucidworks: The <em>Data</em> that Lies Beneath"
"Lucidworks: The <em>Data</em> that Lies Beneath"
"Dark <em>Data</em> is Power."
"00.100 THE <em>DATA</em> THAT LIES BENEATH What you know about your <em>data</em> is only the tip of the iceberg"
"Big <em>Data</em> is Failing Pharma"
"Big <em>Data</em> is Failing Pharma"
"Big <em>Data</em> is Failing Pharma"
" machine learning, and artificial intelligence. Learn more › Quickly create bespoke <em>data</em> applications for"
Using a tool such as Fusion App Studio, highlighting will be parsed and displayed automatically on the front-end UI. For custom UI integrations, the Query Pipeline API’s response with highlighting information can be easily parsed for presentation.
Additional Highlighting Parameters
Up to this point, we’ve only looked at enabling highlighting and using default parameters to demonstrate core functionality. However, when deploying in production, we may be more selective with the fields that require highlighting, the tag to use before and after a highlighted term and choosing a specific highlighter based on our needs.
When choosing a highlighter, be conscious of index costs to store additional highlighting features. For example, besides the stored value, terms and positions (where the highlighted terms begin and end), the FastVector Highlighter also requires full term vector options on the field. Therefore, the speed of the search may affect execution time performance. See the Solr Highlighters section below for more information.
By default, only one snippet is returned per field. The parameter hl.snippets controls the number of snippets that will be generated. For example, the default value of 1 returns the following:
When this value is increased to 3, additional snippets within the body_t will be highlighted:
Most commonly, an HTML tag will be used pre and post the highlighted term for the presentation layer. By default, the HTML tag used for pre is <em> and for post is </em>. In addition, depending on the chosen highlighter, the parameter will either be hl.tag. (Original Highlighter) or hl.simple. . Any string can be used for the respective pre or post parameters.
For example, if we wanted to change to a <strong> HTML tag, we configure the following parameters:
Note that the parameter value for an HTML tag must be escaped.
This would generate the following result:
The highlighting section of the Query Pipeline API response would also reflect this change:
Solr features different highlighters such as the original or default highlighter, the unified (new as of Solr 6.4) and FastVector. Each one has tradeoffs between accuracy and speed. Depending on your workload and needs, you may want to evaluate each one to see the performance based on searches for items such as terms, phrases and wildcards.
The study comprises an overview report summarising the overall findings and identifying opportunities for knowledge transfer and regional cooperation as well as specific reports assessing to what extent governments in Latvia, Sweden and Finland have implemented internationally agreed-upon open data principles as part of their anti-corruption regime, providing recommendations for further improvement at the national level.
The study is the outcome of a project funded by the Nordic Council of Ministers. The aim of the project was to gain a better understanding of how Nordic and Baltic countries are performing in terms of integration of anti-corruption and open data agendas, in order to identify opportunities for knowledge transfer and promote further Nordic cooperation in this field. The study assessed whether 10 key anti-corruption datasets in Latvia, Finland and Sweden are in line with international open data standards. The datasets considered in the frame of the study are:
Beneficial ownership register
Public officials’ directories
Public procurement register
Political Financing register
Parliament’s Voting Records
Within this respect, Sweden has made only 3 of 10 key anti-corruption datasets available online and fully in line with open data standards, whereas Finland have achieved to make 8 of these datasets available online, six of which are fully in line with open data standards. As for Latvia, 5 of them have been found to be available and in line with the standards. When it comes to scoring these three countries with regard to anti-corruption datasets, in Sweden, the situation is more problematic compared to other two countries. It has the lowest score, 5.3 out of 9, while Finland and Latvia have scored 6.1 and 6.0, respectively. Similarly, there are some signals that transparency in Sweden has been worsening in recent years despite its long tradition of efficiency and transparency in the public administration, good governance and rule of law as well as being in the top-10 of the Transparency International’s Corruption Perception Index (CPI) for several years.
The problem in Sweden stems from the fact that the government has had to cope with the high decentralization of the Swedish public administration, which seems to have resulted in little awareness of open data policies and practices and their potential for anti-corruption among public officials. Thus, engaging the new agency for digitalisation, Agency for Digital Government (DIGG), and all other authorities involved in open data could be a solution to develop a centralised, simple, and shared open data policy. Sweden should also take legal measures to formally enshrine open data principles in PSI (Public Sector Information) law such as requiring that all publicly released information be made ‘open by default’ and under an ‘open license’.
The situation in Finland and Latvia is more promising. In Finland, a vibrant tech-oriented civil society in the country has played a key role in promoting initiatives for the application of open data for public integrity in a number of areas, including lobbying and transparency of government resources.
As for Latvia,in recent years, it has made considerable progress in implementing open data policies, and the government has actively sought to release data for increasing public accountability in a number of areas such as public procurement and state-owned enterprises. However, the report finds that much of this data is still not available in open, machine-readable formats – making it difficult for users to download and operate with the data.
Overall, in all three countries it seems that there has been little integration of open data in the agenda of anti-corruption authorities, especially with regard to capacity building. Trainings, awareness-raising and guidelines have been implemented for both open data and anti-corruption; nonetheless, these themes seem not to be interlinked within the public sector. The report also emphasizes the lack of government-funded studies and thematic reviews on the use of open data in fighting corruption. This applies both to the national and regional level.
On the other hand, there is also a considerable potential for cooperation among Nordic-Baltic countries in the use of open data for public integrity, both in terms of knowledge transfer and implementation of common policies. While Nordic countries are among the most technologically advanced in the world and have shown the way with regard to government openness and trust in public institutions, the Baltic countries are among the fastest-growing economies in Europe, with a great potential for digital innovation and development of open data tools.
Such cooperation among the three states would be easier in the presence of networks of “tech-oriented” civil society organisations and technology associations, as well as the framework of cooperation with authorities with the common goal of promoting and developing innovation strategies and tools based in open data.
I recently re-read a gem of a paper by Leonard Rapport about the process of reappraisal, or reviewing an archives’ holdings and reevaluating whether they still need to be retained (Rapport, 1981). It was written just as the deluge of data generated by computation was ramping up, but before the explosion of the Internet and the web had begun. Rapport’s argument hinges on the difference between “permanent” and “continued” preservation. It is the latter that the Federal Records Act of 1950 stipulated:
The Administrator, whenever it appears to him to be in the public interest, is hereby authorized-to accept for deposit with the National Archives of the United States the records of any Federal agency or of the Congress of the United States that are determined by the Archivist to have sufficient historical or other value to warrant their continued preservation by the United States Government.
The use of “continued preservation” made it through the various amendments to the act over time, including Obama’s addition of electronic records in 2014. So it’s funny we often talk about records going to archives “permanently”, and design systems that enforce or enshrine this idea.
Obviously not all archives are run the same way as NARA in the US of A. But it is comforting to recognize that the architects of this system understood that not only could we not keep all records, but there were limits on how long they could be kept. Rapport makes a strong humanistic argument that the cost of keeping records should always be weighed as best they can against their perceived use, and that this is an iterative process.
Just a few years later Bearman (1989) argued, again pretty persuasively, that the process of appraisal was fundamentally broken in the age of electronic records because of the staggering increase in volume of records. While he is sympathetic to Rapport’s argument, Bearman says that we can’t reappraise because our value-based process for appraisal is itself broken. Reappraising without changing what it means to appraise would just make the problem many times worse. Instead Bearman says we need to shift the analysis from the value of records, to the activities that generate records, and assessing the importance of records to those activities. Of course this anticipates the work on Macro-Appraisal (Cook, 2004) and Documentation Strategies (Samuels, 1991) that was to follow.
So it was fun to follow this little thread, but I mostly wanted to blog about Rapport’s article because it closes with this really wonderful and enigmantic little thought experiment about archives:
If that does not put a troubled appraiser in a more comfortable frame of mind, share with me two apocalyptic visions. In the first it suddenly becomes possible to keep a copy of every single document created, and, for these documents, a perfect, instantaneous retrieval system. In the second, and less blissful, vision the upper atmosphere fills with reverse neutron bombs, heading toward every records repository. These are bombs that destroy records only, not people. They come down and obliterate every record of any sort.
Keeping these two events in separate parts of your mind, project forward a century. How different would the two resultant worlds be? In the first would our descendants, having all the information that it is possible to derive from documents, have, therefore, all knowledge? And if they have all knowledge would they have, therefore, all wisdom?
In the second, lacking the records we have as of this moment, would our descendants wander in a world of anarchy, in a world in which they would be doomed to repeat the errors of the past?
I leave it to you to conjecture as you please. My own guess is that between these two worlds there wouldn’t be all that much difference.
Why would there be no difference between these two scenarios? Do you agree? How prescient are they as we look at what technologies like IPFS are attempting to make available, and we consider the threat of a strategically deployed Nuclear Electromagnetic Pulse?
Back in June, I announced that we had completed an inventory of all serials with active copyright renewals made through 1977, based on listings in the Copyright Office’s Catalog of Copyright Entries. At the time, I said we’d also be releasing a draft of suggested procedures for using the information there, along with other data, to quickly identify and check public domain serial content. (If you’ve been following the Public Domain Day advent calendar I’ve been publishing this month, you’ll have seen the inventory or its records mentioned in somerecententries.)
It took a little longer than I’d hoped, but after having some librarians and IP experts have a look at it, I’m pleased to announce that the draft of “Determining copyright status of serial issues” is open for public comment. I hope this will become something that people can use or adapt to identify public domain content of interest to them, so it can be digitized, adapted, or otherwise shared with the world.
It’s challenging to come up with a guide that will work for every audience, whether that’s folks wanting to digitize a lot of stuff quickly without a lot of fuss, or dig deeply into the status of certain publications they really want to work with, or who are lawyers or serious intellectual property nerds. But I hope the document I’ve produced will have some use for all these folks, and since it’ll be licensed CC-BY once it comes out of draft status, folks should be free to adapt it for more targeted audiences and projects. (I’d also love to eventually see visually effective graphics or flow-charts based on it; graphic design of that sort isn’t really my forte.)
My aim at this point is to bring the document out of draft status at the start of next month. (Public Domain Day would be a very appropriate time to make it official.) If you want to comment on the draft, getting your comments to me by December 25 should give me enough time to make any appropriate responses or revisions. You can email them to me at (ockerblo) at (upenn) dot (edu), or post a public comment on this blog post, or get a hold of me via my other public contacts or forums I frequent.
“Barney Google, with the goo-goo-googly eyes,
Barney Google bet his horse would win the prize.
When the horses ran that day, Spark Plug ran the other way!
Barney Google, with the goo-goo-googly eyes!”
Several novelty songs introduced in 1923 have been remembered long afterwards. The one quoted above is “Barney Google”, written by Billy Rose (whose other songwriting credits include “Me and My Shadow” and “Does Your Chewing Gum Lose its Flavor on the Bedpost Overnight?”) and Con Conrad. The song’s registration with the Copyright Office calls it a “fox-trot”, the foxtrot being one of the most popular dances of the time. (The dance would stay popular up to the early days of rock’n’roll– the label on a 1953 single record for “Rock Around the Clock” describes it as a “novelty foxtrot” . But I digress.)
Originally performed by Eddie Cantor, “Barney Google” was also adopted by vaudeville acts like Olsen and Johnson, and other singers. It was still recognizable enough by the 1970s to be adapted for TV commercials I recall seeing on Saturday mornings for a brand of sweetened peanut butter (“Koogle, with the koo-koo-koogly eyes!”)
Copyrighted in 1923, and renewed in 1950, the song will join the public domain 26 days from now. It’s based on a comic strip whose beginnings are already in the public domain, and which is still running.
Barney’s comic strip was originally titled Take Barney Google, F’rinstance. It began in 1919 in the Chicago Herald Examiner, but got a much wider distribution after King Features began syndicating it later that year. It got even more popular with the introduction of Barney’s horse, Spark Plug, in 1922. So when the “Barney Google” song came out the following year, listeners were already familiar with the characters in it. Barney would also appear in numerous silent films and cartoons in the 1920s and 1930s. In the 1930s, he met a hillbilly named Snuffy Smith, who would eventually take over the comic strip. (It’s now titled “Barney Google and Snuffy Smith”, though when I read it in the papers as a kid, “Snuffy Smith” was in much bigger lettering, and I wondered who Barney Google was, since I never saw him in the strip. He’s reportedly made cameo appearances more recently.)
The early years of Barney Google’s strip are in the public domain, though you may have difficulty finding good copies. For the most part, newspapers and the comic strips that appeared in them were considered ephemeral, and copyright renewal was very rare. But after the success of Blondie and Superman, which both began in the 1930s, and the ongoing popularity of Disney’s characters, comics were increasingly seen as valuable intellectual property, and started to be renewed regularly. A large number of periodicals in my periodical renewals inventory that renewed from the very first issue are comic books.
It’s a bit tricky to identify the point at which Barney Google’s strips stop being public domain and start having active copyright. You’ll search in vain for his name in book, periodical, or artwork renewal sections of the Catalog of Copyright Entries. The first active copyright renewal covering the Barney Google strips, as far as I can tell, doesn’t mention his name at all. It’s the renewal for the May 4, 1933 issue of King Features Comic Weekly, one of a set of periodicals from King Features that I have a hard time finding in any libraries. It appears to have existed solely for distribution to newspapers, copyright protection, or both. From what I can tell, the King Features weeklies included a copy of every comic strip, newspaper column, or other feature that King Features distributed. Registering an issue with the Copyright Office would register the copyrights for the features it included. And renewing the issue would also renew the copyrights to those features.
Those renewals also mean that any newspaper from May 1933 and onward running any of those renewed King Features comic strips still has some copyrighted material in it, even if there was no renewal for the newspaper itself or any of its other contents. That may complicate cultural initiatives to put newspapers from that era online. Later today, we’ll be releasing the draft of a guide for determining the copyright status of serials, which discusses some of the relevant issues (including that of syndicated features). I’ll have a special post on that shortly. (Update: It’s out now.) And I’ll also discuss newspaper copyrights more in tomorrow’s calendar entry.
1. Most digital transactions require verified claims
Much of Tim’s narrative assumes that there is clear ownership of data, which is far from straightforward. Different entities are looking for different kinds of data:
For the majority of digital transactions and interactions (buying things online, applying for services, booking a flight, proving my age), the most valuable data is data asserted about me from an authoritative source. For example, that I have a valid driving license or verified address, bank account, passport.
For advertising, it’s what I bought and where I clicked as well as profile data (email address, demographic and interests info). This data is generated by the services I use (e.g. Facebook, Google, Twitter).
For AirBnB and Uber it’s the ratings that other users have given me that’s important, which isn’t data I obviously ‘own’.
Yes, in theory I own data on the Internet that I originate, such as this blog post. Storing and disseminating these items from a personal pod doesn't pose problems. But this kind of data isn't valuable, partly at least because it isn't objective:
Yes, some of this can be self-asserted, but organisations often want objective data based on behaviour and decisions made about us not what we say is true. Mortgage brokers don’t just want my assertion that I have income, they want proof.
The valuable data almost all comes from Web interactions that have at least two sides; logins, purchases, clicks, etc. These generate data on both sides. Why do I think I own the data that the other side of these transactions collects? So, even if my pod stores a copy of my purchase data, Amazon will also have a copy. Anyone wanting to use my purchase data will want to get it from Amazon as being more objective.
My pod could store signed attestations of data about me from third parties such as Amazon, but why would anyone want those in preference to data straight from the horse’s mouth? If the only data that has to be accessed from my pod is data I generated there’s not going to be much decentralization.
Decentralized systems will continue to lose to centralized systems until there’s a driver requiring decentralization to deliver a clearly superior consumer experience. Unfortunately, that may not happen for quite some time.
So if we can’t sell privacy as a product in social media, we need evidence of where else these priorities will bring users. Alternatively, decentralised or PDS-integrated tech must deliver novel and valued functionality or be solving major problems users have with existing centralised solutions.
For companies, service providers and app developers the value proposition is hazy. I have yet to come across a PDS provider with an impressive or long list of partners and companies. Most existing business models depend on controlling the data and using it to improve a service and provide valuable analytics to up-sell paid plans or directly monetise the data collected through advertisers and third party data marketplaces. Giving this up requires incentives or regulation.
I agree with Cory Doctorow that the key requirement for a decentralized Web is anti-trust enforcement. Otherwise even if a decentralized Web company has a viable business model and starts to be successful, it'll get bought by one of the FAANGs.
Bolychevsky gets to the heart of the problem in a different way, via the idea of "consent":
If Solid uptake is big enough to attract app developers, what stops the same data exploitation happening, albeit now with an extra step where the user is asked for ‘permission’ to access and use their data in exchange for a free or better service? Consent is only meaningful if there are genuine alternatives and as an industry we have yet to tackle this problem (see how Facebook, Apple, Google, Amazon ask for ‘consent’). What’s really going on when users are asked to agree to the terms and conditions of software on a phone they’ve already bought that won’t work otherwise? Or agreeing to Facebook’s data selling if there’s no other way for users to invite friends to events, message them or see their photos if those friends are Facebook users? I wouldn’t call this consent.
Under the new directive, every time a European's personal data is captured or shared, they have to give meaningful consent, after being informed about the purpose of the use with enough clarity that they can predict what will happen to it.
Building pods that confirm to the GDPR will involve a very complex protocol.
Bolychevsky concludes with a call for public funding to build a decentralized Web infrastructure. Alas, in the absence of meaningful anti-trust enforcement this would simply subsidize the FAANGs. Overall it is well worth reading.
Tip of the hat to Herbert Van de Sompel for pointing me to Bolychevsky's post.
I was honored to be asked by Europeana, the indispensable, unified digital collection of Europe’s cultural heritage institutions, to write a piece celebrating the 10th anniversary of their launch. My opening words:
‘The future is already here – it’s just not very evenly distributed,’ science fiction writer William Gibson famously declared. But this is even more true about the past.
The world we live in, the very shape of our present, is the profound result of our history and culture in all of its variety, the good as well as the bad. Yet very few of us have had access to the full array of human expression across time and space.
Cultural artifacts that are the incarnations of this past, the repository of our feelings and ideas, have thankfully been preserved in individual museums, libraries, and archives. But they are indeed unevenly distributed, out of the reach of most of humanity.
Europeana changed all of this. It brought thousands of collections together and provided them freely to all. This potent original idea, made real, became an inspiration to all of us, and helped to launch similar initiatives around the world, such as the Digital Public Library of America.
Despite the ongoing discussions about the benefits of Open Access, Open Data, Open Government Information, and Open Content more generally, the extent to which such content is impacting the spread of knowledge and libraries’ role therein are yet to be understood.
It is a complex topic, as Rachel Frick recently wrote in her blog post “An open discussion”. Some predict that in an open access world, where most articles and monographs are available online and for free, the library’s role in discovery and fulfillment will further erode. Others argue that a library’s quality control role will become only more important with open access, because this content is more susceptible to fraud. The impact on library budgets is also a central topic, with growing concern that if content is freely available, libraries will lose acquisition budget. David Lewis’ article on “The 2.5% Commitment” proposes that all libraries devote at least 2.5% of their budgets to support a common infrastructure for open content. What this initiative has made clear is that no-one really knows how much libraries are actually investing in OA and open content. If libraries are not able to measure their investments and efforts in open content, how can they make planning decisions moving forward? If the benefits for the library and their users are unclear, how can libraries measure how successful they are in realizing those benefits? With all these challenges still ahead, it is timely to start the conversation about impact and use of OA and open content.
This year, OCLC Global Council expressed interest in exploring this theme. To help fuel a meaningful discussion around this very vast topic, a survey has been developed that focuses on the following questions: What are libraries’ ambitions and realities with open content? How invested are they in open content? How can OCLC assist to leverage their efforts?
This is an open invitation to all libraries of any type and any region in the world. The survey aims to bring more voices to the table and to share a baseline understanding of libraries’ commitment to open content and their ambitions in this field. The scope of the survey is intentionally broad to gather as many different perspectives as possible and to be inclusive.
If access to open content is important to your library, please consider participating!
VRA affiliate Chris Day is the Digital Services Librarian at the Flaxman Library of the School of the Art Institute of Chicago.
He started at SAIC in 2008, managing the library’s first digital collection, the Joan Flasch Artists’ Books Collection. In 2011 he oversaw the the deaccession of the 35mm slide library and the start of a local repository, now featuring over 200,000 images, of digital art images for teaching. The past year saw the migration of 13 digital collections from ContentDM to Islandora, which now features 4 new collections and over 26,4000 digital objects.
Learning to Live DLF
I’ve been a librarian for over twelve years and am still developing my conference skills. Conferences are valuable events, but can be difficult when you struggle to find the balance between learning opportunities and self-care. Last year, at my first DLF Forum, I found a community and atmosphere that made it easier to find peace in the chaos. This sense of ease drove my desire to make the most of my time there and bring what I learned back to my job. As 2017 turned into 2018, I looked back on my time in Pittsburgh and saw that I had used just a small portion of what I’d been exposed to. I had a notebook full of ideas, but hadn’t given myself the chance to follow-through. My personal quest for DLF Forum 2018, declared just an hour into the start of Learn@DLF, was to absorb, process, and act on what I experienced. I want to learn to live DLF.
Forum 2018 started the same way Forum 2017 finished for me, with the excellent Metadata Analysis Workshop developed by the DLF Assessment and Metadata Working groups. I wasn’t repeating the workshop because I failed to learn, I simply wanted another opportunity to act on what I had learned. One digital collection migration and clean-up wiser, I returned refreshed and ready to absorb everything. OpenRefine, regular expressions, GREL, application profiles; I wanted all that wonderful techy metadata stuff. The rush of knowledge felt real good!
I wasn’t repeating the workshop because I failed to learn, I simply wanted another opportunity to act on what I had learned. One digital collection migration and clean-up wiser, I returned refreshed and ready to absorb everything.
I made a vow, that when I got back to Chicago I would turn my pages of notes into real tasks and real projects. I would try every tool, visit every collection, read every article, and archive my favorite presentations for future referral. I lost a few days to jet lag and con crud, but I have taken positive steps towards my goals. I started with a Google Doc full of raw notes, which I’ll now organize into direct actions (digitizing historic course catalogs? create a faculty name authority list like Jeremy Floyd at UNV Reno), research opportunities (get Hope Olson’s “The power to name”), email contacts to follow-up (tell me more about your Syllabrowser project, University of Utah), and random thoughts (did I leave the download link on when I left?).
Three weeks have passed since our time in Las Vegas. How am I doing so far? I’ve had a very good start, thank you for asking. I’ve reached out to colleagues and had helpful, substantive follow-up conversations. I used new tools and skills to normalize accession numbers in our Artists’ Book collection (this allowed us to better search, filter, and analyze those numbers; I squealed like a child on their birthday when I got that to work). This is just the beginning; I have so many new projects to try that I am freshly enthused for the coming year.
And what of the next year? Will I continue to operationalize my experiences from the 2018 Forum, or will entropy take control? Between now and next fall I intend to find out.
And what of the next year? Will I continue to operationalize my experiences from the 2018 Forum, or will entropy take control? Between now and next fall I intend to find out. I will track and analyze my successes and my failures, and hope to report back in Tampa. If this proves a success, I can add another goal for the 2019 DLF Forum: pacing my physical & mental energy and make it through the whole conference without exhaustion. I’ll be seeing you in the Meditation Room!
If you’d like to get involved with the scholarship committee for the 2019 Forum (October 13-16, 2019 in Tampa, FL), sign up to join the Planning Committee now! More information about 2019 fellowships will be posted in late spring.
In yesterday’s post, I linked to a law review article that cited a court decision as “755 F.3d 496”. That’s a kind of citation that pops up a lot both in law review articles, and in court decisions and briefings themselves. It’s a reference to a decision as published in the third series of the Federal Reporter, volume 755, page 496. The Federal Reporter isn’t a government publication, though– it’s a serial issued by the private firm of West Publishing, now a part of Thomson Reuters. And it’s copyrighted.
In fact, it’s pretty emphatically copyrighted, like many other law reporters from private publishers. If you look at the inventory I’ve compiled of early 20th-century serials with renewed copyrights, you’ll see that a lot of the serials with renewals all the way back to 1923 are law reporters or related legal publications. West Publishing was particularly thorough with its renewals; for 1923, it renewed both initial issues of the Federal Reporter published shortly after the court decisions they cover, and “permanent edition” volumes, published a little bit later, that had updated and corrected copies of the ruling texts and accompanying notes.
It’s not surprising that West, and other legal publishers reporting on other jurisdictions, would be so assertive about copyright: If you publish something that’s a “must-have” for law firms, libraries, and other serious legal professionals, you can charge a lot of money for it. Pricing for Westlaw, Thomson Reuters’ online database that includes the Federal Reporter, is hard to find, but third parties note that it can easily run well over $100 per month per user. If I wanted instead to buy print copies of the Federal Reporter, the purchase options the publisher offers to me include buying individual volumes at $890 apiece, or buying the whole series as far back as 1993 for the low, low price of just over $30,000. And this pricing reflects a time when readers have more choices than they have in the past for getting court decisions. Nowadays, you can enter a recent federal case citation or a set of parties into Google or another search engine and have a pretty good chance of finding copies of the court’s rulings and decisions online. In 1923, there was no Google, and most people had few or no options for obtaining authoritative copies of many court rulings or other laws other than through a private publisher’s law reporter.
This level of monopolization and pricing for law reporters may seem odd, given their contents. By long-standing custom, the law is public domain in the United States. Section 105 of US copyright law explicitly states “Copyright protection under this title is not available for any work of the United States Government“, and that includes the official rulings and opinions of the federal courts covered in the Federal Reporter. So what can the publishers claim copyright on?
Historically, the answer has been “whatever they can get away with”. That was a lot, even as late as the early days of the Web, when there were intense struggles to make the law freely available online. Gary Wolf’s article “Who Owns the Law?”, published in Wired in 1994, gives us a detailed, dark picture of the situation then. Online access to much of the law was controlled by a duopoly of West and Lexis-Nexis (the latter now a part of RELX). West claimed (and still claims) copyright to the summary headnotes and other annotations it added to cases. That seems fairly reasonable in itself, as it can represent substantial original creative work. But since those notes are interleaved with the issuances of the court in the reporters, it can be difficult for a lay person to determine what’s public domain and what’s not– and it also means that people can’t simply mass-digitize the reporters and put the page images online, like has been done for millions of completely public-domain books. Hence, HathiTrust’s openly accessible run of the Federal Reporter as of today stops at volume 281, the last permanent-edition volume published in 1922.
But West went further than just claiming copyright on its annotations. It also claimed copyright on its editing of the decision texts (including incorporating corrections as they were issued), and even on the page numbering system it used for the decisions. As Wolf’s article notes, in the 1980s, the 8th Circuit supported West’s claims over its page numbering, which effectively gave it a monopoly as long as the court system used citations based on that page numbering system. (West would later license the numbering system to Lexis-Nexis, but no one else had a license in the early 1990s.)
The situation would improve not long after Wolf’s article was published. That same year, Hyperlaw sued West over its claims to copyright over its page numbering and edited versions of court decisions. In 1998, the 2nd Circuit issued two rulings (158 F.3d 674 and 158 F.3d 693) that struck down West’s more expansive copyright claims. As quoted by the Association of Research Libraries, the court ruled that “all of West’s alterations to judicial opinions involve the addition and arrangement of facts, or the rearrangement of data already included in the opinions, and therefore any creativity in these elements of West’s case reports lies in West’s selection and arrangement of this information. In light of accepted legal conventions and other external constraining factors, West’s choices on selection and arrangement can reasonably be viewed as obvious, typical, and lacking even minimal creativity.” On that basis, the Second Circuit denied copyright protections to them, and to their page numbering system. (A detailed contemporary analysis of the case, published after the district court ruled but before the appeals court did, can be found in Peter Thottam’s 1998 article, “Matthew Bender v. West Publishing”, published in the Berkeley Technology Law Journal.)
Since then, it’s become much easier to find court opinions online, including those derived from the Federal Reporter and other law reporters, complete with page number citations, but stripped of any external annotations. And even those annotations may not always be copyrighted. A recent decision by the 11th Circuit notes that when annotations are “an inextricable part of the official codification” of the law, they might not qualify for copyright protection. In Georgia, an annotated code was declared to be the official version of state law, and Georgia asserted a copyright over it based on the annotations. (Ironically, those annotations were made by Matthew Bender and Co., the same firm that challenged West’s expansive copyright claims to its law reporters.) After Georgia sued Public.resource.org, an organization that digitized the complete annotated code, the appeals court ruled that “no valid copyright interest can be asserted in any part” of the annotated code. (There’s a nice summary of this case and some related ongoing legal actions at the website of the Electronic Frontier Foundation, which is representing Public.resource.org.)
Could this ruling serve as a precedent to open up other annotated publications, such as, for instance, annotated court decisions, if they were effectively the official records of those decisions? I can’t say for sure, or whether that would apply to older volumes of the Federal Reporter, since I’m not a lawyer and don’t have deep knowledge of how that publication was treated back in the 1920s. But I find it an interesting question to consider. In the meantime, we’ll have some more Federal Reporter volumes entering the public domain the long way, via expiration of their copyright terms, 27 days from now.
I say above that “there was no Google” in 1923. Strictly speaking, there was a Google then: not the search and advertising company we know today, but another Google that was the brand of a media franchise that continues to this day. It’s partly in the public domain now, and more of it will be in the public domain next month. I’ll talk more about it, and the challenges in determining which parts are in the public domain, in tomorrow’s calendar entry.
Yesterday the Digital Preservation Network (DPN) announced plans to cease operations. The individual nodes that collectively provided preservation services to DPN seek to reassure the DPN membership as well as the larger academic and digital preservation communities that we remain confident about the future of digital preservation.
We continue to support long-term distributed digital preservation. The rich array of collaborative, community-driven digital preservation services in higher education offers reliable benefits to the academic community, despite DPN’s departure. Several of the services represented by DPN nodes provided robust technical infrastructure to DPN depositors from strong organizational bases that serve other constituents as well. That strength is unshaken by this turn of events. In no way does DPN’s end obviate the need for continued redundant, resilient, diverse preservation services working together.
Consistent with the values we affirmed last year, in the document “Digital Preservation Declaration of Shared Values,” which we all signed, at the core of our digital preservation mission is the belief that “we can accomplish these goals better together rather than separately.” We are united in our dedication to continue exploring future collaborative opportunities. Our resolve and our resilience in pursuit of our common goals remains strong.
In the immediate future we will work together with DPN to assist its depositors during DPN’s shut down process. While we continue to identify next steps, we will be moving ahead from a collaborative position of strength. We are available to work with you to support current and future preservation needs in whatever way we can. You can expect further communication from the node where you deposited content very soon.
In April-May this year we conducted the third “International Linked Data Survey or Implementers” as we were curious what might have changed in the three years since the last survey. We wanted to learn details of new implementations that format metadata as linked data and/or make subsequent uses of it, and what might have changed in linked data implementations reported in previous surveys.
Eighty-one institutions responded to the 2018 survey, describing 104 linked data projects or services, compared to 71 institutions describing 112 linked data implementations in 2015. Of the 104 linked data implementations, only 42 had been described previously.
Among the highlights I’ve noticed and their implications:
This was the first time we received responses from service providers, which provide linked data services for their customers. This may lead to fewer individual institutions launching their own linked data projects. Among the 2018 survey respondents, 37% relied at least partially on a system vendor, corporation, or external consultants/developers to implement their linked data project or service, and several respondents were clients of providers who also responded to the survey.
40% of the linked data implementations in production that were described in the 2018 survey had been in production for more than four years. These more mature implementations can serve as exemplars for others.
42% of the “new” responses to the survey (not described in previous surveys) were outside the library domain. We see more linked data initiatives from research institutions and cultural heritage organizations, reflecting a more diverse range and usage of linked data.
More respondents reported that their linked data project or service was successful or “mostly” successful in 2018 than in 2015 (56% compared to 41%). The respondents whose linked data implementations had been in production for at least four years commented that their indicators of success were:
Usage, evidenced by substantial increases in usage and more contributors
Data re-use, shown by other applications make use of their linked data implementations and more bulk downloads
Interoperability, providing access to other resources
User satisfaction, providing users a richer experience that is much more contextualized and inter-related. One noted their “happy users” were “probably unaware that the service is based on linked data.”
Influence, as successful implementations gain attention they’ve influenced other initiatives and moved the discussion on linked data in the library community
Professional development, as even absent metrics demonstrating linked data’s value to others, linked data projects still provide professional development for staff.
Among those publishing linked data, we observe substantial increases in the use of Schema.org and BibFrame, and decreased usage of SKOS and FOAF, in particular.
Among the top ten linked data sources consumed, the biggest change was the surge in consuming Wikidata, more than four times that reported by respondents in 2015. The National Library of Finland observed, “Wikidata is becoming more and more significant for cultural heritage organizations, including our library.” We also saw big increases in consuming WorldCat.org and ISNI.
Most linked data implementations remain experimental or educational in nature. Few are integrated into daily workflows. Commented the Oslo Public Library, “As far as I can see, Oslo Pubic Library is still the first and only library with its production catalogue and original cataloguing workflows done directly with linked data.”
We have updated the Linked Data Survey section on our website which compiles summaries, articles, and presentations about the three surveys and a link to the Excel workbook with the survey responses. The responses to all three surveys (2014, 2015, and 2018), without the contact information which we promised we’d keep confidential, are publicly available in this Excel workbook so you can conduct your own analysis or focus on the institutions that most resemble yours.
My friend John Wharton died last Wednesday of cancer. He was intelligent, eccentric, opinionated on many subjects, and occasionally extraordinarily irritating. He was important in the history of computing as the instruction set architect (PDF) of the Intel 8051, Intel's highest-volume microprocessor, and probably the most-implemented instruction set of all time, and as the long-time chair of the Asilomar Microcomputer Workshop. I attended AMW first in 1987, and nearly all years since. I served on the committee with John from 2000 through 2016, when grandparent duties forced a hiatus.
On hearing of his death, I thought to update his Wikipedia page, but found none. I collected much information from fellow Asilomar attendees, and drafted a page for him, which is has been published. Most of the best stories about John have no chance of satisfying Wikipedia's strict standards for sourcing and relevance, so I have collected some below the fold for posterity.
John was a founding member of the editorial board of Microprocessor Report, writing for it frequently. His opinion columns were often contrarian, and ended up being called "Oblique Perspective". A collection from August 1988 to August 1995 includes among others:
Architecture vs. Implementation in RISC Wars (8/88)
Unanswered Questions on the i860 (5/89)
The "Truth" About Benchmarks (5/18/90)
Does Microcomputer R&D Really Pay Off? (9/19/90)
Have The Marketing Gurus Gone Too Far? (5/15/91)
A Software Emulation Primer (10/2/91)
The Irrelevance Of Being Earnest (4/15/92)
Why RISC Is Doomed (8/19/92)
Brave New Worlds (12/9/92)
Breaking Moore's Law (5/8/95)
Is Intel Sandbagging on Speed? (8/21/95)
They're all worth reading, I've just hit the highlights.
One of the "Oblique Perspectives" deserves special mention. In How To Win Design Contests (10/17/90) John propounds a set of rules for winning design contests and, in a section entitled A Case-Study In Goal-Oriented Design, shows how he used them to win a red Porsche 944! The rules are:
Go ahead and enter. No-one else will, and you can't win otherwise.
Another department staged a single-board-computer design contest, with development systems and in-circuit emulators worth thousands of dollars as prizes. Their most creative entry proposed to install computers in public lavatories to monitor paper towel and toilet paper consumption and alert the janitor if a crisis was imminent. No one could tell if the proposal was a joke, but it won top honors anyway.
Consider what the sponsor really wants.
So the best way to win is to reverse engineer the sponsor's intentions: figure out what characteristics he'd most like to publicize and then put together an application-possibly contrived-with each of these characteristics.
Keep it really, really, really simple.
The judges will have a number of entries to evaluate, and those they understand will have the inside track. ... It's far better to address a real-world problem the public already understands so they can begin to grasp your solution and see the widget's advantages immediately.
Devote time to your entry commensurate with the value of the prize.
The entry form may request a five-line summary of your proposal and its advantages, but this isn't a quick-pick lottery. If the contest is worth entering, it's worth entering right, and neatness counts.
John starts by applying rule 4:
The Porsche was worth about $25,000. Based on the tum-out of previous contests, I guessed Seeq would get at best four other serious entries, which put my odds of winning at one in five. That justified an investment of up to $5,000 worth of my time - about two weeks - enough to go thoroughly overboard on my entry.
And goes on to describe designing and prototyping a "smart lock". He concludes:
It seems my expectations were overly optimistic on two fronts. I'd underestimated the number of competing designs by an order of magnitude, and while my entry's basic concepts and gimmicks were all developed in one evening, it took a week longer than I'd planned to debug the breadboard and document the design.
Even so, I, too, was happy with the results. Some months after the contest ended - on my birthday, by happy coincidence -! got the call. My lock had been judged best-of-show; I'd won the car. The award ceremony - with full media coverage - was one week later.
So, if you notice a shiny, red, no-longer-new Porsche cruising the streets of Silicon Valley, sporting vanity plates "EEPRIZ," you'll be seeing one of the spoils of design contests. Fame and fortune can be yours, too, if you simply apply a little creative effort.
The Asilomar Microcomputer Workshop
AMW started in 1975 with sponsorship from IEEE. David Laws' writes in his brief history of the workshop about the first John attended and spoke at:
The last IEEE-sponsored workshop in 1980 featured a rich program of Silicon Valley nobility. Jim Clark of Stanford University spoke on the geometry engine that kick-started Silicon Graphics. RISC pioneer Dave Patterson of UC Berkeley covered “Single Chip Computers of the Future,” a topic that evolved over the subsequent year and led to his 1981 “The RISC” talk. In 2018 Patterson shared the Turing Award with John Hennessy of Stanford for their work on RISC architecture. Gary Kildall, who both influenced and was influenced by discussions at the workshop, described his PL/I compiler. Designer of the first planar IC and the first MOS IC, Bob Norman talked about applications of VLSI. Carver Mead capped this off with a keynote talk on his design methodology.
John started chairing sessions in 1983, and became Chair of the workshop in 1985, a position he continued to hold through 1997. He was Program Chair from 1999 through 2017. The format, the eclectic content, and the longevity of AMW are all testament to John's work over three decades.
The title of John's 1980 talk was "Microprocessor-controlled carburetion", which presumably had something to do with ...
Engine Control Computers
In Found Technology (4/17/91) John recounted having a problem with his Toyota in Tehachapi, CA and attempting to impress the Master Mechanic:
"In fact, I developed Ford's very first engine computer, back in the '70s." That should impress him, I thought.
He pondered briefly, then asked: "EEC-3 or -4?"
Damn! This guy was good. "I thought it was EEC-1," I began, trying to remember the "electronic engine control" designators. "It was the first time a computer ... "
"Nah, EEC-1 and -2 used discrete parts," he interrupted. "EEC-3 was the first with a microprocessor."
"That was it, then. It had an off-the-shelf 8048."
"You mean you designed EEC-3?" the Master Mechanic asked incredulously. "Hey, George!" he shouted to the guy working under the hood. 'When you're done fixing this guy's car, push it out back and torch it! He designed EEC-3!"
So much for impressing the Mechanic. "Huh?" I shot back defensively. "Did EEC-3 have a problem?"
"Reliability, mostly," he replied. "The 02-sensor brackets could break, and the connectors corroded."
I beat a hasty retreat. "Those sound like hardware problems," I said. "All I did was the software."
Stan Mazor recounts the early history of engine control computers:
While an App Engineer at Intel, GM engaged me to help them design a car that used an on board computer to do lots of stuff, even measure the tire pressure of a moving vehicle. Little did I understand at the time that auto companies HATED electronic company's components, and their motive was to prove that computer chips COULD NOT be used. I only learned that late in their project work. When VW announced and demonstrated an on board diagnostic computer, the auto industry (USA) was hugely embarrassed and tried to catch up with VW.
Due to pollution and government mandates, Ford implemented a catalytic converter with a Zirconium Oxide sensor. Their 8048 computer measured the pollution, and servo'd the car's carburetor mixture of fuel and oxygen. (recall ancient cars had a manual choke). John and his associate were Intel app engineers on the project. As I best recall their program used pulse-width duty cycle modulation to control the 'solenoid' controlling fuel mix, (Hint: too lean, too rich, too lean, too lean, etc.)
Now comes the interesting story: Cranking the starter on a cold car, injected huge transient voltage into the CPU, and scrambled the program counter, and the app could start at any instruction, no that's not quite the issue.
The 8048 has 1 and 2 byte instructions, so the program counter could enter up at any byte, yes, the middle of an instruction and interpret that byte as an op code, even if it was a numeric constant, or half of a jump address !!!
No that's only half the story: It turns out that one of the 8048 instructions is irreversible under program control and the only way out (of mode 2), was to hit the reset line!!!. So the poor guys (John) had to re-write their code to insure that no single byte of object code could be the same as that magical (and unwanted) instruction operation code.
This paper describes the implementation of model vehicles with neural network control systems based on Valentino Braitenberg's thought experiment. The vehicles provide a platform for experimentation with neural network functions in a novel format. Their operation is facilitated by the use of a commercially available neural network simulator to define the network for downloading to the vehicles.
The block diagram shows a vehicle with right and left sensor arrays feeding an 8051 running neural network code from an EPROM. The network definition is downloaded into RAM via a serial link. The 8051 drives right and left motors. Their development environment was MacBrain:
The network model implemented in the current firmware is similar to that used by MacBrain, a neural network simulator for Macintosh computers, ... MacBrain allows the user to create and edit networks on screen, load and save network definitions to disk, and run simulations of networks while observing the changing activation of the various network units.
In "Further Work" they wrote:
One area of potential interest not addressed in initial version of the vehicle's network simulation model was the dynamic alteration of network parameters during operation based on sensory input and network activation. While any of the network's parameters could be changed in this manner, the most likely candidate would be the link weights. The alteration of link weights based on some criteria is a widely used model for "learning" in neural network experimentation and, indeed, is thought to be part of the mechanism for learning and memory in real biological nervous systems.
In 1996, when the Letterman show came to San Francisco, Mr. Wharton made a calculated effort to get noticed. He figured out which seats the camera would be most likely to focus on and made sure that he was seated there. He made himself conspicuous by undoing his ponytail and donning a tie-dyed shirt. He looked ''just like the sort of San Francisco hippie'' the show's producers would expect to see, he said.
It worked. Mr. Letterman himself strode into the audience, asked Mr. Wharton his name, then asked if he would agree to take a shower in the host's dressing room -- an ongoing gag of Mr. Letterman's. Mr. Wharton happily obliged. The cameras followed Mr. Wharton, from his torso up, as he disrobed, stepped into the shower and lathered up. On his way back into the audience, clad in a white bathrobe, he managed to snatch a copy of the script.
Mr. Wharton derives much of his satisfaction from the thrill of the puzzle itself, like the inner workings of the Furby. Apart from Dave Hampton, the inventor and engineer who created Furby and whom Mr. Wharton reveres, Mr. Wharton may understand Furby's innards better than anyone else. Although he and Mr. Hampton are acquainted, Mr. Wharton would never think to ask Mr. Hampton for a road map.
''That would be cheating,'' he said. ''It would be like asking the guy who wrote the crossword puzzle for the answers.''
Dave Hampton was a frequent attendee at AMW. One year he bought a large box full of Furbies as gifts for the attendees. At John's instigation, a few of us turned them all on, replaced them carefully in the box, carried the box carefully into the room where the meeting was underway, and shook the box. Whereupon they all woke up and started to talk to their friends. The sound of about a hundred Furbies in full chatter has to be heard to be believed.
The chapter included the story of how Tim Paterson's QDOS (Quick and Dirty Operating System), reverse-engineered from CP/M, became MicroSoft's 86-DOS, which led Paterson to file:
a defamation lawsuit, Paterson v Little, Brown & Co, against Sir Harry in Seattle, claiming the book's assertions had caused him "great pain and mental anguish". The court heard the detailed API evidence, and rejected Paterson's suit in 2007. US federal Judge Thomas Zilly observed that Evans' description of Paterson's software as a "rip-off" was negative, but not necessarily defamatory, and said the technical evidence justified Sir Harry's characterisation of QDOS as a "rip-off".
Much of the technical evidence came from John, who was tasked at Intel with evaluating 86-DOS and showed that it was only a partial clone of CP/M, as described in the image of a letter to Microprocessor Report in 1994. The image comes from John's obituary by Andrew Orlowski in The Register.
Mark Olson provides this annotated scan of page 9-102 of the 1983 edition of Intel's Microcontroller Handbook in which it was included (it is page 46 of the PDF linked from the reference).
Click to embiggen. See this comment for another of them.
Update 3: Memorial Web Site
Michael Takamoto has set up a memorial Web site for John, from which I obtained the pictures of John's 944, alas before the vanity license plate, and of the newspaper article about it. The site includes links to two other appearances John made in the New York Times:
As I noted in yesterday’s post, under US copyright law we sometimes get early works by long-lived authors in our public domain earlier than most other countries do. But we also often have to wait longer for authors’ late works. Today I consider the case of Sir Arthur Conan Doyle, and his most famous creation, the ingenious detective Sherlock Holmes.
It didn’t stick, though. Public demand for more Holmes stories led the author to publish another Holmes novel in 1901 (but set before his plunge over the Reichenbach Falls), The Hound of the Baskervilles. By 1903, Conan Doyle gave in and brought Holmes back to life in “The Adventure of the Empty House”, one of the more famous early examples of retroactive continuity now common in comics and other long-running entertainment series. Conan Doyle would continue to publish Holmes stories, at a somewhat slower pace than before, through 1927, the same year the last ones were collected in The Case-Book of Sherlock Holmes.
Conan Doyle died in 1930, so his published works entered the public domain in the UK and many other countries in 1981, after 50 years had passed from his death. In 2001, they entered the public domain under “life + 70 years” regimes (which the UK and many other countries had adopted in the interval). But while early Holmes stories have been in the public domain in the United States for a long time, his last stories are still under copyright here. Their copyright terms start with their first publication, though, and the last stories were published in magazines before they were published in the Case-Book. In particular, “The Adventure of the Creeping Man” was published in the March 1923 issues of both the Strand Magazine and Hearst’s International. The author’s children renewed the copyright to the story in 1950, and copyrights were also routinely renewed for Hearst’s issues. US copyrights for the story, and for the other content in the 1923 issues of Hearst’s, will end 28 days from today.
The character of Sherlock Holmes was well-established in his early stories, and many Holmes fans consider the later Case-Book stories relatively minor. That hasn’t stopped the Conan Doyle estate from claiming proprietary rights over Holmes stories generally, though, based on its US copyrights to the last few stories. When the estate tried to block an anthology of new Holmes stories that Leslie Klinger planned to publish, unless he got permission and paid a fee, Klinger sued for a declaratory judgment that he did not need any such permission. In 2014, the 7th Circuit agreed with him, noting that Klinger was not proposing to reuse elements introduced in the late stories, and that the “the alterations [in those stories] do not revive the expired copyrights on the original characters”. In a later ruling, the same court awarded Klinger attorney’s fees from the estate, calling their practices “a form of extortion” and saying “It’s time the estate, in its own self-interest, changed its business model”.
A more detailed legal analysis of the case can be found in Jessica L. Malekos Smith’s article “Sherlock Holmes & the Case of the Contested Copyright” (2016). That article refers to the initial ruling I’ve linked above under the citation “755 F.3d 496”. That notation, cryptic as it may appear to laypeople, refers to a commonly cited legal source that has had its own copyright controversies, and that also includes material that will soon be added to the public domain. I’ll talk about it more in tomorrow’s calendar entry.
DuraSpace and the University of Toronto Libraries in collaboration with Scholars Portal and COPPUL (Council of Prairie and Pacific University Libraries) are pleased to announce a new joint project “DuraCloud Canada: Linking Data Repositories to Preservation Storage” funded by CANARIE, a vital component of Canada’s digital infrastructure supporting research, education and innovation.
The purpose of the proposed project is to connect preservation storage services through a common deposit layer based on open source software from DuraSpace called DuraCloud to research data preservation in Canada.
DuraCloud is a package of software components — some server based and others web and desktop based — that provides brokering services for cloud-based storage as well as a set of deposit tools and APIs that standardize the way users interact with cloud storage providers.
These software components are made available under open source licenses by DuraSpace and have been used to set up national services such as DuraCloud in the US and DuraCloud Europe. Using the same model and a similar deployment approach, the project will create a service called DuraCloud Canada with the goal of connecting preservation storage services to data repositories and bridging the current gap that exists between both.
The proposed project will make DuraCloud available for research data preservation in Canada by contributing to the DuraCloud open source codebase in order to facilitate the integration of existing data repositories. In doing so, it will fulfill multiple goals relating to the preservation of research data in Canada. It will provide an
interoperability layer between different preservation storage providers, expose a relatively easy-to-use API that data repositories may use to integrate with preservation storage options, and expose the set of pre-existing integrations that are already part of the DuraCloud system.
Work on this project will begin in November of 2018 and continue into 2020. Initial work will make use of DuraCloud within the Amazon Web Services (AWS) cloud environment by adding storage integrations to existing Canadian data repositories. Once these connectors are in place, development will continue with the goal of allowing the DuraCloud software to be run within the University of Toronto data center rather than in AWS.
CANARIE strengthens Canadian leadership in science and technology by delivering digital infrastructure that supports world-class research and innovation.
CANARIE and its twelve provincial and territorial partners form Canada’s National Research and Education Network. This ultra-high-speed network connects Canada’s researchers, educators and innovators to each other and to global data, technology, and colleagues.
Beyond the network, CANARIE funds and promotes reusable research software tools and national research data management initiatives to accelerate discovery, provides identity management services to the academic community, and offers advanced networking and cloud resources to boost commercialization in Canada’s technology sector.
Established in 1993, CANARIE is a non-profit corporation, with the majority of its funding provided by the Government of Canada.
For more information, please visit: www.canarie.ca
DuraSpace as an independent 501(c)(3) not-for-profit organization provides leadership and innovation for open technologies that promote durable, persistent access to digital data. We collaborate with academic, scientific, cultural, technology, and research communities by supporting projects and advancing services to help ensure that current and future generations have access to our collective digital heritage. Our vision is expressed in our organizational byline, “Working together to provide enduring access to the world’s digital heritage.” DuraSpace is the organizational home of the DuraCloud open source software.
About The University of Toronto Libraries
The University of Toronto Libraries system is the largest academic library in Canada and is ranked sixth among peer institutions in North America. The system consists of 42 libraries located on three university campuses: St. George, Mississauga, and Scarborough. This array of college libraries, special collections, and specialized libraries and information centres supports the teaching and research requirements of over 280 graduate programs, more than 60 professional programs, and about 700 undergraduate degree programs. In addition to more than 15 million volumes in 341 languages, the library system currently provides access to millions of electronic resources in various forms and over 31,000 linear metres of archival material. More than 150,000 new print volumes are acquired each year. The Libraries’ data centre houses more than 500 servers with a storage capacity of 1.5 petabytes.
Scholars Portal was formed in 2002 as a service of the Ontario Council of University Libraries (OCUL) with the University of Toronto as service provider. The Scholars Portal technological infrastructure preserves and provides access to information resources collected and shared by Ontario’s 21 university libraries. Through the Scholars Portal online services, Ontario’s university students, faculty and researchers have access to an extensive and varied collection of e-journals, e-books, social science data sets, geo reference data and geospatial sets. Scholars Portal continues to respond to the research needs of Ontario universities through the creation of innovative information services and by working to ensure access to and preservation of this wealth of information.
Last year's series of posts and PNC keynote entitled The Amnesiac Civilization were about the threats to our cultural heritage from inadequate funding of Web archives, and the resulting important content that is never preserved. But content that Web archives do collect and preserve is also under a threat that can be described as selective amnesia. David Bixenspan's When the Internet Archive Forgets makes the important, but often overlooked, point that the Internet Archive isn't an elephant:
On the internet, there are certain institutions we have come to rely on daily to keep truth from becoming nebulous or elastic. Not necessarily in the way that something stupid like Verrit aspired to, but at least in confirming that you aren’t losing your mind, that an old post or article you remember reading did, in fact, actually exist. It can be as fleeting as using Google Cache to grab a quickly deleted tweet, but it can also be as involved as doing a deep dive of a now-dead site’s archive via the Wayback Machine. But what happens when an archive becomes less reliable, and arguably has legitimate reasons to bow to pressure and remove controversial archived material? ... Over the last few years, there has been a change in how the Wayback Machine is viewed, one inspired by the general political mood. What had long been a useful tool when you came across broken links online is now, more than ever before, seen as an arbiter of the truth and a bulwark against erasing history.
Below the fold, some commentary on the vulnerability of Web history to censorship. Bixenspan discusses, with examples, the two main techniques for censoring the Wayback Machine:
That archive sites are trusted to show the digital trail and origin of content is not just a must-use tool for journalists, but effective for just about anyone trying to track down vanishing web pages. With that in mind, that the Internet Archive doesn’t really fight takedown requests becomes a problem. That’s not the only recourse: When a site admin elects to block the Wayback crawler using a robots.txt file, the crawling doesn’t just stop. Instead, the Wayback Machine’s entire history of a given site is removed from public view.
The ability of anyone claiming to own the copyright on some content to issue a takedown notice under the DMCA in the US, and corresponding legislation elsewhere is a problem for the Web in general, not just for archives. It is a problem that the copyright industries are constantly pushing to make worse. In almost no case is there a penalty for false claims of copyright ownership, which are frequently made by automated systems prone to false positives. The onus is on the recipient of the takedown to show that the claim is false which, given that in most cases copyright ownership is never registered, and even when it is the registration may be fraudulent, or out-of-date, poses an impossible barrier to contesting claims:
if someone were to sue over non-compliance with a DMCA takedown request, even with a ready-made, valid defense in the Archive’s pocket, copyright litigation is still incredibly expensive. It doesn’t matter that the use is not really a violation by any metric. If a rightsholder makes the effort, you still have to defend the lawsuit.
The Internet Archive's policy with respect to takedowns was based on a 2001 meeting which resulted in the Oakland Archive Policy. It is being reviewed, at least partly because the Internet Archive's exposure to possible litigation is now so much greater.
The fundamental problem here is that, lacking both a registry of copyright ownership, and any effective penalty for false claims of ownership, archives have to accept all but the most blatantly false claims, making it all too easy for their contents to be censored.
A few months ago we stopped referring to robots.txt files on U.S. government and military web sites for both crawling and displaying web pages (though we respond to removal requests sent to firstname.lastname@example.org). As we have moved towards broader access it has not caused problems, which we take as a good sign. We are now looking to do this more broadly.
We see the future of web archiving relying less on robots.txt file declarations geared toward search engines, and more on representing the web as it really was, and is, from a user’s perspective.
Robots.txt files affect two stages of Web archiving. As regards collection, archival crawlers should clearly respect entries specifically excluding them. But, as Graham points out, most robots.txt files are designed for Search Engine Optimization, and don't clearly express any preference about archiving.
A particular concern regarding dissemination is the policy of automatically preventing access to past crawls of a website that took place while its robots.txt permitted them, because its current robots.txt forbids them, perhaps because the domain name has changed hands, with the new owner having no ownership of the past content. This seems wrong. Web site owners wishing to exclude past permitted crawls should have to request exclusion, and show ownership of the past content they are asking to have redacted.
Clearly, one way to deal with the selective amnesia problem is LOCKSS, Lots Of Copies Keep Stuff Safe, especially when they are in diverse jurisdictions and have different policies about takedowns and robots.txt. Nearly two decades ago, in the very first paper about the LOCKSS system, we wrote (my emphasis):
Librarians' technique for preserving access to material published on paper has been honed over the years since 415AD, when much of the world's literature was lost in the destruction of the Library of Alexandria. Their method may be summarized as:
Acquire lots of copies. Scatter them around the world so that it is easy to find some of them and hard to find all of them. Lend or copy your copies when other librarians need them.
But, alas, this doesn't work as well for Web archives:
A comprehensive Web archive is so large, the Internet Archive has around 20PB of Web history, that maintaining two copies in California is a major expense. The effort to establish a third copy in Canada awaits funding. Three isn't "lots".
The copyright industries have assiduously tried to align other countries legislation with the DMCA, so the scope for different policies is limited.
The rise of OCLC's Worldcat, which aggregates library catalogs in the interests of informing readers of accessible copies, has made finding all of the paper copies in libraries much easier. Similarly, the advent of Memento (RFC7089), which aggregates Web archive catalogs in the interests of informing browsers of accessible copies, has made finding all the copies in Web archives much easier. In practice, Memento is much more of a double-edged sword than Worldcat because the paper world has no effective equivalent of DMCA takedowns.
So, especially with the help of Memento, it is easy for malefactors to target all accessible copies of content in the small number of archives that will have them.
Libraries have a special position under the DMCA, and the Internet Archive has always positioned itself as the equivalent on the Web of a public library in the paper world. This is, for example, the basis upon which their successful lending program is based. But:
"Under current copyright law, although there are special provisions that give certain rights to libraries, there is no definition of a library," explained Brandon Butler, the Director of Information Policy for the University of Virginia Library.
And in the one case in point:
"The court didn’t really care that this place called itself a library; it didn’t really shield them from any infringement allegations."
So the Internet Archive and other Web archives may not in practice qualify as libraries in the legal sense.
It would be hard for a copyright owner to argue that a national library, such as the Library of Congress or the British Library, wasn't a library. National libraries typically have special rights supporting their copyright deposit programs, but these rights have to be extended via new legislation to cover Web archives, and these collections are typically inaccessible outside the national library's campus. In any case, many national libraries are under sustained budget pressure, and many of their programs would be difficult without cooperation from copyright holders, so they are in a weak negotiating position.
Jay Colbert (@SpookyColbert) is the current Resident Librarian at the University of Utah’s J. Willard Marriott Library. Their responsibilities include subject liaison work, reference, and library instruction along with metadata and cataloging duties, primarily for the digital library and digital exhibits.
Their research investigates the relationship between language and power in libraries, particularly in subject access and descriptive cataloging. Jay received their MSLIS from the University of Illinois at Urbana-Champaign and their BA in English from the College of William & Mary. They are also active within the ALCTS CaMMS division and GLBT Round Table of ALA, and OLAC. In their free time, they watch too many movies, practice Buddhism and yoga, and hang out with their bearded dragon.
This year, I was selected as one of the DLF New Professionals Fellows. I was beyond thrilled to be accepted and to attend the DLF Forum. Although I focused my graduate coursework on metadata, it was more geared toward traditional bibliographic description as opposed to online resources. In my current position, I work a lot with our digital library services, I thought it would be prudent to expand my knowledge on digital libraries. I am glad to have attended the DLF Forum to aid in this.
I attended many types of sessions on varying topics, from system migrations to linked data to privacy. One theme through most of the sessions that I wish to highlight is that of recognizing our power as librarians over our patrons and even our coworkers. Most sessions I attended mentioned this theme, either at their focus or implied.
Metadata librarians of all types have a long history of trying to fix the way we describe items so that the language does not replicate oppressive structures. However, simply trying to correct our practices is not enough. As the presentation on indigenous visual culture and subject access in the Decolonizing Vocabularies session stressed, we must work alongside communities and/or consult them when developing our vocabularies and describing items. Through this, we shift our power to those communities.
We do not just hold power over communities. Indeed, as the session on Disrupting Exploitative Labor demonstrated, most of our entry-level positions put new-career librarians at a power disadvantage. This session pointed out how unpaid internships and temporary contracts exploit the labor of those in those positions and contribute to a workplace culture which devalues labor, especially labor done by early-career librarians. I often see job postings for digitization and description that are project based, meaning they end when the project ends. In digital libraries, as this session emphasized, we need to create permanent positions, question and oppose temporary positions, and create workplace atmospheres where new librarians, especially those who belong to marginalized groups, feel welcome.
Although our field is a highly technical one, the sessions at the DLF Forum reminded us that we do this for people, and that our work therefore does not exist in a vacuum made of ones and zeros.
I feel that the keynote speaker, Anasuya Sengupta, tied this theme of power in digital libraries together. In her keynote, she spoke of the unifying power of digital libraries and how they can empower marginalized peoples. However, they can also reinforce hegemonies; most of the internet (including Wikipedia) is in English.
As digital library workers, it is our job to help make the internet and the information in it accessible to all people. At the 2018 DLF Forum, many sessions stressed this aspect of our job, challenging us to be better. Although our field is a highly technical one, the sessions at the DLF Forum reminded us that we do this for people, and that our work therefore does not exist in a vacuum made of ones and zeros.
If you’d like to get involved with the scholarship committee for the 2019 Forum (October 13-16, 2019 in Tampa, FL), sign up to join the Planning Committee now! More information about 2019 fellowships will be posted in late spring.
In recognition of their many contributions to the community and to the development of Islandora CLAW, the Islandora CLAW committers have asked Rosie Le Faive (UPEI) and Mark Jordan (SFU) to become a committers and we are pleased to announce that they have accepted!
Rosie has brought her dedication to user experience and documentation from the 7.x stack to Islandora CLAW, providing guidance on how to improve the front end and working as the convenor of the UI Interest Group (currently on hiatus) to develop a welcoming first experience for the Islandora CLAW sandbox environment. As co-convenor of the Metadata Interest Group, Rosie has also been integral to the process of plotting out our MODS to RDF mapping so that users of Islandora 7.x can make the move to Islandora CLAW with their MODS in tow.
Mark has joined the Islandora CLAW party more recently, but hit the ground running, developing tools such as RipRap, a fixity-auditing microservice that acts as a successor to Islandora Checksum Checker. Mark's focus on preservation tools fills an important gap in the CLAW ecosystem.
Both Rosie and Mark are also Committers on Islandora 7.x, and now join the short list of dual Committers.
Further details of the rights and responsibilities of being a Islandora committer can be found here:
Few authors have had as longstanding an influence on their genres as Agatha Christie. In her more than half-century career writing detective fiction, she established (and then often went on to break) many of the expectations of the English “whodunit” puzzle story. Her novels continue to be adapted for movies and television (recent examples being 2017’sMurder on the Orient Express and 2018’s Ordeal by Innocence), and their fingerprints are all over many crime novels to this day. (For instance, Anthony Horowitz’s best-selling 2016 novel Magpie Murders, which I read while traveling last week, very explicitly both imitates and reacts to Christie’s style.)
Christie died in 1976, less than 50 years ago, so her works are still under copyright in most countries. But her career was long enough that her early work has started to enter the public domain in the United States. That’s because the United States used fixed copyright terms, rather than terms based on the author’s lifetime, until the 1970s. Christie’s first published novel, The Mysterious Affair at Styles, introduced what may be her most famous character, Hercule Poirot. The novel entered the public domain in the United States in 1996. In 1998, it was joined there by The Secret Adversary (1922), which introduced the detective couple Tommy and Tuppence. (Copies of both novels are online.)
1998 was also the year that the US Congress passed the Sonny Bono Copyright Term Extension Act, which extended most copyrights for 20 years and froze the US public domain for published works– until the end of this year. 29 days from now, though, we should see more Christie works entering the public domain here, including The Murder on the Links, in which Hercule Poirot returns to solve a murder mystery in France. The Murder on the Links was published in both the US and the UK in 1923, and its copyright was renewed in the US in 1950. That copyright, like other 1923 copyrights still in force, is set to expire here at the end of this year.
Another English author’s famous detective also made his debut in 1923: Dorothy L. Sayers’ Lord Peter Wimsey. Unlike The Murder on the Links, his debut novel, Whose Body? will not be entering the US public domain this coming January. But that’s because it’s already in the public domain here (at least in its first edition), and has been since 1951, as I found out some years ago after a little detective work of my own.
How did this happen? Under the copyright system used in the US in 1923, copyrights were initially secured for a 28-year term, and then had to be renewed in their 28th year to remain in effect. A 1923 copyright, then, needed to be renewed in either 1950 or 1951 to get another term. (Initially that term was 28 more years, but was later extended to 47 more years, and further extended to 67 more years with the 1998 copyright term extension noted above). Later changes in copyright law (which I’ll discuss further in later calendar entries) exempted many works first published outside the US from the renewal requirement. However, when reading Barbara Reynolds’ biography Dorothy L. Sayers: Her Life and Soul, I found out that due to publisher delays, Whose Body? had come out first in the United States, before Sayers’ British publisher issued it. So if its copyright wasn’t renewed, it was no longer in effect here.
A search through the US Copyright Office’s Catalog of Copyright Entries confirmed that no renewal had been filed for the book. So I digitized it and contributed it to my wife’s Celebration of Women Writers website, where you can read it today (if you’re in the US, or a country whose copyright terms last for 60 years or less from the author’s death).
In less than a month, American readers will have more mysteries to read online, share, or adapt– and not just from Christie (or Sayers). Tomorrow’s advent calendar entry will feature another famous detective whose stories are all in the public domain almost everywhere in the world, but not quite all there in the US, and note one of the copyright battles that situation sparked.
Lucidworks was recently recognized as a leader in the Gartner Magic Quadrant in their Insight Engines category. I wanted to write down what this means for you — and why it’s so important to your organization.
For those unfamiliar, Gartner is the gold standard of analysts with over 2,000 research experts covering various computing and IT trends and vendor categories. As such, they wield enormous influence with IT buyers in the largest companies in the world.
Insight Engines is the term Gartner uses for search that goes beyond mere keywords. An Insight Engine is search that uses AI and advanced algorithmic techniques to deliver more relevant personalized results to customers and employees alike. In other words, Gartner awards this term to search infrastructure that provides real insights. A modern search engine.
Being Named a Leader
So. What does all this mean? Three big things are top of mind to me:
1. The industry agrees with our vision. We believe that search is the single best way to activate AI in the enterprise. We want to enable people to maximize every single digital moment at work or at play. Whether they are looking for an esoteric topic to help them complete a project, or looking through a camping website for a perfect cold weather tent, are the technology that helps people make connections to topics, insights, and experts at the exact moment when they can best use it.
2. Our customers believe in our vision. We launched Fusion four years ago by taking the most robust and reliable search technology in the world, Apache Solr, and fusing it with the popular cluster-computing framework Apache Spark. On top of that foundation we added essential enterprise-friendly AI and operational features to the stack so that some of the most influential organizations in the world could rely on our platform to solve their toughest problems.
Since that time, more than 400 of the largest organizations in the world have given us the honor of running their most important workloads on our Fusion platform. These include top 5 global banks, retailers, and energy companies.
3. We’re primed and ready for much more. As you know, search is not just a box to find things, search technology is everywhere you look and on every screen you see. Search has gone way beyond just 10 blue links. AI requires vision and a human-focus to be more than hype.
Over the next 18-24 months, we will accelerate the delivery capabilities for our customers, and humanize AI. This means more user friendly apps for casual users, power analysts, and service and support agents.
It also means more deployment friendly tools for system administrators and DevOps professionals, all containerized and available to use. All of these tools and apps must be worry free on any combination of private and public cloud infrastructures.
Search might be 30 years old, but don’t confuse 30-year old technology with the search of today.
We’re thrilled about what’s to come. Your trust in our products and team continues to keep us motivated to keep innovating with you, as together we bring the power of our platform to more use cases where we can effectively humanize AI. Together we can make the most complex technology easy and accessible for anyone, where they can best make use of it — through each digital moment in their lives.
Today marks the start of the Christian season of Advent in my church, four Sundays before Christmas. It’s an important time of the year for vocal music, as choirs both sacred and secular rehearse and perform Christmas-related music in services and concerts. We like singing public domain classics– those have stood the test of time and are often well-known to our audiences, and we can copy scores cheaply and don’t have to worry about performance rights or royalties. We also like to sing newer copyrighted works– those can be fresh and exciting, and can often be easily found in print and in recordings. Older works still under copyright, though, can often fade into obscurity, even when they’re by creators who were once well-known.
Today’s Public Domain Day advent calendar entry falls into that category. Though Frances McCollin had a long career in Philadelphia, where I’ve lived and sung for nearly 20 years, I don’t recall hearing of her before doing research for this calendar. Blind from an early age, she took a synesthetic approach to music, according to her biography Frances McCollin: Her Life and Music(1990), associating various keys with different colors and moods. She published nearly 100 pieces of music, and her works have been performed by groups that include the Philadelphia Orchestra and the Warsaw Symphony. The library where I work has some of her papers in its special collections; the Free Library of Philadelphia has others. Some of her early public domain music is now freely available online, such as “The Singing Leaves” (1918). And some of it is still performed today. I’ve found online a sample of a recent recording by Canada’s Elektra Women’s Choir of her arrangement of “In the Bleak Midwinter” for women’s voices. (McCollin died in 1960, so most of her published work is in the public domain in Canada. I’m not sure of the US copyright status of this particular composition.)
I had a much harder time, however finding traces of another Christmas vocal piece she was known for in her day, a children’s chorale setting of “‘Twas the Night Before Christmas”. Based on the famous poem by Clement Clarke Moore (which, published 100 years earlier, had been in the public domain for her to adapt), her cantata was copyrighted in 1923, and renewed in 1950. That copyright persists to this day, as the piece has faded into obscurity. But 30 days from now, the work will enter the public domain in the US, and anyone interested in it can digitize it, make copies, perform it, or create new works based on it, without having to find a rightsholder to ask permission or pay royalties.
As Public Domain Day approaches, I’m looking forward to seeing familiar works join the public domain in the US, like the perennial international best-seller featured in yesterday’s calendar post. But for every work like that, there are many more works that are less well-known, on all kinds of topics of interest, by people who lived all over, including in one’s own hometown. They may have been largely out of sight for years, but their entry into the public domain gives them a new opportunity to be discovered, enjoyed, and shared by anyone.
If you know of any other works that will be coming into the public domain this January you’d like to share, I’d love to hear about them. I’ll post about another one tomorrow.
“Your children are not your children […]
For they have their own thoughts […]
You are the bows from which your children as living arrows are sent forth.”
The lines above are from one of the more well-known parts of The Prophet, written by Lebanese-American poet and artist Kahlil Gibran. Though they talk about one’s offspring, one could also see them as applying to one’s creations. They may originate from people (parents or authors) who start with great control and influence over them, but eventually they become independent of their origin. To quote another line in the poem, they “dwell in the house of tomorrow, which you cannot visit”.
Since The Prophet was first published in 1923, it has to a large extent been under the control and influence of its author, and then, after his death in 1931, his estate. But over time it has entered the public domain in various countries around the world. It entered the public domain in many countries at the start of 1982, after 50 years had passed from the author’s death. It entered (and in some cases, re-entered) the public domain in many others in 2002, more than 70 years after his death. And finally, 31 days from now, it will enter the public domain in the United States, as part of the first batch of published works to enter the public domain in more than 20 years.
During the month of December, this blog will feature various works from 1923 that will be joining the public domain in the US this coming January 1, Public Domain Day. The Prophet is fairly well-known and still easy to find in print. Many other interesting works from 1923 are not so well-known or easy to find, and I hope to feature a wide variety of works over the next 31 days. (I already have some works planned to feature, but have not yet filled out a full roster; if there are any in particular you’d like to suggest, let me know by commenting here or by contacting me.)
I plan to keep most of the posts short, but I’ll have links to more information on the works and authors featured, and folks are also welcome to discuss them further here in the comments if they wish. Come the new year, I’ll also go back and add live links to online copies of the works when possible. (For now, I’ll just add a link to all online books I know of by Gibran, some of which are already public domain in the US, and some of which aren’t yet but are public domain elsewhere.)
I may also sometimes take the opportunity to point out aspects of copyright relevant to the featured work. For instance, in this post I’ve directly copied material from Gibran’s book (by quoting from it) even though I’ve noted that it’s still under copyright where I am. I can do this thanks to the principle of fair use, which allows such copying under certain circumstances. There’s no universal algorithm for determining whether fair use applies in a particular case, but since I’m only quoting a small portion, in a noncommercial educational context, in order to provide original commentary and not to provide a substitute for the original work, a fair use defense would be very likely to prevail if a copyright dispute came up. Once The Prophet enters the public domain here, though, we’ll be able to use larger portions, or the whole work, for a much wider variety of purposes.
I’ll post my next Public Domain Day advent calendar entry tomorrow. I hope you’ll come back then to read it, and welcome your comments.
by the aid of symbolism, we can make transitions in reasoning almost mechanically by the eye which would otherwise call into play the higher faculties of the brain.
…Civilization advances by extending the number of important operations that we can perform without thinking about them. Operations of thought are like cavalry charges in a battle — they are strictly limited in number, they require fresh horses, and must only be made at decisive moments.
One very important property for symbolism to possess is that it should be concise, so as to be visible at one glance of the eye and be rapidly written.
This guest post by Grow with Google Vice President Lisa Gevelber originally appeared in Google’s blog The Keyword.
Since we launched Grow with Google a little over a year ago, we’ve traveled to cities and towns, partnering with local organizations from Kansas to Michigan to South Carolina to bring job skills to job seekers and online savvy to small businesses. No matter where we went, big cities or small towns, libraries were at the heart of these communities.
To support the amazing work of libraries throughout the country, Google and the American Library Association are launching the Libraries Ready to Code website, an online resource for libraries to teach coding and computational thinking to youth. Since we kicked off this collaboration last June, thirty libraries across the U.S. have piloted programs and contributed best practices for a “by libraries, for libraries” hub. Now, the 120,000 libraries across the country can choose the most relevant programs for their communities.
Libraries have long been America’s go-to gathering place for learning. Now more than ever, people are using libraries as resources for professional growth. And libraries are stepping up: 73% of public libraries are making free job and interview support available in their communities.
That’s why starting in January, we’ll also work hand-in-hand with libraries around the country, using technology to help ensure that economic opportunity exists for everyone, everywhere. We’ll bring Grow with Google in-person workshops for job seekers and small businesses, library staff trainings, and ongoing support to libraries in all 50 states.
We’re also announcing a $1M sponsorship to the American Library Association, creating a pool of micro-funds that local libraries can access to bring digital skills training to their community. An initial group of 250 libraries will receive funding to support coding activities during Computer Science Education Week. Keep an eye out for a call for applications from the ALA as Grow with Google comes to your state.
Google is proud to partner with libraries all over the country to ensure economic opportunities for more Americans.