I started my Access 2018 conference experience with a meetup of library people interested in UX. There were only five of us, but we had good conversations about Research Ethics Boards and UX research, about being a UX team of one, and about some of the projects we’ll be working on in the coming year. We also chatted about how we would like to communicate more regularly but how difficult it can be to sustain virtual communities. (Canada is BIG. Heck, even Ontario is big.) It was nice to start off the conference with UX friends – old and new – and my focus stayed on the UX side of things throughout the conference so that’s what I want to write about here.
On Day 1, the first post-keynote presentation was all about UX. Eka Grguric talked about her experience one year in as UX Librarian at McGill. She gets brought into projects in her library as a UX consultant, and also supports others doing UX and user research in the library. She also offers training on UX research methods for interested library staff. Her work is a combination of operational and project-based. She gave a bit of detail about two projects and her monthly operational tests to give us a flavour of the range of methods and processes she uses.
Next up was Ken Fujiuchi and Joseph Riggie from Buffalo State College, who talked about Extended Reality, a combination of virtual reality, augmented reality, and mixed reality technologies. They covered a few different topics (slides here), but what stood out for me was their mention of how user experiences will change as new interfaces become possible and there are new ways for people to interact with materials. They specifically mentioned oral histories moving from audio-only files to users being able to interact with a holographic image of a person who can tell stories but also answer questions. What’s good UX for oral history holograms?
A few presentations also focused on what I see as UX for library staff. Juan Denzer spoke about a project being developed by a pair of students he’s supervising that aims to make it easier to manage EXProxy configuration files (which can easily run to thousands of lines). Having tried to troubleshoot stanzas in EXProxy myself, I can definitely see how this could improve the UX for staff. However, as one of my table mates said, adding an application to manage a text file also adds overhead for whoever has to maintain and update that application. Trade-offs!
Ruby Warren from University of Manitoba was fantastic in her description of a project that didn’t quite get off the ground in the six months she’d set aside to complete it. Ruby had seen that distance students weren’t learning how to use the library in the same way in-person students were (e.g. no in-class visits from a librarian). She wanted to find a way to teach some basic IL to these students and thought that an interactive fiction game would be a good thing to try. She had some great lessons learned (including “Don’t Do Everything in the Wrong Order” and “Plan for Apathy”). One of my favourite things about Ruby’s presentation was that she was upfront about her failures, including – as a UX person – not planning for user testing during development. It’s gutsy to get up in front of your peers and say that you forgot a basic tenet of your discipline because you were too excited about a project. So human but so hard. Yay Ruby! Another key takeaway was not underestimating appeal when planning this kind of project. As someone who has a bard time seeing the appeal of library games, I appreciated hearing this. (I believe it’s possible, but I think it’s extremely difficult.) Ruby’s slides are here.
Back to UX for staff (and users too, to some extent), Calvin Mah from Simon Fraser University spoke about his experience archiving their ILS when his library moved from Millennium to Alma. Some kinds of information were not migrated at all, but even the records that were migrated were not trusted by cataloguers; they wanted to be able to go back to the old records and compare. With these two situations – missing information plus untrusted information – it was decided to build an archival copy of the old system. I find this interesting. On the one hand, I can absolutely understand wanting to help staff feel comfortable with the new system by letting them know they still have the old information if they need it; the transition can be more gradual. But Calvin noted that even though the information is getting stale, staff are still relying on it. So perhaps it’s more of a security blanket, and that’s not good. Also, there was a good library nerd laugh when he said that some staff wanted the archival copy to behave like the old system: “Respect the 2nd indicator non-filing characters skip!”
Something I see as having both staff and user UX implications is having contract work in library systems (probably everywhere, but in systems for sure). Bobbi Fox from Harvard has been on many sides of this situation (as a contractor, as a person hiring the contractor, as a team member, as a person cleaning up after a contractor) and detailed many things to consider before, during, and after contract work in library IT. Too often, contract work results in projects that are difficult to maintain after the contractor has gone, if they are even completed at all. I really like that she specifically mentioned thinking about who is providing user support for the thing(s) the contractor is building, as separate from who is going to own/maintain the project going forward. And in talking about documentation, specifying what documentation those user support people need in order to be able to support the users. This will almost always be different documentation that what is required for maintenance. Good docs are vital for maintenance but if people can’t use the thing, there’s not much point in maintaining it!
Nearing the end of the first day was a panel: “When the Digital Divides Us: Reconciling Emerging and Emerged Technologies in Libraries” that looked at disconnects that can happen on both the staff side and the user side when libraries favour emerging (“shiny”) technology. I thought there were some great points made. Monica Rettig at Brock University talked about issues when access services staff are expected to help troubleshoot technology problems; for staff used to a transactional approach to service, with a heavy reliance on policy and procedures, there is a big cultural shift in moving to a troubleshooting approach. Rebecca Laroque from North Bay Public Library wondered about providing 3D printers while she still has users asking for classes on how to use email. Monica noted the importance of core services to users even though they’re aren’t shiny or new; she asked who will be the champion for bathrooms or printers in the library? Krista Godfrey from Memorial University asked whether library technology should be evaluate and assessed in the same way that library collections are? Lots of questions here, but definitely an agreement that a focus on core infrastructure and services may not be exciting but it’s absolutely vital.
Day 2 was a bit lighter on the UX side. Tim Ribaric gave a great presentation on RA21 and the possible implications of it replacing IP authentication for access to electronic resources in libraries. Tim is skeptical about RA21 and believes it is not good news for libraries (one of his theorems about RA21: “We are effed”). His take was very compelling, and from a UX perspective, he is not convinced there is a clear way forward for walk-in users of academic libraries (i.e. users not affiliated with the university or college) to access our subscription-based electronic resources if we move from IP authentication to RA21. I know some academic libraries explicitly exclude walk-in users, but others are mandated to provide access to the general public so we are used to providing guest access and our users are used to having it. Tim has posted his slides if you’re interested in more on this.
Another interesting UX moment was in Autumn Mayes’ lightning talk about working in Digital Scholarship and Digital Humanities. Part of her job had been working in The Humanities Interdisciplinary Collaboration (THINC) Lab at the University of Guelph. THINC Lab is a members-only space aimed at grad students, postdocs, faculty, etc. who are doing interdisciplinary and digital humanities research. However, they also host events and programs that are open to the larger university population. So Autumn found herself having to tell non-members that they weren’t allowed to use the space, but at the same time was trying to promote events and programs to both members and non-members. She very succinctly described this as “Get out! But come back!” It’s interesting to think about spaces that are alternately exclusionary and open; what is the impact on users when you make a mostly exclusionary space occasionally welcoming? What about when a mostly welcoming space is occasionally exclusionary?
Bill Jones and Ben Rawlins from SUNY Geneseo spoke about their tool OASIS (Openly Available Sources Integrated Search), aimed at improving the discovery of Open Educational Resources (OER) for faculty at their campus and beyond. The tool allows searching and browsing of a curated collection of OER (currently over 160,000 records). It seems like a nice way to increase visibility and improve the UX of finding OER such as open textbooks.
Again in library staff UX, May Yan and MJ Suhonos from Ryerson University talked about how library-specific technologies can be difficult to use and adapt, so they decided to use WordPress as a web platform for a records management project in their library. One thing I found interesting was that the Ryerson library had a Strategic Systems Requirements Review that explicitly says that unless library-specific technology has a big value-add, the preference should be to go outside library technology for solutions. From a UX point of view, this could mean that staff spend less time fighting with clunky library software, both using it and maintaining it.
The last conference presentation of Day 2 reported on the results of UX testing of Open Badges in an institutional repository. Christie Hurrell from the University of Calgary reported that her institution uses quite a number of Open Badges. For this project, the team wondered whether having an Open Badge that demonstrated compliance with an Open Access policy would encourage faculty to deposit their work in the institutional repository. They did a survey, which didn’t show a lot of love for Open Badges in general. Then they did some user testing of their IR (DSpace), to find out whether faculty would add an Open Badge to their work if the option was there. Unfortunately, the option to add an Open Badge was completely lost in the overall process to deposit a work in the IR, which faculty found extremely time-consuming. Since faculty were frustrated with the process in general, it is very unlikely that an Open Badge would provide an incentive to use the IR again.
The conference ended with the Dave Binkley Memorial Lecture, given this year by Monique Woroniak. Monique spoke about “Doing the Work: Settle Libraries and Responsibilities in a Time of Occupation” where the Work is what non-Indigenous people and organizations need to do before trying to work with Indigenous people and organizations. She gave some clear guidelines on, essentially, how to act with empathy and these guidelines can apply to many communities. However, I definitely don’t want to “all lives matter” this. Monique was clearly speaking about Indigenous people, and specifically about her experiences with Indigenous people in Winnipeg. When she spoke of the importance of assessing our capacity before undertaking new work, she included the capacity to build respectful relationships with Indigenous people. Although it can definitely be argued that a capacity to build respectful relationships is useful for UX work, her caution to never over-promise and under-deliver when working with Indigenous people is situated in the Canadian context of settlers over-promising and under-delivering time and time and time again. Sure, we’ll respect this treaty. Sure, we’ll take care of your children. Of course we’re ready for reconciliation. Over-promising and under-delivering is never a great move, but in this context it is particularly toxic. A few other things that stood out for me in Monique’s talk:
Listen to the breadth of opinions in the community. Take the time.
This is head work and heart work, and, especially, long-haul work.
Look to shift the centre of power for not just the big decisions, but the small as well.
The award recognizes superior student writing and is intended to enhance the professional development of students. The manuscript can be written on any aspect of libraries and information technology. Examples include, but are not limited to, digital libraries, metadata, authorization and authentication, electronic journals and electronic publishing, open source software, distributed systems and networks, computer security, intellectual property rights, technical standards, desktop applications, online catalogs and bibliographic systems, universal access to technology, and library consortia.
We’re excited to announce the availability of Zotero integration with Google Docs, joining Zotero’s existing support for Microsoft Word and LibreOffice.
The same powerful functionality that Zotero has long offered for traditional word processors is now available for Google Docs. You can quickly search for items in your Zotero library, add page numbers and other details, and insert citations. When you’re done, a single click inserts a formatted bibliography based on the citations in your document. Zotero supports complex style requirements such as Ibid. and name disambiguation, and it keeps your citations and bibliography updated as you make changes to items in your library. If you need to switch citation styles, you can easily reformat your entire document in any of the over 9,000 citation styles that Zotero supports.
Google Docs support is part of the Zotero Connector for Chrome and Firefox, which adds a new Zotero menu to the Google Docs interface:
It also adds a toolbar button for one-click citing:
When you start using Zotero in a document, you’ll first need to authenticate it with your Google account. You can then begin inserting citations from the Zotero libraries on your computer, just as you can with Word and LibreOffice.
Once you’ve finished your document and are ready to submit it, use File → “Make a copy…” and, in the new document, use Zotero → “Unlink Citations” to convert the citations and bibliography to plain text. You can then download that second document as a PDF or other type of file, while keeping active citations in the original document in case you need to make further changes. Zotero will prompt you to create a copy if you try to download your original document.
Built for Collaboration
Zotero and Google Docs are a perfect combination for people writing together. Zotero groups are a great way to collect and manage materials for a shared project, and Google Docs integration allows you and your coauthors to insert and edit citations in a shared document. Groups are free and can contain an unlimited number of members, so you can collaborate with as many people as you like.
While citing from the same library allows everyone to make changes to items in Zotero and have them reflected in the document, if you don’t want to work from a group, that’s fine too: Zotero can generate correct citations and bibliography entries even for items people add from their own libraries.
Ready to try it out? Open a document in Google Docs and look for the Zotero menu. If you don’t see it, make sure you have Zotero Connector 5.0.42 for Chrome or Firefox.
See our documentation to learn more about using Zotero with Google Docs.
I have been running the Rubyland.news aggregator for two years now, as just a hobby spare time thing. Because I wanted a ruby blog and news aggregator, and wasn’t happy with what was out there then, and thought it would be good for the community to have it.
I am not planning or trying to make money from it, but it does have some modest monthly infrastructure fees that I like getting covered. So I’m happy to report that Ruby Magic has agreed to sponsor Rubyland.news for a modest $20/month for six months.
Ruby Magic is an email list you can sign up for for occasional emails about ruby. They also have an RSS feed, so I’ve been able to include them on Rubyland.news for some time. I find their articles to often be useful introductions or refreshers to particular topics about ruby language fundamentals. (It tends not to be about Rails, I know some people appreciate some non-Rails-focused sources of ruby info). Personally, I’ve been using ruby for years, and the way I got as comfortable with it as I am is by always asking “wait, how does that work then?” about things I run into, always being curious about what’s going on and what the alternatives are and what tools are available, starting with the ruby language itself and it’s stdlib.
These days, blogging, on a platform with an RSS feed too, seems to have become a somewhat rarer thing, so I’m also grateful that Ruby Magic articles are available through RSS feed, so I can include then in rubyland.news. And of course for the modest sponsorship of Rubyland.news, helping to pay infrastructure costs to keep the lights on. As always, I value full transparency in any sponsorship of rubyland.news; I don’t intend it to affect any editorial policies (I was including Ruby Magic feed already); but I will continue to be fully transparent about any sponsorship arrangements and values, so you can judge for yourself (a modest $20/month from Ruby Magic; no commitment beyond a listing on About page, and this particular post you are reading now, which is effectively a sponsored post).
I also just realized I am two years into Rubyland.news. I don’t keep usage analytics (was too lazy to set it up, and not entirely clear how to do that in case where people might be consuming it as an RSS feed itself), although it’s got 156 followers on it’s twitter feed (all aggregated content is also syndicated to twitter, which I thought was a neat feature). I’m honestly not sure how useful it is to anyone other than me, or what people changes people might want; feedback is welcome!
Over 100 people from over 20 countries took part in the founding meeting of the MyData Global nonprofit organisation last Thursday, October 11. The purpose of MyData Global is to empower individuals by improving their right to self-determination regarding their personal data. The human-centric paradigm is aimed at a fair, sustainable, and prosperous digital society, where the sharing of personal data is based on trust as well as a balanced and fair relationship between individuals and organisations.
“We need new ground rules for the use of personal data. We currently live in a world where large companies collect unprecedented amounts of data about people and do as they please with it. This has led to numerous abuses with shocking global consequences. The MyData model is a vision for new and fair practices, design principles, and their implementation. Founding the MyData Global organisation is a huge step in the right direction,” explains MyData researcher and founding member Antti Jogi Poikola.
A milestone reached
Establishing the organisation is the result of several years’ work. And since 2016, the MyData movement has gathered personal data experts and practitioners from all over the globe at its annual conferences in Helsinki, Finland. The movement has self-organised into a network of over 20 local hubs spread over six continents, which all work together to further the cause of digital human rights in different domains of society. The MyData Global organisation formalises this network and continues the work of influencing the development of digital markets to better respect the rights of individuals.
“The time is now ripe for an organisation that seeks to enable a fairer and more balanced digital society globally. Personal data has enormous potential for making our lives easier and our societies better. Used in a way that is respectful of individuals and the standards of fairness, personal data also creates limitless opportunities for successful business. A fair trade logo on a packet of coffee is a familiar and trusted guarantee that the coffee is responsibly sourced and also a reason to favour it. Why do we not ask for the same kind of guarantee that the applications and services we use are based only on personal data that is responsibly and transparently acquired and treated with respect,” asks MyData activist and founding member Viivi Lähteenoja.
The first general meeting of MyData Global will be held on 15 November 2018 in Barcelona, Spain. During the meeting, a full board of directors will be elected. The meeting is open to all and remote participation is available. Application for membership is now open to individuals and organisations.
The theme for the 2018 LITA Library Technology Forum explores Building & Leading. What are you most passionate about in librarianship and information technology? This conference is your chance to share how your passions are building the future and learn how to expand them further.
This year we’ve crafted a set of programs that diverge from the normal library conference model. Instead of only hour-long blocks, with experts on stages and slides at their back, we have planned a more interactive, hands-on-focused conference with more choices and different ways for people to interact. This mix of opportunities will challenge participants in what we hope are new and different ways.
Dr. AnnMarie Thomas and the Playful Learning Lab from the University of St. Thomas will be offering two workshops:
OK Go Sandbox: Using the videos of OK Go to engage learners with art, music, and STEM. OK Go Sandbox is a new website for educators which has content related to OK Go’s music videos. New videos, educators’ guides, and hands-on activities allow learners to explore sound, sensors, art, math, and more. (Al Gore even thinks it’s a great idea!) During this session we’ll jump into the sandbox and try out some of the activities and discuss ways that they can be used in libraries. Check out their videos!
Playtime: Not Just for Kids
The Playful Learning Lab at the University of St. Thomas is an interdisciplinary research group that works at the intersection of art, technology, and education. Whether developing Squishy Circuits, helping chefs create interactive pastry, developing a Circus Science curriculum, or helping the band OK Go build a new platform for educators, collaboration is at the heart of our work. In this session, we will look at how “play” is at the heart of all that we do, and why you should consider a playful approach for your own organization. Check out AnneMarie’s TED Talk!
Matthew Battles and Jessica Yurkofsky from Harvard University will be offering a workshop on:
MetaLab – Library Kitchen
Wifi-proof booths; study carrels for napping; digital campfires for charging devices convivially—in the “Library Test Kitchen” seminar at the Graduate School of Design, metaLAB (at) Harvard has been exploring participatory innovation for libraries through fun, creative, improvisatory projects. And what’s a test kitchen without recipes? In this session, members of metaLAB will be on hand to demo their new platform for sharing such “recipes” for playful innovation in libraries—and to invite participants to contribute recipes of their own.
[This blog entry is written to accompany the release of University Futures, Library Futures: Aligning library strategies with institutional directions. This is a collaboration between Ithaka S+R and OCLC Research, and is supported by the Andrew W. Mellon Foundation. There is a companion blog entry by Deanna Marcum and Roger Schonfeld. The report looks at … Continue reading University Futures are shaping Library Futures→
forecasts that the amount of data generated globally will reach 44 zettabytes (ZBs) in 2020 and 163 ZBs in 2025. Even the estimates are increasing, as earlier it was forecast to be 35 ZBs in 2020 instead of 44.
And on Seagate's marketing materials based upon it:
Seagate ... subscribes to IDC’s estimate that around 13 ZBs of 44 ZBs generated in 2020 would be critical and should be stored. ... Seagate also anticipates that the storage capacity available in 2020 will not be able to fulfill the minimum required storage demand, and will lead to a data-capacity gap of at least 6 ZBs
Unfortunately, Bhat does not cite or seem to have read my 2016 post Where Did All Those Bits Go? in which I point out a number of flaws in IDC's reports, and in the analyses such as Seagate's based on them. The most important of these flaws is the implicit assumption that the demand for storage is independent of the price of storage:
Seagate ... subscribes to IDC’s estimate that around 13 ZBs of 44 ZBs generated in 2020 would be critical and should be stored. ... the storage capacity available in 2020 will not be able to fulfill the minimum required storage demand, and will lead to a data-capacity gap of at least 6 ZB
Note the lack of any concept of the price of storing the 13ZB. Since it is evident that neither IDC nor Seagate nor Baht believe that the 6ZB of additional media "required" would be available at any price, something has to give. But what?
In practice big, and indeed any, data storage user compares a prediction of the value to be realized by storing the data with the cost of doing so. Data whose potential value does not justify its storage doesn't get stored, which is what will happen to the 6ZB.
IDC, Seagate and Bhat suffer from the collision of two ideas, both of which are wrong:
The "Big Data" hype, which is that the value of keeping everything is almost infinite.
The "storage is free" idea left over from the long-gone days of 40+% Kryder rates.
If storage is free and the value to be extracted from stored data is non-zero, of course the extra 6ZB "should be stored". But:
and the value to be extracted from some data will be a lot more than others. The more valuable data is more likely to be stored than the less valuable. Typically, the value of data decays with time, in most cases quite rapidly. Another flaw in the IDC analysis is that there is no concept of how long the data is to be stored, and thus how quickly the media storing it can be re-used. Again, the more valuable data is likely to be stored longer than the less valuable.
Thus the gap to which Bhat refers is between what data centers would store if storage were free, and what they will store given the actual cost of storing it. This gap would only be something new or unexpected if storage were free. This hasn't been the perception, let alone the reality, since very early in the history of Big Data centers.
While continuing to add the finishing touches to the Watch Folders – I realized I needed to provide a couple of additional contextual notes.
Watch Folder Behavior
C# provides a couple of different methods to make this work. Probably the most straight forward is what is called a FileSystemWatcher event that can be tagged to a specific path. This is how I originally considered implementing this functionality – but it caused some very specific problems.
First – I would like users to keep access to their original source data. In the process, the tool will move data from the watched folder to: [watched_folder]\_originals\[filename]. The moving of the items into the _originals folder caused the FileSystemWatcher event to fire again. This was causing a lot of confusion and extra event processing.
Second – This made it very difficult to thread the processing. Within this event – once the event was thrown, it would kick off the evaluation process that, being tied to an event on the GUI, was a little more difficult to segregate into its own space (you can do it, but it makes tracking the process more difficult)
Third – these events stay active for the file of the service. I’m assuming that once running, users may add new watchers or edit existing ones. In this model, those changes wouldn’t occur until the service was restarted. This seems undesirable.
To address these issues, I’ve implemented a timer. The tool will re-evaluate all Watcher Criteria and then re-evaluate folders/files every 15 minutes or schedule all watchers to be run at a specific time (this seems like the best option in most cases). Once the timer fires, if a watcher is run, the timer will pause and a flag will be set in the program that will notify Windows that an active process is occurring. If you try to restart windows while this is happening, you’ll get the window that shows a program is in process and will ask you to wait for the completion. The watcher will then push all watched processes into a thread pool. This way, 3 watchers can be run at any time. Threads will be given low priority – this way, Windows and the CPU will know speed isn’t important and that other operations can be prioritized. The idea is that this will keep the service from impacting general system performance (this works in testing – interested to see when it gets to a wider install base and system types). After the watchers complete, the system is notified that the program is no longer busy (so if you are restarting, it will be released to continue, if not – it will ensure you never see the message) and will then re-enable the timer to run either at a specific interval or time (depending on your settings).
How will I know things have happened
If you are running on Windows 10 – the Watcher service has been tied into the notification/toast system. So you’ll see messages show up in the notifications area:
If you have an older version of Windows, these will show up as ballon tips that will display when a watcher has completed a task. As I implement this on the Mac side – I’m hoping to push results into the notification/toast area.
Additionally, if you right click on the MarcEdit Service Icon, there will be an option to load the log file. This will provide a log of each file processed and at what time. Watcher logs will rotate every 24 hours, and will be purged every 5 days. If there is an interest having these archive (zip) rather than delete, I can look at adding that functionality.
This is the tricky part. MarcEdit cannot update while the Watcher is running. So, one of the things that will needed to be added to the installer (and this is proving to be tricky) is a messaging tool that can tell the Watcher when it needs to shutdown, and a listener that allows the Watcher to tell the installer to wait until a process has completed. This is the last piece that needs to be completed before it can be made available.
Individual Awards: George Edward McCain
Organization Award: Texas Digital Library
Project Award: UC Guidelines for Born-Digital Archival Description
Educator Awards: Heather Moulaison Sandy
Future Steward Award: Raven Bishop
These awards highlight and commend creative individuals, projects, organizations, educators, and future stewards demonstrating originality and excellence in their contributions to the field of digital preservation.
The awardees will be recognized publicly during NDSA’s Digital Preservation 2018 during the Opening Plenary on Wednesday, October 17. Please join us in congratulating them for their hard work! Each of the winners will be interviewed later this year, so stay tuned to learn more about their work on our blog.
As the Digital Curator of Journalism and founder of the Journalism Digital News Archive (JDNA), George Edward McCainhas been and is a leading voice and passionate advocate for saving born digital news. He has advanced awareness and understanding of the crisis we face through the loss of the “first rough draft of history” in digital formats. In collaboration and with support from colleagues and community members, he has led the “Dodging the Memory Hole” outreach agenda. Thus far, five “Memory Hole” forums have brought together journalists, editors, technologists, librarians, archivists, and others who seek solutions to preserving born-digital news content for future generations. By bringing together thought leaders in the news industry and information science, the forums have broadened the network of stakeholders working on this issue and helped these communities gain critical insight on the challenges and opportunities inherent in preserving content generated by a diverse array of news media, both commercial and non-profit.
Edward McCain would like to thank Dorothy Carner, Ann Riley, Jim Cogswell, Mike Holland, Jeannette Pierce, Randy Picht, Katherine Skinner, Peter Broadwell, Todd Grapone, Sharon Farb, Martin Klein, Brewster Kahle, Mark Graham, Jefferson Bailey, Brian Geiger, Anna Krahmer, Senator Roy Blunt and his staff, Clifford Lynch, Martin Halbert, Jim Kroll, Leigh Montgomery, Eric Weig, Frederick Zarndt, The Institute for Museum and Library Services, The Mizzou Advantage, and last but not least, his wife, Rosemary Feraldi.
The Texas Digital Library(TDL) is a consortium of Texas higher education institutions that builds capacity for preserving, managing, and providing access to unique digital collections of enduring value.
Accepting the award on behalf of TDL is Kristi Park. For nearly a decade, Kristi Park has led consortial Open Access and digital preservation initiatives at the state and national levels. The Executive Director of the Texas Digital Library (TDL) since 2015, Kristi oversees a portfolio of collaboratively built and managed services that enable sharing and preserving scholarship and research data. During her tenure, the Texas Digital Library has launched a statewide repository for sharing and managing research data, joined the Chronopolis digital preservation network, and grown its membership to 22 institutional members. Kristi joined the Texas Digital Library in 2009, serving in various marketing and communications roles before becoming executive director. Prior to TDL she worked in private industry as a researcher, writer, and editor for business and educational publishers. A native Texan with deep roots in the state, she earned her bachelor’s degree in English from Texas A&M University and a master’s degree in English from the University of Texas at Austin.
The UC Guidelines for Born-Digital Archival Description are a significant step in breaking down one of the biggest obstacles to making born-digital content accessible: its description. With standards for describing born-digital content, archivists and other professionals can more clearly communicate the quality, quantity, and usability of digital material to users. The UC Guidelines were the result of intensive research by a large group of practitioners and content experts who analyzed existing descriptive standards, emerging best practices for born digital materials, and archivists’ practical considerations. The resulting UC Guidelines are a comprehensive resource presented in simple terms, expanding accessibility beyond advanced professionals to include a wide range of practitioners. This project embodies a creative and inclusive approach to problem solving: tackling a hyper-local problem while contributing to larger discussions about widely shared challenges. The mapping to DACS, MARC, and EAD allows other institutions to easily incorporate the UC standards into their own. The guidelines are also useful for institutions new to born-digital descriptive practices and for graduate students learning how to write and compose finding aids.
The most up-to-date version of the UC Guidelines for Born-Digital Archival Description can be found in GitHub.
In addition, the UC Guidelines for Born-Digital Archival Description have been preserved and made permanently accessible in eScholarship, a service of the California Digital Library that provides scholarly publishing and repository services for the University of California community. The permalink to this paper series can be found on eScholarship.
Heather Moulaison Sandy is Associate Professor at the iSchool at the University of Missouri and works primarily at the intersection of the organization of information and the online environment. She studies metadata in multiple contexts, including those that support long-term preservation of digital information, as well as its access and use; she is co-author on a book on digital preservation, now in its second edition. Moulaison Sandy currently teaches classes in Digital Libraries, Metadata, Organization of Information, and Scholarly Communication. Moulaison Sandy holds a PhD in Information Science from Rutgers and an MSLIS and MA in French, both from the University of Illinois at Urbana-Champaign.
Future Steward Award
Raven Bishop is recognized for her work as Instructional Technologist on Washington College’s Augmented Archives project. This collaborative work has helped leverage emerging technologies to increase access to and engagement with primary source materials in Washington College’s Archives & Special Collections, as well as exploring ways to solve the sustainability problems institutions face in using end-user platforms to create AR content. A co-founder of the project, Raven served as resident Augmented Reality (AR) expert and visual arts educator, guiding the pedagogical considerations of the project, serving as the principal developer of the Pocket Museum app prototype, and overseeing the creation of the resource website. We would also like to make a special acknowledgement to Raven’s colleague and collaborator, Heather Calloway, for her work as Archivist and Special Collections Librarian and co-founder of the Augmented Archives project.
The annual Innovation Awards were established by the NDSA to recognize and encourage innovation in the field of digital preservation stewardship. The program is administered by a committee drawn from members of the NDSA Innovation Working Group. Learn more about the 2012, 2013, 2014, 2015, 2016, and 2017 Award recipients.
New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.
Student registration rate available – 50% off registration rate – $180
This offer is limited to graduate students enrolled in ALA-accredited programs and includes all the conference provisions of a full registration, a real bargain. In exchange for the lower registration cost, these graduate students will be asked to assist LITA organizers and Forum presenters with onsite operations. This is a great way to network and meet people active in the field.
The selected students will be expected to attend the full LITA Forum, all day Friday and Saturday. Attending the preconferences on Thursday afternoon is not required. While you will be assigned a variety of duties, you will also be able to attend Forum programs, which include 2 keynote sessions, more than 60 concurrent sessions, and a variety of social activities.
To apply for the student registration rate, complete and submit the form by Wednesday, October 24. You will be asked to provide the following:
Contact information, including email address and cell phone number
Name of the school you are attending
Statement of 150 words (or less) explaining why you want to attend the LITA Forum
Those selected to be volunteers registered at the discounted rate will be notified no later than Friday, October 26.
Islandoracon is coming back! The third iteration of our conference will be held October 7 - 11th, 2019 in Vancouver, British Columbia, Canada. The main conference will be hosted by Islandora Foundation Partner Simon Fraser University, at their downtown location in Harbour Centre. Stayed tuned to this blog and our listserv for updates about registration, a Call for Proposals, and more.
Thanks also to our Sprint Coaches, Andrea Bollini (4Science) and Art Lowell (Atmire). They not only help lead/plan the development of the DSpace 7 REST API and Angular UI, but also act as coaches/teachers during each of our community sprints.
Here are a few interesting facts about this Sprint. This sprint saw the largest number of institutions participating (11 total, counting DuraSpace). We had six community developers who returned after participating in a previous sprint. It’s wonderful to see each of them return and continue to learn and contribute to DSpace 7! Germany, as a country, outnumbered all others with five total participants (from 5 different institutions)! I’m glad to see the enthusiasm for DSpace 7 carried over from the German DSpace User Group meeting (DSpace Anwendertreffen 2018) in September.
Summary of Sprint Activities
During the two-week sprint, seven Pull Requests (PRs) were developed, approved & merged into DSpace 7, with another seven PRs still in-progress. On the Angular UI side, we saw new translations of the existing UI (Czech, Dutch and German) along with bug fixes, enhancements and an early mockup of administrative menus. On the REST API side, we saw the creation of new endpoints (for creating communities and groups), enhanced Submission functionality, OpenSearch support and numerous bug fixes.
The dates for our next sprint are still being determined. However, even between sprints, we welcome anyone to join the DSpace 7 development effort. We hold a weekly, Thursday meeting and have active Slack channels for DSpace 7 UI (#angular-ui) and DSpace 7 REST API (#rest-api) discussions. We also maintain a list of “easy” tickets which new developers can claim to learn about DSpace 7 and/or the contribution process. More information can be found at: https://wiki.duraspace.org/display/DSPACE/DSpace+7+Working+Group
Thanks again to all those who’ve contributed to DSpace 7 (in sprints or between sprints). Your support helps make DSpace 7 even better!
I’ve been spending a lot of time working on a new feature for MarcEdit 7 – Watch Folders. The idea behind this concept is that users may have files that they get regularly from vendors, staff, etc. that are part of a specific workflow. Rather than having to open each file into MarcEdit in order to process the data – the idea behind watch folders is that MarcEdit will essentially “watch” particular folders and then process the data according to the criteria set forth. Right now, this criteria is primarily tasks that can be run against a file – but as I’m working on the first release, I might try to extend this option beyond tasks to character and format conversions (though we’ll have to see).
Eventually, folder watchers will be enabled through the preferences. Users will enable the watcher, and the program will automatically place into the system a command that will restart MarcEdit’s watcher service any time the system starts. For now, while I’m working on testing, Folder Watchers will be enabled via the Help menu:
When a Watcher is enabled – MarcEdit initiates the watcher service. While testing, this service must be started manually. Once testing completes, users will be allowed to enable the watcher to start automatically.
When the watcher starts, users will see the following in their icon bar:
Right clicking on this icon will show three menu items:
Notifications [all notifications created during a session are saved]
Settings [Open the application preferences]
Exit [Stop the service]
Configuring a Watcher
Watchers will be configured through the Preferences. To configure a watcher, you enter the Preferences window and select the Configure Watcher link.
If you select this option, you will see the following options:
Any Watchers that are configured, will be in the list. From there, you can delete or edit a watcher. Notice the Enable Watch Folder Service – eventually, this option will cause the program to install the service to autostart. During testing, this option won’t do anything.
Adding a Watcher, you see the following options:
Couple things to notice —
Folders to process can process all data, specific files matching a pattern, and folders and subfolders.
Currently, Tasks are the only attached processes – but I’m looking at adding others.
Once set, you save the file and it will save into the Preferences and is loaded for use if the watcher service is enabled.
Processed File Behavior
The Watcher service will evaluate the designated Folder every 15 minutes. If data is found matching one of the criteria specified in a watcher, the program will:
Move the file to be processed into the [Folder]\_originals folder. This folder is special, and will *not* be processed by the watcher and is where data is stored so the user still has access to the original source file.
Finished file is moved to the finished folder + filename[yyyy_mm_dd_hhmmss].[extension]
Notifications will be logged and displayed as a tooltip in over the MarcEdit Service Icon.
Work still to do:
As I was testing this afternoon, I realized a couple things need to be worked out.
What happens when a user restarts their computer while a file is being processed. I can submit a message to Windows to cause the system to ask the user if they want to restart now or wait till the process is completed. I can also intercept the restart message, kill a process, and move the file being processed back into the to process folder. Still working on what this behavior should be.
Installing MarcEdit updates requires the application and service to stop. This is currently a problem that I need to fix before moving forward.
If you have questions about this work, let me know.
Over the last year, I was fortunate to help guide a study of the news consumption habits of college students, and coordinate Northeastern University Library’s services for the study, including great work by our data visualization specialist Steven Braun and necessary infrastructure from our digital team, including Sarah Sweeney and Hillary Corbett. “How Students Engage with News,” out today as both a long article and accompanying datasets and media, provides a full snapshot of how college students navigate our complex and high-velocity media environment.
This is a topic that should be of urgent interest to everyone since the themes of the report, although heightened due to the more active digital practices of young people, capture how we all find and digest news today, and also points to where such consumption is heading. On a personal level, I was thrilled to be a part of this study as a librarian who wants students to develop good habits of truth-seeking, and as an intellectual historian, who has studied changing approaches to truth-seeking over time.
“How Students Engage with News” details how college students are overwhelmed by the flood of information they see every day on multiple websites and in numerous apps, an outcome of their extraordinarily frequent attention to smartphones and social media. Students are interested in news, and want to know what’s going on, but given the sheer scale and sources of news, they find themselves somewhat paralyzed. As humans naturally do in such situations, students often satisfice in terms of news sources—accepting “good enough,” proximate (from friends or media) descriptions rather than seeking out multiple perspectives or going to “canonical” sources of news, like newspapers. Furthermore, much of what they consume is visual rather than textual—internet genres like memes, gifs, and short videos play an outsized role in their digestion of the day’s events. (Side note: After recently seeing Yale Art Gallery’s show “Seriously Funny: Caricature Through the Centuries,” I think there’s a good article to be written about the historical parallels between today’s visual memes and political cartoons from the past.) Of course, the entire population faces the same issues around our media ecology, but students are an extreme case.
And perhaps also a cautionary tale. I think this study’s analysis and large survey size (nearly 6,000 students from a wide variety of institutions) should be a wake-up call for those of us who care about the future of the news and the truth. What will happen to the careful ways we pursue an accurate understanding of what is happening in the world by weighing information sources and developing methods for verifying what one hears, sees, and reads? Librarians, for instance, used to be much more of a go-to source for students to find reliable sources of the truth, but the study shows that only 7% of students today have consulted their friendly local librarian.
It is incumbent upon us to change this. A purely technological approach—for instance, “improving” social media feeds through “better” algorithms—will not truly solve the major issues identified in the news consumption study, since students will still be overwhelmed by the volume, context, and heterogeneity of news sources. A more active stance by librarians, journalists, educators, and others who convey truth-seeking habits is essential. Along these lines, for example, we’ve greatly increased the number of workshops on digital research, information literacy, and related topics at Northeastern University Library, and students are eager attendees at these workshops. We will continue to find other ways to get out from behind our desks and connect more with students where they are.
Finally, I have used the word “habit” very consciously throughout this post, since inculcating and developing more healthy habits around news consumption will also be critical. Alan Jacobs’ notion of cultivating “temporal bandwidth” is similar to what I imagine will have to happen in this generation—habits and social norms that push against the constant now of social media, and stretch and temper our understanding of events beyond our unhealthily caffeinated present.
Share your technology knowledge with a LITA Education proposal!
The Library Information Technology Association (LITA) invites you to share your expertise with a national audience!
Submit a proposal by November 2nd, 2018
to teach a webinar, webinar series, or online course for Spring 2019.
We seek and encourage submissions from underrepresented groups, such as women, people of color, the LGBTQ+ community, and people with disabilities.
All topics related to the intersection of technology and libraries are welcomed. Possible topics include, but are not limited to:
Privacy and analytics
Ethics and access
Augmented and virtual reality
Tech design for social justice
Diversity in library technology
Collection assessment metrics beyond CPU
Government information and digital preservation
Instructors receive a $500 honorarium for an online course or $150 for a webinar, split among instructors. View our list of current and past course offerings to see what topics have been covered recently. We will contact you no later than 30 days after your submission to provide feedback.
We’re looking forward to a slate of compelling and useful online education programs for 2019!
Questions or Comments?
For all other questions or comments related to LITA continuing education, contact us at (312) 280-4268 or firstname.lastname@example.org
Software Heritage is an active project that has already assembled the largest existing collection of software source code. At the time of writing the Software Heritage Archive contains more than four billion unique source code files and one billion individual commits, gathered from more than 80 million publicly available source code repositories (including a full and up-to-date mirror of GitHub) and packages (including a full and up-to-date mirror of Debian). Three copies are currently maintained, including one on a public cloud.
As a graph, the Merkle DAG underpinning the archive consists of 10 billion nodes and 100 billion edges; in terms of resources, the compressed and fully de-duplicated archive requires some 200TB of storage space. These figures grow constantly, as the archive is kept up to date by periodically crawling major code hosting sites and software distributions, adding new software artifacts, but never removing anything. The contents of the archive can already be browsed online, or navigated via a REST API.
Software, and in particular open source software is just as much a cultural production as books, music, movies, plays, TV, newspapers, maps and everything else that research libraries, and in particular the Library of Congress, collect and preserve so that future scholars can understand our society.
I'm very disappointed that national libraries haven't accepted this argument, let alone the argument that preservation and access to their other digital collections largely depend on preserving and providing access to open source software. Since they have failed in this task, it is up to the Software Heritage Foundation to step into the breach.
A quotidian concern of anybody responsible for a database is the messy data it contains. See a record about a Pedro GonzÃ¡lez? Bah, the assumption of Latin-1 strikes again! Better correct it to González. Looking at his record in the first place because you’re reading his obituary? Oh dear, better mark him as deceased. 12,741 people living in the bungalow at 123 Main St.? Let us now ponder the wisdom of the null and the foolishness of the dummy value.
Library name authority control could be viewed as a grand collaborative data cleanup project without having to squint too hard.
What of the morality of data cleanup? Let’s assume that the data should be gathered in the first place; then as Patricia Hayes noted back in 2004, there is of course an ethical expectation that efforts such as medical research will be based on clean data: data that has been carefully collected under systematic supervision.
Let’s consider another context: whether to engage in batch authority cleanup of a library catalog. The decision of whether it is worth the cost, like most decisions on allocating resources, has an ethical dimension: does the improvement in the usefulness of the catalog outweigh the benefits of other potential uses of the money? Sometimes yes, sometimes no, and the decision often depends on local factors, but generally there’s not much examination of the ethics of the data cleanup per se. After all, if you should have the database in the first place, it should be as accurate and precise as you can manage consistent with its raison d’être.
Now let’s consider a particular sort of database. One full of records about people. Specifically, a voter registration database. There are many like it; after all, at its heart it’s just a slightly overgrown list of names and addresses.
An overgrown list of names of addresses around which much mischief has been done in the name of accuracy.
This is on my mind because the state I live in, Georgia, is conducting a gubernatorial election that just about doubles as a referendum on how to properly maintain a voter registration list.
On the one hand, you have Brian Kemp, the current Georgia secretary of state, whose portfolio includes the office that maintains the statewide voter database and oversees all elections. On other hand, Stacey Abrams, who among other things founded the New Georgia Project aimed at registering tens of thousands of new voters, albeit with mixed results.
Is it odd for somebody to oversee the department that would certify the winner of the governor’s race? The NAACP and others think so, having filed a lawsuit to try to force Kemp to step down as secretary of state. Moreover, Kemp has a history of efforts to “clean” the voter rolls; efforts that tend to depress votes by minorities—in a state that is becoming increasingly purple. (And consider the county I live in, Gwinnett County. It is the most demographically diverse county in the southeast… and happens to have the highest rate of rejection of absentee ballots so far this year.) Most recently, the journalist Greg Palast published a database of voters purged from Georgia’s list. This database contains 591,000 names removed from the rolls in 2017… one tenth of the list!
A heck of a data cleanup project, eh?
Every record removal that prevents a voter from casting their ballot on election day is an injustice. Every one of the 53,000 voters whose registration is left pending due to the exact match law is suffering an injustice. Hopefully they won’t be put off and will vote… if they can produce ID… if the local registrar discretion leans towards expanding and not collapsing the franchise.
Dare I say it? Data cleanup is not an inherently neutral endeavor.
Sure, much of the time data cleanup work is just improving the accuracy of a database—but not always. If you work with data about people, be wary.
In May 2018, the DLF community nominatedthirteen inspiring projects, teams, and people for our biggest honor, the DLF Community/Capacity Award—a biennial award that honors constructive, community-minded efforts to build collective capacity in digital libraries and allied fields. The award, which includes a $1,000 prize to the winner, promotes the work of those whose efforts contribute to our ability to collaborate across institutional lines and work toward something larger, together.
Each DLF member institution had one vote to cast, and at the2018 DLF Forum in Las Vegas, Nevada, the winner was announced: Documenting the Now—a project which develops tools and builds community practices that support the ethical collection, use, and preservation of social media content.
Documenting the Now responds to the public’s use of social media for chronicling historically significant events as well as demand from scholars, students, activists, and archivists, among others, seeking a user-friendly means of collecting and preserving this type of digital content. In the first phase of the project, the University of Maryland, University of California, Riverside, and Washington University in St. Louis collaborated to build a set of tools to help researchers work with Twitter data, which has helped draw together a community to raise the visibility of discussions related to the ethics of collecting digital content. The suite of tools will enable communities to research and preserve digital content in conscientious and thoughtful ways.
The $1,000 prize that accompanies the award will be donated to Josh Williams, a protester from the Ferguson uprisings of 2014 who was sentenced to 8 years in prison.
If you’re interested in learning more and keeping up with the project, you can do so via DocNow’s Medium and Twitter accounts or their Slack channel.
With thanks to the Documenting the Now team, and all thirteen nominees for their capacity-building work:
Digital Library of the Caribbean
Documenting the Now
Dr. Melissa Nobles and Professor Margaret Burnham
Metropolitan New York Library Council (METRO) / Studio599
Santa Barbara Statement on Collections as Data
South Asian American Digital Archive
DLF will seek nominations for the next Comm/Cap Award in spring of 2020.
TL;DR: As part of reinvigorating our OpenGLAM (Open Galleries, Libraries, Archives and Museums) community, we’re evaluating the OpenGLAM principles: fill out this survey and get involved.
Several months ago, community members from Wikimedia, Open Knowledge International and Creative Commons reinvigorated the “OpenGLAM” initiative. OpenGLAM is a global network of people and organizations who are working to open up content and data held by Galleries, Libraries, Archives and Museums. As a community of practice, OpenGLAM incorporates ongoing efforts to disseminate knowledge and culture through policies and practices that encourage broad communities of participation, and integrates them with the needs and activities of professional communities working at GLAM institutions.
One of our first steps was to revitalize the @openglam twitter account, inviting contributors from different parts of the world to showcase and highlight the way in which “OpenGLAM” is being understood in different contexts. So far, the Twitter account has had contributors from Africa, Asia, Latin America, the Middle East, North America & Europe. Anyone can become a contributor or suggest someone to contribute by signing up through this form. If you want to see the content that has been shared through the account, you can check the oa.glam tag in the Open Access Tracking Project.
Now, as we move forward in planning more activities, we want to check on the continued impact of the Open GLAM Principles. Since their publication in 2013, the Open GLAM principles offered a declaration of intention to build a community of practice which helps GLAMs share their collections with the world
In the last five years, the OpenGLAM community has become more global, adopted more tactics and strategies for integrating openness into institutions. But do the principles reflect this change?
To find out, we’re inviting people to fill in a survey about the utility of the principles. We want to understand from the broader community: Are you aware of the principles? Are they still relevant or useful? Do you use them in your institutional or local practice? What opportunities are there to improve them for the future?
The survey will run until 16th November. Your participation is greatly appreciated! To get involved with the Open GLAM working group, you can join us through email@example.com
As foreshadowed in my last post, I've now stopped publishing with Ghost and moved to the static site generator Eleventy. I plan to write about some of the more tehcnical aspects of how I did this in a future post, but for this one I want to explain broadly what I've done, and why.
The 'classic' case for, and explanation of, modern static site generators was made by Matt Biilmann in Smashing Magazine back in 2015. For Biilmann, it's mostly about performance, but for me it's more about control and a sense that publishing on the web has become far more complicated than it needs to be. As readers of my Marginalia series may have noticed, I've been reading and listening to a lot of stuff recently about minimal technology, and a 'brutalist' approach to web design. Publishing with a static site generator allows me to control to a much greater degree what's actually getting published, and removes a bunch of technology from the stack needed to get it onto your screen. This means that there are fewer points of failure and less things needing to be patched or upgraded.
Ironically, whilst systems like WordPress and Ghost require more configuration and maintenance than a static site, they also provide less control over the things that matter to me. Some static site generators will be more flexible than others in this regard, but part of the reason I settled on Eleventy is that it provides a very large degree of flexibility and customisation - Zach Leatherman has done a good job of ensuring his system doesn't try to do everything itself, filling a similar niche to Metalsmith but a bit less intimidating. There were four specific things I wanted total control over:
everything in the HEAD section generally
the referrer meta tag
the meta tags for open graph and twitter images
Most web publishing systems provide users, or at least theme creators, the ability to inject tags and scripts into the head, header and/or footer of each page. This is useful, but what's usually not possible is to remove tags that the system itself creates. This may not necessarily sound like a big deal, but it has important ramifications. Firstly, if your system automatically inserts a particular meta tag that has multiple potential values, you have to go with the value chosen by the system designers. Secondly, if they decide to insert a particular script or tag you don't want there at all, there's not much you can do. In the case of Ghost, it does both (if you share my views about best practices in web publishing).
Firstly, Ghost has baked Google AMP support into the publishing system itself: you can't not publish AMP-friendly posts using Ghost. The AMP site does a good job of hiding the fact that this is a Google project, but most of what you need to know about it is in the headings under About - Who is AMP for?. The four things listed are 'AMP for Publishers', 'AMP for Advertisers', 'AMP for Ecommerce' and 'AMP for Ad Tech Platforms'. AMP is sold as a way to improve the reader experience and speed sites up, but 'Ad Tech Platforms' (like Google) are the cause of the problem AMP is allegedly trying to solve. AMP is really about Google gaining more data and more control over publishing, and I want nothing to do with it.
Secondly, Ghost uses the referrer meta tag with the content attribute set to no-referrer-when-downgrade. This means that any link from an https site to an http site won't pass on the referer header in the http request: but if I link to an https page it still will. I want my referrer tag to be set to no-referrer, for the reasons outlined in Eric Hellman's useful post about the privacy implications of the referer header and referrer meta tag. Basically, it's nobody's business if you're reading my blog posts (more on this later).
I wrote a little bit about Ghost's strangely forgiving attitude to permalinks in my last post. The particular problem I had when it came to migrating my blog to a static site was that I wanted to maintain all the existing permalinks, but change the URL pattern for any new posts. In systems like WordPress and Ghost this is more or less impossible unless you start mucking around with redirects on the actual webserver. Eleventy allows me to do something pretty cool, however, and it's very simple. Each post is written in a Markdown file, and has 'front matter' at the top with basic metadata. A basic frontmatter section looks like this:
title: Beginning a new approach to blogging with Eleventy
author: Hugh Rundle
tags:['GLAM blog club','blogging','post']
With Eleventy you can optionally add a permalink value, which will override any generic rules you have in place regarding how page URLs are created. I wrote a script to extract all my old posts out of Ghost, and among other things it puts the permalink in the frontmatter of the extracted file. This allowed me to avoid breaking old permalinks which use the format YYYY/MM/DD but stop using a dated format for new posts (it seems like unnecessary and ugly cruft when the date is at the top of each post anyway).
Reducing bloat and trackers
Given my minor tirade about AMP and commitment to no-referrer above, you may be wondering about tracking scripts. Wordpress.com has an analytics system built in, and when I moved to Ghost I set up my own Matomo (formerly Piwik) instance. At the time I felt this was a good compromise between my desire to know which pages were most popular on my blog, and my desire not to feed the Google machine with your browsing habits. Even though the stats only go to me, however, having a tracking system is a signal that I think tracking reading habits is normal and reasonable - and also that it's useful. I'm quite doubtful about all three: I literally can't remember when I last checked the stats on my blog or the newCardigan website, which both used my Matomo analytics server until yesterday, and whenever I have looked at them they give me information I already know and can't do anything useful with: my two most popular posts ever were one that was about a lack of investment in and understanding of core technology in librarianship (widely misinterpreted as a post protesting the use of 3D printers) and a post about migrating from WordPress to Ghost. I mean, it's vaguely interesting, but is it worth keeping a PHP application and MySQL server running, and normalising surveillance? Probably not. I hope other people find my blog posts interesting, but I'm usually writing them for me as much as for anyone else.
The final thing I did to remove a tracking vector I'd inadvertantly added to my Ghost site was to strip out all the script links from embedded tweets. When you 'embed' a tweet, you get some HTML like this:
<blockquoteclass="twitter-tweet"data-lang="en"><plang="en"dir="ltr">Has there ever been a better reason to be divorced by your husband? A gem from the archives, December 1938 <ahref="https://t.co/x5ULi25vOr">pic.twitter.com/x5ULi25vOr</a></p>— Tom D C Roberts (@TomDCRoberts) <ahref="https://twitter.com/TomDCRoberts/status/1049914517579321344?ref_src=twsrc%5Etfw">October 10, 2018</a></blockquote>
Do you see that down the bottom? <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
Lastly, I removed the biggest source of bloat: images. This brings me back to my third requirement, which you may have noticed I haven't yet addressed: meta tags for open graph and twitter images. But first things first. Before migrating, I did a speed test of my homepage using Pingdom Tools. My webserver is in Singapore (the nearest point to Australia in the Digital Ocean empire), so there's inevitably going to be a bit of a lag loading pages from here, but my Ghost site was still pretty slow. From Sydney it was calculated to take 3.5 seconds to load, making 37 requests and pulling down a horrendous 3.3MB! The vast majority of that data was images. I've waxed and waned a bit with post images: they're annoying to source, and I'm a fairly text-based thinker, so I don't find images a particularly useful addition to most blog posts - particularly if it's just a header image. On the other hand, they definitely do help to catch my eye when I'm scrolling through social media. What I really wanted was a system that created images that only show up on social media. It turns out there's a way to do that.
I'll save the technical details for a future post, but suffice to say that you can put a reference to an image in a meta tag regardless of whether it's actually displayed on the page. That means you can do this:
The image will appear in the little cards that Twitter and Facebook create when you post a link, but the link in content doesn't need to appear anywhere else on the page.
What's still in
Rubik web font
I've relied on the system font Tahoma for base text, but to make things slightly more interesting I'm using 'Rubik' for headers. This adds a small amount of extra page load time.
Bigfoot is an amazing jQuery plugin that I use for footnotes. The reason I love it so much is that it is so considered and well thought-out. It works by allowing footnotes to function in a 'web way' - instead of having to scroll to the bottom of the screen to read the footnote, Bigfoot inserts a little ellipses instead of the number, and a pop-over when you click or tap on the ellipses, showing the footnote text. The really smart thing, however, is that if you print the page out, Bigfoot basically just switches itself off and the footnotes work the way that is useful in hardcopy.
I wrote a teeny little script to change all the publication dates to relative time (e.g. 'two months ago'). Initially I thought I could do this when pre-processing the page, but then I thought about it for five more seconds and realised that was possibly the dumbest idea I've ever had: it would show the time relative to whenever it was processed, not when you were reading it! The script doesn't have to be big because I'm using momentjs.
So what's the upshot for you, the reader? You get:
Enhanced reader privacy, with every tracker I could think of removed from all pages.
A much faster page load
Significantly less data to download
Glorious Brutalist web design
Using the Pingdom Tools test I used on the old site, the homepage now makes only 9 requests (all local i.e. requesting files from the same server), takes 1.24 seconds to load, and loads just 176kB. Nice.
An open source project takes a community effort to be successful, and the Outreach Committee recently decided to start a community acknowledgement program. We’re looking for nominations of community members who YOU think deserve a bit of extra recognition.
Every month, the Outreach Committee will select one community member to feature on the website.
Please use this form to submit your nominations. We do ask for your email in case we have any questions, but all nominations will be kept confidential.
Any questions can be directed to Andrea Buntz Neiman via firstname.lastname@example.org or abneiman in IRC.
Nominations are open for the 2019 LITA/Library Hi Tech Award, which is given each year to an individual or institution for outstanding achievement in educating the profession about cutting edge technology within the field of library and information technology. Sponsored by LITA and Library Hi Tech, the award includes a citation of merit and a $1,000 stipend provided by Emerald Publishing, publishers of Library Hi Tech. The deadline for nominations is December 31, 2018.
The award, given to either a living individual or an institution, may recognize a single seminal work or a body of work created during or continuing into the five years immediately preceding the award year. The body of work need not be limited to published texts but can include course plans or actual courses and/or non-print publications such as visual media. Awards are intended to recognize living persons rather than to honor the deceased; therefore, awards are not made posthumously.
You’ve probably used a Question Answering (QA) system. Most of them are just a FAQ turned into a horrible search interface. If you don’t answer the exact question they answered, don’t bother. Other QA systems are basically just keyword search that let you put in questions.
So what is a proper question answering system? The answer seems obvious, “it is a system that answers your questions.” But to do it properly it needs to recognize synonyms, close enough answers and other aspects of the meanings of questions specifically and language generally.
In their talk, “Enriching With Deep Learning for a Question Answering System” at this year’s Activate conference, Lucidworks data scientists Savva Kolbachev and Sanket Shahane will show you a powerful question answering system that they constructed by adding deep learning using Fusion. They’ll both show how to produce more accurate answers as well as how to scale the approach given the weights of deep learning models.
Their talk will cover techniques as well as the more technical, mathematical and statistical details and include a demo of how Fusion enriches. Additionally they’ll detail highlighting using sentiment analysis.
If you’re trying to create an Information Retrieval system such as a QA system, or even if you’re just really interested in deep learning, you’re definitely not going to want to miss this talk. See you in Montreal next week!
Factory work exhausts the nervous system to the uttermost; at the same time, it does away with the many-sided play of the muscles, and confiscates every atom of freedom, both in bodily and in intellectual activity. Even the lightening of the labour becomes an instrument of torture, since the machine does not free the worker from the work, but rather deprives the work itself of all content. Every kind of capitalist production, in so far as it is not only a labour process but also capital’s process of valorization, has this in common, but it is not the worker who employs the conditions of his work, but rather the reverse, the conditions of work employ the worker. However it is only with the coming of machinery that this inversion first acquires a technical and palpable reality.
It’s not hard to imagine Marx talking about the work we do on the web as we answer CAPTCHAs, like and retweet posts in social media, and microworkers labor in their browsers. It’s also not difficult to imagine Foucault saying something very similar when expounding on his idea of biopower. Marx and Foucault seem to have a lot more in common than people generally seem to admit. Perhaps the political projects that arose around them were quite different, especially in time, but meanwhile their thought aligns quite nicely in places.
Yes, I’m still working my way through Capital and listening to David Harvey’s online lectures as a podcast on my commute. It is slow going, but it has been fun to slowly work through it at my own pace and not for a class or some directed purpose for my own research. But there are actually quite a few points of intersection, and it’s fun to run across them when Marx is writing in such a different time and place–which I guess is not so different after all. The scope of his project is truly remarkable.
Marx, K. (1990). Capital: A critigue of political economy (Vol. 1). London: Penguin.
University of Toronto Libraries opened a User Research and Usability (UX) Lab in September 2017, the first space of its kind on campus. The UX Lab is open to students, staff, and faculty by appointment or during weekly drop in hours.
In this 90-minute webinar, our presenter will discuss:
The rationale behind building a physical usability lab and why a physical space isn’t always needed (or recommended)
Experience with community building efforts
How to raise awareness of UX as a service to staff and the University community at large
The evolution of the lab’s services
Presenter: Lisa Gayhart, User Experience Librarian, University of Toronto Libraries Thursday November 15, 2018, 1:00 – 2:30 pm Central Time
The security of a permissionless peer-to-peer system generally depends upon the assumption of uncoordinated choice, the idea that each peer acts independently upon its own view of the system's state. Vitalik Buterin, a co-founder of Ethereum, wrote in The Meaning of Decentralization:
In the case of blockchain protocols, the mathematical and economic reasoning behind the safety of the consensus often relies crucially on the uncoordinated choice model, or the assumption that the game consists of many small actors that make decisions independently.
Another way of saying this is that the system isn't secure if enough peers collude with each other. Below the fold, I look at why this is a big problem. How can you prove that peers in your system aren't covertly colluding? In the real world, you can't prove a negative, so the security of permissionless P2P systems generally depends upon an assumption that cannot be verified.
A spreadsheet has been leaked, allegedly authored by Shi Feifei, a Huobi employee. Titled “Huobi Pool Node Account Data 20180911,” the document details mutual voting and sharing of proceeds from producing blocks in EOS.
Ownership in EOS is highly concentrated, with just 10 addresses holding some 50% of all tokens. Exchanges tend to dominate for obvious reasons, but in other public blockchain they have no say in protocol rules and take no direct part in validation.
In EOS, however, it appears exchanges effectively control the network with this very centralized blockchain having just 21 validators. Some of which are seemingly controlled by just one entity with Huobi allegedly able to act as a king maker.
The threat model underlying the design of the LOCKSS protocol included a powerful adversary or conspiracy controlling a large number of peers. The design had a mechanism that made it statistically likely that the adversary would be detected.
If you’re in ecommerce, you take inspiration from the leaders. One of the talks we’re really excited about at this year’s Activate conference is with Suyash Sonawane and John Castillo from online retailer Wayfair. As one of the largest online destinations for home, Wayfair knows a thing or two about what its customers shopping onsite need and what they expect. With two million users daily, and more than 10 million products in its catalog, the scale of their operations covers, “a zillion things for the home.”
When you’re in ecommerce, it’s paramount that you’re able to describe products in the written word. Describing physical dimensions or the material of a product can only do so much—often times, the best words to describe an item are those used by your very own customers. Enter stage left: Product reviews.
Senior Engineers Suyash Sonawane and John Castillo ingested a myriad of customer signals and processed millions of product reviews in order to extract useful information. They processed this data using Natural Language Processing. By combining these techniques, they developed insights about what customers say about different products.
In their talk, Sonawane and Castillo detail how Wayfair’s Search Tech team has looked beyond the product catalog to improve search relevance onsite and influence customer experience at scale.
Imagine the possibilities from this. Being able to augment your product descriptions and search terms with what customers actually say, for instance “this barbecue grill is leak-free” yields an important characteristic.
Customers also tell you when things are going wrong as well as alternatives, “I’ve been ordering this for a long time but recently the quality has gone done down so I switched to SuperX brand.”
These sorts of insights go beyond analytics and allow you to optimize keywords, make alternative recommendations, detect when something has gone wrong and even how to resolve common problems. This is a type of customer signal that with the right technique you can mine to optimize your conversion rate and your relationship with your customers!
In other words, you can Activate your AI and Search capabilities like Wayfair.
New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.
As an organization dedicated to developing free and open-source research tools, we care deeply about open access to scholarship. With the latest version of Zotero, we’re excited to make it easier than ever to find PDFs for the items in your Zotero library.
While Zotero has always been able to download PDFs automatically as you save items from the web, these PDFs are often behind publisher paywalls, putting them out of reach of many people.
Enter Unpaywall, a database of legal, full-text articles hosted by publishers and repositories around the world. Starting in Zotero 5.0.56, if you save an item from a webpage where Zotero can’t find or access a PDF, Zotero will automatically search for an open-access PDF using data from Unpaywall.
It can do the same when you use “Add Item by Identifier” to create a new item, and a new “Find Available PDF” option in the item context menu lets you retrieve PDFs for existing items in your library.
We operate our own lookup service for these searches with no logging of the contents of requests.
Unpaywall is produced by Impactstory, a nonprofit dedicated to making scholarly research more open, accessible, and reusable, and we’re proud to support their work by subscribing to the Unpaywall Data Feed.
Zotero can also now take better advantage of PDFs available via institutional subscriptions. When you use “Add Item by Identifier” or “Find Available PDF”, Zotero will load the page associated with the item’s DOI or URL and try to find a PDF to download before looking for OA copies. This will work if you have direct or VPN-based access to the PDF. If you use a web-based proxy, only open-access PDFs will be automatically retrieved using this new functionality, but you can continue to save items with gated PDFs from the browser using the Zotero Connector.
If there are other sources of PDFs you’d like Zotero to use, you can also set up custom PDF resolvers.
Upgrade to Zotero 5.0.56 and Zotero Connector 5.0.41 today to start using these new features.
This featured sponsor post is from Heather Greer Klein, DuraSpace Services Coordinator.
The DuraCloud team from DuraSpace is excited to be a part of the Digital Library Federation Forum 2018 and NDSA’s Digital Preservation 2018. Our participation wraps up a year of focus on expanding the community participation model of the DuraCloud open source project. We hope to share what we have learned about broadening participation in an open community project, and to learn from others about what digital preservation needs remain unmet by current community initiatives and software projects.
I will present at the DLF Forum 2018 to share what we learned this year when looking to expand who participates in open source initiatives, and how to incorporate non-technical staff into focused work to move the project forward.
We began this work in earnest when the ‘Open Sourcing DuraCloud: Beyond the License’ project was chosen for the Mozilla Open Leaders program. As a result of the 12-week mentorship & training program, DuraSpace staff developed an easy introduction to what DuraCloud is and why it matters; contribution guidelines; a roadmap for future development; and detailed opportunities for contribution. The DuraCloud team believes this community contribution model will accelerate DuraCloud’s growth into a truly open source project.
We have also expanded the reach of the DuraCloud service globally by bringing on our first ever Certified DuraSpace Partner offering DuraCloud services, 4Science, to offer DuraCloud Europe. This partnership has enhanced the software to allow for storage providers in multiple regions, and meets a critical need for preservation storage located outside of the United States.
Join us Friday, October 26, 12:00-1:00pm CDT, on Twitter to learn more about the programs, events, and activities at this year’s LITA Library Technology Forum in Minneapolis, MN, November 8-10, 2018.
To participate, launch your favorite Twitter app or web browser, search for the #LITAchat hashtag, and select “Latest” to participate and ask questions. Be sure to include the hashtags #litaforum and #litachat.
WinPlay3 was the first real-timeMP3audio player for PCs running Windows, both 16-bit (Windows 3.1) and 32-bit (Windows 95). Prior to this, audio compressed with MP3 had to be decompressed prior to listening.
WinPlay3 was the first, but it was bare-bones.It was WinAmp that really got people to realize that the PC was a media device. But the best part was that WinAmp was mod-able. It unleashed a wave of creativity (Debbie does WinAmp, anyone?), now preserved in the Archive's collection of over 5,000 WinAmp skins!
Thanks to Jordan Eldredge and the Webamp programming community for this new and strange periscope into the 1990s internet past.
When I first clicked on the llama on The Swiss Family Robinson on my Ubuntu desktop the sound ceased. It turns out that the codec selection mechanism is different between the regular player and WinAmp, and it needed a codec I didn't have installed. The fix was:
Today, in partnership, the HBCU Library Alliance (HBCU LA) and Digital Library Federation (DLF) launched a three-year “Authenticity Project.” This fellowship program, generously supported by an Institute of Museum and Library Services (IMLS) grant, will provide mentoring, learning, and leadership opportunities for 45 early- to mid-career librarians from historically black colleges and universities, as well as meaningful frameworks for conversation and collaboration among dozens of additional participants from both organizations from 2019-2021.
The Authenticity Project builds on a previous collaboration, in which the two organizations co-hosted an IMLS-funded 2017 DLF Forum pre-conference for HBCU and liberal arts college participants, alongside a conference travel fellowship program for 24 DLF HBCU Fellows. Experiences from the pre-conference and fellowship program are outlined in a report entitled “Common Mission, Common Ground.”
In each year of the Authenticity Project, fifteen Fellows will be matched with two experienced library professionals: an established mentor from an HBCU LA library or with a strong background in HBCUs, and a “conversation partner” working in an area of the Fellow’s interest, ideally within a DLF member institution. Fellows will receive full travel, lodging, and registration expenses to the annual DLF Forum and Learn@DLF workshops; access to online discussion spaces and in-person networking opportunities; and opportunities to apply for microgrant funding to undertake inter-institutional projects of strategic importance across institutions and communities. They will also participate in quarterly facilitated, online networking and discussion sessions.
“The Authenticity project will provide wonderful leadership opportunities for our members,” HBCU Library Alliance executive director Sandra Phoenix remarked. “We are excited for a second opportunity to partner with the Digital Library Federation.”
DLF’s director, Bethany Nowviskie, added, “We’re grateful to IMLS for this chance to bring the DLF community into deeper and more authentic engagement with our partners at the HBCU Library Alliance, and very excited to meet the emerging leaders who will benefit from the program as Fellows!”
The program is currently seeking volunteers to act as Authenticity Project mentors and conversation partners in its inaugural year. Volunteers who are matched with a Fellow will support their learning through quarterly discussions on provided topics, receive access to a supportive community of peers, and will be granted small annual stipends in acknowledgment of their time and commitment to the program. Mentors working in HBCU LA libraries will receive an additional travel stipend to enable free DLF Forum registration or to offset travel costs associated with meeting with fellows elsewhere.
“How fitting to begin with the mentoring component,” Phoenix reflected, “positioning the program for teaching, learning, and sharing skills that will strengthen leadership development.”
The application for the first cohort of fifteen Authenticity Project Fellows will open on October 22nd. Both application processes will close on November 16th, and 2019 calendar-year program selections will be announced in December 2018.
DLF is an international network of member institutions and a robust community of practice, advancing research, learning, social justice, and the public good through the creative design and wise application of digital library technologies. It is a program of CLIR, the Council on Library and Information Resources — an independent, nonprofit organization that forges strategies to enhance research, teaching, and learning environments in collaboration with libraries, cultural institutions, and communities of higher learning.
The HBCU Library Alliance is a consortium that supports the collaboration of information professionals dedicated to providing an array of resources to strengthen Historically Black Colleges and Universities (HBCUs) and their constituents. As the voice of advocacy for member institutions, the HBCU Library Alliance is uniquely designed to transform and strengthen its membership by developing library leaders, helping to curate, preserve and disseminate relevant digital collections, and engaging in strategic planning for the future.
This project is made possible in part by the Institute of Museum and Library Services, through grant # RE‐70‐18‐0121. The IMLS is the primary source of federal support for the nation’s libraries and museums. They advance, support, and empower America’s museums, libraries, and related organizations through grantmaking, research, and policy development. Their vision is a nation where museums and libraries work together to transform the lives of individuals and communities. To learn more, visit www.imls.gov.
The Islandora Foundation is very pelased to announce our first event of 2019: Islandora Camp Switzerland, taking place June 17 - 19 in Dübendorf, Switzlerand. We are partnering with Islandora Foundation member Lib4RI to hold the camp at Eawag. While we won't be anouncing the full schedule until some time next year, we expect to follow the usual camp schedule of:
Day One: General and introductory sessions about the software and the community that uses and supports it
Day Two: Hand-on workshop training with tracks for Developers and front-end Administrative users, featuring both Islandora 7.x and Islandora CLAW
Day Three: Sessions on specific sites, tools, and topics of interest to Islandora users
The camp will kick off on the Monday directly following Open Repositories 2019, taking place in Hamburg, Germany, so if you're heading to OR, why not hang around for a weekend of sightseeing and join us as well?
Looking for Islandora events in North America? Stay tuned for dates and locations for Islandoracon.
On September 25, 2018 OpenAIRE and DuraSpace signed a Memorandum of Understanding (MoU). Both OpenAIRE and DuraSpace have a shared interest in a robust, interoperable, and functional network of repositories that provide value to the research community and contribute to Open Science and Open Access in Europe and the world.
Repositories collectively act as the foundation for Open Science by collecting and providing access to research outputs, and play a key role in the emerging scholarly commons. To that end, OpenAIRE and DuraSpace aim to ensure that repositories are using up-to-date technologies and adopting international standards and protocols. Through this MOU, OpenAIRE and DuraSpace have agreed to work together on a number of aspects to support their common goals. These activities include enabling DSpace systems to comply with OpenAIRE metadata guidelines, gradual adoption of next generation repository functionalities, and working together on standardized methods for measuring and aggregating usage statistics.
This partnership will support the global community of repository users by improving repository functionality and enabling the adoption of other value-added services by repository platforms and aggregators.
The Open Repositories Steering Committee and Stellenbosch University is delighted to announce that the 15th Open Repositories Conference will be held in Stellenbosch, South Africa, from 1-4 June 2020. The conference will be organised by Stellenbosch University Library and Information Service who looks forward to welcoming delegates to the first Open Repositories Conference (OR) on the African continent.
Stellenbosch University, located in the historical town of Stellenbosch, approximately 50 km from Cape Town, strives to be Africa’s leading research-intensive university, globally recognised as excellent, inclusive and innovative where knowledge is advanced in service of society.
The Library and Information Service has been playing an active role in the African and international Open Access community for a number of years and one of its strategic objectives is to develop and maintain collaborative relationships with a range of external and internal stakeholders by advancing local, national and international initiatives with regards to open scholarship.
Having chosen Stellenbosch as the venue, the annual OR Conference continues its objective to bring practitioners working at the interface of technology and scholarship together. Participants at OR come from higher education, government, libraries, archives and museums to share their experiences and knowledge about repository infrastructure, tools, services, and policies. We hope you will join us in Stellenbosch in 2020!
For the local organizing committee:
Ellen Tise & Mimi Seyffert-Wirth
Stellenbosch University Library and Information Service
That was the topic discussed recently by OCLC Research Library Partners metadata managers, initiated by John Riemer of University of California, Los Angeles, Stephen Hearn of University of Minnesota, and MJ Han of University of Illinois at Urbana-Champaign. The emphasis in authority work has been shifting from construction of text strings to identity management—differentiating entities, creating identifiers, and establishing relationships between entities. Metadata managers agree that the future is in identity management and getting away from “managing text strings” as the basis of controlling headings in bibliographic records.
To support linked data, there is a need to maximize the number of entities in current descriptive work that have identifiers. The latest Program for Cooperative Cataloging strategic plan includes as a strategic direction “Accelerate the movement toward ubiquitous identifier creation and identity management at the network level.” A Technicalities opinion column Unpacking the Meaning of NACO Lite has been written in response to the call in action item 4.1 of the PCC strategic plan to further define what is meant by the concept.
Part of our discussion revolved on how “identity management” work differs from what catalogers currently do for authority work. The intellectual work required to differentiate names is the same. Is it really “add-on” work or just part of what metadata specialists have always done, but in a new environment? Identity management poses a change in focus, from providing access points for the resource to describing the entities represented by the resource (the work, persons, corporate bodies, places, etc.) and establishing the relationships and links among them. An example of a focus on relationship is illustrated by the above graphic from linkedjazz.org, visualizing the influences of jazz musicians on each other.
Authority work has focused on establishing a specific text string as the controlled access point and variant strings which redirect to it. Some systems index only the controlled access point. This can be particularly problematic when dealing with names in different languages. For example, the University of Hong Kong must deal with both simplified and traditional forms of Chinese characters to represent personal names, and the transliteration is not Pinyin but another scheme for representing the Cantonese, rather than the Mandarin, pronunciation. The Australian National University (ANU) similarly struggles with names of indigenous peoples who often have multiple spellings depending on context. Indexing only the authorized access point is not enough.
Suggestions for transitioning to identity management work included:
Develop identity management as new type of authority work and provide a venue for this type of contribution (a potential complement to traditional authority control).
Reorient traditional NACO authority control work, redirecting the energy toward identity management.
Align library practices with that other parties, e.g., rights management agencies, so that other centralized files can be used and shared.
A key barrier to moving from text strings to identity management is the lack of technology and infrastructure to support it. New tools are needed to index and display information about the entities described with links to the sources of the identifiers. The British Library hopes to take advantage of identifiers from other sources, especially those created by the authors themselves. Metadata managers aspire to reuse data from other communities as much as possible, as no one institution can create all the identifiers needed alone. Valued sources may vary depending on discipline, for example, Getty’s Union List of Artist Names may be most useful for art materials. Since there may well be multiple identifiers pointing to the same entity, we’ll also need tools to reconcile them. Technology developers won’t create the needed functionality when the data isn’t there. This chicken and egg problem hampers efforts to make this transition.
Several RLP metadata managers have been participating in the Program for Cooperative Cataloging’s ISNI pilot, a year-long effort to understand ISNI (International Standard Name Identifier) tools and identifiers, create documentation and training, and explore the possibility of using ISNI as a cost-effective component of the PCC’s Name Authority Cooperative Program (NACO). The results have been mixed, as participants struggled to learn a new system which lacked some of the attributes usually associated with authority records. ISNI itself is looking to facilitate individual contributions and streamline its batch-loading processes. ISNI continues to explore how it might include ORCIDs (Open Researcher and Contributor Identifiers). ORCIDs have become a common identifier for researchers not represented in authority files; for instance, ANU reports that 85% of its researchers now have ORCIDs.
Where do identifiers belong in bibliographic records? There is a temporary moratorium on adding identifiers to the 024 field in LC/NACO authority records, and not all systems yet differentiate or support the $0 subfield (URIs and control numbers that refer to “records describing things”) and the $1 subfield (URIs that directly refer to “the Thing”) in bibliographic records. Nor is there a common understanding of the differentiation—do authority records “describe things” or represent “the Thing”? OCLC recently implemented the $1 subfield for bibliographic records, and UCLA is scoping a pilot project on using the $1 to identify persons not covered by the LC/NACO authority file toward calculating the effort to achieve 100% identifier coverage.
Some posited Wikidata as another option for getting identifiers for names not represented in authority files which would also broaden the potential pool for contributors. Those who attended the 12 June 2018 OCLC Research Works in Progress Webinar: Introduction to Wikidata for Librarians rated it highly. A subsequent poll indicated the most interest among the OCLC Research Library Partnership in a “deeper dive” was to learn how other libraries are using Wikidata. The newly formed OCLC Research Library Partnership Wikimedia Interest Group may provide some good use cases. As Wikidata was developed by drawing data from Wikipedia, it has focused on “works” and their authors, which could be viewed as an alternate version of the traditional author/title entries in authority files. But recently an effort to support citations that are in Wikipedia articles, WikiCite, demonstrates that there is also a need to register and support identifiers that make up those citations, which would include information about a specific edition or document.
Redirecting the energy devoted to traditional authority work toward identity management poses the biggest hurdle. The linchpin is whether we can reconfigure our systems to deal with identifiers as the match point, collocation point, and the key to whatever associated labels we display and index.
Open-source software is an amazing movement in today’s programming environment. By sharing the code behind programs, open-source projects empower online communities to create quality programs that are available for free. These collaborations celebrate transparency and inclusion, improving the landscape of development in many ways. Many of the programs you likely use are developed by open-source communities. The Firefox and Chrome browsers, the Android operating system, and many websites are entirely or use open-source software.
If that sounds appealing to you, there is no better time to dip your feet into open-source development than now. Welcome to Hacktoberfest! Hacktoberfest is a time where the largest repository of open-source software, GitHub, encourages people to try out development and encourages its members and projects to make the barriers of entry as low as possible. Beginner projects are created for practice. Established projects tag issues that are good for new coders with a special Hacktoberfest label. Contribute enough code in the month of October and you will be sent a free shirt!
The shirts are awesome
So how do you get started? The first thing you have to do is learn a special tool called “version control”. If you’ve ever been working on a paper that requires revisions, you have experience with version control techniques.
What version control tools do is allow you to do is to bookmark changes and stages of progress without making a hundred different files. They also make it easy to go back to previous versions and see the differences between bookmarks. One of the most popular options is called git (hence GitHub).
The most important thing that git allows you to do is to reconcile different changes to the same files. You can “merge” changes from one version into another. This allows multiple people to work on the same files without conflicting with each others’ work. This means that teams and communities can delegate tasks and work on them individually. GitHub gives them a place to host code publically and git allows people to take projects in their own directions (called “forks”).
When someone wants to add their improvements back to the project, they create what’s called a “pull request”. A pull request shows all of the changes that have been made to a project clearly and asks the original owner if they want to include these changes in the project. It’s the center for conversation and progress on GitHub.
Now that I’ve covered the basics, it’s time for you to dive in and make your own pull requests! Creating only five scores you an awesome shirt. Go to the Hacktoberfest website to get started or check out this resource or this interactive tutorial for git to start applying that to your everyday routine.
Chris Hallberg is a web designer and technology developer at Falvey Library, an open-source enthusiast and part-time teacher.
In the September 2018 issue of Information Technology and Libraries (ITAL), we continue our celebration of ITAL’s 50th year with a summary of articles from the 1980s by former Editorial Board member Mark Dehmlow. In Mark’s words, “The 1980s were an exciting time for technology development and a decade that is rife with technical evolution.” As personal computers became commonplace through the decade, the Internet age was just around the corner.
As accessibility of online resources becomes an increasingly visible priority for libraries, ensuring that our licensed content vendors meet the same standards is more important. This article describes a method of increasing the visibility of vendor accessibility documentation for the benefit of our users.
Many library management systems are moving toward linked data storage and retrieval systems. To use this format for our cataloging metadata, libraries need ways to store, manipulate, and process RDF triples. This article proposes a distributed solution for efficiently processing and storing the large volume of library linked data stored in traditional storage systems. Apache Spark is used for parallel processing of large data sets and a column-oriented schema is proposed for storing RDF data.
This Ex Libris/LITA Student Paper Award winning article reexamines the conclusions reached by Davis and Walters in 2011 by providing a critical review of OACA literature that has been published since then, and explores how increases in OA publication trends could serve as a leveraging tool for libraries against the high costs of journal subscriptions.
In this article, the authors propose a method to use user comments and reviews of books to enhance the discovery of the items themselves. They describe the process by which they attempted to improve user satisfaction with the effects of retrieval results and visual appearance by employing users’ own information.
Recommender systems for books are frequently employed in libraries and bookstores. In this article, the authors provide a method for creating recommender systems for archival or other special collections based on user actions and collection metadata from finding aids created using Encoded Archival Description (EAD) standards.
LITA and OCLC invite nominations for the 2018 Frederick G. Kilgour Award for Research in Library and Information Technology. Submit your nomination no later than December 31, 2018.
The Kilgour Research Award recognizes research relevant to the development of information technologies, in particular research showing promise of having a positive and substantive impact on any aspect of the publication, storage, retrieval, and dissemination of information or how information and data are manipulated and managed. The winner receives $2,000 cash, an award citation, and an expense-paid trip (airfare and two nights lodging) to the 2019 ALA Annual Conference.
Nominations will be accepted from any member of the American Library Association. Nominating letters must address how the research is relevant to libraries; is creative in its design or methodology; builds on existing research or enhances potential for future exploration; and/or solves an important current problem in the delivery of information resources. A curriculum vitae and a copy of several seminal publications by the nominee must be included. Preference will be given to completed research over work in progress. The intent is to recognize a body of work probably spanning years, if not the majority of a career. More information and a list of previous winners can be found on the LITA website.
Thank you to OCLC for recognizing top researchers in our profession by sponsoring this award.
Over the last couple of years, as we retired, the program has migrated from being an independent operation under the umbrella of the Stanford Library, to being one of the programs run by the Library's main IT operation, Tom Cramer's DLSS. The transition will shortly be symbolized by a redesigned website (its predecessor looked like this).
Thanks again to the NSF, Sun Microsystems, and the Andrew W. Mellon Foundation for the funding that allowed us to develop the system. Many thanks to the steadfast support of the libraries of the LOCKSS Alliance, and the libraries and publishers of the CLOCKSS Archive, that has sustained it in production. Special thanks to Don Waters for facilitating the program's evolution off grant funding, and to Margaret Kim for the original tortoise logo.
Vicky Reich gave a brief talk at the Building A Better Web: The Internet Archive’s Annual Bash. She followed Jefferson Bailey's talk, which reported that the Internet Archive's efforts to preserve the journals have already accumulated full text and metadata of nearly 8.7M articles, of which nearly 1.5M are from "at-risk" small journals. This is around 10% of the entire academic literature.
Below the fold, an edited text of Vicky's talk with links to the sources. I’m a librarian. I work with Jefferson and Dr. David Rosenthal. For more than 20 years, I’ve worked to preserve web journals published by small organizations. Journals published by big corporations are not disappearing. Journals published by small organizations disappear every day. Most small organizations don’t have enough money.
I want to tell you two stories about two journals; Genders and Exquisite Corpse.
Genders was first published in 1988. As the first major humanities journal to focus on theories of gender and sexuality, it was both influential and controversial. Starting in 1998, it was published online only. In 2014, the editor-in-chief retired, and she stopped paying the web hosting bills. 15 years of Genders appeared to be lost.
Andrei Codrescu edited “Exquisite Corpse”, from 1983 through 2015. Codrescu is a Romanian-American poet, novelist, essayist, screenwriter, and NPR commentator. For 25 years, he was a Distinguished English Professor at Louisiana State University. Exquisite Corpse was a notorious anti-literary- literary journal. Rebellion, passion and black humor were trademarks.
We need $25,000 dollars to keep bringing you the Corpse. Anything from $20 upwards would be a great help!
After Hurricane Katrina, Exquisite Corpse ceased publication. It resumed in 2008 with this from Codrescu:
Welcome to the Post-Katrina Resurrection Corpse, back from a dank hiatus of one year in a formaledehyde-poisoned FEMA trailer. We festered, we raged, we contemplated suicide, and in the end, voted for life because we are a Corpse already and we hate to keep on dying, just like the ideals of the Republic.
Codrescu has declared, “The Corpse isn't dead yet”, and due to the Internet Archive, it’s unlikely to die. The WayBack machine has the only complete copy of original back issues.
Journals published by small organizations encapsulate underrepresented voices and experiences. In addition to poets and gender researchers, scientists publish in numerous small, local journals.
Here are three examples from Turkey, Nigeria, and Russia. These titles are targeted for capture in the coming year:
This kind of local environmental research has the potential to inform, for example, responses to global climate change.
I am proud to be working with the Internet Archive’s staff to ensure that the Web’s vast and deep trove of journal literature persists. This work helps to move forward Brewster’s vision; A web that helps us understand our world, and preserves the human record.