From Michael Guthrie, KnowledgeArc EIFL, in association with the University of Mandalay and the University of Yangon, ran a series of seminars and meetings in March to finalize institutional open access policies and launch open access repositories. During this series of meetings, as part of the EIFL eLibrary Myanmar project, KnowledgeArc launched the first open access repository in the country of Myanmar.
Conference early bird deadline extended to Tuesday June 28. And the conference program is available on line! And we have a new keynote, Dario Taraborelli of Wikidata! Register today at http://vivoconference.org
Andy told us a couple of stories about his recent experiences on trains in Hong Kong and Melbourne. Despite the language barrier, he found the Hong Kong trains to be much easier to use, and in fact, made the experience so enjoyable that he and his family sought out opportunities to take the train: “Hey, if we go to that restaurant across town instead of the one down the street we could take the train!”(this isn’t a direct quote)
My notes on this read:
How can we help students not feel like they’re in a foreign place in the library?
How can we help the library feel desirable?
But now that I think about it, that first point is totally unecessary. Feeling like you’re in a foreign place isn’t the problem; it can actually be quite wonderful and exciting. Being made to feel unwelcome is the problem, regardless of whether the place is foreign or familiar. So I quite like the idea of trying to make the library feel desirable. I think my own library does this reasonably well with our physical space (we’re often full to bursting with students) but it’s a nice challenge for our virtual spaces.
Andy also talked about Ellen Isaacs idea of “the hidden obvious” when describing library staff reaction to his team’s user research findings. He also mentioned Dan North on uncertainty: “We would rather be wrong than be uncertain.” These two ideas returned at other times during the next two days.
Donna Lanclos: Keynote
Donna also told us stories. She told us stories about gardens and her mother’s advice that if you plant something new and it dies, you plant something else. With “Failed” as one of the conference streams, this key next step of “plant something else” is important to keep in mind. Failing and then learning from failure is great. But we must go on to try again. We must plant something else. Not just say “well, that didn’t work, let’s figure out what we learned and not do that again.” Plant something else.
Donna’s mother also said, though, that “sometimes the plant dies because of you.” So that maybe, sometimes, it’s not that you need to plant something else. You just need to plant the same thing and be more careful with it. Or maybe someone else should plant it or look after it.
Another point from this garden story was that there are always people in the library who take particular pains to keep lists of all the dead plants. People who say “we tried that before and it didn’t work.” Or who make it clear they think you shouldn’t try to plant anything at all. Or who cling too strongly to some of those dead plants; who never intend to plant again because of it. Don’t keep a list of the dead plants. Or maybe keep a list but not at the forefront of your mind.
Donna told us another story about her fieldwork in Northern Ireland. How she found it difficult to be gathering folklore when there were bigger issues; problems that needed fixing. Advice she got then and passed on to us was that just because you can’t fix problems with your ethnographic work doesn’t mean that you can’t do anything, that you aren’t doing anything. Gathering understanding – a new and differentunderstanding – is valid and valuable work and it’s different work than solving problems.
She argued that ethnographic work is not about finding and solving problems but about meaning. Finding out what something means, or if you don’t know what it means, figuring out what you think it means. The work can help with small wins but is really about much more. This is a theme Donne and Andrew discussed further in the wrap-up panel on Friday.
Finally, I have this note that I can’t at all remember the context for, but boy do I like it anyway:
Not risk, but possibility
Jenny Morgan: UX – Small project/ high value?
Jenny’s was the first of the Nailed, Failed, Derailed sessions I attended and she was a wonderfully calm presenter – something I always admire since I often feel like a flailing goon. She spoke about a project she led, focusing on international students at her library at Leeds Beckett University. A couple of my take-aways:
They asked students how they felt about the library. I like this affective aspect and think it ties in with what Andy was talking about with making the library desirable.
Students don’t think of the whole building; despite the library making printers available in the same place on every floor, students didn’t realize there were printers on any floor other than the one they were on. As a consequence, students would stand in line to use printers on one floor instead of going to another floor where printers were available. Of course this makes sense, but library staff often think of the whole building and forget that our users only use, see, and know about a tiny portion.
The international students they spoke to found the library too noisy and were hesitant to ask the “home” students to be quiet. They didn’t like the silent study areas or the study carrels; they wanted quiet, but not silent.
International students are often on campus at times when “home” students are not (e.g. holidays, break times). They like going to the library for the community that they can’t find elsewhere, often because everywhere else is closed. This hit home for me because our campus really shuts down at the Christmas break, and even the library is closed. It made me wonder where our international students go for that feeling of community.
Carl Barrow: Getting back on the rails by spreading the load
One of the first things that struck me about Carl’s presentation was his job title – Student Engagement Manager – and that Web is included under his purview. I think I would love that job.
Carl was really open and honest in his presentation. He talked about being excited about what he learned at UXLibs and wanting to start doing user research with those methods, but feeling hesitant. And then he looked deeper into why he was feeling hesitant, and realized part of it was his own fear of failure. Hearing him be so honest about how his initial enthusiasm was almost sidetracked by fear was really refreshing. Conference presenters usually (and understandably) want to come off as polished and professional, and talking about feelings tends not to enter into it. But it makes so much sense at a UX conference – where we spend a fair bit of time talking about our users’ feelings – to talk about our own feelings as well. I really appreciated this about Carl’s talk. A few other points I noted down:
He trained staff on the ethnographic methods he wanted to use and then (this is the really good bit) he had them practice those methods on students who work in the library. This seemed to me to be a great way for staff to ease in: unfamiliar methods made less scary by using them with familiar people.
Something that made me think of Andy’s point about “the hidden obvious”: they realized through their user research that the silent reading room had services located in the space (e.g. printers, laptop loans) that made it rather useless for silent study. I personally love how user research can make us see these things, turning “the hidden obvious” to “the blindingly obvious.”
I just like this note of mine: “Found that signage was bad. (Signage is always bad.)”
They found that because people were not sure what they could do from the library’s information points (computer kiosk-type things), they simply stayed away from them. At my own library, trying to make our kiosks suck less is one of my next projects, so this was absolutely relevant to me.
Deirdre Costello: Sponsor presentation from EBSCO
Last year, Deirdre rocked her sponsor presentation and this year was no different. I was still a bit loopy from having done my own presentation and then gone right to my poster, so honestly, this was the only sponsor presentation I took notes on. My brain went on strike for a bit after this.
Deirdre talked about how to handle hard questions when you’re either presenting user research results, or trying to convince someone to let you do user research in the first place. One of those was “Are you sure about your sample?” and she said the hidden questions behind this was “Are you credible?” It reminded me about a presentation I did where I (in part) read out a particularly insightful love letter from a user, and someone’s notes on that part of the presentation read “n=1”: surely meant to be a withering slam.
Other points I took away from Deirdre:
Sometimes you need to find ways for stakeholders to hear the message from someone who is not you (her analogy was that you can become a teenager’s mom; once you’ve said something once, they can’t stand to hear the same thing from you again).
One great way of doing the above is through videos with student voices. She said students like being on video and cracking jokes, and this can create a valuable and entertaining artifact to show your stakeholders.
Again related to all this, Deidre talked about the importance of finding champions who can do things you can’t. She said that advocacy requires a mix of swagger and diplomacy, and if you’re too much on the swagger side then you need a champion who can do the diplomacy part for you.
Andrea Gasparini: A successful introduction of User Experience as a strategic tool for service and user centric organizations
Apologies to Andrea: I know I liked his session but the notes I took make almost no sense at all. I got a bit distracted when he was talking about his co-author being a product designer at his library at the University of Oslo. The day before I came to UXLibs II, I met with Jenn Phillips-Bacher who was one of my team-mates at the first UXLibs. Jenn does fabulously cool things at the Wellcome Library and is getting a new job title that includes either “product designer” or “product manager” and we had talked a bit about what that means and how it changes things for her and for the library. That discussion came back to me during Andrea’s session and took me away from the presentation at hand for a while.
The only semi-coherent note I do have is:
Openness to design methods implies testing and learning
Ingela Wahlgren: What happens when you let a non-user loose in the library?
Ingela described how a whole range of methods were used at Lund University library to get a bigger picture of their user experience. She then went into depth about a project that she and her colleague Åsa Forsberg undertook, trying to get the non-user’s perspective.
One UX method that was taught at last year’s UXLibs was “touchstone tours,” where a user takes the researcher on a tour of a space (physical or virtual). This lets the researcher experience the space from the user’s point of view and see the bits that are most useful or meaningful to them. Ingela and Åsa wanted to have a non-user of the library take them on a touchstone tour. They might see useful and meaningful parts of the library, but more importantly would see what was confusing and awful for a new user. I thought this was a brilliant idea!
Most of the presentation, then, was Ingela taking the audience along for the touchstone tour she had with a non-user. With lots of pictures of what they had seen and experienced, Ingela clearly demonstrated how utterly frustrating the experience had been. And yet, after this long and frustrating experience, the student proclaimed that it had all gone well and she was very satisfied. ACK! What a stunningly clear reminder that what users say is not at all as important as what they do, and also how satisfaction surveys do not tell us the true story of our users’ experience.
Ingela won the “best paper” prize for this presentation at the gala dinner on Thursday night. Well-deserved!
The team challenge this year focused on advocacy. There were three categories:
Marketing Up (advocating to senior management)
Collaboration (advocating to colleagues in other areas)
Recruitment (advocating to student groups)
Attendees were in groups of about 8 and there were 5 groups per category. We had less than 2 hours on Thursday and an additional 45 minutes on Friday to prepare our 7-minute pitches to our respective audiences. I was in team M1, so Marketing Up to senior management. I’m going to reflect on this in my Conference Thoughts post, but there are a few notes below from the other teams’ presentations.
I got a bit lost at some of the UK-specific vocabulary and content of Lawrie’s keynote, but he made some really rather wonderful points:
Don’t compromise the vision you have before you share it. He talked about how we often anticipate responses to our ideas before we have a chance to share them, and that this can lead to internally deciding on compromises. His point was that if you make those compromises before you’ve articulated your vision to others, you’re more likely to compromise rather than sticking to your guns. Don’t compromise before it’s actually necessary.
Incremental changes, when you make enough of them, can be transformative. You don’t have to make a huge change in order to make a difference. This was nice to hear because it’s absolutely how I approach things, particularly on the library website.
Use your external network of people to tell your internal stakeholders things because often external experts are more likely to be listened to or believed. (Deirdre Costello had said pretty much the same thing in her presentation. It can be hard on the ego, but is very often true.)
“Leadership is often stealthy.” Yes, I would say that if/when I show leadership, it is pretty much always stealthy.
Finally, Lawrie talked about the importance of documenting your failures. It’s not enough to fail and learn from your failures, you have to document them so that other people learn from them too, otherwise the failure is likely to be repeated again and again.
Team Challenge Presentations
I didn’t take as many notes as I should have during the team presentations. The other teams in my group certainly raised a lot of good points, but the only one I made special note of was from Team M5:
There are benefits to students seeing our UX work, even when they aren’t directly involved. It demonstrates that we care. Students are often impressed that the library is talking to students or observing student behaviour – that we are seeking to understand them. This can go a long way to generating goodwill and have students believe that we are genuinely trying to help them.
My team (M1) ended up winning the “Marketing Upwards” challenge, which was rather nice although I don’t think any of us were keen to repeat our pitch to the whole conference! We thought the fire alarm might get us out of it, but no luck. (Donna Lanclos – one of our judges – later said that including the student voice and being very specific about what we wanted were definitely contributing factors in our win. This feels very “real world” to me and was nice feedback to hear.)
There were a couple of points from the winning Collaboration team (C4) that I took note of:
Your networks are made up of people who are your friends, and people who may owe you favours. Don’t be afraid to make use of that.
Even if a collaborative project fails, the collaboration itself can still be a success. Don’t give up on a collaborative relationship just because the outcome wasn’t what you’d hoped.
Again, my brain checked out a bit during team R2’s winning Recruitment pitch. (I was ravenous and lunch was about to begin.) There was definitely uproarious laughter for Bethany Sherwood’s embodiment of the student voice.
Andrew Asher: Process Interviews
I chose the interviews workshop with Andrew Asher because when I was transcribing interviews I did this year, I was cringing from time to time and knew I needed to beef up my interview skills. I was also keen to get some help with coding because huge chunks of those interviews are still sitting there, waiting to be analyzed. Some good bits:
You generally will spend 3-4 hours analyzing for each 1 hour interview
Different kinds of interviews: descriptive (“tell me about”), demonstration (“show me”), and elicitation (using prompts such as cognitive maps, photos)
Nice to start with a throwaway question to act as an icebreaker. (I know this and still usually forget to include it. Maybe now it will stick.)
We practiced doing interviews and reflected on that experience. I was an interviewee and felt bad that I’d chosen a situation that didn’t match the questions very well. It was interesting to feel like a participant who wanted to please the interviewer, and to reflect on what the interviewer could have said to lessen the feeling that I wasn’t being a good interviewee. (I really don’t know the answer to that one.)
We looked at an example of a coded interview and practiced coding ourselves. There wasn’t a lot of time for this part of the workshop, but it’s nice to have the example in-hand, and also to know that there is really no big trick to it. Like so much, it really just takes doing it and refining your own approach.
Andy Priestner: Cultural Probes
I had never heard of cultural probes before this, and Andy started with a description and history of their use. Essentially, cultural probes are kits of things like maps, postcards, cameras, and diaries that are given to groups of people to use to document their thoughts, feelings, behaviour, etc.
Andy used cultural probes earlier this year in Cambridge to explore the lives of postdocs. His team’s kit included things like a diary pre-loaded with handwritten questions for the participants to answer, task envelopes that they would open and complete at specific times, pieces of foam to write key words on, and other bits and pieces. They found that the participants were really engaged with the project and gave very full answers. (Perhaps too full; they’re a bit overwhelmed with the amount of data the project has given them.)
After this, we were asked to create a cultural probe within our table groups. Again, there wasn’t a lot of time for the exercise but all the groups managed to come up with something really interesting.
I loved this. In part it was just fun to create (postcards, stickers, foam!) but it was also interesting to try to think about what would make it fun for participants to participate. When I was doing cognitive maps and love letters/break-up letters with students last summer, one of them was really excited by how much fun it had been – so much better than filling out a survey. It’s easier to convince someone to participate in user research if they’re having a good time while doing it.
The next-to-last thing on the agenda was a panel discussion. We’d been asked to write down any questions we had for the panelists ahead of time and Ned Potter chose a few from the pile. A few notes:
In response to a question about how to stop collecting data (which is fun) and start analyzing it (which is hard), Matthew Reidsma recommended the book Just Enough Research by Erika Hall. Other suggestions were: finding an external deadline by committing to a conference presentation or writing an article or report, working with a colleague who will keep you to a deadline, or having a project that relies on analyzing data before the project can move forward
Responding to a question about any fears about the direction UX in libraries is taking, Donna spoke about the need to keep thinking long-term; not to simply use UX research for quick wins and problem-solving, but to really try to create some solid and in-depth understanding. I think it was Donna again who said that we can’t just keep striking out on our own with small projects; we must bring our champions along with us so that we can develop larger visions. Andrew and Donna are working on an article on this very theme for an upcoming issue of Weave.
I don’t remember what question prompted this, but Ange Fitzpatrick talked about how she and colleague were able to get more expansive responses from students when they didn’t identify themselves as librarians. However, as team M5 had already mentioned and I believe it was Donna who reiterated at this point: students like to know that the library wants to know about them and cares about knowing them.
Finally, to a question about how to choose the most useful method for a given project, there were two really good responses. Andrew said to figure out what information you need and what you need to do with that information, and then pick a method that will help you with those two things. He recommended the ERIAL toolkit (well, Donna recommended it really, but Andrew wrote the toolkit, so I’ll credit him). And Matthew responded that you don’t have to choose the most useful method, you just have to choose a useful method.
Andy Priestner: Conference Review
Andy ended the day with a nice wrap-up and call-out to the positive collaborations that had happened and would continue to happen in the UXLibs community. He also got much applause ending his review with “I am a European.”
Like last year, I left exhausted and exhilarated, anxious to put some of these new ideas into practice, and hoping to attend another UXLibs conference. Next year?
Good to remember when you embark on a project with someone, both of you full of good intentions that it will be completed soon, but a bit vague on who will do what work how: “Collaboration is not causation.”
Mary Meeker presented her 2016 Internet Trends report earlier this month. If you want a better understanding of how tech and the tech industry is evolving, you should watch her talk and read her slides.
This year’s talk was fairly time constrained, and she did not go into as much detail as she has in years past. That being said, there is still an enormous amount of value in the data she presents and the trends she identifies via that data.
Some interesting takeaways:
The growth in total number of internet users worldwide is slowing (the year-to-year growth rate is flat; overall growth is around 7% new years per year)
However, growth in India is still accelerating, and India is now the #2 global user market (behind China; USA is 3rd)
Similarly, there is a slowdown in the growth of the number of smartphone users and number of smartphones being shipped worldwide (still growing, but at a slower rate)
Android continues to demonstrate growth in marketshare; Android devices are continuing to be less costly by a significant margin than Apple devices.
Overall, there are opportunities for businesses that innovate / increase efficiency / lower prices / create jobs
Advertising continues to demonstrate strong growth; advertising efficacy still has a ways to go (internet advertising is effective and can be even more so)
Internet as distribution channel continues to grow in use and importance
Brand recognition is increasingly important
Visual communication channel usage is increasing – Generation Z relies more on communicating with images than with text
Messaging is becoming a core communication channel for business interactions in addition to social interactions
Voice on mobile rapidly rising as important user interface – lots of activity around this
Data as platform – important!
So, what kind of take-aways might be most useful to consider in the library context? Some top-of-head thoughts:
In the larger context of the Internet, Libraries need to be more aggressive in marketing their brand and brand value. We are, by nature, fairly passive, especially compared to our commercial competition, and a failure to better leverage the opportunity for brand exposure leaves the door open to commercial competitors.
Integration of library services and content through messaging channels will become more important, especially with younger users. (Integration may actually be too weak a term; understanding how to use messaging inherently within the digital lifestyles of our users is critical)
Voice – are any libraries doing anything with voice? Integration with Amazon’s Alexa voice search? How do we fit into the voice as platform paradigm?
One parting thought, that I’ll try to tease out in a follow-up post: Libraries need to look very seriously at the importance of personalized, customized curation of collections for users, something that might actually be antithetical to the way we currently approach collection development. Think Apple Music, but for books, articles, and other content provided by libraries. It feels like we are doing this in slices and pieces, but that we have not yet established a unifying platform that integrates with the larger Internet ecosystem.
The MPG/SFX link resolver is now alternatively accessible via the https protocol. The secure base URL of the productive MPG/SFX instance is: https://sfx.mpg.de/sfx_local.
HTTPS support enables secure third-party sites to load or to embed content from MPG/SFX without causing mixed content errors. Please feel free to update your applications or your links to the MPG/SFX server.
For quite some time now, Lucidworks has been hosting a community site named Search Hub (aka LucidFind) that consists of a searchable archive of a number of Apache Software Foundation mailing lists, source code repositories and wiki pages, as well as related content that we’ve deemed beneficial. Previously, we’ve had three goals in building and maintaining such a site:
Provide the community a focused resource for finding answers to questions on our favorite projects like Apache Solr and Lucene
Dogfood our product
Associate the Lucidworks brand with the projects we support
As we’ve grown and evolved, the site has done a good job on #1 and #3. However, we have fallen a bit behind on goal number two, since the site, 22 months after the launch of Lucidworks Fusion, was still running on our legacy product, Lucidworks Search. While it’s easy to fall back on the “if it ain’t broke, don’t fix it” mentality (the site has had almost no down time all these years, even while running on very basic hardware and with a very basic setup and serving decent, albeit not huge, query volume), it has always bothered me that we haven’t put more effort into porting Search Hub to run on Fusion. This post intends to remedy that situation, while also significantly expanding our set of goals and the number of projects we cover. Those goals are, including the original ones from above:
Show others how it’s done by open sourcing the code base under an Apache license. (Note: you will need Lucidworks Fusion to run it.)
While we aren’t done yet, we are far enough along that I am happy to announce we are making Search Hub 2.0 available as a public beta. If you want to cut to the chase and try it out, follow the links I just provided, if you want all the gory details on how it all works, keep reading.
Rebooting Search Hub
When Jake Mannix joined Lucidworks back in January, we knew we wanted to significantly expand the machine learning and recommendation story here at Lucidworks, but we kept coming back to the fundamental problem that plagues all such approaches: where to get real data and real user feedback. Sure, we work with customers all the time on these types of problems, but that only goes so far in enabling our team to control it’s own destiny. After all, we can’t run experiments on the customer’s website (at least not in any reasonable time frame for our goals), nor can we always get the data that we want due to compliance and security reasons. As we looked around, we kept coming back to, and finally settled on, rebooting Search Hub to run on Fusion, but this time with the goals outlined above to strive for.
We also have been working with the academic IR research community on ways to share our user data, while hoping to avoid another AOL query log fiasco. It is too early too announce anything on that front just yet, but I am quite excited about what we have in store and hope we can do our part at Lucidworks to help close the “data gap” in academic research by providing the community with a significantly large corpus with real user interaction data. If you are an academic researcher interested in helping out and are up on differential privacy and other data sharing techniques, please contact me via the Lucidworks Contact Us form and mention this blog post and my name. Otherwise, stay tuned.
In the remainder of this post, I’ll cover what’s in Search Hub, highlight how it leverages key Fusion features and finish up with where we are headed next.
The Search Hub beta currently consists of:
26 ASF projects (e.g. Lucene, Solr, Hadoop, Mahout) and all public Lucidworks content, including our website, knowledge base and documentation, with more content added automatically via scheduled crawls.
90+ datasources (soon to be 120+) spanning email, Github, Websites and Wikis, each with a corresponding schedule defining its update rate.
A custom, scheduled Spark job that runs periodically to coalesce mail threads and help reduce the effect of email thread hijacking effects. (There is always more work to be done here.)
If you wish to run Search Hub, see the README, as I am not going to cover that in this blog post.
Next Generation Relevance
While other search engines are touting their recent adoption of search ranking functions (BM25) that have been around for 20+ years, Fusion is focused on bringing next generation relevance to the forefront. Don’t get me wrong, BM25 is a good core ranking algorithm and it should be the default in Lucene, but if that’s your answer to better relevance in the age of Google, Amazon and Facebook, then good luck to you. (As an aside, I once sat next to Illya Segalovich from Yandex at a SIGIR workshop where he claimed that at Yandex, BM25 only got relevance about ~52% of the way to the answer. Others in the room disputed this saying their experience was more like ~60-70%. In either case, its got a ways to go.)
If BM25 (and other core similarity approaches) only get you 70% (at best) of the way, where does the rest come from? We like to define Next Generation Relevance as being founded on three key ideas (which Internet search vendors have been deploying for many years now), which I like to call the “3 C’s”:
Content — This is where BM25 comes in, as well as things like how you index your content, what fields you search, editorial rules, language analysis and update frequency. In other words, the stuff Lucene and Solr have been doing for a long time now. If you were building a house, this would be the basement and first floor.
Collaboration — What signals can you capture about how users and other systems interact with your content? Clicks are the beginning, not the end of this story. Extending the house analogy, this is the second floor.
Context — Who are you? Where are you? What are you doing right now? What have you done in the past? What roles do you have in the system? A user in Minnesota searching for “shovel” in December is almost always looking for something different than a user in California in July with the same query. Again, with the house analogy: this is the attic and roof.
In Search Hub, we’ve done a lot of work on the content already, but it’s the latter two we are most keen to showcase in the coming months, as they highlight how we can create a virtuous cycle between our users and our data by leveraging user feedback and machine learning to learn relevance. To achieve that goal, we’ve hooked in a number of signal capture mechanisms into our UI, all of which you can see in the code. (See snowplow.js, SnowplowService.js and their usages in places like here.)
These captured signals include:
Time on page (approximated by the page ping heartbeat in Snowplow).
Queries executed, including the capture of all documents and facets displayed.
What documents were clicked on, including unique query id, doc id, position in the SERP, facets chosen, and score.
Typeahead click information, including what characters were typed, the suggestions offered and which suggestion
With each of these signals, Snowplow sends a myriad of information, including things like User IDs, Session IDs, browser details and timing data. All of these signals are captured in Fusion. Over the coming weeks and months, as we gather enough signal data, we will be rolling out a number of new features highlighting how to use this data for better relevance, as well as other capabilities like recommendations.
Getting Started with Spark on Search Hub
The core of Fusion consists of two key open source technologies: Apache Solr and Apache Spark. If you know Lucidworks, then you already likely know Solr. Spark, however, is something that we’ve added to our stack in Fusion 2.0 and it opens up a host of possibilities that were previously something our customers had to do outside of Fusion, in what was almost always a significantly more complex application. At it’s core, Spark is a scalable, distributed compute engine. It ships with machine learning and graph analytics libraries out of the box. We’ve been using Spark for a number of releases now to do background, large scale processing of things like logs and system metrics. As of Fusion 2.3, we have been exposing Spark (and Spark-shell) to our users. This means that Fusion users can now write and submit their own Spark jobs as well as explore our Spark Solr integration on the command line simply by typing $FUSION_HOME/bin/spark-shell. This includes the ability to take advantage of all Lucene analyzers in Spark, which Steve Rowe covered in this blog post.
All of these demos are showcased in the SparkShellHelpers.scala file. As the name implies, this file contains commands that can be cut and pasted into the Fusion spark shell (bin/spark-shell). I’m going to save the details of running this to a future post, as there are some very interesting data engineering discussions that fall out of working with this data set in this manner.
Our long term intent as we move out of beta is to support all Apache projects. Currently, the project specifications are located in the project_config folder. If you would like your project supported, please issue a Pull Request and we will take a look and try to schedule it. If you would like to see some other feature supported, we are open to suggestions. Please open an issue or a pull request and we will consider it.
If you’re project is already supported and you would like to add support for it similar to what is on Lucene’s home page, add a search box that submits to http://searchhub.lucidworks.com/?p:PROJECT_NAME, passing in
your project name (not label) for PROJECT_NAME, as specified in the project_config. For example, for Hadoop, it would be http://searchhub.lucidworks.com/?p:hadoop.
In the coming months, we will be rolling out:
Word2Vec for query and index time synonym expansion. See the Github Issue for the details.
Classification of content to indicate what mailing list we think the message belongs to, as opposed to what mailing list it was actually sent to. Think of it as a “Did you mean to send this to this list?” classifier.
The initial phases of Islandora CLAW development worked with Drupal 7 as a front-end, but Islandora CLAW has been architected with a pivot to Drupal 8 in mind from its very inception. Drupal 8 has been officially released and development has begun on Drupal 9. Drupal policy will see Drupal 7 become unsupported when Drupal 9 is released, putting it in the same end-of-life territory as Fedora 3. As of this month, Islandora CLAW development has pivoted fully to Drupal 8, ensuring that when the Islandora Community is ready to make the move, there will be a version of Islandora that functions with the latest and best-supported versions of both our front-end and repository layers by pairing Drupal 8 with Fedora 4. This pivot was approved by the Islandora Roadmap Committee, based on a Drupal 8 Prospectus put forth by the CLAW development team.
Austin, TX Are you in the process of researching different types of repository platforms? Or have you lately been trying to understand hosted cloud service options? DuraSpace just made it easier to compare apples to apples and oranges to oranges when it comes to sorting out which DuraSpace-supported repository or cloud service is right for you with two comparison tables. These tables are designed to help you match your use case to a repository or hosted cloud service that meets your needs.
I have often thought that I have been fortunate to meet a lot of great people during my time in library school and since then in the working world. While I have thanked many of them in writing and in person, I wanted to reflect on how the combination of people and their support has … Continue reading A Letter of Thanks
You may have read about our recent downtime. We thought it might be a good opportunity to let you know about some of the other behind the scenes things going on here. We continue to answer email, keep the FAQ updated and improve our metadata. Many of you have written about the quality of some of our EPUBs. As you may know, all of our OCR (optical character recognition) is done automatically without manual corrections and while it’s pretty good, it could be better. Specifically we had a pernicious bug where some books’ formatting led to the first page of chapters not being part of some books’ OCRed EPUB. I personally had this happen to me with a series of books I was reading on Open Library and I know it’s beyond frustrating.
To address this and other scanning quality issues, we’re changing the way EPUBs work. We’ve improved our OCR algorithm and we’re shifting from stored EPUB files to on-the-fly generation. This means that further developments and improvements in our OCR capabilities will be available immediately. This is good news and has the side benefit of radically decreasing our EPUB storage needs. It also means that we have to
remove all of our old EPUBs (approximately eight million items for EPUBs generated by the Archive)
put the new on-the-fly EPUB generation in place (now active)
do some testing to make sure it’s working as expected (in process)
We hope that this addresses some of the EPUB errors people have been finding. Please continue to give us feedback on how this is working for you. Coming soon: improvements to Open Library’s search features!
I sometimes need to this, and always forget how. I want to see the currently loaded version of a current gem, and see if it’s greater than a certain version X.
Mainly because I’ve monkey-patched that gem, and want to either automatically stop monkey patching it if a future version is installed, or more likely output a warning message “Hey, you probably don’t need to monkey patch this anymore.”
I usually forget the right rubygems API, so I’m leaving this partially as a note to myself.
Here’s how you do it.
# If some_gem_name is at 2.0 or higher, warn that this patch may
# not be needed. Here's a URL to the PR we're back-porting: <URL>
if Gem.loaded_specs["some_gem_name"].version >= Gem::Version.new('2.0')
msg = "
Please check and make sure this patch is still needed\
Whenever I do this, I always include the URL to the github PR that implements the fix we’re monkey-patch back-porting, in a comment right by here.
The `$stderr.puts` is there to make sure the warning shows up in the console when running tests.
In the previous two parts, I explained that much of the knowledge context that could and should be provided by the library catalog has been lost as we moved from cards to databases as the technologies for the catalog. In this part, I want to talk about the effect of keyword searching on catalog context.
KWIC and KWOC
If you weren't at least a teenager in the 1960's you probably missed the era of KWIC and KWOC (neither a children's TV show nor a folk music duo). These meant, respectively, KeyWords In Context, and KeyWords Out of Context. These were concordance-like indexes to texts, but the first done using computers. A KWOC index would be simply a list of words and pointers (such as page numbers, since hyperlinks didn't exist yet). A KWIC index showed the keywords with a few words on either side, or rotated a phrase such that each term appeared once at the beginning of the string, and then were ordered alphabetically.
If you have the phrase "KWIC is an acronym for Key Word in Context", then your KWIC index display could look like:
KWIC is an acronym for Key Word In Context Key Word In Context acronym for Key Word In Context KWIC is an acronym for acronym for Key Word In Context
To us today these are unattractive and not very useful, but to the first users of computers these were an exciting introduction to the possibility that one could search by any word in a text.
It wasn't until the 1980's, however, that keyword searching could be applied to library catalogs.
Before Keywords, Headings
Before keyword searching, when users were navigating a linear, alphabetical index, they were faced with the very difficult task of deciding where to begin their entry into the catalog. Imagine someone looking for information on Lake Erie. That seems simple enough, but entering the catalog at L-A-K-E E-R-I-E would not actually yield all of the entries that might be relevant. Here are some headings with LAKE ERIE:
Boats and boating--Erie, Lake--Maps. Books and reading--Lake Erie region. Lake Erie, Battle of, 1813. Erie, Lake--Navigation
Note that the lake is entered under Erie, the battle under Lake, and some instances are fairly far down in the heading string. All of these headings follow rules that ensure a kind of consistency, but because users do not know those rules, the consistency here may not be visible. In any case, the difficulty for users was knowing with what terms to begin the search, which was done on left-anchored headings.
One might assume that finding names of people would be simple, but that is not the case either. Names can be quite complex with multiple parts that are treated differently based on a number of factors having to do with usage in different cultures:
De la Cruz, Melissa Cervantes Saavedra, Miguel de
Because it was hard to know where to begin a search, see and see also references existed to guide the user from one form of a name or phrase to another. However, it would inflate a catalog beyond utility to include every possible entry point that a person might choose, not to mention that this would make the cataloger's job onerous. Other than the help of a good reference librarian, searching in the card catalog was a kind of hit or miss affair.
When we brought up the University of California online catalog in 1982, you can image how happy users were to learn that they could type in LAKE ERIE and retrieve every record with those terms in it regardless of the order of the terms or where in the heading they appeared. Searching was, or seemed, much simpler. Because it feels simpler, we all have tended to ignore some of the down side of keyword searching. First, words are just strings, and in a search strings have to match (with some possible adjustment like combining singular and plural terms). So a search on "FRANCE" for all information about France would fail to retrieve other versions of that word unless the catalog did some expansion:
Cooking, French France--Antiquities Alps, French (France) French--America--History French American literature
The next problem is that retrieval with keywords, and especially the "keyword anywhere" search which is the most popular today, entirely misses any context that the library catalog could provide. A simple keyword search on the word "darwin" brings up a wide array of subjects, authors, and titles.
Darwin, Charles, 1809-1882 – Influence Darwin, Charles, 1809-1882 — Juvenile Literature Darwin, Charles, 1809-1882 — Comic Books, Strips, Etc Darwin Family Java (Computer program language) Rivers--Great Britain Mystery Fiction DNA Viruses — Fiction Women Molecular Biologists — Fiction
Darwin, Charles, 1809-1882 Darwin, Emma Wedgwood, 1808-1896 Darwin, Ian F. Darwin, Andrew Teilhet, Darwin L. Bear, Greg Byrne, Eugene
Darwin Darwin; A Graphic Biography : the Really Exciting and Dramatic
Story of A Man Who Mostly Stayed at Home and Wrote Some Books Darwin; Business Evolving in the Information Age Emma Darwin, A Century of Family Letters, 1792-1896 Java Cookbook Canals and Rivers of Britain The Crimson Hair Murders Darwin's Radio
It wouldn't be reasonable for us to expect a user to make sense of this, because quite honestly it does not make sense.
In the first version of the UC catalog, we required users to select a search heading type, such as AU, TI, SU. That may have lessened the "false drops" from keyword searches, but it did not eliminate them. In this example, using a title or subject search the user still would have retrieved items with the subjects DNA Viruses — Fiction, and Women Molecular Biologists — Fiction, and an author search would have brought up both Java Cookbook and Canals and Rivers of Britain.One could see an opportunity for serendipity here, but it's not clear that it would balance out the confusion and frustration. You may be right now thinking "But Google uses keyword searching and the results are good." Note that Google now relies heavily on Wikipedia and other online reference books to provide relevant results. Wikipedia is a knowledge organization system, organized by people, and it often has a default answer for search that is more likely to match the user's assumptions. A search on the single word "darwin" brings up:
In fact, Google has always relied on humans to organize the web by following the hyperlinks that they create. Although the initial mechanism of the search is a keyword search, Google's forte is in massaging the raw keyword result to bring potentially relevant pages to the top.
The move from headings to databases to un-typed keyword searching has all but eliminated the visibility and utility of headings in the catalog. The single search box has become the norm for library catalogs and many users have never experienced the catalog as an organized system of headings. Default displays are short and show only a few essential fields, mainly author, title and date. This means that there may even be users who are unaware that there is a system of headings in the catalog.
Recent work in cataloging, from ISBD to FRBR to RDA and BIBFRAME focus on modifications to the bibliographic record, but do nothing to model the catalog as a whole. With these efforts, the organized knowledge system that was the catalog is slipping further into the background. And yet, we have no concerted effort taking place to remedy this.
What is most astonishing to me, though, is that catalogers continue to create headings, painstakingly, sincerely, in spite of the fact that they are not used as intended in library systems, and have not been used in that way since the first library systems were developed over 30 years ago. The headings are fodder for the keyword search, but no more so than a simple set of tags would be. The headings never perform the organizing function for which they were intended.
Part IV will look at some attempts to create knowledge context from current catalog data, and will present some questions that need to be answered if we are to address the quality of the catalog as a knowledge system.
Suppose you were a publisher and you wanted to get the goods on a pirate who was downloading your subscription content and giving it away for free. One thing you could try is to trick the pirate into downloading fake content loaded with spy software and encoded information about how the downloading was being done. Then you could verify when the fake content was turning up on the pirate's website.
This is not an original idea. Back in the glory days of Napster, record companies would try to fill the site with bad files which somehow became infested with malware. Peer-to-peer networks evolved trust mechanisms to foil bad-file strategies.
I had hoped that the emergence of Sci-Hub as an efficient, though unlawful, distributor of scientific articles would not provoke scientific publishers to do things that could tarnish the reputation of their journals. I had hoped that publishers would get their acts together and implement secure websites so that they could be sure that articles were getting to their real subscribers. Silly me.
In series of tweets, Rik Smith-Unna noted with dismay that the Wiley Online Library was using "fake DOIs" as "trap URLs", URLs in links invisible to human users. A poorly written web spider or crawler would try to follow the link, triggering a revocation of the user's access privileges. (For not obeying the website's terms of service, I suppose.)
Don't know why I'm being coy - it's Wiley. Thousands of fake DOIs buried around their site. RE https://t.co/YORhwzOZPE
Gabriel J. Gardner of Cal State Long Beach has reported his library's receipt of a scary email from Wiley stating:
Wiley has been investigating activity that uses compromised user credentials from institutions to access proxy servers like EZProxy (or, in some cases, other types of proxy) to then access IP-authenticated content from the Wiley Online Library (and other material). We have identified a compromised proxy at your institution as evidenced by the log file below.
We will need to restrict your institution’s proxy access to Wiley Online Library if we do not receive confirmation that this has been remedied within the next 24 hours.
I've been seeing these trap urls in scholarly journals for almost 20 years now. Two years ago they reappeared in ACS journals. They're rarely well thought out, and from talking with publishers who have tried them, they don't work as intended. The Wiley trap URLs exhibit several mistakes in implementation.
Spider trap URLs are are useful for detecting bots that ignore robot exclusions. But Wiley's robots.txt document doesn't exclude the trap urls, so "well-behaved" spiders, such as googlebot are also caught. As a result, the fake Wiley page is indexed in Google, and because of the way Google aggregates the weight of links, it's actually a rather highly ranked page
The download urls for the fake article don't download anything, but instead return a 500 error code whenever an invalid pseudo-DOI is presented to the site. This is a site misconfiguration that can cause problems with link checking or link verification software.
Because the fake URLs look like Wiley DOI's, they could cause confusion if circulated. Crossref discourages this.
The trap URLs as implemented by Wiley can be used for malicious attacks. With a list of trap URLs, it's trivial to craft an email or a web page that causes the user to request the full list of trap URLs. When the trap URLs trigger service suspensions this gives you the ability to trigger a suspension by sending the target an email.
It's just not a smart idea (Even on April Fools!) for a reputable publisher to create fake article pages for "Constructive Metaphysics in Theories of Continental Drift. (warning: until Wiley realizes their ineptness, this link may trigger unexpected behavior. Use Tor.) It's an insult to both geophysicists and philosophers. And how does the University of Bradford feel about hosting a fictitious Department of Geophysics???
If you visit the Wiley fake article page now, you won't get an article. You get a full dose of monitoring software. Wiley uses a service called Qualtrics Site Intercept to send you "Creatives" if you meet targeting criteria. But you'll also get that if you access Wiley's Online Library's real articles, along with sophisticated trackers from Krux Digital, Grapeshot, Jivox, Omniture, Tradedesk, Videology and Neustar.
Here's the letter I'd like libraries to start sending publishers:
[Library] has been investigating activity that causes spyware from advertising networks to compromise the privacy of IP-authenticated users of the [Publisher] Online Library, a service for we have been billed [$XXX,XXX]. We have identified numerous third party tracking beacons and monitoring scripts infesting your service as evidenced by the log file below.
We will need to restrict [Publisher]'s access to our payment processes if we do not receive confirmation that this has been remedied within the next 24 hours.
Here's another example of Wiley cutting off access because of fake URL clicking. The implication that Wiley has stopped using trap URLs seems to be false.
Some people have suggested that the "fake DOIs" are damaging the DOI system. Don't worry, they're not real DOI's and have not been registered. The DOI system is robust against this sort of thing; it's still disrespectful.
Historypin wins Knight News Challenge award to gather, preserve, and measure the impact of public library-led history, storytelling, and local cultural heritage in rural US communities in partnership with Digital Public Library of America
BOSTON & SAN FRANCISCO —Historypin announced today that they have been awarded $222,000 from the John S. and James L. Knight Foundation as part of its Knight News Challenge on Libraries, an open call for ideas to help libraries serve 21st century information needs. Selected from more than 615 submissions, Historypin’s “Our Story” project, a partnership with the Digital Public Library of America (DPLA), will collaborate with more than a dozen rural libraries in New Mexico, North Carolina and Louisiana to host lively events to gather and preserve community memory, and to measure the impact of these events on local communities.
“Local historical collections are some of the most viewed content in DPLA, and express the deep interest in our shared community history,” according to Emily Gore, Director for Content at DPLA. “Making cultural heritage collections from rural communities accessible to the world is extremely important to us, and this project will help us further share this rich history and the diverse stories to be found.”
“This award gives us the ability to work with small libraries to provide a toolkit–a physical box with posters, materials and guidance–to make it easy for librarians and volunteers to engage their community in memory sharing events,” said Jon Voss, Strategic Partnerships Director for Historypin. “We know through research that getting people across generations and cultures to sit together and share experiences strengthens communities, and this project will help local libraries to better measure their social impact.”
Led by national partners Historypin and DPLA, together with state and local library networks, Our Story aims to expand the national network and projects of thousands of cultural heritage collaborations that both DPLA and Historypin have established and increase the capabilities of small, rural libraries. Participating libraries in Our Story will be supplied with kits and training to guide them through a number of steps, including recruiting staff and volunteers for the project, planning for digitization and preservation, running community events and collecting stories, and measuring engagement and impact, among other important steps. The library kits and training will be based on four key areas — training, programming, preservation, and evaluation — and will pull in methodology and curriculum developed by both DPLA and Historypin in their work with cultural heritage partners throughout the US and around the world.
“The project will help promote civic engagement, while providing libraries with meaningful data, so they can better understand their impact on communities and meet new information needs,” said John Bracken Knight Foundation vice president for media innovation.
The Knight News Challenge, an open call for ideas launched in late February 2014, asked applicants to answer the question, “How might libraries serve 21st century information needs?” Our Story aims to advance the library field in three key areas: measuring the social impact of public libraries, strengthening a national network of digital preservation and content discovery, and demonstrating the potential of open library data. The outputs of the project will be published and openly licensed for reuse in other rural libraries worldwide.
The Digital Public Library of America (https://dp.la) strives to contain the full breadth of human expression, from the written word, to works of art and culture, to records of America’s heritage, to the efforts and data of science. Since launching in April 2013, it has aggregated over 13 million items from 2,000 institutions. The DPLA is a registered 501(c)(3) non-profit.
Historypin.org is a global non-profit project that builds community through local history. Over 3,000 cultural heritage organizations and 75,000 individuals have used the site to discover, share and enrich community memory since 2010.
About Knight Foundation
Knight Foundation supports transformational ideas that promote quality journalism, advance media innovation, engage communities and foster the arts. The foundation believes that democracy thrives when people and communities are informed and engaged. For more, visit KnightFoundation.org.
Oxford, England Neil Jeffries, Oxford University, and Tom Cramer, Stanford University, from the Fedora team will hold an informal gathering joined by other Fedora community members, at The King's Arms Pub (just next to Wadham College and the Weston Library in Oxford, England) on July 5 from 5-7 PM, prior to the Jisc and CNI Conference welcome reception.
Glasgow, Scotland A new edition of the Digital Preservation Handbook was officially launched at the Guildhall in York yesterday, comprehensively updating the original version first published in 2001: http://handbook.dpconline.org/
The Library is coordinating an effort to introduce ORCID identifiers to the campus. ORCID (orcid.org) is an open, non-profit initiative founded by academic institutions, professional bodies, funding agencies, and publishers to resolve authorship confusion in scholarly work. The ORCID repository of unique scholar identification numbers aims to reliably identify and link scholars in all disciplines with their work, analogous to the way ISBN and DOI identify books and articles.
Brown is an institutional member of ORCID, which allows the University to create ORCID records on behalf of faculty and to integrate ORCID identifiers into the Brown Identity Management System, Researchers@Brown profiles, grant application processes, and other systems that facilitate identification of faculty and their works.
Please go to https://library.brown.edu/orcid to obtain an ORCID identifier OR, if you already have an ORCID, to link it to your Brown identity.
Please contact firstname.lastname@example.org if you have questions or feedback.
We’re packing up and preparing to head to Orlando for ALA Annual this week! Equinox will be in Booth 1175. Throughout the conference, you’ll find Mike, Grace, Mary, Galen, Shae, and Dale in the booth ready to answer your questions. We’d love for you to come visit and do a little crafting with us. Crafting? Yes–CRAFTING. We’ll have some supplies ready for you to make a little DIY swag. Quantities are limited, so make sure to see us early.
As usual, the Equinox team will be available in the booth to discuss Evergreen, Koha, and FulfILLment. We’ll also be attending a few programs and, of course, the Evergreen Meet-Up. Directly following the Evergreen Meet-Up, Equinox is hosting a Happy Hour for the Evergreen aficionados in attendance. Come chat with us at the Equinox booth to get more information!
The Equinox team is so proud of the proactive approach ALA has taken toward the senseless tragedy in Orlando recently. We will be participating in some of the relief events. We will be attending the Pulse Victim’s Memorial on Saturday to pay our respects and you’ll also find some of the team donating blood throughout the weekend.
We’re looking forward to the conference but most of all, we’re looking forward to seeing YOU. Stop by and say hello at Booth 1175!
Participants in the Eckerd Digitization Advisory meeting include (l-r) Nancy Schuler, Lisa Johnston, Alexis Ramsey-Tobienne, Alyssa Koclanes, Mary Molinaro (Digital Preservation Network) George Coulbourne (Library of Congress), David Gliem, Arthur Skinner, Justine Sanford, Emily Ayers-Rideout, Nicole Finzer (Northwestern University), Kristen Regina (Philadelphia Museum of Art), Anna Ruth, and Brittney Sherley.
This is a guest post by Eckerd College faculty David Gliem, associate professor of Art History, and Nancy Schuler, librarian and assistant professor of Electronic Resources, Collection Development and Instructional Services.
On June 3rd, a meeting at Eckerd College in St. Petersburg, Florida, brought key experts and College departments together to begin plans for the digitization of the College’s art collection. George Coulbourne of the Library of Congress assembled a team of advisers that included DPOE trainers and NDSR program providers from the Library of Congress, Northwestern University, the Digital Preservation Network, the Philadelphia Museum of Art and Yale University.
Advisers provided guidance on project elements including institutional repositories, collection design, metadata and cataloging standards, funding and partnership opportunities and digitization strategies. Suggestions will be used to design a digitization and preservation strategy that could be used as a model for small academic institutions.
Eckerd College is an undergraduate liberal arts institution known for its small classes and values-oriented curriculum that stresses personal and social responsibility, cross-cultural understanding and respect for diversity in a global society. As a tuition-dependent institution of 1,770 students, Eckerd is seeking ways to design the project to be cost-effective, while also ensuring long-term sustainability.
The initial goal of the project is to digitize the College’s large collection of more than 3000 prints, paintings, drawings and sculptures made by the founding faculty in the visual arts: Robert O. Hodgell (1922-2000), Jim Crane (1927-2015) and Margaret (Pegg) Rigg (1928-2011). Along with Crane (cartoonist, painter and collage artist) and Rigg (art editor of motive (always spelled with a lowercase “m”) magazine, as well as graphic designer, assemblage artist and calligrapher), Hodgell (printmaker, painter, sculptor, and illustrator) contributed regularly to motive, a progressive monthly magazine published by the Methodist Student Movement.
In print from 1941 to 1972, motive was hailed for its vanguard editorial and artistic vision and for its aggressive stance on civil rights, Vietnam, and gender issues. In 1965 the publication was runner-up to Life for Magazine of the Year and in 1966, Time magazine quipped that among church publications it stood out “like a miniskirt at a church social.” An entire generation of activists was shaped by its vision with Hodgell, Crane and Rigg playing an important role in forming and communicating that vision.
Eckerd’s position as a liberal arts college influenced by the tenets of the Presbyterian Church made it possible for these artists to converge and produce art that reflected society and promoted the emergence of activism that shaped the identity of the Methodist church at the time. Preserving these materials and making them available for broader scholarship will provide significant insight into the factors surrounding the development of the Methodist Church as it is today. Implementing the infrastructure to preserve, digitize and house the collection provides additional opportunities to add other College collections to the repository in the future.
The gathering also brought together relevant departments within Eckerd College, including representatives from the Library, Visual Arts and Rhetoric faculty, Information Technology Services, Marketing & Communications, Advancement and the Dean of Faculty. Having these key players in the room provided an opportunity to involve the broader campus community so efforts can begin to ensure the long-term sustainability of the project, while also highlighting key challenges unique to the College as seen by the external board of advisors.
Eckerd will now move forward to seek funding for the project, with hopes to integrate DPOE’s Train-the-trainer and an NDSR program to jump start and sustain the project through implementation. Potential partnerships and training opportunities with area institutions and local groups will be explored, as well as teaching opportunities to educate students about the importance of digital stewardship.
Go ahead and take a look. Study it. I’ll be here when you get back.
Do you see what he did? He took raw data and made it communicate visually. Let me re-iterate this, as this lesson is too often lost in present day “infographics”. You receive information immediately, without reading it. The minute you understand that the width of the line equals the relative number of troops, you are stunned. The depth of the tragedy has been communicated — if not fully, at least by impression.
The Menard infographic also combines several different planes of information, from troop strength, to temperature, to distance. It is, frankly, brilliant. I’m not suggesting that every library infographic needs to be brilliant, but nearly all of them can be smarter than they are. Either that, or give up the attempt. Seriously.
It’s sad, but many contemporary infographics are hardly anything more than numbers and clip art — often with only a tenuous connection between them. We really must do better.
Minard’s early infographic ably demonstrates the best qualities of an infographic presentation:
Information is conveyed at a glance. If you must read a lot of text to get the drift of the message, then you are failing.
The whole is greater than the sum of its parts. Menard deftly uses all of the dimensions of a piece of paper to convey distance, temperature, and troop strength all in one graphic. The combination puts across a message that any single element could not.
There are layers of information that are well integrated in the whole. An initial impression can be conveyed, but your graphic should also reveal more information under scrutiny.
Unfortunately, library infographics rarely, if ever, even loosely achieve these aims. Humor me, and do a Google Images search on “library infographics” and see what you get. Mostly they are simply numbers that are “illustrated” by some icon or image. They really aren’t infographics of the variety that Tufte champions. They are, unfortunately, mostly pale shadows of what is possible.
So let’s review some of the signs of a bad infographic:
Numbers are the most prominent thing you see. If you look at an infographic and it’s only numbers that leap out at you, stop wasting your time. Move on.
The numbers are not related at all. Many library infographics combine numbers that have no relation to each other. Who wants to puzzle out the significance of the number “30” next to the number “300,000”? Not me, nor anyone else.
The images are only loosely connected to the numbers. Stop putting an icon of a book next to the number of book checkouts. Just stop.
In the end, it’s clear that libraries really need professional help. Don’t think that you can simply take numbers, add an icon, and create a meaningful infographic. You can’t. It’s stupid. Just stop. If we can’t do this right, then we shouldn’t be doing it at all.
BOSTON/SALT LAKE CITY— In concert with the American Library Association national conference in Orlando, Florida, this week, the Digital Public Library of America (DPLA) and FamilySearch International, the largest genealogy organization in the world, have signed an agreement that will expand
access to FamilySearch.org’s growing free digital historical book collection to DPLA’s broad audience of users including genealogists, researchers, family historians, students, and more.
Family history/genealogy continues to be a popular and growing hobby. And FamilySearch is a leader in the use of technology to digitally preserve the world’s historic records and books of genealogical relevance for easy search and access online. With this new partnership, DPLA will incorporate metadata from FamilySearch.org’s online digital book collection that will make more than 200,000 family history books discoverable through DPLA’s search portal later this year. From DPLA, users will be able to access the free, fully viewable digital books on FamilySearch.org.
The digitized historical book collection at FamilySearch.org includes genealogy and family history publications from the archives of some of the most important family history libraries in the world. The collection includes family histories, county and local histories, genealogy magazines and how-to books, gazetteers, and medieval histories and pedigrees. Tens of thousands of new publications are added yearly.
“We’re excited to see information about FamilySearch’s vast holdings more broadly circulated to those trained to collect, catalog, and distribute useful information. Joint initiatives like this with DPLA help us to further expand access to the rich historic records hidden in libraries and archives worldwide to more curious online patrons,” said David Rencher, FamilySearch’s Chief Genealogy Officer.
Dan Cohen, Executive Director of DPLA, sees the addition of FamilySearch’s digital book collection as part of DPLA’s ongoing mission to be an essential site for family history researchers: “At DPLA, we aspire to collect and share cultural heritage materials that represent individuals, families, and communities from all walks of life across the country, past and present. The FamilySearch collection and our continued engagement with genealogists and family researchers is critical to help bring the stories represented in these treasured resources to life in powerful and exciting ways.”
FamilySearch is a global nonprofit organization dedicated to the discovery and preservation of personal and family histories and stories, introducing individuals to their ancestors through the widespread access to records, and collaborating with others who share this vision. Within DPLA, FamilySearch’s book collection will be discoverable alongside over 13 million cultural heritage materials contributed by DPLA’s growing network of over 2,000 libraries, archives, and museums across the country, opening up all new possibilities for discovery for users and researchers worldwide.
Find more about FamilySearch or search its resources online at FamilySearch.org. Learn more about Digital Public Library of America at https://dp.la.
FamilySearch is the largest genealogy organization in the world. FamilySearch is a nonprofit, volunteer-driven organization sponsored by The Church of Jesus Christ of Latter-day Saints. Millions of people use FamilySearch records, resources, and services to learn more about their family history. To help in this great pursuit, FamilySearch and its predecessors have been actively gathering, preserving, and sharing genealogical records worldwide for over 100 years. Patrons may access FamilySearch services and resources for free at FamilySearch.org or through more than 4,921 family history centers in 129 countries, including the main Family History Library in Salt Lake City, Utah.
About the Digital Public Library of America
The Digital Public Library of America strives to contain the full breadth of human expression, from the written word, to works of art and culture, to records of America’s heritage, to the efforts and data of science. Since launching in April 2013, it has aggregated more than 13 million items from 2,000 institutions. The DPLA is a registered 501(c)(3) non-profit.
We are delighted to announce that Lafayette College has become Hydra’s 100000th formal Partner. But only if you count in binary – otherwise, they’re our 32nd!
Although Lafayette is a “small liberal arts school” (their words), they have been involved with digital repository development for almost a decade and with Fedora since 2013. They write: “We have now spent over two years working with Hydra, getting to know the community and learning the Hydra way. We are firmly committed to this trajectory and wish now to become more involved, both technically and through project governance. We believe that a strengthened commitment would be of mutual benefit and, moreover, would serve as a compelling example for other liberal arts colleges that we are seeking to interest in shared Open Source development.”
Welcome Lafayette! We look forward to working with you.
The University of Michigan Library replaces roughly 1/4 of our computers every year. It is a long and complicated process when one considers the number of library staff and the number of computers (both in office and public areas where staff machines are used) involved.
This year we use a locally developed tool to streamline the process.
The University of Michigan Library replaces roughly 1/4 of our computers every year. It is a long and complicated process when one considers the number of library staff and the number of computers (both in office and public areas where staff machines are used) involved.
This year we use a locally developed tool to streamline the process.
Showcase the scholarship at your institution! This week, the VIVO membership drive began in earnest with an email to all VIVO community members. I hope you received the email and are considering how your institution might financially support VIVO:
This multi-part post is based on a talk I gave in June, 2016 at ELAG in Copenhagen.
Imagine that you do a search in your GPS system and are given the exact point of the address, but nothing more.
Without some context showing where on the planet the point exists, having the exact location, while accurate, is not useful.
In essence, this is what we provide to users of our catalogs. They do a search and we reply with bibliographic items that meet the letter of that search, but with no context about where those items fit into any knowledge map.
Because we present the catalog as a retrieval tool for unrelated items, users have come to see the library catalog as nothing more than a tool for known item searching. They do not see it as a place to explore topics or to find related works. The catalog wasn't always just a known item finding tool, however. To understand how it came to be one, we need a short visit to Catalogs Past.
We can't really compare the library catalog of today to the early book catalogs, since the problem that they had to solve was quite different to what we have today. However, those catalogs can show us what a library catalog was originally meant to be.
A book catalog was a compendium of entry points, mainly authors but in some cases also titles and subjects. The bibliographic data was kept quite brief as every character in the catalog was a cost in terms of type-setting and page real estate. The headings dominated the catalog, and it was only through headings that a user could approach the bibliographic holdings of the library. An alphabetical author list is not much "knowledge organization", but the headings provided an ordered layer over the library's holdings, and were also the only access mechanism to them.
Some of the early card catalogs had separate cards for headings and for bibliographic data. If entries in the catalog had to be hand-written (or later typed) onto cards, the easiest thing was to slot the cards into the catalog behind the appropriate heading without adding heading data to the card itself.
Often there was only one card with a full bibliographic description, and that was the "main entry" card. All other cards were references to a point in the catalog, for example the author's name, where more information could be found.
Again, all bibliographic data was subordinate to a layer of headings that made up the catalog. We can debate how intellectually accurate or useful that heading layer was, but there is no doubt that it was the only entry to the content of the library.
The Printed Card
In 1902 the Library of Congress began printing cards that could be purchased by libraries. The idea was genius. For each item cataloged by LC a card was printed in as many copies as needed. Libraries could buy the number of catalog card "blanks" they required to create all of the entries in their catalogs. The libraries would use as many as needed of the printed cards and type (or write) the desired headings onto the top of the card. Each of these would have the full bibliographic information - an advantage for users who then would not longer need to follow "see" references from headings to the one full entry card in the catalog.
These cards introduced something else that was new: the card would have at the bottom a tracing of the headings that LC was using in its own catalog. This was a savings for the libraries as they could copy LC's practice without incurring their own catalogers' time. This card, for the first time, combined both bibliographic information and heading tracings in a single "record", with the bibliographic information on the card being an entry point to the headings.
Machine-Readable Card Printing
The MAchine Readable Cataloging (MARC) project of the Library of Congress was a major upgrade to card printing technology. By including all of the information needed for card printing in a computer-processable record, LC could take advantage of new technology to stream-line its card production process, and even move into a kind of "print on demand" model. The MARC record was designed to have all of the information needed to print the set of cards for a book; author, title, subjects, and added entries were all included in the record, as well as some additional information that could be used to generate reports such as "new acquisitions" lists.
Here again the bibliographic information and the heading information were together in a single unit, and it even followed the card printing convention of the order of the entries, with the bibliographic description at top, followed by headings. With the MARC record, it was possible to not only print sets of cards, but to actually print the headers on the cards, so that when libraries received a set they were ready to do into the catalog at their respective places.
Next, we'll look at the conversion from printed cards to catalogs using database technology.
The American Library Association’s Annual Conference kicks off later this week in Orlando, Florida and DPLA staffers are excited to hit the road, connect with a fantastic community of librarians and show our support for the city of Orlando. Here’s your guide to when and where to catch up with DPLA’s staff and community members at ALA Annual. If you’ll be following the conference from afar, connect with us on Twitter and following the conference at #alaac16.
12:00pm – 2:00pm: Ebook Working Group Project Update [S]
Location: Networking Uncommons, Orange County Convention Center
This meeting is open to all librarians curious about current issues, ongoing projects, and ways to get involved. Attendees will learn how the Ebook Working Group fits in with other library ebook groups, and explore the projects we currently work on, including the Library E-content Access Project (LEAP), SimplyE/Open eBooks, SimplyE for Consortia, Readers First and other library-created ebook projects. Current members of the working groups will have the opportunity to meet and share updates, and connect with potential new members.
In the past years, libraries have embraced their role as global participants in the Semantic Web. Developments in library metadata frameworks such as BibFrame and RDA built on standard data models and ontologies including RDF, SKOS and OWL highlight the importance of linking data in an increasingly global environment. What is the status of linked data projects in libraries and other memory institutions internationally? Come hear our speakers address current projects, including RightsStatements.org, opportunities and challenges.
Panelists: Gordon Dunsire, Chair, RDA Steering Committee, Edinburgh, United Kingdom; Reinhold Heuvelmann, Senior Information Standards Specialist, German National Library; Richard Urban, Asst. Professor, School of Information, Florida State University
This program will include an interactive panel discussion of the major trends in e-books and how library consortia are at the forefront of elevating libraries as a major player in the e-book market. Leading models from library consortia that showcase innovation and advocacy including shared collections using open source, commercial and hybrid platforms and the investigation of a national e-book platform for local content from self-published authors and independent publishers.
Panelists: Michelle Bickert, Digital Public Library of America; Veronda Pitchford, Director of Membership Development and Resource Sharing, Reaching Across Illinois Library System; Valerie Horton, Executive Director, Minitex; Greg Pronevitz, Executive Director, Massachusetts Library System
For their latest Knight News Challenge, the Knight Foundation asked applicants to submit their best idea answering the question: “How might libraries meet 21st century information needs? This program will include a presentation of the newest winners of the challenge and a panel discussion on transformational change in the library field.
Panelists: Lisa Peet, Associate News Editor at Library Journal; Francesca Rodriquez. Foundation Officer at Madison Public Library Foundation; Matthew Phillips, Manager, Technology Development Team at Harvard University Library
Two innovative approaches help libraries address rights and reuse status for growing digital collections. RightsStatements.org addresses the need for standardized rights statements through international collaboration around a shared framework implemented by the Digital Public Library of America, New York Public Library, and other institutions. The Copyright Review Management System provides a toolkit for determining copyright, building off the copyright status work for materials in HathiTrust.
Panelists: Emily Gore, Director for Content, Digital Public Library of America; Greg Cram, Associate Director, Copyright and Information Policy, The New York Public Library; Rick Adler, DPLA Service Hub Coordinator at University of Michigan, School of Information
Digitization programs can be resource rich, even when institutions may be resource poor. Developing a program for the digitization of cultural heritage materials benefits from planning at the macro level, with organizational buy-in and strategic considerations addressed. Once this foundation is in place,an organization can successfully implement a digitization service aligned with organizational mission that benefits important known stakeholders and the wider community. This panel will focus on digitization programs from these two perspectives with emphasis on the creation of a mobile digitization service and how this can be replicated to sustain small-scale digitization programs that can have a huge and positive impact – not only for the institution but for the communities they serve.
Panelists: Caroline Catchpole, Mobile Digitization Specialist at Metropolitan New York Library Council; Natalie Milbrodt, Associate Coordinator at Metadata Services at Queens Library; Jolie O. Graybill, Assistant Director at Minitex; Molly Huber, Outreach Coordinator at Minnesota Digital Library
In the previous post, I talked about book and card catalogs, and how they existed as a heading layer over the bibliographic description representing library holdings. In this post, I will talk about what changed when that same data was stored in database management systems and delivered to users on a computer screen.
Taking a very simple example, in the card catalog a single library holding with author, title and one subject becomes three separate entries, one for each heading. These are filed alphabetically in their respective places in the catalog.
In this sense, the catalog is composed of cards for headings that have attached to them the related bibliographic description. Most items in the library are represented more than once in the library catalog. The catalog is a catalog of headings.
In most computer-based catalogs, the relationship between headings and bibliographic data is reversed: the record with bibliographic and heading data, is stored once; access points, analogous to the headings of the card catalog, are extracted to indexes that all point to the single record.
This in itself could be just a minor change in the mechanism of the catalog, but in fact it turns out to be more than that.
First, the indexes of the database system are not visible to the user. This is the opposite of the card catalog where the entry points were what the user saw and navigated through. Those entry points, at their best, served as a knowledge organization system that gave the user a context for the headings. Those headings suggest topics to users once the user finds a starting point in the catalog.
When this system works well for the user, she has some understanding of where she was in the virtual library that the catalog created. This context could be a subject area or it could be a bibliographic context such as the editions of a work.
Most, if not all, online catalogs do not present the catalog as a linear, alphabetically ordered list of headings. Database management technology encourages the use of searching rather than linear browsing. Even if one searches in headings as a left-anchored string of characters a search results in a retrieved set of matching entries, not a point in an alphabetical list. There is no way to navigate to nearby entries. The bibliographic data is therefore not provided either in the context or the order of the catalog. After a search on "cat breeds" the user sees a screen-full of bibliographic records but lacking in context because most default displays do not show the user the headings or text that caused the item to be retrieved.
Although each of these items has a subject heading containing the words "Cat breeds" the order of the entries is not the subject order. The subject headings in the first few records read, in order:
Cat breeds - History
Cat breeds - Handbooks, manuals, etc.
Cat breeds - Thailand
If if the catalog uses a visible and logical order, like alphabetical by author and title, or most recent by date, there is no way from the displayed list for the user to get the sense of "where am I?" that was provided by the catalog of headings.
In the early 1980's, when I was working on the University of California's first online catalog, the catalogers immediately noted this as a problem. They would have wanted the retrieved set to be displayed as:
(Note how much this resembles the book catalog shown in Part I.) At the time, and perhaps still today, there were technical barriers to such a display, mainly because of limitations on the sorting of large retrieved sets. (Large, at that time, was anything over a few hundred items.) Another issue was that any bibliographic record could be retrieved more than once in a single retrieved set, and presenting the records more than once in the display, given the database design, would be tricky. I don't know if starting afresh today some of these features would be easier to produce, but the pattern of search and display seems not to have progressed greatly from those first catalogs.
In addition, it is in any case questionable whether a set of bibliographic items retrieved from a database on some query would reproduce the presumably coherent context of the catalog. This is especially true because of the third major difference between the card catalog and the computer catalog: the ability to search on individual words in the bibliographic record rather than being limited to seeking on full left-anchored headings. The move to keyword searching was both a boon and a bane because it was a major factor in the loss of context in the library catalog.
Keyword searching will be the main topic of Part III of this series.
My library just received three Samsung S7 devices with Gear VR goggles. We put them to work right away.
The first thought I had was: Wow, this will change everything. My second thought was: Wow, I can’t wait for Apple to make a VR device!
The Samsung Gear VR experience is grainy and fraught with limitations, but you can see the potential right away. The virtual reality is, after all, working off a smartphone. There is no high-end graphics card working under the hood. Really, the goggles are just a plastic case holding the phone up to your eyes. But still, despite all this, it’s amazing.
Within twenty-four hours, I’d surfed beside the world’s top surfers on giant waves off Hawaii, hung out with the Masai in Africa and shared an intimate moment with a pianist and his dog in their (New York?) apartment. It was all beautiful.
We’ve Been Here Before
Remember when the Internet came online? If you’re old enough, you’ll recall the crude attempts to chat on digital bulletin board systems (BBS) or, much later, the publication of the first colorful (often jarringly so) HTML pages.
It’s the Hello World! moment for VR now. People are just getting started. You can tell the content currently available is just scratching the surface of potentialities for this medium. But once you try VR and consider the ways it can be used, you start to realize nothing will be the same again.
The Internet Will Disappear
So said Google CEO Erik Schmidt in 2015. He was talking about the rise of AI, wearable tech and many other emerging technologies that will transform how we access data. For Schmidt, the Internet will simply fade into these technologies to the point that it will be unrecognizable.
I agree. But being primarily a web librarian, I’m mostly concerned with how new technologies will translate in the library context. What will VR mean for library websites, online catalogs, eBooks, databases and the social networking aspects of libraries.
So after trying out VR, I was already thinking about all this. Here are some brief thoughts:
Visiting the library stacks in VR could transform the online catalog experience
Library programming could break out of the physical world (virtual speakers, virtual locations)
VR book discussions could incorporate virtual tours of topics/locations touched on in books
Collections of VR experiences could become a new source for local collections
VR maker spaces and tools for creatives to create VR experiences/objects
Still, VR makes your eyes tired. It’s not perfect. It has a long way to go.
But based on my experience sharing this technology with others, it’s addictive. People love trying it. They can’t stop talking about it afterward.
So, while it may be some time before the VR revolution disrupts the Internet (and virtual library services with it), it sure feels imminent.
Re:create, the copyright coalition that includes members from industry and library associations, public policy think tanks, public interest groups, and creators sponsored a program – How it works: understanding copyright law in the new creative economy – to a packed audience at the US Capital Visitors Center. Speakers included Alex Feerst, Corporate Counsel from Medium; Katie Oyama, Senior Policy Counsel for Google; Becky “Boop” Prince, YouTube CeWEBrity and Internet New Analyst; and Betsy Rosenblatt, Legal Director for the Organization for Transformative Works. The panel was moderated by Joshua Lamel, Executive Director of Re:create. Discussion focused on new creators and commercial businesses made possible by the Internet, fair use, and freedom of expression.
We live in a time of creative resurgence; more creative content is produced and distributed now than in any time of history. Some creators have successfully built profit-making businesses by “doing something they love,” whether it’s quilting, storytelling, applying makeup, or riffing on their favorite TV shows. What I thought was most interesting (because sometimes I get tired of talking about copyright) was hearing the stories of new creators – in particular, how they established a sense of self by communicating with people across the globe that have like-minded interests. People who found a way to express themselves through fan fiction, for example, found the process of creating and sharing with others so edifying that their lives were changed. Regardless of whether they made money or not, being able to express themselves with a diverse audience was worth the effort.
One story included a quilter from Hamilton, Missouri who started conducting quilting tutorials on YouTube. Her popularity grew to such an extent that she and her family – facing a tough economic time – bought an old warehouse and built a quilting store selling pre-cut fabrics. Their quilting store became so popular that fans as far away as Australia travel to see the store. And those people spent money in Hamilton. In four years, the Missouri Star Quilting Company became the biggest employee in the entire county, employing over 150 people, including single moms, retirees and students.
But enough about crafts. The panel also shared their thoughts on proposals to change “notice and take down” to “notice and stay down,” a position advocated by the content community in their comments on Section 512. This provision is supposed to help rights holders limit alleged infringement and provide a safe harbor for intermediaries – like libraries that offer open internet service – from third party liability. Unfortunately, the provision has been used to censor speech that someone does not like, whether or not copyright infringement is implicated. A timely example is Axl Rose, who wanted an unflattering photo of himself taken down even though he is not the rights holder of the photo. The speakers, however, did favor keeping Section 512 as it is. They noted that without the liability provision, it is likely they would not continue their creative work, because of the risk involved in copyright litigation.
All in all, a very inspiring group of people with powerful stories to tell about creativity and free expression, and the importance of fair use.
Wishing you’d submitted a proposal for Access? As part of this year’s program, we’re assembling a 60-minute Ignite event, and we’re looking for a few more brave souls to take up the challenge!
What’s Ignite, you ask? An Ignite talk is a fast-paced, focused, and entertaining presentation format that challenges presenters to “enlighten us, but make it quick.”
Ignite talks are short-form presentations that follow a simple format: presenters have 5 minutes, and must use 20 slides which are set to automatically advance every 15 seconds. Tell us about a project or experience, whether it’s a brilliant success story, or a dismal failure! Issue a challenge, raise awareness, or share an opportunity! An Ignite talk can focus on anything and can be provocative, tangential, or just plain fun. Best of all, it’s a great challenge to hone your presentation skills in front of one of the best audiences you’ll find anywhere!
Interested? Send your submissions to email@example.com, and tell us in 200 words or fewer what you’d like to enlighten us about. We’ll continue to accept Ignite proposals until July 15th, and accepted submitters will be notified by July 20.
Questions? Contact your friendly neighbourhood program chair at firstname.lastname@example.org!
As mentioned in my last posts, conducting a needs assessment and/or producing quantitative and/or qualitative data about the communities you serve is key in having a successfully funded proposal. Once you have an idea of the project that connects to your patrons, your research for RFPs or Request for Proposals begins.
Here are some RFP research items to keep in mind:
Open your opportunities for funding. Our first choice may be to look at “technology grants” only, but thinking of other avenues to broaden your search may be helpful. As MacKellar mentions in her book Writing Successful Technology Grant Proposals, “Rule #15: Use grant resources that focus on the goal or purpose of your project or on your target population. Do not limit your research to resources that include only grants for technology” (p.71).
Build a comprehensive list of keywords that describes your project in order to conduct strong searches.
Keep in mind throughout the whole process: grants are for people not about owning the latest devices or tools. Also, what may work for one library may not work for another; each library has its own unique vibe. This is another reason why a needs assessment is essential.
Know how you will evaluate your project during and after project completion.
Sharpen your project management skills by working on a multi-step project such as grants. It takes proper planning, and key players to get the project moving and afloat. It is helpful to slice the project into pieces, foster patience, and develop comfort in working on multi-year projects.
It is helpful to have leadership that supports and aids in all phases of the grant project. Try to find support from administration or from community/department partnerships. Find a mentor or someone seasoned in writing and overseeing grants in or outside of your organization.
Read the RFP carefully and contact funder with well-thought out questions if needed. It is important to have your questions and comments written down to lessen multiple emails or calls. Asking the right questions informs you if the proposal is right for a particular RFP.
Build a strong team that are invested in the project and communities served. It is wonderful to share aspects of the project in order to avoid burnout.
What's stopping us? That's the central question that the "open access" movement has been asking, and trying to answer, for the last two decades. Although tremendous progress has been made, with more knowledge freely available now than ever before, there are signs that open access is at a critical point in its development, which could determine whether it will ever succeed
It is a really impressive, accurate, detailed and well-linked history of how we got into the mess we're in, and a must-read despite the length. Below the fold, a couple of comments.
In addition, academics are often asked to work on editorial boards of academic journals, helping to set the overall objectives, and to deal with any issues that arise requiring their specialised knowledge. ... these activities require both time and rare skills, and academics generally receive no remuneration for supplying them.
The skewed nature of power in this industry is demonstrated by the fact that the scientific publishing divisions of leading players like Elsevier and Springer consistently achieve profit margins between 30 percent and 40 percent—levels that are rare in any other industry. The sums involved are large: annual revenues generated from English-language scientific, technical, and medical journal publishing worldwide were about $9.4bn (£6.4bn) in 2011.
This understates the problem. It is rumored that the internal margins on many journals at the big publishers are around 90%, and that the biggest cost for these journals is the care and feeding of the editorial board. Co-opting senior researchers onto editorial boards that provide boondoggles is a way of aligning their interests with those of the publisher. This matters because senior researchers do not pay for the journals, but in effect control the resource allocation decisions of University librarians, who are the publisher's actual customers. The Library Loon understands the importance of this:
The key aspect of Elsevier’s business model that it will do its level best to retain in any acquisitions or service launches is the disconnect between service users and service purchasers.
the Council of the European Union has just issued a call for full open access to scientific research by 2020. In its statement it "welcomes open access to scientific publications as the option by default for publishing the results of publicly funded [EU] research," but also says this move should "be based on common principles such as transparency, research integrity, sustainability, fair pricing and economic viability." Although potentially a big win for open access in the EU, the risk is this might simply lead to more use of the costly hybrid open access, as has happened in the UK.
but doesn't note the degraded form of open access they provide. Articles can be "free to read" while being protected by "registration required", restrictive copyright licenses, and robots.txt. Even if gold open access eventually became the norm, the record of science leading up to that point would still be behind a paywall.
Elsevier and Wiley have been singled out as regularly failing to put papers in the right open access repository and properly attribute them with a creative commons licence. This was a particular problem with so-called hybrid journals, which contain a mixture of open access and subscription-based articles. More than half of articles published in Wiley hybrid journals were found to be “non-compliant” with depositing and licensing requirements, an analysis of 2014-15 papers funded by Wellcome and five other medical research bodies found. For Elsevier the non-compliance figure was 31 per cent for hybrid journals and 26 per cent for full open access.
An entire organization called CHORUS has had to be set up to monitor compliance, especially with the US OSTP mandate. Note that it is paid for and controlled by the publishers, kind of like the fox guarding the hen-house.
Today, OITP released “The People’s Incubator: Libraries Propel Entrepreneurship” (.pdf), a 21-page white paper that describes libraries as critical actors in the innovation economy and urges decision makers to work more closely with the library community to boost American enterprise. The paper is rife with examples of library programming, activities and collaborations from across the country, including:
classes, mentoring and networking opportunities developed and hosted by libraries;
dedicated spaces and tools (including 3D printers and digital media suites) for entrepreneurs;
collaborations with the U.S. Small Business Administration (SBA), SCORE and more;
access to and assistance using specialized business databases;
business plan competitions;
guidance navigating copyright, patent and trademark resources; and
programs that engage youth in coding and STEM activities.
One goal for this paper is to stimulate new opportunities for libraries, library professionals and library patrons to drive the innovation economy forward. Are you aware of exemplary entrepreneurial programs in libraries that were not captured in this report? If so, please let us know by commenting on this post or writing to me.
As part of our national public policy work, we want to raise awareness of the ways modern libraries are transforming their communities for the better through the technologies, collections and expertise they offer all comers.
This report also will be used as background research in our policy work in preparation for the new Presidential Administration. In fact, look for a shorter, policy-focused supplement to The People’s Incubator to be released this summer.
ALA and OITP thank all those who contributed to the report, and congratulate the many libraries across the country that provide robust entrepreneurship support services.
This is another post in a series that I’ve been doing to compare the End of Term Web Archives from 2008 and 2012. If you look back a few posts in this blog you will see some other analysis that I’ve done with the datasets so far.
One thing that I am interested in understanding is how well the group that conducted the EOT crawls did in relation to what I’m calling “curator intent”. For both the EOT archives suggested seeds were collected using instances of the URL Nomination Toolhosted by the UNT Libraries. A combination of bulk lists of seeds URLs collected by various institutions and individuals were combined individual nominations made by users of the nomination tool. The resulting lists were used as seed lists for the crawlers that were used to harvest the EOT archives. In 2008 there were four institutions that crawled content, the Internet Archive (IA), Library of Congress (LOC), California Digital Library (CDL), and the UNT Libraries (UNT). In 2012 CDL was not able to do any crawling so just IA, LOC and UNT crawled. UNT and LOC had limited scope in what they were interested in crawling while CDL and IA took the entire seed list and used that to feed their crawlers. The crawlers were scoped very wide so that they would get as much content as they could, so the nomination seeds were used as starting places and we allowed the crawlers to go to all subdomains and paths on those sites as well as to areas that the sites linked to on other domains.
During the capture period there wasn’t consistent quality control performed for the crawls, we accepted what we could get and went on with our business.
Looking back at the crawling that we did I was curious of two things.
How many of the domain names from the nomination tool were not present in the EOT archive.
How many domains from .gov and .mil were captured but not explicitly nominated.
EOT2008 Nominated vs Captured Domains.
In the 2008 nominated URL list form the URL Nomination Tool there were a total of 1,252 domains with 1,194 being either .gov or .mil. In the EOT2008 archive there were a total of 87,889 domains and 1,647 of those were either .gov or .mil.
There are 943 domains that are present in both the 2008 nomination list and the EOT2008 archive. There are 251 .gov or .mil domains from the nomination list that were not present in the EOT2008 archive. There are 704 .gov or .mil domains that are present in the EOT2008 archive but that aren’t present in the 2008 nomination list.
Below is a chart showing the nominated vs captured for the .gov and .mil
2008 .gov and .mil Nominated and Archived
Of those 704 domains that were captured but never nominated, here are the thirty most prolific.
I see quite a few state and local governments that have a .gov domain which was out of scope of the EOT project but there are also a number of legitimate domains in the list that were never nominated.
EOT2012 Nominated vs Captured Domains.
In the 2012 nominated URL list form the URL Nomination Tool there were a total of 1,674 domains with 1,551 of those being .gov or .mil domains. In the EOT2012 archive there were a total of 186,214 domains and 1,944 of those were either .gov or .mil.
There are 1,343 domains that are present in both the 2008 nomination list and the EOT2012 archive. There are 208 .gov or .mil domains from the nomination list that were not present in the EOT2012 archive. There are 601 .gov or .mil domains that are present in the EOT2012 archive but that aren’t present in the 2012 nomination list.
Below is a chart showing the nominated vs captured for the .gov and .mil
2012 .gov and .mil Domains Nominated and Archived
Of those 601 domains that were captured but never nominated, here are the thirty most prolific.
Again there are a number of state and local government domains present in the list but up at the top we see quite a few URLs harvested from domains that are federal in nature and would fit into the collection scope for the EOT project.
How did we do?
The way that seed lists for the nomination tool were collected for the EOT2008 and EOT2012 nomination lists introduced a bit of dirty data. We would need to look a little deeper to see what the issues were with these. Some things that come to mind are that we got seeds from domains that existed prior to 2008 or 2012 but that didn’t exist when we were harvesting. Also there could have been typos in the URLs that were nominated so we never grabbed the suggested content. We might want to introduce a validate process for the nomination tool that let’s us know what that status of a URL in a project is at a given point so that we can at least have some sort of record.
This is coming to the site few days after it hit our listservs, but great news keeps. Please see below an annoucement from our Islandora 7.x-1.7 Release Manager Dan Aitken about the new release:
all flagged bugfixes have been completed,
all documentation has been confirmed up-to-date,
all audits have been completed, and
no new important issues have been opened in a reasonable period of time,
we are (a little late to the gate) proud to announce the fifth community release of the 7.x-1.x branch of Islandora, 7.x-1.7!
This release represents the dedicated efforts of nearly 40 volunteers who have produced a mountain of code resulting in the most stable, feature-rich version of Islandora 7.x-1.x we've ever put out. A release VM has also been generated and will be available for download shortly.
The way derivatives are generated is now alterable by customization modules, allowing any site to override the natural method of derivative generation and replace it with their own.
A ton of iTQL queries have been rewritten into SPARQL, paving the way to to swap in SPARQL-compliant triplestores.
Solr collection displays are sortable now, meaning that (finally!) the order of objects displayed in collections isn't subject to the whims of the node graph.
Many datastreams that would have formerly been created as inline XML now exist as managed datastreams; this prevents a huge number of collision issues within Fedora; it may seem minor or esoteric but it's been a long time coming, and the stability gain is appreciable.
In addition, we've fixed some long-standing issues with many modules, including our first major security issue. We've updated modules to have support for Drush 7. We've finally gotten around to documenting the crazy Drupal Rules functionality hidden away in 7.x-1.x's many modules. We've repaired documentation ranging from little typos to outright mistakes. It's been quite the undertaking, and I think in the end we have something to be incredibly proud of. A huge thanks to the entire team; honestly, it's been a wonder throughout.
Some quick notes on community and contribution and other housekeeping:
Modules now contain a .github folder; we will be using this going forward to store a template for making pull requests that should be filled out when contributing.
We're now maintaining an Awesome list at https://github.com/Islandora-Labs/islandora_awesome where users can take a look at some modules that aren't currently in the core rotation but could get added someday with enough love, or should just otherwise be checked out because they're neat and the people managing them are neat.
As Nick Ruest is resigning as component manager for 7.x-1.x modules listed in https://groups.google.com/forum/#!topic/islandora/6r4ZnVZv2bA as of the publication of this release, I'll remind us that one of the first orders of business post-release needs to be a call to find new component managers for these modules.
Rosie Le Faive
The Linked Library Data Interest Group will hold a session at the American Library Association Annual Conference from 8:30-10:00AM, this Saturday, June 25, 2016 in W208 at Orange County Convention Center (OCCC).
Michael Conlon, PhD, VIVO Project Director and Emeritus Faculty, University of Florida, will present "OpenVIVO: a hosted platform for representing scholarly work".
There’s now a group of people taking a look at whether and how to set up some sort of ongoing fiscal entity for the annual Code4Lib conference. Of course, one question that comes to mind is why go to the effort? What makes the annual Code4Lib conference so special?
There are lot of narrativesoutthere about how the Code4Lib conference and the general Code4Lib community has helped people, but for this post I want to focus on the conference itself. What does the conference do that is unique or uncommon? Is there anything that it does that would be hard to replicate under another banner? Or to put it another way, what makes Code4Lib a good bet for a potential fiscal host — or something worth going to the effort of forming a new non-profit organization?
A few things that stand out to me as distinctive practices:
The majority of presentations are directly voted upon by the people who plan to attend (or who are at least invested enough in Code4Lib as a concept to go to the trouble of voting).
Similarly, keynote speakers are nominated and voted upon by the potential attendees.
Each year potential attendees vote on bids by one or more local groups for the privilege of hosting the conference.
In principle, most any aspect of the structure of the conference is open to discussion by the broader Code4Lib community — at any time.
Historically, any surplus from a conference has been given to the following year’s host.
Any group of people wanting to go to the effort can convene a local or regional Code4Lib meetup — and need not ask permission of anybody to do so.
Some practices are not unique to Code4Lib, but are highly valued:
The process for proposing a presentation or a preconference is intentionally light-weight.
The conference is single-track; for the most part, participants are expected to spend most of each day in the same room.
Preconferences are inexpensive.
Of course, some aspects of Code4Lib aren’t unique. The topic area certainly isn’t; library technology is not suffering any particular lack of conferences. While I believe that Code4Lib was one of the first libtech conferences to carve out time for lightning talks, many conferences do that nowadays. Code4Lib’s dependence on volunteer labor certainly isn’t unique, although putting aside keynote speakers) Code4Lib may be unique in having zero paid staff.
Code4Lib’s practice of requiring local hosts to bootstrap their fiscal operations from ground zero might be unique, as is the fact that its planning window does not extend much past 18 months. Of course, those are both arguably misfeatures that having fiscal continuity could alleviate.
Overall, the result has been a success by many measures. Code4Lib can reliably attract at least 400 or 500 attendees. Given the notorious registration rush each fall, it could very likely be larger. With its growth, however, come substantially higher expectations placed on the local hosts, and rather larger budgets — which circles us right back to the question of fiscal continuity.
I’ll close with a question: what have I missed? What makes Code4Lib qua annual conference special?
Tomorrow we will drive to Orlando, as next week I’m attending two conferences: the Perl Conference (YAPC::NA) and the American Library Association’s Annual 2016 conference.
A professional concern shared by my colleagues in software development and libraries is the difficult problem of naming. Naming things, naming concepts, naming people (or better yet, using the names they tell us to use).
Names have power; names can be misused.
In light of what happened in Orlando on 12 June, the very least we can do is to choose what names we use carefully. What did happen? That morning, a man chose to kill 49 people and injure 53 others at a gay bar called the Pulse. A gay bar that was holding a Latin Night. Most of those killed were Latinx; queer people of color, killed in a spot that for many felt like home. The dead have names.
Names are not magic spells, however. There is no one word we can utter that will undo what happened at the Pulse nor immediately construct a perfect bulwark against the tide of hate. The software and library professions may be able to help reduce hate in the long run… but I offer no platitudes today.
Sometimes what is called for is blood, or cold hard cash. If you are attending YAPC:NA or ALA Annual and want to help via some means identified by those conferences, here are options:
I will close with this: many of our LGBT colleagues will feel pain from the shooting at a level more visceral than those of us who are not LGBT — or Latinx — or people of color. Don’t be silent about the atrocity, but first, listen to them; listen to the folks in Orlando who know what specifically will help the most.
Between April 2015 and June 2016, members of the Open Access Network Austria (OANA) working group “Open Access and Scholarly Communication” met in Vienna to discuss [Open Science]. The main outcome of our considerations is a set of twelve principles that represent the cornerstones of the future scholarly communication system. They are designed to provide a coherent frame of reference for the debate on how to improve the current system. With this document, we are hoping to inspire a widespread discussion towards a shared vision for scholarly communication in the 21st century.
This is the fourth post in a series that looks at the End of Term Web Archives captured in 2008 and 2012. In previous posts I’ve looked at the when, what, and where of these archives. In doing so I pulled together the domain names from each of the archives to compare them.
My thought was that I could look at which domains had content in the EOT2008 or EOT2012 and compare these domains to get some very high level idea of what content was around in 2008 but was completely gone in 2012. Likewise I could look at new content domains that appeared since 2008. For this post I’m limiting my view to just the domains that end in .gov or .mil because they are generally the focus of these web archiving projects.
Comparing EOT2008 and EOT2012
The are 1,647 unique domain names in the EOT2008 archive and 1,944 unique domain names in the EOT2012 archive, which is an increase of 18%. Between the two archives there are 1,236 domain names that are common. There are 411 domains that exist in the EOT2008 that are not present in EOT2012, and 708 new domains in EOT2012 that didn’t exist in EOT2008.
Domains in EOT2008 and E0T2012
The EOT2008 dataset consists of 160,212,141 URIs and the EOT2012 dataset comes in at 194,066,940 URIs. When you look at the URLs in the 411 domains that are present in EOT2008 and missing in EOT2012 you get 3,784,308 which is just 2% of the total number of URLs. When you look at the EOT2012 domains that were only present in 2012 compared to 2008 you see 5,562,840 URLs (3%) that were harvested from domains that only existed in the EOT2012 archive.
The thirty domains with the most URLs captured for them that were present in the EOT2008 collection that weren’t present in EOT2012 are listed in the table below.
The thirty domains with the most URLs from EOT2012 that weren’t present in EOT2012.
Shared domains that changed
There were a number of domains (1,236) that are present in both the EOT2008 and EOT2012 archives. I thought it would be interesting to compare those domains and see which ones changed the most. Below are the fifty shared domains that changed the most between EOT2008 and EOT2012.
Only 11 of the 50 (22%) resulted in more content harvested in EOT2012 than EOT2012.
Of the eleven domains that had more content harvested for them in EOT2012 there were five navy.mil, osd.mil, vaccines.mil, defense.gov, and usajobs.gov that increased by over 1,000% in the amount of content. I don’t know if this is necessarily a result in an increase in attention to these sites, more content on the sites, or a different organization of the sites that made them easier to harvest. I suspect it is some combination of all three of those things.
It should be expected that there are going to be domains that come into and go out of existence on a regular basis in a large web space like the federal government. One of the things that I think is rather challenging to identify is a list of domains that were present at one given time within an organization. For example “what domains did the federal government have in 1998?”. It seems like a way to come up with that answer is to use web archives. We see based on the analysis in this post that there are 411 domains that were present in 2008 that we weren’t able to capture in 2012. Take a look at that list of the top thirty, did you recognize any of those? How many other initiatives, committees, agencies, task forces, and citizen education portals existed at one point that are now gone?
If you have questions or comments about this post, please let me know via Twitter.
“We need programs like this one to help close the digital divide. People who can’t afford broadband Internet at home are at a significant disadvantage when it comes to school, looking for jobs or accessing government services and education,” said Vickery Bowles, City Librarian. “Internet access is essential in our digital world. We’re proud to pilot this program and hopeful we can increase its reach in the future.”
Everyone should have Internet access. Absolutely. But not provided free by Google or Facebook or another company that profits by monitoring its users.
Another hotspot borrower is a university student from out-of-town who regularly uses the library’s wi-fi to study and complete his course work. Now, he’s able to access the Internet outside of library hours at home. A hotspot was also borrowed by a single mother on disability who is using the device to submit benefit forms, communicate by email with her caseworker and browse health-related information online.
Touching stories. Of course these folks should have internet access. But the Toronto Public Library shouldn’t be helping Google make money (I realize it’s Google’s “charitable arm” providing the money, but get real): it should be an active, engaged library advocating for the public welfare of all Torontonians, for example by working with the Library Freedom Project while advocating for free city-wide municipal wifi. In the meantime, instead of taking money from Google, it should be buying its own devices and net connections, and making online privacy part of its information literacy work.
If I could get one of those hotspots I’d run a Tor exit node on it.
Brian Lavoie and I have just wrapped up a project examining the collective collection of a number of important research libraries in the UK and Ireland. The key findings from our analysis are published in a new OCLC Research report, Strength in Numbers: The Research Libraries UK (RLUK) Collective Collection. We were very pleased to have an opportunity to collaborate with colleagues from the RLUK consortium on this project, our first foray into extending our collective collections research to geographies outside of North America. RLUK has a role in the UK that is comparable to that of ARL in the US, and its analogues in other places: it provides ‘coordination capacity’ for research libraries that share a number of similar challenges and operate within (broadly) comparable institutional circumstances. As in our Right-scaling Stewardship project, undertaken in partnership with research libraries in the Committee on Institutional Cooperation (CIC) a few years ago, the insights and engagement of libraries in situ provided invaluable context for our analysis of WorldCat bibliographic and holdings data.
In the US, universities libraries have a long history of engagement with WorldCat as a core bibliographic utility for cataloging and resource sharing operations. In the UK, participation in WorldCat has – until recently – been more opportunistic and instrumental in nature. This has meant that coverage of UK library holdings in WorldCat is somewhat uneven. In preparation for the collective collection study, a number of RLUK libraries undertook major record-loading projects to bring their WorldCat holdings up to date. As a result, we now have a much better picture of the distribution of library resource across research libraries in the UK. The figure below, not included in our report, gives an idea of the regional concentration of RLUK library collections.
Geographic concentration of RLUK holdings in WorldCat (January 2016)
It is no surprise that the greatest concentration of resource is found in and around London – there are many RLUK institutions in the area, including the British Library. The regional concentrations that are shown in green (Oxford, Cambridge, Dublin, Edinburgh) reflect the depth of legal depositcollections in those locations. Edinburgh is a particularly bright spot, given it includes both the National Library of Scotland and the University of Edinburgh. The very prominent concentration in the North of England represents holdings of the British Library that are managed in Yorkshire. Setting aside the legal deposit libraries, the median collection size of RLUK libraries in WorldCat (as of January 2016) was about 827,000 titles; the average size was about 1.1 million titles.
An interesting question that arose in the course of this project was the degree to which the existing legal deposit libraries in the UK and Ireland might serve as preservation hubs within the larger RLUK network. As more academic libraries are looking to manage down locally held print inventory (transferring materials to offsite repositories, increasing reliance on shared print agreements etc.), there is growing interest in identifying latent print preservation capacity that might be more effectively leveraged, so that the total cost of stewarding research and heritage collections can be reduced and the overall scope of the collective collection increased. In the UK context, where universities (and their libraries) are largely supported by public funding, it is reasonable to ask whether investments in national legal deposit collections, and de facto preservation collections in some university libraries, can be used to support some rationalization of print collections management across the larger higher education sector.
While a close study of bi-lateral duplication rates within the RLUK group (e.g., duplication between individual legal deposit libraries and other university libraries) was beyond the scope of our collective collection project, we did do some preliminary investigation. The project advisory board, composed of representatives from 11 RLUK libraries, was especially interested to know if a large share of scarce or distinctive resources in the collective collection were concentrated in the legal deposit libraries. If so, they reasoned, other libraries might relegate or even de-select local copies of those resources, with confidence that legal deposit partners would uphold preservation and access responsibilities. We looked at titles held in fewer than 5 RLUK institutions and found that 73% or more were held in at least one legal deposit repository. The percentage of titles held in legal deposit libraries increased with the overall duplication rate: for titles held by 4 libraries in the RLUK group, the legal deposit duplication rate rose to 95%. Among the legal deposit libraries, the British Library – the largest of all the legal deposit collections within the RLUK – was the most frequent source of duplication, with Oxford University a close second.
What this suggests is that while the legal deposit network in the UK represents a significant source of preservation and access infrastructure, the capacity of individual legal deposit libraries to contribute to shared RLUK preservation goals will vary. Put another way, the collective capacity of these libraries is greater than the sum of their individual institutional capacities. What the optimal allocation of preservation responsibility across the RLUK group will ultimately look like will depend on a number of factors. Rick Lugg, Ruth Fisher and colleagues on the OCLC Sustainable Collection Services will be working with subsets of the RLUK group (including members of the White Rose Libraries consortium) in the coming year to examine how print preservation responsibilities can be shared among UK research libraries.
From a research perspective, we are interested in delineating patterns that reveal where library collaboration can increase system-wide efficiency and maximize institution benefit, without necessarily prescribing specific choices or courses of action for individual libraries. This recent collaboration with RLUK libraries provided a wealth of opportunities to explore how WorldCat data can be used to support group-scale approaches to collection management. Not all of the research it motivated is reflected in the final report, but it has helped to illuminate some lines of inquiry that we hope to explore in the future.
Constance Malpas is a Research Scientist at OCLC. Her work focuses on data-driven analysis of library collections and services, with a special emphasis on strategic planning and managing institutional change. She has a particular interest in the organization of knowledge and research practices in the sciences.