His Juno acceptance speech for best Indigenous Music Album was badass: he thanked his family and team, he asked the other nominees to stand up and praised their work for creating space and defying a single genre, then he called out the Canadian Prime Minister for supporting pipelines, for sending in militarized police forces into unceeded territory and for the boil water advisory that exists in many First Nations communities. He was interrupted by the music playing him off.
Later the Arkells, who won the Rock Album of the Year, said a quick thank you and stepped back and invited Jeremy Dutcher to finish what he was saying. Before yesterday it was outside my imagination that a rock band would step back and give a two spirit Indigenous opera singer space their time and space on the stage.
I think of allyship as a verb, not as a noun, and this was a beautiful example of this. All of this is such an inspiration for me to speak truth to power, to use some of my time to hold up my colleagues’ work on the stage, and to think about where i can step back and literally create time and space for others.
Visitors to newsstands in early 1923 encountered a number of significant American magazines for the first time. They could pick up the first issues of Time, with brief, breezy dispatches relating the week’s news from around the world. (Issues of that magazine from 1923 and some years afterward are already in the public domain, due to a lack of copyright renewals.) Or they might find dispatches from stranger, creepier worlds in another new magazine: Weird Tales, a pulp fiction magazine featuring stories of horror, fantasy, and what in a few years would be called “science fiction”. A number of iconic genre characters such as Robert E. Howard’s Conan, C. L. Moore’s Jirel of Joiry, and H. P. Lovecraft’s Cthulhu made their mass-media debut in the magazine.
It took a while for Weird Tales to hit its stride, but there are some notable stories in its first issues, many of which will be joining the public domain five days from now. (Much of the content published in Weird Tales was not copyright-renewed, but most of the 1923 issues were.) One tale that caught my interest, as much for its circumstances as for its content, is a story by Sonia Greene titled “The Invisible Monster” in the November 1923 issue (and called “The Horror of Martin’s Beach” in some later reprints). The story features a strange sea-beast that sailors find, kill, and bring to shore, unknowingly incurring the wrath of the beast’s bigger and fiercer mother. Able to hypnotize humans so they they both fail to see her and lose control over their body movements, the mother-beast exacts her revenge on the sailors and nearby beachgoers, dragging them to watery deaths.
The story may remind a reader today of stories with similar elements like Beowulf and Jaws. But it also reads like a Lovecraft story. That’s not a coincidence: Greene and Lovecraft, who were both active in the world of amateur journalism, had met not long before. In 1922, Greene visited Lovecraft in New England and suggested the idea for the story while they walked along the beach. According to L. Sprague de Camp’s biography of Lovecraft, Greene wrote up an outline of the story that night, and Lovecraft was so enthusiastic about the story that Greene spontaneously kissed him, the first kiss he had had since infancy.
Thus began a romance that would eventually result in the marriage of Greene and Lovecraft in 1924, as well as the publication of the story in Weird Tales in 1923. It’s pretty clear that Lovecraft had some hand in the story that ran there. At the time, a fair bit of his income came from unsigned editing and revising of others’ stories, he appears to have shepherded it into print at Weird Tales, and some of the vocabulary in this story is distinctly Lovecraftian. Some commenters have therefore not only added Lovecraft as an author of the story, but also credited him as the primary author, or even speculated, as de Camp does, that he wrote the whole thing from Greene’s “mere outline”. However, both Greene and Lovecraft were experienced writers, and knowing both the tendency of attributions to gravitate to more famous writers, and of women’s writing contributions to be marginalized, I’m inclined to keep crediting Greene as the author of this story, as she is credited in the Weird Tales issue.
Sadly, Greene and Lovecraft’s relationship would soon grow strained. Beset with financial woes and health problems after their marriage, they spent much of their time apart, were living in different cities by 1925, and by the end of the 1920s had divorced. Lovecraft’s relationship with his genre has also been increasingly strained. He was deeply racist, and while his stories have had a significant influence on fantasy and horror literature, many of them are also inherently infused with fear and contempt for non-white races and foreigners. That eventually led the administrators of the World Fantasy Awards, which had used his likeness on their trophies since their establishment in 1975, to redesign the award without him in 2017.
Some writers, though, have found ways to recognize the contributions of Lovecraft and other early horror writers to the genre while still engaging unflinchingly with their racism. One work doing this that I particularly like is Matt Ruff’s 2016 novel Lovecraft Country, where the main characters have to deal with both the forces of supernatural horror and the forces of Jim Crow– and the latter are often scarier than the former.
Ruff manages to rework flaws in Lovecraft’s 20th-century work into strengths for the story he wants to tell, and combines it with other Lovecraftian elements to make a sort of narrative alloy well suited for the 21st century. Once “The Invisible Monster” and other stories from the first year of Weird Tales join the public domain next week, other writers will also have the chance to take their flaws and strengths and make other wonderful things with them. I don’t know what will result, but I’d love to see what people try.
On Open Data Day 2019, groups from around the world organised over 300 events to celebrate, promote and spread the use of open data. The Women Economic and Leadership Transformation Initiative (WELTI) and Safety First for Girls Outreach Foundation (SAFIGI) received funding through our mini-grant scheme by the Foreign and Commonwealth Office of the United Kingdom, to organise events under the Equal Development theme. The event report below was written by Ifeoma Okonji and Hadassah Louis: their biographies are included at the bottom of this post.
We are living in times where the spotlight is on women. Gender equality and equal development is a running theme, however, this is not translated in the daily lives of women. The mantra has to be translated to actionable steps if we are to achieve SDG5 by 2030. Open data is the best defense for women because data does not discriminate, especially when it accessible to and produced by underserved communities.
Women take the lead
Two female-led and focused organizations in the Africa region leveraged Open Data Day 2019 to showcase how open data is crucial to improving the socio-economic conditions of women in developing communities. Women Economic and Leadership Transformation Initiative (WELTI) in Nigeria and Safety First for Girls Outreach Foundation (SAFIGI) in Zambia, both use open data as a response tool on issues affecting their respective communities.
WELTI leads open data initiatives in Nigeria by ensuring that the young women whose lives are being impacted, leverage technology to make their businesses thrive, drive leadership and also in their educational life as such opportunities isn’t inherent in the normal school curriculum and have access to data that can help them in their daily lives. This involves leading stakeholder engagement strategies to drive this.
SAFIGI taps into the power of working open, putting young women in leadership positions, and strategic collaboration to pursue research, create safety courses, and execute social campaigns in order to improve safety for girls.
Open data often goes hand in hand with open working cultures and open business practices. While this culture lends itself to diversity, it is crucial that those who are involved in open data take on a bottom up and inclusive approach so that marginalized communities do not continue to be sidelined in research spaces.
Cancer, data and female health
WELTI’s Open Data day was themed Cancer, Data and Female Health. In partnership with CEAFON Nigeria, an organization of doctors who are spreading awareness on cancer, the Women Economic and Leadership Transformation Initiative hosted the event at Girls Senior Academy Secondary School, Simpson Street, in Lagos.
The ODD Nigeria event started by beneficiaries being asked what they understood by open data and what sort of data they look for when trying to access data. Young Nigeria women were shown statistics, preventive measures and care in regards to cancer. In the course of the training, they were shown how information can be sought openly. This included pre and post surveys regarding cancer, data and female health and what open sources are available to them to get information. We had 45% of these young women knowing what cancer, data and female health was all about and after the program, we had a 70% increase in awareness and knowledge and that was quite an impact. This included over a hundred female beneficiaries who were also exposed to sites and data collection/retrieval regarding the subject matter.
At the clinical/population and research data level, opening up medical data, sharing and linking large healthcare datasets enables semantically to relate and enrich data on symptoms, diseases, diagnosis, treatments, and prescriptions offering the potential for improvements in care for individuals and populations as well as more efficient semantic access to the evidence base.
Safer communities with open data
SAFIGI Outreach Foundation in Lusaka, Zambia hosted Open Data Day with a goal to increase understanding on the benefits of open data in creating safer grassroots communities. The event was hosted at Global Platform in Lusaka. 80% of the attendees were female, and prior to the event 1 in 10 did not understand open data, and 40% of participants only had a rough idea about open data. The Open Data Day event by SAFIGI was structured to respond to this gap and share strategies which the participants could use to improve and solve issues in their community through open data.
In 2019, Safety First for Girls is working on a campaign called Equality Culture in which they are engaging community members to address both positive and negative aspects of tradition in line with gender equality. This was founded on the youth-led organizations’ open research titled the Safety Report paper in which they studied how culture, traditions and beliefs help maintain the status quo and inequality. Through this campaign, SAFIGI is using open data to improve safety conditions of girls in local communities through safety education, research, and advocacy.
Open Data Day hosted by SAFIGI in Lusaka highlighted the gap in comprehensive research about women from grassroots communities. UN online volunteers who worked on the open research and data analysis through SAFIGI were part of a panel at the event to showcase a good example of how open data can bring positive change by sharing SAFIGI Foundations Open Data Analysis, which is accessible here.
An equal future is possible with open data
In a continent like Africa, rigged with strong patriarchal systems which create communities rife with gender inequality, open data initiatives can be a tool to enable for socio-economic empowerment of women. The strides made by SAFIGI and WELTI to use open data and open practices for equal development is creating communities within the continent that addresses inequality with evidence-based approaches.
While open data is gender neutral, a gendered approach is necessary for equal development in underserved and developing communities. This can only be accomplished when women take the lead in analysing core issues affecting their communities, sharing this through open data and using best practices to solve gender inequality.
The capacity strengthening of female-led initiatives creates a ripple effect in the movement for a more equal world in which women are safer, healthier, and economically sound which emphasises on the human dignity of marginalized girls and in turn promotes their human rights. Open Data Day was more than just a celebration, it is a milestone toward creating a more equal world through data, one girl at a time.
Ifeoma Okonji is a Social entrepreneur, a Customer experience Professional with over ten years’ experience in both the profit and non-profit sector. She is an astute young lady who has a passion to empower young women, and also has a knack for smart work, dedication and teamwork. She is the founder of Women Economic and Leadership Transformation Initiative (WELTI) a non-profit that advocates for equality for young women in leadership, technology, health and education. She is also a Mozilla Open leader, an associate member of Women in Management, business and public service (WIMBIZ),an open knowledge thought leader/advocate and a member of Global giving International. She has a propensity to travel, sustain useful acquaintances and loves music and dancing.
Hadassah Louis is a youth leader passionate about gender, digital literacy, and grassroots advocacy. She is founder of the SAFIGI Outreach Foundation and President of Digital Grassroots. She is also a 2019 IFF Community Development fellow, a 2019 Engineers Without Borders Canada Kumvana fellow, a Mozilla Open Leader and expert, an Internet Society 2017 Youth@IGF fellow , an open knowledge advocate, and a champion for capacity building of youth and girls. Hadassah graduated summa cum laude in multimedia journalism, and is a contributor on Impakter.com and Africa.com. She is a Woman Deliver Scholarship recipient 2019. Learn more about her work on www.hadassahlouis.com
I recently attended a workshop, organised by the excellent team of the Turing Way project, on a tool called BinderHub. BinderHub, along with public hosting platform MyBinder, allows you to publish computational notebooks online as "binders" such that they're not static but fully interactive. It's able to do this by using a tool called repo2docker to capture the full computational environment and dependencies required to run the notebook.
What is the Turing Way?
The Turing Way is, in its own words, "a lightly opinionated guide to reproducible data science." The team is building an open textbook and running a number of workshops for scientists and research software engineers, and you should check out the project on Github. You could even contribute!
Add some extra metadata describing the packages and versions your code relies on
Go to mybinder.org and tell it where to find your repository
Open the URL it generates for you
Other than step 5, which can take some time to build the binder, this is a remarkably quick process. It supports a number of different languages too, including built-in support for R, Python and Julia and the ability to configure pretty much any other language that will run on Linux.
However, the Python support currently requires you to have either a requirements.txt or Conda-style environment.yml file to specify dependencies, and I commonly use a Pipfile for this instead. Pipfile allows you to specify a loose range of compatible versions for maximal convenience, but then locks in specific versions for maximal reproducibility. You can upgrade packages any time you want, but you're fully in control of when that happens, and the locked versions are checked into version control so that everyone working on a project gets consistency.
Since Pipfile is emerging as something of a standard thought I'd see if I could use that in a binder, and it turns out to be remarkably simple. The reference implementation of Pipfile is a tool called pipenv by the prolific Kenneth Reitz. All you need to use this in your binder is two files of one line each.
requirements.txt tells repo2binder to build a Python-based binder, and contains a single line to install the pipenv package:
Then postBuild is used by repo2binder to install all other dependencies using pipenv:
pipenv install --system
The --system flag tells pipenv to install packages globally (its default behaviour is to create a Python virtualenv).
I’m a librarian at York University Libraries in Toronto. Let’s call me Librarian Bill when I’m there. At home I’m Civilian Bill, and last month Civilian Bill put in a freedom of information request to York University for the amounts the Libraries spent on electronic resources in fiscal years 2017 and 2018. Civilian Bill knew the information exists because Librarian Bill prepared a spreadsheet with precisely those costs.
York has refused to release the data. Their response is “withhold in full.”
Civilian Bill appealed. Seven months later Civilian Bill and Librarian Bill am very happy to report the data will be released.
My appeal was handled the by same mediator who had my request for communications between the chairs of York’s Senate and Board of Governors, which made me happy. She was excellent: helpful, informative, quick to act, expert on all aspects of the legislation and a fine example of the civil service at its best. If you’re in Ontario and have an idea for a freedom of information request but are worried things are stacked against you, don’t be. My mediator—I assume they’re all equally good—was everything I’d hoped for.
Last fall I had a number of phone conversations with the mediator, and she in turn talked to York quite a bit. The mediator said (I paraphrase—any misunderstandings are mine) that the Freedom of Information and Protection of Privacy Act (FIPPA) didn’t recognize non-disclosure agreements, so that part of its argument didn’t hold up. The adjudicator might look favourably on my request … but it would take two or three years to get to adjudication because there’s a big backlog.
It emerged that the problem, from York’s point of view, was that the information about NDAs in the big list of eresources I’d help create was unverified. My impression was that York thought it needed to be reviewed and double-checked. At least that was a potentially justifiable reason, unlike the unexplained blanket “denied in full” that was the initial response.
We were at an impasse. Mediation was unsuccessful. York wasn’t going to release the data. I informed the mediator I wanted to go to adjudication.
Soon after that I got a call from the mediator saying I should expect a letter from York. This is what it said:
As a result of mediation with [the mediator] at the Information and Privacy Commission York University would like to suggest as a possible resolution to Appeal PA-18-403 a schedule for the release of the costs you have requested. York University would be able to commit the resources necessary to schedule the release of this information by, or in advance of, April 30, 2020.
That was a surprise! But “would be able to commit the resources necessary” was too vague. I “would be able to commit” to bringing my lunch to work every day for a month but that doesn’t mean I’m actually going to do it. The mediator followed up with York, and last week I got this:
As a result of mediation with [the mediator] at the Information and Privacy Commission York University would like to suggest a possible resolution to Appeal PA-18-403. York University is committing the resources necessary to schedule the release of this information with a goal of April 30, 2020 for the completion of this project. It is hoped that this will resolve the appeal.
Is committing the resources necessary. That satisfied me. I talked to the mediator one last time and said I was now willing to drop the appeal. She immediately did her report, which says in part:
The university issued a decision denying access to the responsive records pursuant to sections 17 and 18 of the Act.
The requester, now appellant, appealed the university’s decision.
RESULTS OF MEDIATION:
During mediation, the mediator had discussions with the appellant and the university.
The mediator provided the appellant with information about the exemptions applied to the records at issue.
The appellant clarified that he is not seeking access to the costs of resources that have been protected under a non disclosure agreement with the university. In regards to the remainder of the resources, the appellant further narrowed the request to include only the name of each publication and total cost per year for 2017 and 2018. The appellant also advised that he is aware of one document in the possession of the university that he believed contained the information that he is seeking relating to this narrowed request.
The university located the document that the appellant was seeking access to and stated that it is not prepared to release it due to economic and other interests.
The university then answered some of the appellant’s questions and sent him a letter confirming its commitment to schedule the release of the information at issue in this appeal with a goal of April 2020 for the completion of the project.
The appellant advised that he is satisfied with the university’s response and wishes to withdraw his appeal.
Accordingly, this appeal has been closed.
What made York change its mind? I don’t know. There were two bodies involved: the Information and Privacy Officer and the Libraries. I assume the two were able to work together to arrive at this result. I have no idea. No one in the Libraries has told me anything about my appeal—not brought it up in conversation, not so much as alluded to it.
Now, this data is going to be released to me personally. Whatever I get, I’ll make it public, but of course my goal is for York to release this information itself, in a good data set, like other universities do. If York can give me the information, it seems to me there’s nothing preventing it from doing a proper full release.
And I hope that with F2017 and F2018 done there will be nothing to prevent York from releasing F2019 and onwards, because each year we only add a few handfuls of new eresources and their license agreements will be easy to check.
This all began “early in 2017 [when] Librarian Bill was part of a group at York University Libraries that resolved to make public the costs YUL spent on electronic resources.” That didn’t happen, so in 2018 Civilian Bill filed a FIPPA request, was denied, and appealed. In 2019 a satisfactory plan was proposed for releasing the data; as a result the appeal was dropped. Civilian Bill should have the data in early 2020. When he does, both Civilian Bill and Librarian Bill will be happy.
The parties agree to continue their practice of upholding, protecting, and promoting academic freedom as essential to the pursuit of truth and the fulfilment of the University’s objectives. Academic freedom includes the freedom of an employee to examine, question, teach, and learn; to disseminate his/her opinion(s) on any questions related to his/her teaching, professional activities, and research both inside and outside the classroom; to pursue without interference or reprisal, and consistent with the time constraints imposed by his/her other University duties, his/her research, creative or professional activities, and to freely publish and make public the results thereof; to criticize the University or society at large; and to be free from institutional censorship. Academic freedom does not require neutrality on the part of the individual, nor does it preclude commitment on the part of the individual. Rather, academic freedom makes such commitment possible.
I’m privileged to have academic freedom and I’m happy to use it.
Some people see deadlines as guidelines to aim for, not absolute dates by which a deliverable is expected by
This view of deadlines as flexible guidelines can be seen throughout western culture, as exemplified by the ongoing, oft delayedBrexit negotiations. However, deadlines also compete against other factors in any project. Consider the three constraints in the project management triangle:
Some people see deadlines as guidelines to aim for, not absolute dates by which a deliverable is expected by This view of deadlines as flexible guidelines can be seen throughout western culture, as exemplified by the ongoing, oft delayed Brexit negotiations. However, deadlines also compete against other factors in any project. Consider the three constraints in the project management triangle:
Time Budget Scope Those three constraints are part of a humorous phrase that almost everybody who’s worked on a project of any sort is familiar with: “pick two: good, fast, or cheap,” as illustrated with this diagram:
[the book] starts in an entirely appropriate place.
Dr Dao - a doctor with patients to serve the next day - was "selected" by United Airlines to be removed from an overbooked plane.
As he had patients to tend the next day he did not think he should leave the plane. So the airline sent thugs to bash him up and forcibly removed him.
The video (truly sickening) went viral. But the airline did not apologise. The problem it seems was caused by customer intransigence.
They apologised after what Tepper and Hearn think was true public revolt, but what I think was more likely the realistic threat to ban United Airlines from China because of the racial undertones underlying that incident.
If a "normal" company sent thugs to brutalise its customers it would go out of business. But United went from strength to strength.
The reason the authors assert was that United has so much market power you have no choice to fly them anyway - and by demonstrating they had the power to kick your teeth in they also demonstrated that they had the power to raise prices. The stock went up pretty sharply in the end.
Oligopoly - extreme market power - not only makes airlines super-profitable. It gives them the licence to behave like complete jerks.
But what is true of airlines is true of industry after industry in the United States. Hospital mergers have left many towns with one or two hospitals. Health insurance is consolidated to the point where in most states there is only one or two realistic choices. Even the chicken-farming industry is consolidated to the point where the relatively unskilled and non-technical industry makes super-normal profits.
How did we get to the point where a company can hike its stock price by assaulting its customers? It wasn't that anti-trust law changed, it is that the Chicago school changed the way the law was interpreted to focus on "consumer welfare" defined as low prices, thereby ham-stringing its enforcement. As we see in the contrast between the Savings and Loan crisis and the Global Financial crisis, a law isn't effective simply because it is on the books, but only if it is effectively enforced.
At $25 billion [in annual revenue to trigger a breakup], you’re not anticipating that the local supermarket is going to stop having to do house brands.
Exactly. And no one’s looking for that. You’re getting into the nuance, that actually this is a two level regulation. The one that’s caught all the headlines is that for everybody above $25 billion, you got to break off the platform for many of the ancillary or affiliated businesses.
But between 90 million and 25 billion [in annual global revenue], the answer is to say if you run a platform, you have an obligation of neutrality, so you can’t engage in discriminatory pricing. Obviously, it’s like the net neutrality rule: you can’t speed up some folks and slow down other folks, which is another way of pricing. So there’s an obligation of neutrality.
The advantage to breaking them up at the top [tier] rather than just simply saying, “gosh, girl, why didn’t you just go for obligation of neutrality all the way through?” is that it actually makes regulation far easier. When you’ve just got a bright-line rule, you don’t need the regulators. At that point, the market will discipline itself. If Amazon the platform has no economic interest in any of the formerly-known-as-Amazon businesses, you’re done. It takes care of itself. ... So you’re articulating a bright-line rule. A lot of conversations I’ve had with antitrust people like the Tim Wus and Lina Khans of the world, they’re saying we need to change the standard. We need to go from the consumer welfare antitrust standard to a European-style competition standard. Are you advocating that we change the antitrust standard?
I just think it’s a lot harder to enforce that against a giant that has huge political power.
So you’re in favor of leaving the consumer welfare standard alone?
Look, would I love to have [that changed] as well? Sure. I have no problem with that.
My problem is in the other direction: there are times when hard, bright-line rules are the easiest to enforce, and therefore you’re sure you’ll get the result you want.
Let me give you an example of that: I’ve been arguing for a long time now for reinstatement of [the] Glass-Steagall [Act]. And my argument is basically, don’t tell me that the Fed and the Office of the Comptroller of the Currency can crawl through Citibank and JPMorgan Chase and figure out whether or not they’re taking on too much risk and whether they’ve integrated and cross-subsidized businesses. Just break off the boring banking part — the checking accounts, the savings accounts, what you and I would call commercial banking — from investment banking, where you go take a high flyer on this stock or that new business
When you break those two apart, you actually need fewer regulators and less intrusion on the business.
You also get more assurance it really happened. We live in an America where it’s not only economic power that we need to worry about from the Amazons and Facebooks and Googles and Apples of the world — we have to worry about their political power as well. There’s a reason that the Department of Justice and the Federal Trade Commission are not more aggressive. There was a time, long ago, when they were more aggressive, a golden age of antitrust enforcement.
These big companies exert enormous influence in the economy and in Washington, DC. We break them apart, that backs up the influence a little bit, and it makes absolutely sure that they’re not engaged in these unfair practices that stomp out every little business that’s trying to get a start, every startup that’s trying to get in there.
Senator Warren is clearly right about the importance of bright lines for enforceable anti-trust laws when she says:
When you’ve just got a bright-line rule, you don’t need the regulators. At that point, the market will discipline itself.
But in my view she doesn't go far enough, for two reasons:
In her vision, what happens when a company exceeds $25B/yr in revenue is that a conversation starts between the company and the regulators. Given the resources available on both sides, this is a conversation that (a) will go on for a long time, and (b) will be resolved in some way acceptable to the company.
Her vision seems narrowly tailored to the FAANGS, ignoring the real oligopoly of the online world, the telcos. But her arguments apply equally to oligopoly and monopoly in other areas. John Hempton uses the example of Lamb Weston, the dominant player in french fries:
French fries it seems are absurdly profitable. The return on assets is in the teens (which seems kind-of-good in this low return world). Margins keep rising and yet there is no obvious emerging competition.
It may be a good investment even though it looks pretty expensive. But if competition comes Lamb Weston could be a terrible stock.
There has been plenty of consolidation in this industry. Sure many of the mergers shouldn't have been approved by regulators - but they were - and the industry has become oligopolistic.
But this is not a complicated industry - it is not obvious why competition doesn't come.
I think Robinson was on to a better alternative. Although it is never spelled out explicitly, one key aspect of the transformation in Pacific Edge is that there are hard limits on both personal incomes and the size of corporations. There is a very simple way to implement such hard limits, via the tax code:
Corporations should be subject to a 100% tax rate on revenue above the cap.
There should be no need for anti-trust regulators to argue with the company about what it should do. It is up to the company, as always, to decide how to minimize their tax liability. They can decide to break themselves up, to lower prices, to stop selling product for the year, whatever makes sense in their view. It isn't up to the government to tell them how to structure their business. Basing the cap on revenue, as opposed to profit, prevents most of the ways companies manipulate their finances to avoid tax. Basing enforcement on the tax code leverages existing mechanisms rather than inventing new ones. And, by the way:
Individuals should be subject to a 100% tax rate on income above a similar cap.
In both cases the 100% rate should be supplemented by a small wealth tax, a use-it-or-lose-it incentive for cash hoards to be put to productive use instead of imitating Smaug's hoard.
“We need to make it easy to manage data throughout its lifecycle and ensure it can be easily and reliably retrieved by people who want to reuse and repurpose it. We developed Data Curator to help publishers define certain characteristics to improve data and metadata quality” – Dallas Stower, Assistant Director-General, Digital Platforms and Data, Queensland Government – Project Sponsor
Data Curator allows you to create data from scratch or open an Excel or CSV file. Data Curator requires that each column of data is given a type (e.g. text, number). Data can be defined further using a format (e.g. text may be a URL or email). Constraints can be applied to data values (e.g. required, unique, minimum value, etc.). This definition process can be accelerated by using the Guess feature, that guesses the data types and formats for all columns.
Data can be validated against the column type, format and constraints to identify and correct errors. If it’s not appropriate to correct the errors, they can be added to the provenance information to help people understand why and how the data was collected and determine if it is fit for their purpose.
Often a set of codes used in the data is defined in another table. Data Curator lets you validate data across tables. This is really useful if you want to share a set of standard codes across different datasets or organisations.
Data Curator lets you save data as a comma, semicolon, or tab separated value file. After you’ve applied an open license to the data, you can export a data package containing the data, its description, and provenance information. The data package can then be published to the Internet. Some open data platforms support uploading, displaying, and downloading data packages. Open data consumers can then confidently access and use quality open data.
When Roy Rosenzweig and I wrote Digital History 15 years ago, we spent a lot of time thinking about the overall tone and approach of the book. It seemed to us that there were, on the one hand, a lot of our colleagues in professional history who were adamantly opposed to the use of digital media and technology, and, on the other hand, a rapidly growing number of people outside the academy who were extremely enthusiastic about the application of computers and computer networks to every aspect of society.
For the lack of better words—we struggled to avoid loaded ones like “Luddites”—we called these two diametrically opposed groups the “technoskeptics” and the “cyberenthusiasts” in our introduction, “The Promises and Perils of Digital History“:
Step back in time and open the pages of the inaugural issue of Wired magazine from the spring of 1993, and prophecies of an optimistic digital future call out to you. Management consultant Lewis J. Perleman confidently proclaims an “inevitable” “hyperlearning revolution” that will displace the thousand-year-old “technology” of the classroom, which has “as much utility in today’s modern economy of advanced information technology as the Conestoga wagon or the blacksmith shop.” John Browning, a friend of the magazine’s founders and later the Executive Editor of Wired UK, rhapsodizes about how “books once hoarded in subterranean stacks will be scanned into computers and made available to anyone, anywhere, almost instantly, over high-speed networks.” Not to be outdone by his authors, Wired publisher Louis Rossetto links the digital revolution to “social changes so profound that their only parallel is probably the discovery of fire.”
Although the Wired prophets could not contain their enthusiasm, the technoskeptics fretted about a very different future. Debating Wired Executive Editor Kevin Kelly in the May 1994 issue of Harper’s, literary critic Sven Birkerts implored readers to “refuse” the lure of “the electronic hive.” The new media, he warned, pose a dire threat to the search for “wisdom” and “depth”—“the struggle for which has for millennia been central to the very idea of culture.”
Reading passionate polemics such as these, Roy and I decided that it would be the animating theme of Digital History to find a sensible middle position between these two poles. Part of this approach was pragmatic—we wanted to understand how history could, and likely would, be created and disseminated given all of this new digital technology—but part of it was also temperamental and even a little personal for the two of us: we both loved history, including its very analog and tactile aspects of working with archives and printed works, but we were also both avid computer hobbyists and felt that the digital world could do some uncanny, unparalleled things. So we sought a profoundly humanistic, but also technologically sophisticated, position on which to base the pursuit of knowledge.
* * *
Robin Sloan is a novelist who has published two books, Mr. Penumbra’s 24-Hour Bookstore and Sourdough, that are very much about this intersection between the humanistic and the technological. Beyond his very successful work as an author, he has had a career at new media companies that are often associated with cyberenthusiasm, including Twitter and Current TV, and he has also spent considerable time engaging in crafts often associated with technoskepticism, including the production of artisanal olive oil, old-school printing, and 80s-era music-making. In this larger context of his vocations and avocations, his novels seem like an attempt to find that very same, if elusive, via media between the incredible power and potential of modern technology and the humanizing warmth of our prior, analog world.
Unlike some other contemporary novelists and nonfiction writers who work in the often tense borderlands between the present and future, Sloan neither can bring himself to buy fully into the utopian dreams of Silicon Valley—although he’s clearly tickled and even wowed by the way it constantly produces unusual, boundless new tech—nor can he simply conclude that we should throw away our smartphones and move off the grid. Although he clearly loves the peculiar, inventive shapes and functions of older technology, he doesn’t badger us with a cynical jeremiad to return to some imagined purity inherent in, say, vinyl records, nor will he overdo it with an uncritical ode to our augmented-reality, gene-edited future.
Instead, his helpful approach is to put the old and new into lively conversation with each other. In his first novel, Mr. Penumbra’s 24-Hour Bookstore, Sloan set the magic of an old bookstore in conversation with the full power of Google’s server farm. In his latest novel, Sourdough, he set the organic craft of the farmer’s market and the culinary artisanry of Chez Panisse in conversation with biohacked CRISPRed food and the automation of assembly robots.
But this was in the published version of the novel. In a revealing abandoned first draft of Sourdough that Sloan made available (as a Risograph printing, of course) to those who subscribe to his newsletter, he started the novel rather differently. In the introduction to this discarded draft, titled Treasured Subscribers, Sloan briefly notes that “these were not the right characters doing the right things.” I think he’s absolutely right about that, but it’s worth unpacking exactly why, because in doing so we can understand a bit better how Sloan pursues that elusive via media, and how in turn we might discover and promote humane technology in a rapidly changing world.
[Spoiler alert: If you haven’t read Sourdough yet, I’ve kept the plot twists mostly hidden, but as you’ll see, the following contains one critical character revelation. Please stop what you’re doing, read the book, and return here.]
Treasured Subscribers begins with a similar overarching narrative concept as Sourdough: a capable, intelligent young woman moves to the Bay Area and becomes part of a mysterious underground organization that focuses on artisanal food, and that is orchestrated by a charismatic leader. Mina Fisher, a writer, lands a new marketing job at Intrepid Cellars, led by one Wolfram Wild, who refuses to carry a smartphone or use a laptop. Wild barks text and directions for his newsletter on craft food and wine offerings over what we can only assume is an aging Motorola flip phone as he travels to far-flung fields and vineyards. In short, Wild appears to be a kind of gastronomic J. Peterman, globetrotting for foodie finds. The only hint of future tech in Treasured Subscribers is a quick mention of “Chernobyl honey,” although it’s framed as just another oddball discovery rather than—as Sourdough makes much more plain—an intriguing exercise in modding traditional food through science-fiction-y means. Wild seems too busy tracking down a cider mentioned by Flaubert to think about, or articulate, the significance of irradiated apiaries.
By itself, this seems like not such a bad setup for a novel, but the problem here is that if one wishes to explore, maximally, the intersection and possibilities of human craft and high tech, one can’t have a flattened figure like Wolfram Wild, who sticks with Windows 95 on an aging PC tower. (Given the implicit nod to Stephen Wolfram in Wild’s name, I wonder if Sloan planned to eventually reveal other computational layers to the character, but it’s not there in the first chapter.) In order for Sloan’s fiction to consider the tension between technoskepticism and cyberenthusiasm, and to find some potential resolution that is both excitingly technological and reassuringly human, he can’t have straw men at either pole. Had Sloan continued with Treasured Subscribers, it would have been all too easy for the reader to dismiss Wild, cheer for Mina, and resolve any artisanal/digital divide in favor of an app for aged Bordeaux. To generate some real debate in the reader’s mind, you need more multidimensional, sophisticated characters who can speak cogently and passionately about the advantages of technology, while also being cognizant of the impact of that technology on society. A clamshell cellphone-brandishing foodie J. Peterman won’t do.
Sloan solved this problem in multiple ways in the production version of Sourdough. In the published novel, the protagonist is the young Lois Clary, a software developer who gets a job automating robot arms at General Dexterity, and learns baking at night from two lively undocumented immigrants and their equally animated starter dough. General Dexterity is led by a charismatic tech leader, Andrei, who can articulate the remarkable features of robotic hands and their potential role in work. Also hanging out at the unabashed cyberenthusiast pole, ready for conversation and debate, is the founder and CEO of Slurry Systems, the maker of artificial, nutritious, and disgusting foods of the future, Dr. Klamath. And Clary ends up working at—yes, here it returns from Treasured Subscribers, but in a different form—an underground craft food market, which is chockablock with artisanal cheeses and beverages made by off-duty scientists and a librarian who maintains a San Francisco version of the New York Public Library’s menu collection. Tech and craft are in rich, helpful collision.
The most important character, however, for our purposes here, is the delightfully named Charlotte Clingstone, who is the head of the legendary Café Candide, and the stand-in for Alice Waters of Chez Panisse fame. Chez Panisse, in Berkeley, pioneered the locavore craft food movement, and normally a fictional Waters would be a novel’s unrelenting resident technoskeptic. But in a key twist, it turns out at the end of Sourdough that Clingstone also underwrites futuristic high-tech foodie endeavors—including that “Chernobyl honey” that is a carryover from Treasured Subscribers. Clingstone both defends the craft of the farm-to-table kitchen while seeing it as important to explore the next phase of food through robotics, radiation, and RNA.
As Sourdough develops with these characters, it can thus ask in a deeper way than Treasured Subscribers whether and how we can fuse tech know-how with humanistic values; whether it’s possible to exist in a world in which a robotic hand kneads dough but the process also involves an organic, magical yeast and well-paid workers; whether that starter dough should be gene sequenced to produce artificial, nutritious, and delicious food at scale; and how craft-worthy human labor and creativity can exist in the algorithmic, technological society that is quickly approaching. The only way to find out is to experiment with the technical and digital while keeping one’s heart in the mode of more traditional human pursuits. Sloan’s protagonist, Lois, thus follows an emotional arc between developing code and developing bread.
* * *
I suppose we shouldn’t make that much of an abandoned first draft of a novel (he says 1,000 words into an exploratory blog post), but reading Treasured Subscribers has made me think again about the right middle way between technoskepticism and cyberenthusiasm that we tried to find in Digital History. Certainly the skepticism side has been on the sharp ascent as Silicon Valley has continually been tone-deaf and inhumane in important areas like privacy. Certainly we need a good healthy dose of that criticism, which is valid. But at the end of the day, when it’s time to put down the newspaper and pick up the novel, Robin Sloan holds out hope for some forms of sophisticated technology that are attuned to and serve humanistic ends. We need a bit of that hope, too.
Robin Sloan is willing to give both the artisanal and the technical their own proper limelight and honest appraisal. Indeed, much of what makes his writing both fun and thoughtful is that rather than toning down cyberenthusiam and technoskepticism to find a sensible middle, he instead uses fiction to turn them up to 11 and toward each other, to see what new harmonious sounds, if any, emerge from the cacophony. Sloan looks for the white light from the overlapping bright colors of the analog and digital worlds. Like the synthesizers he also loves—robotic computer loops intertwined with the soul of music—he seeks the fusion of the radically technological and the profoundly human.
Islandora Camp is heading to Dübendorf, near Zürich, Switzerland from June 17 - 19, 2019. Hosted by Lib4RI - Library for the Research Institutes within the ETH domain: Eawag, Empa, PSI & WSL, and located at Eawag. We'll be holding our usual style of camp, with two days of sessions and one day of hands-on training from experienced Islandora instructors. You can register for the camp and find out more here. Both workshop tracks will contain content exploring Islandora's next major version, which pairs Drupal 8 and Fedora 4, alongside more traditional training in the current release with Drupal 7 and Fedora 3. We are very pleased to announce that the instructors for our training workshop will be:
Melissa Anez has been working with Islandora since 2012 and has been the Community and Project Manager of the Islandora Foundation since it was founded in 2013. She has been a frequent instructor in the Admin Track and developed much of the curriculum, refining it with each new Camp. Lately she has been expanding the iCamp workshop to cover new ground with Islandora 8.
Sarah Last is the newest team member of the Lib4RI DORA Group since June 2018. She works as a project assistant and is responsible for the good behaviour of the DORA Repository. She mainly deals with issues regarding the form builder, export options and help page. It will be her first time in the iCamp, but is looking forward to help with the Admin Track exercises.
Mark Jordan has taught at four other Islandora Camps and at the Islandora Conference. He is the developer of numerous custom Islandora modules, including Islandora Context, Islandora Simple Map, Islandora Datastream CRUD, and the XML Solution Pack. Mark is also one of the co-developers of the the Move to Islandora Kit. Recently, he developed Riprap, a fixity checking service, and a module to integrate it into Islandora 8. He is also an Islandora committer and is currently serving as Chair of the Islandora Foundation Board of Directors. His day job is as Associate Dean of Libraries, Digital Strategy at Simon Fraser University.
Marcus Emmanuel Barnes is an active participant in the Islandora community and has presented workshops and talks at Islandora events. He develops and maintains several custom Islandora modules with the team at The Digital Scholarship Unit (DSU) at the University of Toronto Scarborough Library, including the Islandora Oral Histories Solution Pack and the Islandora Web Annotations module. Marcus is excited to share his knowledge and insights learned through many years of developing digital collections and looks forward to helping empower participants to be successful with their own Islandora-based projects.
But most of our time was spent in collaborative dialogue. OCLC Program Officers utilized a World Café format to facilitate small group conversations around one of the most important issues for libraries and research institutions today:
How do you successfully collaborate?
We also encouraged participants to particularly address the challenges of transnational collaboration — this was particularly relevant given the international makeup of our group.
In discussion groups of 4-6 people, participants examined a multitude of opportunities, challenges, and barriers to effective collaboration, distilling and sharing a few relevant takeaways:
Collaborative efforts should be TANGIBLE. People are more willing to engage when an initiative solves real world, practical problems. It’s not enough that something seems like a “worthy effort” or “something we should support.”
Collaboration can be HARD. That’s why, as participants acknowledged, so many of our efforts are institutionally-focused. Collaborating across national borders is particularly difficult as it involves challenges that are linguistic, cultural, and often with asymmetrical resources.
Collaboration is slow. That’s one of the reasons people and institutions don’t collaborate. But. . .
Cross-institutional efforts provide greater pay-offs than local, institutionally-based efforts. It’s hard to innovate and have significant impact by making changes at just the institution level.
Communication is imperative and there are many elements to this. First, it’s essential to know and understand the interests of fellow stakeholders–but it’s also important for librarians to hone their soft skills in order to be effective leaders and collaborators. And the engagement of a diverse group of participants and stakeholders is essential for sound decision making.
Funding is, of course, a perennial challenge.
Participants identified several examples of successful transnational collaboration, particularly citing programs like the Erasmus+ program, which provides opportunities for librarians to make extended visits to other institutions; the leadership development programs offered by LIBER; OPERAS (Open Access in the European Research Area through Scholarly Communications); and also some of OCLC’s efforts like WorldCat, VIAF, and OCLC Research.
Transnational cooperation leads to accelerated cross-pollination of innovations and practices as well as rich learning experiences. Facilitating transnational cooperation is a primary effort of the OCLC Research Library Partnership, and events like this one offer participants opportunities to connect and learn from others at similar research-focused institutions.
We are looking forward to hosting RLP members in Dublin, Ohio on the OCLC campus for a Research Retreat on 23-24 April 2019 where we will focus on the myriad of challenges for library leaders today: shifts in the library workforce, scholarly communication and research workflows, collecting practices, and stakeholder relationships. This event will offer information and models to support strategic conceptualization as well as in-depth, small group discussions to make sense of it all.
It will be immediately followed by ResearchWorks, in which the community can help shape an applied research agenda that charts engagement with data science and a range of computational methods. It promises to be an exciting and collaborative week.
Life sciences experts see great potential for artificial intelligence to improve health care. A recent Accenture survey found that 90 percent of life sciences’ executives recognized AI as important in driving innovation and achieving outcomes, such as hyper-personalized experiences, new sources of growth, and new levels of efficiency. However, some challenges remain.
Regulations have yet to catch up to the technology, data management continues to consume valuable time, there’s organizational resistance to change, and it can be difficult to understand why AI has come to a given conclusion.
The stakes are high — lives are literally in the balance. A deeper understanding of diseases and therapies will continue to personalize treatments, but to achieve the biggest impact, we’ll need to apply AI across data — no matter where or how it is stored — and strive for global cooperation and data sharing. Five experts weigh in.
1. Opportunities in Disease Management Are Significant
Because patients respond to treatments in various ways, physicians use a trial and error approach to treat some diseases. As methods improve to collect large volumes of patient-drug feedback, AI can help assess these trial and error clinical pathways — looking for markers and the different parameters that have led to success. We can try to answer questions about the impact of social and environmental factors, as well as physical responses to therapies. Finding that data and looking for patterns can help us find the right prescription in the right amount at the right time to treat diseases. — Steven Gerhardt, CEO, Managing Partner of Element Blue
2. Opening the Black Box Will Increase Adoption
Explainable AI will become a requirement — especially for the medical industry. If AI makes a medical recommendation for an individual’s health or treatment, the doctor must be able to explain what logic and data was used to reach that conclusion. We are not yet at a point in our relationship with AI where many people are willing to take medication or have surgery because of a recommendation by AI, especially if the involved medical professional can’t explain the why of its recommendation. — Candace Worley, VP and Chief Technical Strategist, McAfee
3. AI Can Help Bring Therapies to Market
It takes billions of dollars and more than 10 years to bring a new drug to market. The process generates massive amounts of data, and hidden in that data are the insights that could start a promising new drug program — or halt one otherwise doomed to fail expensively or one that is a possible risk to patients’ lives. Identifying drug candidates for repurposing in rare disease treatment or rigorously analyzing the safety and efficacy profiles of compounds in early R&D are all potential uses of AI in drug development. Overcoming obstacles in data cleanup and management will leave researchers free to focus on what the data is telling them in relation to their study. — Tim Miller, VP, Life Sciences Platform Solutions, Elsevier R&D Solutions
4. AI Will Drive Changes in FDA Regulations
Companies pursuing AI technologies must realize that while the health care industry is embracing this technology, the regulatory landscape is still finding its footing. With AI being implemented across the health care continuum, the FDA and other agencies find themselves contending with the prospect of regulating a moving target. Use of AI systems promises better health care management for patients and faster, more accurate diagnoses for doctors, but FDA’s traditional regulatory framework will require major changes in preparation for the advances on the horizon. — John J. Smith, partner at Hogan Lovell’s
5. AI Will Help Find Knowledge in Text
AI will help scientists use literature to make decisions, mine real-world evidence for insights, and match patients to trials. This helps address the challenge of the expense of bringing treatments to market, and efficiencies will need to be improved to mitigate some of that. So any repeatable, but time consuming, manual task likely can be automated. This frees up people to focus on the harder and more creative tasks. But challenges remain around transparency and organizational resistance to change.
— Matthew Michelson, CEO of Evid Science
Lucidworks Expert Roundups are invitation-only insights from leading C-Suite executives who share expertise, predictions, and observations about their industries. Email us if you are interested in contributing.
Atlanta, GA – March 13, 2019 – LYRASIS and DuraSpace are pleased to announce that by July 1, 2019, they will officially merge to create one dynamic organization empowered to help drive scalable change, new technologies and vital services.
This merger will create a new model for collaboration, innovation and development in the landscape of academic, research, and public libraries, galleries, archives, and museums. The merged organization will leverage its expertise, reach, and capacity to create and build new programs, services and technologies that enable durable, persistent access to data and services. The LYRASIS and DuraSpace communities will continue to benefit from the existing programs and services that they receive from each organization.
The newly merged organization will be an on-ramp to a worldwide collaborative community of more than 4,000 institutions and nine open source, community-supported technology programs across six continents. In addition, it will build capacity in the scholarly ecosystem through open technologies, services, funding opportunities, expertise, training, and support.
Both organizations bring world class experts and a common vision for the future of knowledge as it is used across their overlapping memberships. They share a common drive to provide end-to-end solutions that reflect the governance, fair pricing, technical road map, and community expectations of their memberships.
Erin Tripp, DuraSpace’s Executive Director explains, “We found that our two organizations were working more and more toward similar goals such as stewarding community-supported and open source initiatives in the scholarly ecosystem. We each observed a growing need for innovation, research and development, and thought leadership. I feel strongly that by working together, we can have a greater impact on our communities.”
Robert Miller, CEO of LYRASIS, says of the merger, “Over the past three years we have tested and launched a thought leadership and program development initiative that combines collective risk mitigation and community engagement with milestone-based funding. The rapid adoption of this by 10% of our membership has confirmed to us that this technology platform and service focus is more critical now than ever before. Leveraging DuraSpace’s global institution base with our strategic end-to-end solution focus will be a substantial win for our members.”
LYRASIS and DuraSpace have traditionally delivered high-quality content and cultural heritage solutions to their diverse memberships and will continue to do so. All services, programs, purchases and subscriptions through LYRASIS and DuraSpace will continue without interruption. No action needs to be taken at this time by any members to ensure continuity of service.
This merger strengthens each organization’s core competencies and offers new opportunities for expansion. LYRASIS will be the public name of the merged organization and led by Robert Miller. Within LYRASIS, a newly-created DuraSpace Community Supported Programs division led by Erin Tripp will be the future home of all existing DuraSpace Open Source Software projects and all LYRASIS community supported programs including ArchivesSpace and CollectionSpace, as well as DSpace, Fedora, and VIVO.
LYRASIS, a not-for-profit membership organization of more than 1000 libraries, museums, and archives supports enduring access to our shared academic, scientific and cultural heritage through leadership in open technologies, content services, digital solutions and collaboration with archives, libraries, museums and knowledge communities worldwide.
DuraSpace is an independent 501(c)(3) not-for-profit organization founded in 2009 providing leadership and innovation for open technologies that promote durable, persistent access to digital data. We collaborate with academic, scientific, cultural, technology, and research communities by supporting projects and advancing services to help ensure that current and future generations have access to our collective digital heritage.
I’m running “Phasers on Satie (Long Phase)” on STAPLR for a while. It uses the left-hand riff from Vexations by Erik Satie, normalized so all the notes are the same length. There are eighteen notes, and it’s running at 54 beats per minute, so it plays three times a minute.
Every time there’s an interaction at one of the York University Libraries desks, the piece begins to play for as many minutes as the interaction was long, but after the first time the riff is played it starts to go out of phase, and gets a little more behind every repetition. The phase length is such that the first note is just about to run into the second note by the time the repetitions are done, but it stops one repetition before that happens. Because the interactions at the desks start at different times and last for different durations, many different types of patterns can crop up.
The title is a tribute to the Canadian prog band FM and the late, great Nash the Slash. FM did “Phasors on Stun” on their first album, Black Noise.
This week, after 10 years of working at VCU Libraries, I have been letting my colleagues know that I’m nonbinary. Response from my boss, my team, and my colleagues has been so positive, and has made this process so incredibly easy.
I didn’t really have a template for a coming-out message, so ended up writing this post out to our staff intranet. I’m sharing it here in hopes that it helps some folks. Mileage certainly varies depending on where you work, but this FAQ may be helpful not only for folks coming out, but for people working alongside them.
My letter is below.
Disclaimer: Many of the answers in this FAQ won’t be true for all nonbinary folks, but it’s a jumping-off point if people want to start their own docs.
Thank you to the out trans and nonbinary librarians before me who helped me along the way, specifically Stephen Krueger, Max Bowman, char booth, Mark Matienzo, and Wen Nie Ng.
Good morning! I’m coming out as nonbinary
Y’all have made VCU feel like home for me for the past 10 years. I wanted to share with you today that I am nonbinary, and use they/them pronouns. I have been out as nonbinary in my personal life for a while and I’m ready to bring that part of myself to my work life.
I have been a member of the VCU community for a long time, I love working here, and I know this is a place where I can bring my whole self to work. I think my work and VCUL community are enriched when employees are authentically present. I think that all you kind folks at VCUL are open to welcoming me. I also think it’s important to be visible to folks in the community, especially students, who are trans or nonbinary.
What does that mean for me, your colleague?
I’m asking you to change how you talk to me and how you refer to me. Instead of using she or her pronouns to refer to me, you can use they and them. “Erin sent that message about their pronouns.” It’s kind of awkward at first but it gets easier with practice.
What can I call you?
– Addressing me: Erin, you, friend, colleague, erwhite, E-dubs, Mx. White (pronounced “mix”)…
– Referring to me: Erin, they, them, theirs, that person, friend, colleague, talented IT professional…
What shouldn’t I call you?
– Addressing me: Ms., Miss, lady, girl, woman, ma’am…
– Referring to me: she, her, he, him, it, Ms., Miss, lady, girl, woman…
What if I get it wrong?
It’s okay! If you catch yourself, correct and move on. What’s important is to try.
Will you correct me if I get it wrong?
It depends on the situation. If I remind you, it’s because I know we respect each other and both care about our relationship.
I don’t agree that I should use they/them pronouns for you.
I hope that you can respect me and honor how I am asking to be addressed, recognizing that inclusion is a core value at VCU, so we can work together. Another option is to just use my name instead of my pronouns.
One of the "defects" of RDF for data management is that it does not support business rules. That's a generality, so let me explain a bit.
Most data is constrained - it has rules for what is and what is not allowed. These rules can govern things like cardinality (is it required? is it repeatable?), value types (date, currency, string, IRI), and data relationships (If A, then not B; either A or B+C). This controlling aspect of data is what many data stores are built around; a bank, a warehouse, or even a library manage their activities through controlled data.
RDF has a different logical basis. RDF allows you to draw conclusions from the data (called "inferencing") but there is no mechanism of control that would do what we are accustomed to with our current business rules. This seems like such an obvious lack that you might wonder just how the developers of RDF thought it would be used. The answer is that they were not thinking about banking or company databases. The main use case for RDF development was using artificial intelligence-like axioms on the web. That's a very different use case from the kind of data work that most of us engage in.
RDF is characterized by what is called the "open world assumption" which says that:
- at any moment a set of data may be incomplete; that does not make it illegitimate - anyone can say anything about anything; like the web in general there are no controls over what can and cannot be stated and who can participate
However, RDF is being used in areas where data with controls was once employed; where data is validated for quality and rejected if it doesn't meet certain criteria; where operating on the data is limited to approved actors. This means that we have a mis-match between our data model and some of the uses of that data model.
This mis-match was evident to people using RDF in their business operations. W3C held a preliminary meeting on "Validation of Data Shapes" in which there were presentations over two days that demonstrated some of the solutions that people had developed. This then led to the Data Shapes working group in 2014 which produced the shapes validation language, SHACL (SHApes Constraint Language) in 2017. Of the interesting ways that people had developed to validate their RDF data, the use of SPARQL searches to determine if expected patterns were met became the basis for SHACL. Another RDF validation language, ShEx (Shape Expressions), is independent of SPARQL but has essentially the same functionality of SHACL. There are other languages as well (SPIN, StarDog, etc.) and they all assume a closed world rather than the open world of RDF.
My point on all this is to note that we now have a way to validate RDF instance data but no standard way(s) to define our metadata schema, with constraints, that we can use to produce that data. It's kind of a "tail wagging the dog" situation. There have been musings that the validation languages could also be used for metadata definition, but we don't have a proof of concept and I'm a bit skeptical. The reason I'm skeptical is that there's a certain human-facing element in data design and creation that doesn't need to be there in the validation phase. While there is no reason why the validation languages cannot also contain or link to term definitions, cataloging rules, etc. these would be add-ons. The validation languages also do most of their work at the detailed data level, while some guidance for humans happens at the macro definition of a data model - What is this data for? Who is the audience? What should the data creator know or research before beginning? What are the reference texts that one should have access to? While admittedly the RDA Toolkit used in library data creation is an extreme form of the genre, you can see how much more there is beyond defining specific data elements and their valid values. Using a metadata schema in concert with RDF validation - yes! That's a winning combination, but I think we need bot.
Note that there are also efforts to use the validation languages to analyze existing graphs.(PDF) These could be a quick way to get an overview of data for which you have no description, but the limitations of this technique are easy to spot. They have basically the same problem that AI training datasets do: you only learn what is in that dataset, not the full range of possible graphs and values that can be produced. If your data is very regular then this analysis can be quite helpful; if your data has a lot of variation (as, for example, bibliographic data does) then the analysis of a single file of data may not be terribly helpful. At the same time, exercising the validation languages in this way is one way to discover how we can use algorithms to "look at" RDF data.
Another thing to note is that there's also quite a bit of "validation" that the validation languages do not handle, such as the reconciliation work that if often done in OpenRefine. The validation languages take an atomistic view of the data, not an overall one. I don't see a way to ask the question "Is this entry compatible with all of the other entries in this file?" That the validation languages don't cover this is not a fault, but it must be noted that there is other validation that may need to be done.
WOL, meet WVL
We need a data modeling language that is suitable to RDF data, but that provides actual constraints, not just inferences. It also needs to allow one to choose a closed world rule. The RDF suite of standards has provided the Web Ontology Language, which should be WOL but has been given the almost-acronym name of OWL. OWL does define "constraints", but they aren't constraints in the way we need for data creation. OWL constrains the axioms of inference. That means that it gives you rules to use when operating over a graph of data, and it still works in the open world. The use of the term "ontology" also implies that this is a language for the creation of new terms in a single namespace. That isn't required, but that is becoming a practice.
What we need is a web vocabulary language. WVL. But using the liberty that went from WOL to OWL, we can go from WVL to VWL, and that can be nicely pronounced as VOWEL. VOWEL (I'm going to write it like that because it isn't familiar to readers yet) can supply the constrained world that we need for data creation. It is not necessarily an RDF-based language, but it will use HTTP identifiers for things. It could function as linked data but it also can be entirely in a closed world. Here's what it needs to do:
describe the things of the metadata
describe the statements about those things and the values that are valid for those statements
give cardinality rules for things and statements
constrain values by type
give a wide range of possibilities for defining values, such as lists, lists of namespaces, ranges of computable values, classes, etc.
for each thing and statement have the ability to carry definitions and rules for input and decision-making about the value
can be serialized in any language that can handle key/value pairs or triples
can (hopefully easily) be translatable to a validation language or program
Obviously there may be more. This is not fully-formed yet, just the beginning. I have defined some of it in a github repo. (Ignore the name of the repo - that came from an earlier but related project.) That site also has some other thoughts, such as design patterns, a requirements document, and some comparison between existing proposals, such as the Dublin Core community's Description Set Profile, BIBFRAME, and soon Stanford's profle generator, Sinopia.
One of the ironies of this project is that VOWEL needs to be expressed as a VOWEL. Presumably one could develop an all-new ontology for this, but the fact is that most of what is needed exists already. So this gets meta right off the bat which makes it a bit harder to think about but easier to produce.
There will be a group starting up in the Dublin Core space to continue development of this idea. I will announce that widely when it happens. I think we have some real possibilities here, to make VOWEL a reality. One of my goals will be to follow the general principles of the original Dublin Core metadata, which is that simple wins out over complex, and it's easier to complex-ify simple than to simplify complex.
In the past year at Lucidworks we’ve doubled our team, moved into a new (and bigger) HQ office in San Francisco, and begun our expansion into new emerging markets. Today I’m honored to announce Robert Lau as one of our new key hires who will be leading the way as Chief Operating Officer in APAC and Global Emerging Markets.
Lau has over two decades of experience in engineering, marketing, sales, and operations in the software industry, most recently leading APAC operations at Splunk and Elastic. His experience building out successful international operations from scratch makes us confident that he’ll lead the company towards the same success we’ve had in North America, and we’ll be looking to him for guidance as we continue to grow.
“The Asia-Pacific region is experiencing a shift from web 2.0 to 3.0 with the proliferation of 5G networks led by regional operators creating a fresh need for intelligent applications and enterprise solutions,” says Lau. “A majority of consumers are now choosing to shop online for their daily wants and needs, and they expect a highly personalized user experience. Employees are also beginning to expect that high level of ease and personalization when navigating data and information at work. Lucidworks’ Fusion will provide great value to businesses that are capitalizing on this digital market. I’m honored to be a part of the Lucidworks executive team as we continue to share our commitment to innovation with our future partners and customers.”
We see the Asia-Pacific region as incredibly important, with a growing need for digital commerce and digital workplace solutions to help organizations better understand and serve their customers and employees. We’ve established teams in Australia, Hong Kong, India, Thailand, Japan, Korea and Singapore. The next step? Building out a robust partner ecosystem to best serve new customers in these markets.
2019 is a big year for the company and we have a lot to do, to learn about, and to grow. Welcome to the team, Robert! Excited to get to work.
Lucene/Solr 8 is about to be released. Among a lot of other things is brings LUCENE-8585, written by your truly with a heap of help from Adrien Grand. LUCENE-8585 introduces jump-tables for DocValues, is all about performance and brings speed-ups ranging from worse than baseline to 1000x, extremely dependent on index and access pattern.
This is a follow-up post to Faster DocValues in Lucene/Solr 7+. The previous post contains an in-depth technical explanation of the DocValues mechanisms, while this post focuses on the final implementation.
Whenever the content of a field is to be used for grouping, faceting, sorting, stats or streaming in Solr (or Elasticsearch or Lucene, where applicable), it is advisable to store it using DocValues. It is also used for document retrieval, depending on setup.
DocValues in Lucene 7: Linked lists
Lucene 7 shifted the API for DocValues from random access to sequential. This meant smaller storage footprint and cleaner code, but also caused the worst case single value lookup to scale linear with document count: Getting the value for a DocValued field from the last document in a segment required a visit to all other value blocks.
The linear access time was not a problem for small indexes or requests for values for a lot of documents, where most blocks needs to be visited anyway. Thus the downside of the change was largely unnoticeable or at least unnoticed. For some setups with larger indexes, it was very noticeable and for some of them it was also noticed. For our netarchive search setup, where each segments has 300M documents, there was a severe performance regression: 5-10x for common interactive use.
Text book optimization: Jump-tables
The Lucene 7 DocValues structure behaves as a linked list of data-nodes, with the specializations that it is build sequentially and that it is never updated after the build has finished. This makes it possible to collect the node offsets in the underlying index data during build and to store an array of these offsets along with the index data.
With the node offsets quickly accessible, worst-case access time for a DocValue entry becomes independent of document count. Of course, there is a lot more to this: See the previously mentioned Faster DocValues in Lucene/Solr 7+ for details.
One interesting detail for jump-tables is that they can be build both as a cache on first access (see LUCENE-8374) and baked into the index-data (see LUCENE-8585). I much preferred having both options available in Lucene, to get instant speed up with existing indexes and technically superior implementation for future indexes. Alas, only LUCENE-8585 was deemed acceptable.
Best case test case
Our netarchive search contains 89 Solr collections, each holding 300M documents in 900GB of index data. Each collection is 1 shard, merged down to 1 segment and never updated. Most fields are DocValues and they are heavily used for faceting, grouping, statistics, streaming exports and document retrieval. The impact of LUCENE-8585 should be significant.
In netarchive search, all collections are searched together using an alias. For the tests below only a single collection was used for practical reasons. There are three contenders:
Unmodified Solr 7 collection, using Solr 8.0.0 RC1. Codename Solr 7. In this setup, jump-tables are not active as Solr 8.0.0 RC1, which includes LUCENE-8585, only supports index-time jump-tables. This is the same as Solr 7 behaviour.
Solr 7 collection upgraded to Solr 8, using Solr 8.0.0 RC1. Codename Solr 8r1. In this setup, jump-tables are active and baked into the index data. This is the expected future behaviour when Solr 8 is released.
Solr 7 collection, using Lucene/Solr at git commit point 05d728f57a28b9ab83208eda9e98c3b6a51830fc. Codename Solr 7 L8374. During LUCENE-8374 (search time jump tables) development, the implementation was committed to master. This was later reverted, but the checkout allow us to see what the performance would have been if this path had been chosen.
Test hardware is a puny 4-core i5 desktop with 16GB of RAM, a 6TB 7200RPM drive and a 1TB SSD. About 9GB of RAM free for disk cache. Due to time constraints only the streaming export test has been done on the spinning drive, the rest is SSD only.
Premise: Solr’s export function is used by us to extract selected fields from the collection, typically to deliver a CSV-file with URLs, MIME types, file sizes etc for a corpus defined by a given filter. It requires DocValues to work.
DV-Problem: The current implementation of streaming export in Solr does not retrieve the field values in document order, making the access pattern extremely random. This is absolute worst case for sequential DocValues. Note that SOLR-13013 will hopefully remedy this at some point.
The test performs a streaming export of 4 fields for 52,653 documents in the 300M index. The same export is done 4 times, to measure the impact of caching.
Observation: Both Solr 8r1 and Solr 7 L8374 vastly outperforms Solr 7. On a spinning drive there is a multi-minute penalty for run 1 after which the cache has been warmed. This is a well known phenomenon.
Premise: Faceting is used everywhere and it is a hard recommendation to use DocValues for the requested fields.
DV-Problem: Filling the counters used when faceting is done in document order, which works well with sequential access as long as the jumps aren’t too long: Small result sets are relatively heavier penalized than large result sets.
Reading the graphs: All charts in this blog post follows the same recipe:
X-axis is hit count (aka result set size), y-axis is response time (lower is better)
Hit counts are bucketed by order of magnitude and for each magnitude, boxes are shown for the three contenders: Blue boxes are Solr 7, pink are Solr 8r1 and green are Solr 7 L8374
The bottom of a box is the 25 percentile, the top is the 75 percentile. The black line in the middle is the median. Minimum response time for the bucket is the bottom spike, while the top spike is 95 percentile
Maximum response times are not shown as they tend to jitter severely due to garbage collection
Observation: Modest gains from jump-tables with both Solr 8rc1 and Solr 7 L8374. Surprisingly the gains scale with hit count, which should be investigated further.
Premise: Grouping is used in netarchive search to collapse multiple harvests of the same URL. As with faceting, using DocValues for grouping fields are highly recommended.
DV-Problem: As with faceting, group values are retrieved in document order and follows the same performance/scale logic.
Observations: Modest gains from jump-tables, similar to faceting.
Premise: Sorting is a basic functionality.
DV-Problem: As with faceting and grouping, the values used for sorting are retrieved in document order and follows the same performance/scale logic.
This tests performs simple term-based searches with sorting on the high-cardinality field content_length.
Observations: Modest gains from jump-tables. Contrary to faceting and grouping, performance for high hit counts are the same for all 3 setups, which fits with the theoretical model. Positively surprising is that the theoretical overhead of the jump-tables does not show for higher hit counts.
Premise: Content intended for later retrieval can either be stored explicitly or as docValues. Doing both means extra storage, but also means that everything is retrieved from the same stored (and compressed) blocks, minimizing random access to the data. For the netarchive search at the Royal Danish Library we don’t double-store field data and nearly all of the 70 retrievable fields are docValues.
DV-Problem: Getting a search result is a multi-step process. Early on, the top-X documents matching the query are calculated and their IDs are collected. After that the IDs are used for retrieving document representations. If this is done from DocValues, it means random access linear to the number of documents requested.
Observations: Solid performance improvement with jump-tables.
Premise: The different functionalities are usually requested in combination. At netarchive search a typical request uses grouping, faceting, cardinality counting and top-20 document retrieval.
DV-Problem: Combining functionality often means that separate parts of the index data are accessed. This can cause cache thrashing if there is not enough free memory for disk cache. With sequential DocValues, all intermediate blocks needs to be visited, increasing the need for disk cache. Jump-tables lowers the number of storage requests and are thus less reliant on cache size.
Observations: Solid performance improvement with jump tables. As with the previous analysis of search-time jump tables, utilizing multiple DocValues-using functionality has a cocktail effect where the combined impact is larger than the sum of the parts. This might be due to disk cache thrashing.
Overall observations & conclusions
The effect of jump tables, both with Solr 8.0.0 RC1 and LUCENE-8374, is fairly limited; except for export and document retrieval, where the gains are solid.
The two different implementations of jump tables performs very similar. Do remember that these tests does not involve index updates at all: As LUCENE-8374 is search-time, it does have a startup penalty when indexes are updated.
For a the large segment index tested above, the positive impact of jump tables is clear. Furthermore there is no significant slow down for higher hit counts with faceting/grouping/statistics, where the jump tables has no positive impact.
Before running these tests, it was my suspicion that the search-time jump tables in LUCENE-8374 would perform better than the baked-in version. This showed not to be the case. As such, my idea of combining the approaches by creating in-memory copies of some of the on-storage jump tables has been shelved.
Performance testing is never complete, it just stops. Some interesting thing to explore could be
Sir Tim Berners-Lee’s invention of the world wide web has transformed modern life, but more work must be done to ensure it continues to be a force for good, writes Catherine Stihler.
At the giant research laboratory in a suburb of Geneva, the innovative ideas produced by the scientists were stored on multiple, incompatible, computers.
It was the year 1989, and one British worker at CERN decided to write a short document called “Information Management: A Proposal”.
Tim Berners-Lee wrote: “Many of the discussions … end with the question – ‘Yes, but how will we ever keep track of such a large project?’ This proposal provides an answer to such questions.”
In simpler terms, his theory addressed this idea: “Suppose all the information stored on computers everywhere were linked.”
This vision of universal connectivity was produced 30 years ago today, and by 1991 it became the World Wide Web.
Within just a few years, the web became something that wasn’t restricted to computer scientists alone, with the computers in libraries, universities and eventually people’s homes, fundamentally changing our lives.
Over three decades, there has been a long list of extraordinary achievements, culminating in a world where we can now access the web from phones in our pockets, the TVs in our living rooms and the watches on our wrists.
To mark the 30th anniversary, web founder Sir Tim Berners-Lee is taking a 30-hour trip, starting at CERN in Switzerland, travelling via London and finishing in Lagos. Throughout, he will be participating in a #web30 Twitter feed that will highlight significant moments in the web’s history.
Former Vice-President Al Gore will recall the passing of the High Performance Computing Act in 1991, also called the Gore Bill, which promoted cooperation between government, academia, and industry.
It helped fund work which led to the creation of the Mosaic web browser – a key moment as browsers are how we access the World Wide Web. In 1995, Microsoft launched Internet Explorer – a platform still familiar to millions of people around the world.
There will also be a fun side to the celebrations, such as the moment the world was first introduced to ‘grumpy cat’.
For me, as chief executive of Open Knowledge International, there are several key moments that I believe deserve to be remembered.
Our role is to help governments, universities, and civil society organisations reach their full potential by providing them with skills and tools to publish, use, and understand data.
We deliver technology solutions, enhance data literacy, provide cutting-edge research and mobilise communities to provide value for a wide range of international clients.
In 2005 we created the Open Definition, the gold standard for open data which remains in place to this day.
Two years later, our founder Rufus Pollock launched the Comprehensive Knowledge Archive Network, or CKAN as it is known.
It’s a registry of open knowledge packages and projects — be that a set of Shakespeare’s works, the voting records of MPs, or 30 years of US patents.
It is now used across the world, including the data.gov.uk site where you can find data published by central government, local authorities and public bodies in the UK to help designers build products and services.
Another key moment which deserves to be celebrated came in July 2009 when a set of principles to promote open science were written down in a pub called the Panton Arms in Cambridge – the Panton Principles. Among those present was Rufus Pollock.
When open data becomes useful, usable and used – when it is accessible and meaningful and can help someone solve a problem – that’s when it becomes open knowledge.
It can make powerful institutions more accountable, while vital research can help us tackle challenges such as poverty, disease and climate change.
All this would not have been possible without the invention of the World Wide Web.
Today, however, we are at a crossroads. While the web has been a force for good, it has also allowed for the spread of fake facts and disinformation. Political earthquakes around the globe have led to the rise of populism, and people are uncomfortable about the amount of power held by some giant tech companies like Facebook and Google.
The challenge for the next 30 years is to build a digital economy for the many, based on the principles of fairness and freedom.
The web provides the opportunity to empower communities, and we must seize that opportunity and ensure that digital advances are used for the public good.
So attempts to build a more closed society must be addressed. One example of that will come later this month when the European Parliament votes on a controversial copyright crackdown that threatens the future of the internet.
We also want candidates to support improved transparency measures at social media companies like Facebook to prevent the spread of disinformation and fake news; champion ‘responsible data’ to ensure that data is used ethically and legally; back efforts to force governments and organisations to use established and recognised open licences when releasing data or content; and push for greater openness in their country, including committing to domestic transparency legislation.
Sir Tim Berners-Lee’s invention has transformed our world, but the task is to ensure that it continues to transform our world for the better – and that falls to all of us.
Let’s make the next 30 years of the digital era one of fairness, freedom and openness for all.
Having data for budgets and spending can allow us to track public money flows in our communities. It can give us insights into how governments plan and focus on programmes, public works, and services. So the Global Initiative for Fiscal Transparency (GIFT), along with Open Knowledge International (OKI), have been working on new tools to make this data more useful and easier to understand.
Two of these are the [Open] Fiscal Data Package (OFDP) specification and OpenSpending platform.
The OFDP is a data specification that allows publishers to create a literal package of data. This package includes fiscal data mapped onto either standardised or bespoke functional, economic and administrative classifications. Additionally, the different stages of the budget can be mapped, and other fields that are relevant to the publisher. This seeks to reduce the barriers to accessing and interpreting fiscal open data.
One of the main benefits of the OFDP is that data publishers can adopt it no matter how they generate their databases. The flexibility of this specification allows publishers to improve the quality incrementally. There is no need to develop new software. Having this structured data allows us to build tools and services over it for visualization, analysis or comparison.
The second tool is actually a set of tools called OpenSpending.
This is an open-source and a community-driven project. It reflects the valuable contributions of an active, passionate and committed community.
OpenSpending enables analysis, dissemination, and debates for more efficient budgets and public spending. It allows anyone to create, use, and visualize fiscal data using the Open Fiscal Data Package in a centralized place with small effort.
As part of this collaboration, OKI and GIFT have been working with different government partners to publish using OFDP. But we want to see the adoption of the Open Fiscal Data Package grow even more. This is why we have set up the Fiscal Data Helpdesk to help you in the publication process!
How to engage with the Fiscal Data Helpdesk
Maybe you are already publishing fiscal data through an open data portal? Or maybe you have a platform and want to make it more useful for a larger number of users? Perhaps you have heard about standardization but it sounds complex and you think it might not be for your office? The Helpdesk is around to answer all your questions and support you through the process of getting data up and running in OpenSpending.
There are a few good examples of what we want you to get doing. We’ve worked with the Mexican federal government to publish their data from 2008 to 2019 using the OFDP and OpenSpending to make it easier to access. You can navigate their data here.
We’ve also worked to get datasets from many countries in the World bank BOOST initiative on OpenSpending. Currently, there are data from countries like Burkina Faso, Guatemala, Paraguay, and Uruguay.
In the coming weeks, we will publish some resources and a series of blog posts to give you more information about publishing your data in OFDP and using OpenSpending.
From the broken links form, I began to cull some data on the problem. I can tell you, for instance, which destination databases experience the most problems or what the character of the most common problems is. The issue is the sample bias—are the problems that are reported really the most common? Or are they just the ones that our most diligent researchers (mostly our librarians, graduate students, and faculty) are likely to report? I long for quantifiable evidence of the issue without this bias.
How I classify the broken links that have been reported via our form. N = 57
Select Searches & Search Results
So how would one go about objectively studying broken links in a discovery layer? The first issue to solve is what searches and search results to review. Luckily, we have data on this—we can view in our analytics what the most popular searches are. But a problem becomes apparent when one goes to review those search terms:
Of course, the most commonly occurring searches tend to be single words. These searches all trigger “best bet” or database suggestions that send users directly to other resources. If their result lists do contain broken links, those links are unlikely to ever be visited, making them a poor choice for our study. If I go a little further into the set of most common searches, I see single-word subject searches for “drawing” followed by some proper nouns (“suzanne lacy”, “chicago manual of style”). These are better since it’s more likely users actually select items from their results but still aren’t a great representation of all the types of searches that occur.
Why are these types of single-word searches not the best test cases? Because search phrases necessarily have a long tail distribution; the most popular searches aren’t that popular in the context of the total quantity of searches performed 2. There are many distinct search queries that were only ever executed once. Our most popular search of “artstor”? It was executed 122 times over the past two years. Yet we’ve had somewhere near 25,000 searches in the past six months alone. This supposedly popular phrase has a negligible share of that total. Meanwhile, just because a search for “How to Hack it as a Working Parent. Jaclyn Bedoya, Margaret Heller, Christina Salazar, and May Yan. Code4Lib (2015) iss. 28″ has only been run once doesn’t mean it doesn’t represent a type of search—exact citation search—that is fairly common and worth examining, since broken links during known item searches are more likely to be frustrating.
Even our 500 most popular searches evince a long tail distribution.
So let’s say we resolve the problem of which searches to choose by creating a taxonomy of search types, from single-word subjects to copy-pasted citations. 3 We can select a few real world samples of each type to use in our study. Yet we still haven’t decided which search results we’re going to examine! Luckily, this proves much easier to resolve. People don’t look very far down in the search results 4, rarely scrolling past the first “page” listed (Summon has an infinite scroll so there technically are no pages, but you get the idea). Only items within the first ten results are likely to be selected.
Once we have our searches and know that we want to examine only the first ten or so results, my next thought is that it might be worth filtering our results that are unlikely to have problems. But does skipping the records from our catalog, institutional repository, LibGuides, etc. make other problems abnormally more apparent? After all, these sorts of results are likely to work since we’re providing direct links to the Summon link. Also, our users do not heavily employ facets—they would be unlikely to filter out results from the library catalog. 5 In a way, by focusing a study on search results that are the most likely to fail and thus give us information about underlying linking issues, we’re diverging away from the typical search experience. In the end, I think it’s worthwhile to stay true to more realistic search patterns and not apply, for instance, a “Full Text Online” filter which would exclude our library catalog.
Next Time on Tech Connect—oh how many ways can things go wrong?!? I’ll start investigating broken links and attempt to enumerate their differing natures.
This script was largely copied from Robert Hoyt of Fairfield University, so all credit due to him. ↩
For instance, see: Beitzel, S. M., Jensen, E. C., Chowdhury, A., Frieder, O., & Grossman, D. (2007). Temporal analysis of a very large topically categorized web query log. Journal of the American Society for Information Science and Technology, 58(2), 166–178. “… it is clear that the vast majority of queries in an hour appear only one to five times and that these rare queries consistently account for large portions of the total query volume” ↩
Ignore, for the moment, that this taxonomy’s constitution is an entire field of study to itself. ↩
Pan, B., Hembrooke, H., Joachims, T., Lorigo, L., Gay, G., & Granka, L. (2007). In google we trust: Users’ decisions on rank, position, and relevance. Journal of Computer-Mediated Communication, 12(3), 801–823. ↩
In fact, the most common facet used in our discovery layer is “library catalog” showing that users often want only bibliographic records; the precise opposite of a search aimed at only retrieving article database results. ↩
Buried in the recent debates (New York Times, Chicago Tribune, The Public Historian) about the nature, objectives, and location of the Obama Presidential Center is the inexorable move toward a world in which virtually all of the documentation about our lives is digital.
To make this decades-long shift—now almost complete—clear, I made the following infographic comparing three representative presidential libraries, each a generation apart: LBJ’s, Bill Clinton’s, and Barack Obama’s. Each square represents the relative overall size of these presidential archives—roughly 46 million pages for LBJ, 100 million for Clinton, and 360 million for Obama—as well as the basic categories of archival material: paper documents, photographs and audiovisual media, and, starting with Clinton, email.
The LBJ Presidential Library has 45 million pages of paper documents and a million photographs, recordings, and other media. The Clinton Presidential Library contains 78 million pages of documents, 20 million emails, 2 million photographs, and 12,500 videotapes. (Note that contrary to all of the recent coverage of Obama as “the first digital president,” given his administration’s rapid adoption of email in the 1990s, Clinton really should hold that title, as I’ve discussed elsewhere.)
We are still in the process of assessing all that will go into the Obama Presidential Library (other libraries have added considerable new caches of documents over time), but the rough initial count from the U.S. National Archives and Records Administration is that there are about 300 million emails from Obama’s eight years in the White House, and about 30 million pages of paper documents. The chart above would be even more email-centric for Obama’s library if I used NARA’s calculation of a few paper pages per email, which would equal over a billion pages in printed form. In other words, using a more rigorous comparison at best only 3% of the Obama record is print vs. digital.
More vaguely estimated above are the millions of “pages” associated with the many other digital forms the Obama administration used, including websites, apps, and social media (you can already download the entirety of the latter as .zip files here). Most of the photos (many of which were uploaded to Flickr) and videos were of course also born digital. (Update, 3/11/19: The Obama Foundation came out with a new fact sheet that says that “an estimated 95 percent of the Obama Presidential Records were created digitally and have no paper equivalents. It also says that there are roughly 1.5 billion pages in the collection, including everything I’ve detailed here.)
It’s unfortunate that it’s still relatively expensive and time-consuming to digitize analog materials. Nearly two decades on, the Clinton Presidential Library has only digitized about 1% of their paper holdings (about 700,000 pages). The Reagan Presidential Library charges $.80 to digitize one page of his archives. The Obama Presidential Center’s commitment to funding the complete digitization of those 30 million paper pages, in what seems like a more rapid fashion and with open access to the public, seems rather laudable in this context.
Ultimately, I suppose it’s best to say that Obama was “the first almost fully digital president,” and with the digitization of the remaining paper record, will become “the first fully machine-readable and -indexed president.” (Part of the debate in academic and library circles about this shift in the Obama Presidential Center/Library has to do with the role of archivists and historians to create good metadata for, and more thorough searches through, administration documents, but with a billion+ pages, I don’t see how this can be done without serious computational means.)
Meanwhile, all of us have more quietly followed the same path, with only a very small percentage of our overall record now existing in physical formats rather than bits. How we will preserve this heterogeneous and perhaps ephemeral digital record when we don’t have our own presidential libraries and the resources of NARA is a different and more worrisome story.
I've found it increasingly difficult to make time to blog, and it's not so much not having the time — I'm pretty privileged in that regard — but finding the motivation. Thinking about what used to motivate me, one of the big things was writing things that other people wanted to read.
Rather than try to guess, I thought I'd ask!
Those who know what I'm about, what would you read about, if it was written by me?
I'm trying to break through the blog-writers block and would love to know what other people would like to see my ill-considered opinions on.
Sometimes I forget that my background makes me well-qualified to take some of these technical aspects of the job and break them down for different audiences. There might be a whole series in this...
Carrying on our conversation last week I'd love to hear more about how you've found moving from an HE lib to a national library and how you see the BL's role in RDM. Appreciate this might be a bit niche/me looking for more interesting things to cite :)
This is so frustrating as an end user, but at the same time I get that endpoint security is difficult and there are massive risks associated with letting end users have admin rights. This is particularly important at the BL: as custodian's of a nation's cultural heritage, the risk for us is bigger than for many and for this reason we are now Cyber Essentials Plus certified. At some point I'd like to do some research and have a conversation with someone who knows a lot more about InfoSec to work out what the proper approach to this, maybe involving VMs and a demilitarized zone on the network.
I'm always looking for more inspiration, so please leave a comment if you've got anything you'd like to read my thoughts on. If you're not familiar with my writing, please take a minute or two to explore the blog; the tags page is probably a good place to get an overview.
Academic publishes mathematical theory for conformance among hipsters: https://arxiv.org/pdf/1410.8001.pdf MIT Tech Review covers it, with a fancy photo illustration using a stock photo of a hipster-looking male: https://www.technologyreview.com/s/613034/the-hipster-effect-why-anti-conformists-always-end-up-looking-the-same/ A hipster-looking male contacts MIT Tech Review to loudly complain about their using a picture of him without asking: https://twitter.com/glichfield/status/1103040764794363904 It turns out the hipster-looking male in the photo isn’t the same as the one who complained: https://twitter.com/glichfield/status/1103044630134882305
Apply for a mini-grant to build an open source tool for reproducible research using Frictionless Data tooling, specs, and code base
Today, Open Knowledge International is launching the Frictionless Data Tool Fund, a mini-grant scheme offering grants of $5,000 to support individuals or organisations in developing an open source tool for reproducible science or research built using the Frictionless Data specifications and software. We welcome submissions of interest until the 30th of April 2019.
The Tool Fund is part of the Frictionless Data for Reproducible Research project at Open Knowledge International. This project, funded by the Sloan Foundation, applies our work in Frictionless Data to data-driven research disciplines, in order to facilitate reproducible data workflows in research contexts. At its core, Frictionless Data is a set of specifications for data and metadata interoperability, accompanied by a collection of software libraries that implement these specifications, and a range of best practices for data management. The core specification, the Data Package, is a simple and practical “container” for data and metadata.
With this announcement we are looking for individuals or organizations of scientists, researchers, developers, or data wranglers to build upon our existing open source tools and code base to create novel tooling for reproducible research. The fund will be accepting submissions from now until the end of April 2019 for work which will be completed by the end of the year.
This builds on the success of the first tool fund in 2017 which funded the creation of libraries for Frictionless Data specifications in a range of additional programming languages.
For this year’s Tool Fund, we would like the community to work on tools that can make a difference to researchers and scientists.
Applications can be submitted by filling out this form by 30 April 2019 latest.
The Frictionless Data team will notify all applicants whether they have been successful or not at the very latest by the end of May. Successful candidates will then be invited for interviews before the final decision is given. We will base our choice on evidence of technical capabilities and also favour applicants who demonstrate an interest in practical use of the Frictionless Data Specifications. Preference will also be given to applicants who show an interest working with and maintaining these tools going forward.
A new report by the carbon emission think-tank The Shift Project out this week highlights that not much has changed since . ICT still contributes to about 4 per cent of global greenhouse gas emissions, which is still twice that of civil aviation. What is worse, its contribution is growing more quickly than that of civil aviation.
At the current pace of digital GHG emissions, the total over the same period of the additional digital emissions in comparison to 2018 will be about 2.1 GtCO2eq, which would cancel about 20% of the necessary effort made to reduce them.
So the growth in IT energy consumption is a big problem. Where do the kWh go? The pie chart shows the breakdown. The slices marked P are for the production of various types of IT hardware; the others are for the use of the hardware once it has been sold.
45% of the total energy use by IT goes in manufacturing new hardware, 55% goes into running the installed base. Presumably there is some energy used in recycling, or more likely landfilling, the obsolete equipment but that isn't apparent from the chart.
The Moore's-Law-driven 5-year life of IT equipment is thus a really big problem. And this is another area where cryptocurrency mining makes a significant contribution. Mining hardware has a much shorter service life than other types of IT hardware, typically 12 months. So if the split between production and use costs is the same 45/55% as for regular IT equipment, and the top 5 cryptocurrency's usage consumes as much electricity as the Netherlands (about 109 billion kWh/yr), producing their mining equipment with 5-year life would add more than another Belgium (about 90 billion kWh). Thus with a 1-year life it would add about 450 billon kWh, or a France. This is an over-estimate. Cryptocurrency mining hardware is run flat-out continuously for its service life, whereas other IT equipment has a duty cycle. So its usage proportion will be greater, and thus effect of the 1-year life somewhere between a Belgium and a France.
It is easy for manufacturers to design for a much longer service life, but so far it hasn't been economic. For an example, see Dave Anderson's presentation on Seagate's analysis of extended hard drive service life. Perhaps once people figure out that Moore's Law is dead, and Kryder's Law is dying they will insist on keeping their equipment longer. IT manufacturers, like US automakers in the 60s and 70s, have become hooked on planned obsolescence, so they are likely to respond by adding fins and chrome to their hardware rather than making it last longer.
Overall, the improvement in GDP from each increased dollar of IT spending is dropping. The reasons are likely to be similar to those I discussed in Falling Research Productivity, because the underlying mechanisms aren't specific to research.
The 16% of the energy devote to running networks is likely to increase because traffic, primarily Internet video, is growing at 24% CAGR but technology isn't improving the energy efficiency of network gear that fast.
So don't think that by indulging in "Netflix and chill" rather than driving round to the pub you are helping solve the climate crisis. You need to curl up with a good Kindle book on a really old smartphone instead.
Emoji are showing up as evidence in court more frequently with each passing year. Between 2004 and 2019, there was an exponential rise in emoji and emoticon references in US court opinions, with over 30 percent of all cases appearing in 2018, according to Santa Clara University law professor Eric Goldman, who has been tracking all of the references to “emoji” and “emoticon” that show up in US court opinions.
In January I joined OCLC Research as a Practitioner Researcher in Residence. Over the next few months, I will lead the development of an applied research agenda that charts a path for library community engagement with data science and a range of computational methods. The agenda will be the product of a diverse set of community engagements.
The possibilities for libraries and their users in this space are many. Machine learning multiplies connections between research outputs, allowing for enhanced demonstration of impact; computer vision surfaces structured data from collections; allowing for expanded discoverability; new tools increase access to collections; allowing progress on global information equity; and a range of methods are used to analyze collections at scale, allowing for actionable insights that support sustainability and the realization of core values. To move forward as a community, key challenges need to be identified. Challenges must be matched with questions, questions must be matched with methods, and actions must be matched with contexts for collaboration. All of this needs to be grounded by carefully considered ethical commitments.
Applied research agenda development is guided by the following working group:
DuraSpace is pleased to announce that Heather Greer Klein has taken on temporary, part-time responsibilities as the DSpace 7 Product Manager. In this role, Heather will work closely with the DSpace Technical Lead, DSpace Governance, and DSpace 7 working groups to manage the Preview and Beta releases of DSpace 7.
Ms. Greer Klein will also continue as the Services Coordinator on the DuraSpace hosted services team, the role she has held since 2016. Heather manages DuraSpace hosted services (DuraCloud,DSpaceDirect, ArchivesDirect) including customer service, product pricing, new account set up, onboarding, and training.
Heather came to DuraSpace with six years of experience as the Member Services Coordinator at NC LIVE, a consortium of North Carolina institutions providing digital content and services to support 200 member libraries and their communities. In that role she developed community and project management skills by working closely with member libraries and other stakeholders to develop a responsive and innovative member services program.
DuraSpace extends a warm welcome to Heather in her new role as the DSpace Product Manager.
Libraries are haunted houses. As our patrons move through scenes and illusions that took years of labor to build and maintain, we workers are hidden, erasing ourselves in the hopes of providing a seamless user experience, in the hopes that these patrons will help defend Libraries against claims of death or obsolescence. However, ‘death of libraries’ arguments that equate death with irrelevance are fundamentally mistaken. If we imagine that a collective fear has come true and libraries are dead, it stands to reason that library workers are ghosts. Ghosts have considerable power and ubiquity in the popular imagination, making death a site of creative possibility. Using the scholarly lens of haunting, I argue that we can experience time creatively, better positioning ourselves to resist the demands of neoliberalism by imagining and enacting positive futurities.
Thinkpieces on the death of libraries are abundant and have been for quite some time. In 2005, the MIT Technology Review identified Google Books’ mass digitization effort as the driving force that “could reduce today’s libraries to musty archives.” Despite some sensational language, the article is just saying that digitization will change the scope of library collections and services. A controversial Forbes article offered a less benign take, suggesting that Amazon should replace public libraries. The piece was taken down shortly after its publication in summer 2018, in part because the author was writing outside the scope of their expertise.
While the authors of death of libraries articles are usually not affiliated with libraries, library workers are quick to debunk and challenge death of libraries content. As a profession constantly asked to justify the existence of our institutions and quantify the value of our labor, defensive impulses are a normal response. In this article, however, I ask library workers to engage with a rather different stance: that being dead might not actually be a bad thing after all.
Mortician and educator, Caitlin Doughty, explains the sentiment well:
Do not be afraid to delight in death. Of course I do not mean you are happy when someone dies, or happy to see someone in pain or mourning. But the vast majority of your life isn’t spent in mourning. It’s spent living. And while you’re living, it will not hurt you to have a fun, positive relationship with Death. Death is fascinating. Chaotic and ordered at the same time. There are strange rituals and art to be explored. The never-ending cultural entertainment of what death does to people, to relationships, to society. I don’t just pretend to love death. I really do love death. I bet you would too if you got to know him (2011).
In her advocacy for the death positive movement, Doughty helps expose and unpack the extent to which fear — specifically the fear of death — informs the choices we make, including the way we care for (or delegate caring for) our dead. Her advice to break a general sense of fear into specific concerns that can be addressed is a good approach for tackling any “nebula of unknown fear” (2017). What are we really afraid of when we talk about the alleged death of libraries?
Claiming that libraries are dying as a matter of course overlooks the choices and structures that led to those circumstances in the first place. Library workers must assert not only the value of our labor, but the very existence of it. I suggest that part of the underlying concern is not being seen, or being seen only to be replaced or forgotten.1 However, ‘death of libraries’ arguments that equate death with irrelevance are fundamentally mistaken.
Death is relevant as ever in 2019, occupying a prominent place in the popular imagination. The past decade has seen the proliferation of surrealist and nihilist memes that humorously embrace mortality and more recently, a resurgence of affection for cryptozoology and the occult. Faced with a world that at best doesn’t make sense and at worst is violently oppressive, the desire to seek connection beyond ourselves and our circumstances is understandable. There is comfort to be found in aligning with creatures who thrive despite being misunderstood, dismissed as outsiders, or having their existence constantly called into question, and this is especially true for people who hold marginalized identities. For those who move through the world as outsiders, or who struggle to feel hope in a crushing capitalist ecosystem, it can be meaningful and positive to envision a world beyond the present, to think of future lives or afterlives.
What could it look like to approach death and haunting from a place of openness and creativity; to demystify death by exploring the mystical? What if instead of nothingness, we imagine an afterlife where anything is possible? Let’s embrace this moment and see where supernatural connections might take us.
If we imagine that a collective fear has come true and libraries are dead, it stands to reason that library workers are ghosts. Since ghosts have considerable power and ubiquity, this frees us to rethink our position in and beyond the neoliberal library and linear time. Ghosts demand attention when there is “something-to-be-done;” which means “we will have to learn to talk to and listen to ghosts, rather than banish them, as the precondition for establishing our scientific and humanistic knowledge” (Gordon 2008, 22). What insights might emerge from “ongoing conversation with ghosts, real or imagined, dead or very much alive,” whether we are haunted, haunting, or both? (Ballif 2013, 139).
The landscape of the academic library is shaped by and reproduces the conditions that persist in the academy and in society more broadly. Thinking about the ways in which capitalism necessitates that bodies and labor be rendered invisible reveals additional layers of haunting, of which we are simultaneously subjects and objects. As Avery Gordon reminds us, “It is essential to see the things and the people who are primarily unseen and banished to the periphery of our social graciousness. At a minimum, it is essential because they see you and address you” (Gordon 2008, 196). Gordon’s provocative use of the second person assuages fears of not being seen while demanding accountability from readers. The ghost sees you, and it addresses you. You are here, so how will you remedy the “something-to-be-done?”
In libraries, there is much to be done. Library and Information Science (LIS) scholarship has been invested in identifying and challenging stereotypes of living librarians in popular culture, but an exploration of death and libraries would be remiss not to include library ghosts. Perhaps our concerns with the death of libraries are exacerbated by the rather limiting extant representations of library ghosts and haunted libraries in popular culture and professional literature. Even trade publications like American Libraries and School Library Journal have profiled real libraries with haunted reputations. While there are certainly exceptions, many library ghosts seem to be women.
The Willard Library, a public library in Evansville, Indiana, has a reputation as one of the more famous haunted libraries in the United States. Their hallmark specter is “the Grey Lady,” an apparition of a woman first spotted in 1937 and last seen in 2010. The Willard has a website of live camera feeds dedicated to recording her presence, which links out to local ghost-hunting resources. Enthusiastic community members engage in ghost tours of the library each Halloween, hoping to encounter the Grey Lady. While the Grey Lady’s identity is not entirely agreed upon, she manifests in specific, recognizable ways: “moving books, adjusting lights, and turning faucets on and off” in order to “let the world know she is here.” (“Willard Library Ghost Cams,” n.d.) Her presence seems to have reignited interest in local history, and in the library as a space full of possibilities and stories.
Not all library ghosts are so positive, however. The apparition in the 1984 film Ghostbusters also exemplifies the trope of the library ghost, but in a more fearsome manner (Reitman, 1984). Before morphing into a ghoulish entity who attacks the Ghostbusters, she appears as an elderly woman in turn-of-the-century dress and is reading a book, reflecting a cultural stereotype of library workers that is stuck in the age of Dewey. Like the Lady in Grey, this ghost can levitate and move books: the disturbance of physical collections signals that a spirit is at work. Images of female ghosts who haunt the stacks in order to safeguard or speak through their collections visually reinforce the connection between library workers, collections, and gendered (here, feminized) labor.
In such examples, books are a necessary component of the aesthetic of librarianship, juxtaposing the material (books and physical space) with the immaterial (ghosts). Juxtaposition is central to Michel Foucault’s concept of heterotopias, places he describes as “capable of juxtaposing in a single real place several spaces, several sites that are in themselves incompatible” (1984, 6). Foucault identifies cemeteries, libraries, and museums among his examples of heterotopias, as they are linked by unique relationships to time and memory. Cemeteries juxtapose life and death, loss (of life) and creation (of monuments), history and modernity as their grounds become increasingly populated. Similarly, libraries and museums embody “a sort of perpetual and indefinite accumulation of time in an immobile place,” organizing and enclosing representations of memory and knowledge (Foucault 1984, 7).
Disney’s Haunted Mansion ride is a particularly illustrative heterotopia, accumulating time and juxtaposing seemingly opposing concepts. Visitors to the attraction explore the home of “999 happy haunts” from varied time periods and regions of the world. The dead are lively as they dance, sing, and joke. One of the first destinations within the Haunted Mansion ride is the library.2 There (as in the stacks at the Willard Library or in Ghostbusters), books spontaneously fly from the shelves (Surrell et al. 2015, 88). In a dissertation on modern Gothic narratives (of which Disney’s Haunted Mansion is one), Katherine Bailey notes that “books are portals into other worlds themselves,” further describing the library’s significance in the context of the ride’s narration where a mention of ghost writers “serves as an obvious reference to unseen hands at work” (2012, 92).
While the “unseen hands” in the Haunted Mansion’s library are ostensibly those of a ghost, there is another layer of unseen hands: the hands of Disney’s Imagineers, the workers who crafted the Mansion’s story, infrastructure, and illusions. Reference to their existence are hidden in ‘Easter eggs’ throughout the attraction: an inscription on a tombstone, a character’s likeness, etc. Visitors’ attention is directed toward the Mansion as an experience or singular magical entity rather than the creative work of many laborers. This directly parallels libraries, where doing one’s job successfully often requires the deliberate erasure of one’s existence.
In popular culture, the haunted library is a space with books: it is an aesthetic constructed to represent a fantasy. As such, it is noteworthy that death of libraries discourse centers specifically on libraries as spaces and institutions. Libraries become the haunted mansion, the singular magical entity inhabited by ghosts (library workers) who may or may not be visible. This privileging of the institution overlooks the reality of library workers, actual people whose material and emotional needs are denied or compromised in the service of neoliberal capitalism (Cronk, 2019).
Silvia Federici’s Caliban and the Witch presents a feminist historical analysis of how women–and in particular their bodies–have been subjugated and subject to violence within capitalist relations in Europe. Her analysis of witch hunts and efforts to “make visible hidden structures of domination and exploitation” are especially relevant to conversations about hidden, haunted labor within the feminized profession of libraries (2009, 13). Gordon’s take on haunting in Argentina also refers to “state-sponsored systems of disappearance” where ghosts help to reveal the structures and interests behind oppressive systems (2008, 67-70). There is a difference, however, between bringing to light the infrastructure of institutions and valuing the institution more than the workers who sustain it.
Persistent references to libraries instead of library workers are a manifestation of vocational awe, which Fobazi Ettarh describes as the notion that libraries are inherently good and therefore exempt from critique (2018). Vocational awe is a foil to death of libraries discourse, vehemently asserting the permanence of libraries. Rather than assuming inevitable death or irrelevance, this perspective insists that libraries will continue to exist simply because they are good and important and therefore must exist. Such a mindset suggests that it is acceptable for administrators to make decisions that harm workers as long as those decisions will aid the presumed greater good of libraries and keep the institution ‘alive.’ Lauren Berlant might call this a relation of cruel optimism, “the condition of maintaining an attachment to a significantly problematic object” even when the attachment has damaging consequences” (2011, 24).
The concept of haunted futurity can help us to better understand the troubling relationship between libraries and labor under neoliberalism. For Debra Ferreday and Adi Kuntsman, “The future may be both haunted and haunting: whether through the ways in which the past casts a shadow over (im)possible futures; or through horrors that are imagined as ‘inevitable’; or through our hopes and dreams for difference, for change” (2011, 6). Haunted futurity invites us to think of haunting as potential; a collective experience and call to action in response to ghosts.
Listening to ghosts requires effort, just as haunting requires effort. As Kevin Seeber writes, “It’s not the heavens smiling on you when you browse the stacks and find a relevant item, it’s the labor of a bibliographer, a cataloger, and a shelver. This stuff ends up where it does because people are doing the work of putting it there” (2018). By that logic, books fly from the shelves of the haunted library because ghosts are doing the work of moving them. When these ghostly occurrences happen, living people have been conditioned to reshelve the books as quickly as possible: there is an organization scheme to follow, a workflow that has been interrupted, and an image of the library that must be restored. Work under neoliberal capitalism has specific time-bound demands and prioritizes results (especially the accumulation of capital) above all else.
By disrupting space and time, ghosts simultaneously reveal their presence and the presence of structures that are supposed to remain hidden. In an interview for Jacobin, Marxist scholar and anthropologist David Harvey describes neoliberalism as a “political project” taken up by “the corporate capitalist class” lashing out against labor (2016). One way in which this manifests is the obfuscation of labor, as seen in the narratives of serendipity Seeber critiques so well. The experience of finding the perfect book in the stacks becomes decidedly less magical when one considers the labor (and the material circumstances of laborers) behind the encounter. These slippages into visibility, however, can be an opportunity to learn. What might happen if we paused to ask the ghost what harm brought it to this place rather than immediately assessing whether the library’s materials were harmed during flight? Avery Gordon offers one suggestion: “If you let it, the ghost can lead you toward what has been missing, which is sometimes everything” (2008, 58).
In this case, part of what has been missing is concern for humanity. Considering human beings in particular rather than libraries generally will reveal structures and truths that may be hard to reckon with, but this work is necessary. Neoliberalism emerged as a movement because of collective fear felt by the ruling class, and requires a single understanding and experience of time. Privileging a white, Western, cis-hetero-patriarchal viewpoint further marginalizes anyone who moves through the world differently. Haunting offers meaningful opportunity to critique this dehumanizing rigidity by interrogating and experimenting with structures of time: “haunting raises specters, and it alters the experience of being in time, the way we separate the past, the present, and the future” (Gordon 2008, xvi). The idea that haunting changes how we experience and understand time is critical when brought into conversation with scholars whose creative theoretical interventions also challenge dominant constructs of time and labor.
In academia (and by extension academic libraries), time is weaponized to extract as much labor as possible. Adjunct, contract, and term-limited positions based on temporary funding force workers to perform at unsustainable levels while minimizing the financial expenditure required of the institution. Even in an alleged best case scenario where one obtains a tenure-track position, the imposing tenure clock and the prospect of losing permanent, stable employment necessitate stress that is comparable to that of contingent work.
As a result, commodification of time and valorization of overwork are particularly acute problems. Given a future that is uncertain at best and threatening at worst, workers are simply trying to get by. Riyad Shahjahan compellingly argues that “time is a key coercive force in the neoliberal academy,” because colonial logics privilege frequent intellectual output over embodied knowledge which can look different or take more time (2015, 491). “Amid deadlines and reviews,” he observes, “these non-productive parts of our bodies are rendered invisible” (2015, 494). Federici also points to the changing role of the body under capitalism, where primitive accumulation “required the transformation of the body into a work-machine, and the subjugation of women to the reproduction of the work-force” (2009, 63). Thus, productivity is integral to job performance, workers are only of value if they produce specific, visible outputs in designated time-frames, and bodies are only of value in relation to their ability to maintain productivity.
However, it does not have to be this way. David Mitchell and Sharon Snyder also take up the questions of embodiment and productivity, examining through a disability studies lens the ways in which disabled people have historically been positioned as outside the laboring masses due to their “non-productive bodies” (2010, 186). They posit that this distinction transforms as the landscape of labor shifts toward digital and immaterial outputs from work in virtual or remote contexts, establishing the disabled body as a site of radical possibility. Alison Kafer’s crip time is similarly engaged in radical re-imagining, challenging the ways in which “‘the future’ has been deployed in the service of compulsory able-bodiedness and able-mindedness” (2013, 26-27). That is, one’s ability to exist in the future, or live in a positive version of the future is informed by the precarity of their social position. The work of theorists like Mitchell, Snyder, and Kafer is significant because it insists on a future in which disabled people not only exist, but also thrive despite the pressures of capitalism. Death of libraries rhetoric instills fear because it threatens a future without libraries, which vocational awe would have us believe is no future at all.
Perhaps there is reassurance to be found in the connection between haunting and queer time. Gordon’s claim that haunting “mediates between institution and person, creating the possibility of making a life, of becoming something else, in the present and for the future,” is reminiscent of the way Jack Halberstam theorizes queer time (Gordon 2008, 142). There are many reasons why queer people do not or cannot conform to heteronormative temporal and familial expectations, thus queer time is a way of creating positive futurity where one is not expected, and resisting a sense of inevitability (2005). In essence, queer time is about utilizing time differently to open oneself to new possible experiences whether or not those experiences conform to boundaries of linear time.
While queerness and disability are states of being that necessitate different experiences of time, haunting and slowing down are useful frameworks because they offer ways to think about time that apply to all modes of embodiment. Arguments for slow scholarship contend that “to enable slow motion is to open for a state of intense awareness: an intake of ‘more’– not of ‘the same’ at a slower pace” (Juelskjær and Rogowska-Stangret 2017, 6). Haunting asks for the same kind of embodied response, for increased connection to one’s senses in receiving ghostly messages: “to be haunted is to be in a heightened state of awareness; the hairs on our neck stand up: being affected by haunting, our bodies become alert, sensitive” (Ferreday and Kuntsman 2011, 9). If we rethink what it means for someone to ‘look like they have seen a ghost,’ these physical responses do not have to result in fear; rather we can interpret them as reminders to pause, reflect, observe what we are feeling, and listen to what the ghost has to say. Like slowing down, haunting is a way to experience time irrespective of normative productivity. Becoming attuned to one’s senses and listening to ghosts can be transformative, enabling the creation and sharing of more, unique information.
Google Books is at the intersection of information sharing, transforming labor, massive digital production, and ghostly hidden labor. Moreover, academic libraries have been key players in the digitization and consumption of these books. The workers–primarily women of color–who digitize texts for Google Books occupy space at the intersection of digital work, (im)materiality, and ghostliness. In contrast to the intangible but visible apparition who floats through the stacks, these workers are physical beings who are often invisibilized or only partially corporeal: doing their job correctly requires eliminating evidence of their physical existence. We see traces of this disappearing process when fingers or hands appear in the scanned pages of text, “becoming spectral additions to the Google Books library and permanently altering the viewer’s perception of the content” if they slow down enough to notice and ponder (Soulellis, 2013). Sometimes a hand will obscure text in a book’s table of contents, changing a reader’s roadmap to the text. Sometimes a modern-day hand will become part of the image of a book that is hundreds of years old, forming a juxtaposition of time and knowledge. Sometimes all that is captured is the blur of fingers turning the next page to scan.
In the introduction to her zine Hand Job, Aliza Elkin points out that Google’s first logo was co-founder Larry Page’s hand: “Perhaps there is some irony that Google at its scale today is so invested in hiding the fingers and hands (and, following from that, the evidence of manual labor and human intervention) of its employees (or, more probably, contractors) in one of its best known products” (2018). Works like Andrew Norman Wilson’s ScanOps and Paul Soulellis’ Apparition of a Distance However Near it May Be also capture these slippages into visibility, revealing glimpses of the labor behind a massive system of organizing knowledge (Wilson 2012 and Soulellis 2013).
Found image art of Google Books pages contrast the sanitized or humorous depictions of haunting in pop culture. Each hand or ring or painted fingernail is a reminder of the individuality and humanity of workers usually depicted as a monolith. Each image is a reminder to look beyond the entity presented and ask what structures lie beneath, at what cost to laborers.
Libraries are haunted houses, constructed sites of possibility inhabited by ghosts. As our patrons move through scenes and illusions that took years of labor to build and maintain, we workers are hidden, erasing ourselves in the hopes of providing a seamless user experience, in the hopes that these patrons will help defend libraries when the time comes. But I ask that we think deeply about what it means for libraries to be under attack, and why the attachment to that narrative persists. Institutions may or may not die, but all humans do. Library workers at all levels, but especially those who have institutional power, must care for one another and prioritize community wellbeing. Individual actions will not solve structural problems, but they can improve people’s immediate material conditions: that’s something to start with.
Haunting is a complex and rich lens through which we can explore what it might be like to be fearless, or to harness fear in a way that is creatively powerful. If we think like ghosts, we can experience time creatively and less urgently, better positioning ourselves to resist the demands of neoliberalism; to imagine and enact positive futurities.
When a ghost speaks, those around it are compelled to listen.
So then the question is, what kind of ghosts do we want to be?
The idea for this article began as a talk I gave at the 2018 Gender & Sexuality in Information Studies Colloquium. I would like to thank my mentors and co-panelists Leah Richardson and Dolsy Smith for their support and for inspiring me with their own work.
It is an exciting professional accomplishment to publish in In the Library with the Lead Pipe. I am grateful to peer reviewer Samantha Alfrey, internal reviewer Kellee Warren, and publishing editor Annie Pho for their insights and for helping me through the editorial process. I would also like to thank my inimitable friend and colleague Dianne N. Brown for her encouragement, willingness to listen, and her feedback on drafts of this article. Finally, I am beyond thankful to Faith Weis for her unwavering support in this and all things: she’s a true partner with a keen eye and a kind heart.
Ballif, M. (2013). Historiography as Hauntology: Paranormal Investigations into the History of Rhetoric. In Theorizing Histories of Rhetoric (pp. 139–153). Carbondale, IL: Southern Illinois University Press. Retrieved from https://muse.jhu.edu/book/22041
Cronk, L. (2019, January 24). I’ve been considering if the base issue of @ALALibrary is what its name tells its membership- that the org is about institutions rather than workers. Imagine if we stopped defending the idea of libraries & started to defend one another/stand together. That’s my big #alamw19 mood. [Tweet]. Retrieved January 25, 2019, from https://twitter.com/linds_bot/status/1088570042390900736
Ettarh, F. (2018). Vocational Awe and Librarianship: The Lies We Tell Ourselves – In the Library with the Lead Pipe. In the Library with the Lead Pipe. Retrieved from http://www.inthelibrarywiththeleadpipe.org/2018/vocational-awe/
Ferreday, D., & Kuntsman, A. (2011). Haunted Futurities. Borderlands, 10(2), 1–14.
Juelskjær, M., & Rogowska-Stangret, M. (2017). A Pace of Our Own? Becoming Through Speeds and Slows – Investigating Living Through Temporal Ontologies of The University. Feminist Encounters: A Journal of Critical Studies in Culture and Politics, 1(1), 06. https://doi.org/10.20897/femenc.201706
Kafer, A. (2013). Feminist, queer, crip. Bloomington, Indiana: Indiana University Press.
Mitchell, D., & Snyder, S. (2010). Disability as Multitude: Re-working Non-Productive Labor Power. Journal of Literary & Cultural Disability Studies, 4(2), 179–194. https://doi.org/10.3828/jlcds.2010.14
Reitman, I. (1984). Ghostbusters. Columbia Pictures.
There is also a very real fear of unemployment and financial insecurity looming over what is already a precarious situation for many. My goal in this article is not to oversimplify or ignore that reality; rather I hope to offer space for thinking creatively about our work, how it is complicit in systems of oppression, and what we might do differently. I recognize that being in a position to write this article is a place of relative privilege.
Each Disney park has its own version of the attraction. Though the haunted house premise remains the same, storylines, characters, and decor were adapted for Tokyo Disneyland’s Haunted Mansion, Disneyland Paris’ Phantom Manor, and Hong Kong Disneyland’s Mystic Manor. All five mansions contain some version of a library.
Following on from the excitement of having built a functioning keyboard myself, I got a parcel on Monday. Inside was something that I've been waiting for since September: an Ultimate Hacking Keyboard! Where the custom-built Laplace is small and quiet for travelling, the UHK is to be my main workhorse in the study at home.
Here are my first impressions:
I went with Kailh blue switches from the available options. In stark contrast to the quiet blacks on the Laplace, blues are NOISY! They have an extra piece of plastic inside the switch that causes an audible and tactile click when the switch activates. This makes them very satisfying to type on and should help as I train my fingers not to bottom out while typing, but does make them unsuitable for use in a shared office! Here are some animations showing how the main types of key switch vary.
This keyboard has what's known as a 60% layout: no number pad, arrows or function keys. As with the more spartan Laplace, these "missing" keys are made up for with programmable layers. For example, the arrow keys are on the Mod layer on the I/J/K/L keys, so I can access them without moving from the home row. I actually find this preferable to having to move my hand to the right to reach them, and I really never used the number pad in any case.
This is a split keyboard, which means that the left and right halves can be separated to place the hands further apart which eases strain across the shoulders. The UHK has a neat coiled cable joining the two which doesn't get in the way. A cool design feature is that the two halves can be slotted back together and function perfectly well as a non-split keyboard too, held together by magnets. There are even electrical contacts so that when the two are joined you don't need the linking cable.
The board is fully programmable, and this is achieved via a custom (open source) GUI tool which talks to the (open source) firmware on the board. You can have multiple keymaps, each of which has a separate Base, Mod, Fn and Mouse layer, and there's an LED display that shows a short mnemonic for the currently active map. I already have a customised Dvorak layout for day-to-day use, plus a standard QWERTY for not-me to use and an alternative QWERTY which will be slowly tweaked for games that don't work well with Dvorak.
One cool feature that the designers have included in the firmware is the ability to emulate a mouse. There's a separate layer that allows me to move the cursor, scroll and click without moving my hands from the keyboard.
Not much to say about the palm rests, other than they are solid wood, and chunky, and really add a little something.
I have to say, I really like it so far! Overall it feels really well designed, with every little detail carefully thought out and excellent build quality and a really solid feeling.
Seagate was caught out by an unexpectedly deep drop in disk drive demand and saw its revenues fall 7 per cent. Along with the rest of the tech world, it talked about a recovery mid-year, and promised world+dog at least one more lousy quarter.
Reported revenues (PDF) for its second quarter of fiscal '19, ended 28 December, were $2.7bn, down 6.8 per cent on last year's $2.9bn. The company said cost cuts helped profits rise from $159m last year to $384m. ... The company shipped 87.4EB of capacity, down from the 87.5EB shipped a year ago, and averaging out at 2.4TB/drive. The big fall was in nearline drive demand, as a Seagate chart shows. It expected a fall two quarters ago but not this deep.
Western Digital is about to go into cost cutting mode to carve out $800m in savings, after reporting shrinking revenues of $4.23bn for its second fiscal 2019 quarter, down by a fifth compared to the year ago period.
Losses were almost halved from $823m to $487m for the three months ended 28 December 2018. Gross margin was 31.3 per cent and operating cash flow stood at $469m. It built 30.3 million disk drives in the quarter, compared to 42.3 million.
Western Digital now makes both flash and disk products: flash revenues came in at $2.2bn, down 18 per cent year-over-year and disk was down 23.5 per cent to $2.1b. Disk exabytes shipped in the quarter declined 17 per cent on the year.
Findings from the Africa Open Data Index and Africa Data Revolution Report
Today, we are pleased to announce the results of Open Knowledge International’sAfrica Open Data Index. This regional version of our Global Open Data Index collected baseline data on open data publication in 30 African countries to provide input for the second Africa Data Revolution Report.
Based on an adaptation of the methodology for the Global Open Data Index, this project mapped out to what extent African public institutions make key datasets available as open data online. Beyond scrutinising data availability, digitisation degree, and openness of national datasets, we considered the broader landscape of actors involved in the production of government data such as private actors.
Key datasets and methodology were developed in collaboration with the United Nations Development Program (UNDP), the International Development Research Centre (IDRC), and well as the World Wide Web Foundation. We focused on national key datasets such as:
Data describing processes of government bodies at the highest administrative level (e.g. federal government budgets);
Data produced by sub-national actors but collected by a national agency (e.g. certain statistical information).
We also captured if data was available on sub-national levels or by private companies but did not assign scores to these sets. You can find the detailed methodology here.
Ultimately, the key datasets we considered are:
Administrative records: budgets, procurement information, company registers
Figure 1: Screenshot of the Africa Open Data Index Interface
Understanding who produces government data
Many government agencies produce at least parts of the key datasets we assessed. Some key datasets, such as environmental data, are rarely produced. For instance, air pollution and water quality data are sometimes produced in individual administrative zones, but not on national levels. Some initiatives assist producing data on deforestation, such as REDD+ or the Congo Basin Forest Atlases, with the assistance of the World Resources Institute (WRI) and USAID.
Multiple search strategies may be required to identify agencies producing and publishing official records. Some agencies develop public databases, search interfaces and other dedicated infrastructure to facilitate search and retrieval. Statistical yearbooks are another useful access point to several information groups, including economic and social statistics as well as figures on environmental degradation or market figures. In several cases it was necessary to consult third-party literature to identify which public institutions hold the remits to collect data such as World Bank’s Land Governance Assessment Framework (LGAF) and reports issued by the Extractives Industries Transparency Initiative (EITI).
Sometimes, private companies provide data infrastructure to aggregate and host data centrally. For instance, the company Trimble develops data portals for the extractives sector in 15 countries in Africa. These data portals are used to publish data on mining concession, including geographic boundaries, the size of territory, concession types, licensees, or contract start and duration.
Procuring data infrastructure from private organisations
An alternative information aggregator using open licence terms is called African Legal Information Institute (AfricanLII), gathering national legal code from several African countries. It is a programme of the Democratic Governance and Rights Unit at the Department of Public Law at the University of Cape Town.
Sometimes stark differences what data gets published
To test what data gets published online, we defined crucial data points to be included in every key data category (see here). If at least one of these data points was found online, we considered the data category for assessment. This means that we assessed datasets whose completeness can differ across countries. Figure 2 shows which data points are how often provided across our sample of 30 countries.
Figure 2: Percentages of data points found across key datasets. Percentage relative to the total amount of countries (100% = data point available in 30 countries). Source: Africa Data Revolution Report, pp. 19-20.
Budget and procurement data most often contains the relevant data points we have assessed. Several key statistical indicators are provided fairly commonly, too. Agricultural data, environmental data and land ownership data are least commonly provided. For a more thorough analysis we recommend to read the Africa Data Revolution Report, pages 16-22.
One third of the data is provided in a timely manner
To assess timely publication our research considered whether governments publish data in a particular update frequency. Figure 3 shows a clear difference in timely data provision across different data types. The y-scale indicates the percentage of countries publishing updated information. A score of 100 would indicate that the total sample of 30 countries publishes a data category in a timely fashion.
Figure 3: Data provision across the various datasets
We found significant differences across individual data categories and countries. Roughly three out of four countries update their budget data (80% of all countries), national laws (73% of all countries) and procurement information (70% of all countries) in a timely manner. Approximately half of all countries publish updated elections records (50% of all countries), or keep their company registers up-to-date (47% of all countries). All other data categories are published in a timely manner only by a fraction of the assessed countries. For instance, the majority of all countries does not provide updated statistical information.
We strongly advise to interpret these findings as trends rather than representative representations of timely data publication. This has several reasons. In some data categories, we included considerably more and diverse data points. For instance, the agricultural data category includes not only statistics on crop yields but also short-term weather forecasts. If one of these data types was not provided in a timely manner, the data category was considered not to be updated. Furthermore, if a country did not provide timestamps and metadata, we did not consider the data to be updated, as we were unable to proof the opposite.
Open licensing and machine-readability
Only 6% of all data (28 out of 420 datasets assessed) is openly licensed in compliance with the criteria laid out by the Open Definition. Open licence terms are used by statistical offices in Botswana, Senegal, Rwanda, and Somalia, as well as open data portals in Cote d’Ivoire, Eritrea and Kenya and Mauritius. Usually, websites provide copyright notes but do not apply licence terms dedicated to the website’s data. In rare cases we found a Creative Commons Attribution (CC-BY) licence being used. More common are bespoke terms that are compliant with the Open Definition.
14.5% of all data (61 out of 420 datasets assessed) is provided in at least one machine-readable format. Most data, however, is provided in printed reports, digitised as PDFs, or embedded on websites in HTML. Importantly, some types of data, such as land records, may still be in the process of digitisation. If we found that governments hold paper-based records, we tested if our researchers may request the data. If this was not the case, we did not consider the data for our assessment.
On the basis of our findings we recommend that public institutions:
Communicate clearly on their agency websites what data they are collecting about different government activities.
Clarify which data has authoritative status in case multiple versions exist: Metadata must be available clarifying provenance and authoritative status of data. This is importantin cases where multiple entities collect data, or whenever governments gather data with the help of international organisations, bilateral donors, foreign governments, or others.
Make data permanently accessible and findable: Data should be made available at a permanent internet location and in a stable data format for as long as possible. Avoid broken links and provide links to the data whenever you publish data elsewhere (for example via a statistical agency). Add metadata to ensure that data can be understood by citizens and found via search engines.
Provide data in machine-readable formats: Ensure that data is processable. Raw data must be published in machine-readable formats that are user friendly.
Use standard open licences: Use CC0 for public domain dedication or standardized open licences, preferably CC BY 4.0. They can be reused by anyone, which helps ensure compatibility with other datasets. Clarify if data falls under the scope of copyright, or similar rights. If information is in the public domain, apply legally non-binding notices to your data. If you opt for a custom open licence, ensure compatibility with the Open Definition. It is strongly recommended to submit the licence forapproval under the Open Definition.
Avoid confusion around licence terms: Attach the licence clearly to the information to which it applies. Clearly separate a website’s terms and conditions from the terms of open licences. Maintain stable links to licences so that users can access licence terms at all times.
We have gathered all raw data in a summary spreadsheet. Browse the results and use the links we provide to reach a dataset of interest directly.
If you are interested in specific country assessments, please find here our research diaries.
The Open Data Survey tool, powering this project as well as our Global Open Data Index is open to be reused. If you are interested in setting up a regional or national version, get in touch with us at firstname.lastname@example.org.
We would like to thank the experts at Local Development Research Institute (LDRI), the Communauté Afrique Francophone pour les Données Ouvertes (CAFDO) and the Access to Knowledge for Development Center (A2K4D) at the American University, Cairo for advising on the methodology and their support throughout the research process. Furthermore, we would like to thank our 30 country researchers, as well as our expert reviewers Codrina Maria Ilie, Jennifer Walker, and Oscar Montiel. Finally, we would like to thank our partners at the United Nations Development Programme, the International Development Research Centre and the Web Foundation, without whose support this project would not have been possible.
On 24th February 2019 Nepal’s first Women in Data Conference was organized with the theme डाटा शक्ति नारी शक्ति – ‘where two superpowers meet’. It brought together inspiring female speakers, influential panelists, data professionals, and aspiring young women in a one-day event to celebrate women working in data. We (Open Knowledge Nepal) were one of the partners of the event.
The program started with the registration and data mart happening in the calm ambience of Hotel Himalaya. We were amazed by the organizing team lineup as the participants comprised of more than 250 females. From registration to photography, everyone was female. It was all well organized and the conference hall quickly filled as the clock turned nine. The emcee, Nikki Sharma, Program Officer (Consultant), Data for Development in Nepal Program formally started the program. Ms. Sharma highlighted the importance of data and the meaningful involvement of women in it.
Ms. Meghan Nalbo, Country Representative, The Asia Foundation gave a keynote speech on “Why Data is the Future and Women should be part of it”. She highlighted the endless possibilities of women coming to the field of pre-dominantly men. She cited how the same data as viewed by a male, can be differently viewed by a female, opening up different avenues. The need to start the conversation about women in data is of paramount importance and this conference has certainly helped in doing so.
Then the panel discussion on “How women are breaking through the glass ceiling by breaking down the numbers” was held. The panelists were Dr. Pranita Upadhyay (Programme Leader, MSc IT and Research Coordinator, The British College), Ms. Jyoti U. Devkota, PhD (Professor of Statistics, School of Science, Kathmandu University), Dr. Sambriddhi Kharel (Sociologist), Dr. Prativa Pandey (Founder and CEO, Catalyst Technology Pvt. Ltd.) and Ms Avasna Pandey (Editorial Page Editor, The Kathmandu Post), which was moderated by Ms Shuvechha Ghimire (Research Manager, Interdisciplinary Analysts).
They shared their journey of being a professional in their respective fields and cited the conscious and unconscious stereotypical behaviors existing in the workplace. They also emphasized when a person put forward their ideas, they should be taken as an idea presented by a professional rather than a woman giving a suggestion. The structural biases have to be confronted and dealt with it creatively.
One of the added elements of the event was that the speakers were given yellow roses, since it signifies a message of appreciation.
A sense of excitement was already in the air, as the session moved forward, the conference hall was filled with the faces of enthusiasts among which almost 95% were female. This female turnout in the event is something that has rarely happened in Nepal.
Shortly, the panel discussion on “Counting women in by making them visible in statistics in Nepal” followed. It was moderated by Ms. Srijana Nepal (Program Officer, The Asia Foundation). The panelists were Ms. Rosy Shakya (Statistics Expert), Ms. Bhumika Shrestha (Transgender Activist), Dr. Meena Acharya (Gender Expert and Economist) and Mr. Bivek Joshi (Monitoring, Evaluation, and Strategic Partnership Officer, UN Women Nepal). They highlighted the need to institutionalize deeper interaction between the users and producers of gender data. There are lots of gaps in the production, use and enabling policy environment for gender data, thus the need to develop its fundamentals from grassroots levels were being emphasized.
Then the interactive presentation on ‘Factfulness and Gender Statistics’ was presented by Ms. Fernanda Drumond, Head of Operations, Gapminder Foundation. It opened up the concept of how data can be used and interpreted to separate the ground reality and assumptions.
Then, the panel discussion on “When technology, women and data meet” was carried out. The panelists were Ms. Sushma Giri (WLiT), Ms. Binita Shrestha (Women in STEM), Ms. Samita Kapali (Green Growth Group Pvt. Ltd.) and Ms. Sweta Karna (Deerwalk). It was moderated by Ms. Sumana Shrestha. The discussion focused on how social changes have been created through technology. The field of STEM is much more than coding. Policymaking and taking decisions can be better informed through the use of data.
Finally, the announcement of the Open Data Fellowship – Women Edition, by Open Knowledge Nepal (OKN) was made. The representatives from OKN, Mr. Nikesh Balami (CEO) and Ms. Dipti Gautam (Fellowship Lead), described the phases of fellowship and its importance. You can find the details of the fellowship from here: http://fellowship.oknp.org
After the lunch break, the data sessions followed, it included a brief introduction on the different types of workshops that will follow this conference. There were eight such sessions in total, where two sessions took place simultaneously. These included:
A Soft Skills/Professional Development Session by Ms. Sweta Karna Director of Data Operations, Deerwalk.
Comfortable with Numbers – Statistics by Ms. Rosy Shakya, Statistics Expert.
How to Prepare Your Paper Using Social Research Methodology by Ms. Shuvechha Ghimire, Interdisciplinary Analysts.
Data Analysis with SPSS (using PSPP) by Ms. Alina Chhantel, The Open Institute.
The Basics on How to Use Data Analysis with Excel by Ms. Sunita Shakya, Data for Development in Nepal Program.
Mapping Gender Statistics Using Open Street Maps by Ms. Sweta Khanal, NAXA.
Visualizing Gender Data & Statistics by Ms. Anusha Thapa, Ms. Sajani Lama, and Ms. Aarya Bhandari, Bikas Udhyami.
The conference will be followed up by two different phases of training. This type of training will help in preparing the right human resource for working in data.
Key Learnings and Highlights
Data is not always about numbers. It is about the area or subject of interest that it represents.
Something as simple as keeping a record can make people feel visible, especially in the world where women are shadowed in almost every aspect.
What we think is mostly clouded by our perception on a topic, but the reality can be surprising at times.
The percentage of overall data sometimes clouds our judgment, but when we look at the unit values of them, it can be seen to be influenced by certain outliers. Paying close attention to data is necessary.
Data can be very useful for the common good. Just a simple mark in the map about the availability of public toilets in the area can save us hours of tension. Same goes for petrol stations, clinics, and so on.
A multidimensional approach to data, inclusive of information of any kind and not restricted to numbers.
Interactive panel discussions with experts from various sectors discussing their life experiences, data and importance of gender statistics and so on.
An understanding of the problems and hindrances women working with data face in a workspace.
A reminder that though news portrays a very discouraging scenario of women, statistics in most cases are in favor of them making progress.
An insight into the fact that oftentimes, data isn’t accessible at all or if accessible, not available freshly.
An insight into the contribution women has been making in the data sector be it through research companies and institutions or budding startups.
A brief outlook on different data related applications like OpenStreetMap and SPSS that were also featured in a detailed training session that followed the conference.
Lectures from professionals on research essentials, writing CVs, and various related skills.
Frequent tea breaks and a closing program to facilitate fruitful interactions with experts and fellow participants.
For all events, we welcome submissions from members and nonmembers alike. Students, practitioners, and others from any related field are invited to submit for one conference or all three (though, different proposals for each, please).
Archana Kesavan of ThousandEyes speaking at NANOG75 reports that network traffic between AZs within a single region is generally “reliable and consistent,” and that tested cloud providers offer a “robust regional backbone for [suitable for] redundant, multi-AZ architectures.”
ThousandEyes ran tests at ten minute intervals over 30 days, testing bidirectional loss, latency, and jitter. Kesavan reported the average inter-AZ latency for each tested cloud:
AWS Azure GCP .
In a world where it is so very easy to spend money in any form, consumers can be forgiven for thinking that accessing any and all financial services would be painless. And in a time when financial institutions have a wealth of information on customers, banks should be able to make it so. That’s where artificial intelligence (AI), and machine learning in particular, is making great gains.
One of the great frustrations of the financial services business has been the inability to recognize important pain points in customer relationships. Frustrating interactions — unclear instructions on loan applications, or unnecessarily onerous requirements to cash checks — can cause customers to give up on looking for a new product or to consider looking for a new financial services provider altogether.
Even when banks do identify barriers for customers, understanding their impacts is difficult, if not near impossible.
Not so long ago, it was virtually impossible to track individual customer behaviors. A frustrating paper loan application landed in a trash can, offering no evidence to the bank of why it was abandoned. (In fact, the bank may never even have known that a customer was considering a loan in the first place.) Banks had no way of tracking how many times a customer tried to cash a check but left a branch in annoyance because a second form of ID was sitting at home. These behaviors were offline, effectively hidden.
Hoarded Data Was Dark Data
But let’s be real — banks have had lots of other types of customer data because it has been part of the very nature of the business to collect and retain vast amounts of information. According to the Washington Post, the oldest piece of writing discovered in London is from 65-80 AD, and it was — wait for it — an IOU.
Since the time when accounts were kept in huge handwritten ledger books, through the day of the passbook savings account and now, a few decades deep into the digitization of virtually all customer data, financial firms have necessarily been the repositories of enormous amounts of data about their clients.
While the industry indubitably has scads of data, it has had little ability to use it.
Oh sure, very customer-specific queries can be handled with a fair amount of ease. For example, if Ms. Jones was asking for a line of credit, the bank could look at its records to see how reliable she had been about paying her mortgage.
But they couldn’t easily look across customer accounts to see how people similar to Ms. Jones had fared with a line of credit. Or, crucially, to identify customers like her who might respond positively to a proactive offer of a credit line.
More Data, Few Insights
That’s changing. “Credit card companies have been collecting all this stuff about us since there were credit cards. The information has been available. It has just been really, really difficult for most organizations to pore through it and actually make sense of it,” said Greg Kihlstrom, senior vice president of experience at Yes&, a digital marketing agency in Alexandria, Va. “I think that’s the whole big data thing, what that gave us was just … mountains of data. Even then, most organizations didn’t really have a plan for it,” said Kihlstrom.
Then the digitalization of workflow began to happen — and an influx of even more data was available.
Indeed, the amount of data financial firms can accumulate is an order of magnitude larger than it was just 10 years ago.
When a customer begins a loan application these days, it’s probably online. It’s a relatively simple matter to figure out precisely where in the process a potential borrower abandoned it. When a bank loses customers today, it’s far easier to search transaction histories and try to understand why they might have been dissatisfied.
Until recently though, as Kihlstrom notes, banks’ abilities to collect ever larger amounts of data on customers had largely outpaced their ability to analyze it. But things have changed.
“Now we’re actually at a point where companies’ data mining is mature enough — because of AI — where we can actually start using big data,” he said.
Machine Learning Allows Easier Data Mining
Advances in artificial intelligence and machine learning have made it possible for banks to take the data they already possess, and mine it for insights that can help them address customer pain points in a systematic way.
Customer experiences — both good and bad — are also commonly referred to as the customer journey. This highlights the fact that most clients’ interactions with a financial institution take place over time, involve interactions with different departments within the company, and may produce data points that exist in separate silos with little or no intercommunication. In other words, banks act like almost every other enterprise — they store data in all sorts of places and one hand may not know what the other hand is doing.
According to Forrester Research, the ability to map the customer journey is essential to a financial institution looking to improve its customer experience, and it necessarily involves a heavy reliance on data analysis.
Journey Mapping Key to Next Best Action
In a recent report, the firm found that financial services firms ought to use analytics to drive a more data-driven approach to mapping customer journeys.
“Customer-led companies don’t just collect information on customers — they also use it to proactively engage them,” the report said. “Use data such as web analytics, customer feedback surveys, complaint and customer satisfaction data, or contact-center call log verbatims to analyze customer interaction data across silos, fix cross-touchpoint issues, uncover operational bottlenecks, build optimal cross-touchpoint customer experiences, and drive desired behaviors.”
In the same report, Christopher Cox, chief digital officer at USAA, said that “too often, the creation of new digital capabilities are treated as projects, with a beginning, a middle, and an end.” And Grant Ingersoll, Chief Technology Officer for Lucidworks said one of the challenges is that insight initiatives have often been projects instead of a new way of doing business — and this leaves gaps. But now “we see our banking customers leveraging AI-driven search and analytics to power customer 360 views that encompass all the touch points a customer has with the bank.”
AI-driven search is a nebulous marketing term that virtually every vendor uses. Ingersoll prefers the more specific “machine learning.”
“Banks are using machine learning systems to integrate a large number of data sources — from transactional information to a client’s preferences for coffee — and make the data accessible and actionable,” Ingersoll said. “Machine learning algorithms and other statistical techniques are constantly scouring the data, classifying it, relating it, and examining it for trends and anomalies at the individual level.”
This process also enables smarter recommendations for users and better reporting for employees on core business objectives.
Perhaps most importantly, the “most advanced banking customers leverage the system to enable ‘next best action’ analysis, which proactively informs users with information learned by the system,” Ingersoll said.
AI Drives Automation
Advanced analytics also make it possible to automate processes that were once thought beyond the reach of AI. “Robotic Process Automation (RPA) is beginning to automate the multitude of repeatable processes across banking, for example, a customer requests a change of address,” writes Tony Farnfield, the London-based country lead for BearingPoint.
“Because of the many legacy systems traditionally found within a firm, this change request often results in the required update of six or more records from the CRM system, with a lengthy process ensuing for the customer,” he explained. “The automation of this process would contribute to significant savings of time and effort for both the firm and the customer, ultimately leading to an improved customer experience.”
This extends to the other elements of the customer journey as well. It is becoming clear that AI-enhanced chatbots are going to play an increasingly major role in the customer service operation of large financial services firms in the near future.
“It is important for digital solutions to be focused on building great experiences as opposed to simply being used to reduce costs or increase ROI,” he writes. “While consumers are increasingly aware and satisfied by AI-enabled experiences, they also expect a humanized engagement and a human presence to better enable interactions across the entire customer journey.”
In any case, there can be little doubt that as financial services firms increasingly move to streamline the customer journey, artificial intelligence and analytics will play a key — perhaps the key — role. And the question of how to handle it may confront industry leaders sooner than they think.
“Things are going to get a lot more automated a lot more quickly,” said Nihlstrom, of Yes&. “Now that we actually have the ability to make use of all this big data that people have been talking about for at least a decade, I think we’re going to make really, really quick progress on automating a lot of the jobs,” he added.
“The world is going to look very different in five years even for such a risk-averse industry.”
Thanks to Open Knowledge Belgium for inviting me to speak today.
It is great to be you with you all in what is my fourth week in my new role as Chief Executive of Open Knowledge International.
This is the first time I have been in Brussels since serving for 20 years as an MEP for Scotland.
During that time, I worked on copyright reform and around openness with a key focus on intellectual property rights and freedom of expression.
Digital skills and data use have always been a personal passion, and I’m excited to meet so many talented people using those skills to fight for a more open world.
It is a privilege to be part of an organisation and movement that have set the global standard for genuinely free and open sharing of information.
There have been many gains in recent years that have made our society more open, with experts – be they scientists, entrepreneurs or campaigners – using data for the common good.
But I join OKI at a time when openness is at risk.
The acceptance of basic facts is under threat, with many expert views dismissed and a culture of ‘anti-intellectualism’ from those on the extremes of politics.
Facts are simply branded as ‘fake news’.
The rise of the far right and the far left brings with it an authoritarian approach that could return us to a closed society.
The way forward is to resuscitate the three foundations of tolerance, facts and ideas, to prevent the drift to the extremes.
I want to see a fairer and open society where help harness the power of open data and unleash its potential for the public good.
We at Open Knowledge International want to see enlightened societies around the world, where everyone has access to key information and the ability to use it to understand and shape their lives; where powerful institutions are held accountable; and where vital research that can help us tackle challenges – such as inequality, poverty and climate change – is available to all.
To reach these goals, we need to work to raise the profile of open knowledge and instil it as an important value in the organisations and sectors we work in.
In order to achieve this, we will need to change cultures, policies and business models of organisations large and small to make opening up and using information possible and desirable.
This means building the capacity to understand, share, find and use data, across civil society and government.
We need to create and encourage collaborations across government, business and civil society to use data to rebalance power and tackle major challenges.
Last year, CIVICUS found that nearly six in ten countries are seriously restricting people’s fundamental freedoms of association, peaceful assembly and expression.
And, despite some governments releasing more data than before, our most recent Global Open Data Index found that only 11% of the data published in 2017 was truly open, down from 16% of the data surveyed in 2013.
Our fear is that these trends towards closed societies will exacerbate inequality in many countries as declining civic rights, the digital divide, ‘dirty data’ and restrictions on the free and open exchange of information combine in new and troubling ways.
Opaque technological approaches – informed by both public and, more often, private data – are increasingly being suggested as solutions to some of the world’s toughest issues from crime prevention to healthcare provision and from managing welfare or food aid projects to policing border security, most recently evidenced in the debate around the Northern Irish border and Brexit.
Yet if citizens cannot understand, trust or challenge data-driven decisions taken by governments and private organisations due to a lack of transparency or the challenge of a right of redress to the data held on individuals or businesses, then racist, sexist and xenophobic biases risk being baked into public systems – and the right to privacy will be eroded.
We need to act now and ensure that legislation emphasising open values keeps pace with technological advances so that they can be harnessed in ways which protect – rather than erode – citizens’ rights.
And we need people in future to be able to have an open and honest exchange of information with details, context and metadata helping to make any potential biases more transparent and rectifiable.
As Wafa Ben Hassine, policy counsel for Access Now, said recently, “we need to make sure humans are kept in the loop … [to make sure] that there is oversight and accountability” of any systems using data to make decisions for public bodies.
Moving on to another pressing issue, I am very concerned about the EU’s deal on copyright reform – which is due to go before the European Parliament for a vote this month – and the effects that this will have on society.
The agreement will require platforms such as YouTube, Twitter or Google to take down user-generated content that could breach intellectual property and install filters to prevent people from uploading copyrighted material.
That means memes, GIFs and music remixes may be taken down because the copyright does not belong to the uploader. It could also restrict the sharing of vital research and facts, allowing ‘fake news’ to spread.
This is an attack on openness and will lead to a chilling effect on freedom of speech across the EU.
It does not enhance citizens’ rights and could lead to Europe becoming a more closed society – restricting how we share research that could lead to medical breakthroughs or how we share facts.
I know that there is a detailed session focused on copyright reform at 12:30pm in this room so please join that if you want to learn more.
So what can we do about these issues?
First, we are calling on all candidates in May’s European Parliament elections to go to pledge2019.eu to make a public pledge that they will oppose Article 13 of the EU’s chilling copyright reforms. This is an issue that is not going to go away, regardless of the plenary vote this spring. When the new Parliament sits, in July, the MEPs representing voters for the next five years will have an opportunity to take action.
Second, in coordination with our colleagues at Mozilla and other organisations, we want tech companies like Facebook to introduce a number of improved transparency measures to safeguard against interference in the coming European elections, and I have written to Facebook’s vice-president of global affairs and my former MEP colleague Sir Nick Clegg to request more openness from the social media platform.
Facebook have responded but you can add your voice to Mozilla’s ongoing campaign to keep up the pressure and make sure change happens.
Third, we encourage you to visit responsibledata.io to join the Responsible Data community which works to respond to the ethical, legal, social and privacy-related challenges that come from using data in new and different ways.
This community was first convened by our friends at the Engine Room – who have done great work on this issue – alongside our School of Data who were one of the founding partners.
Fourth, get everyone to use established, recognised open licences when releasing data or content.
This should be a simple ask for governments and organisations across the world but our research has found that legally cumbersome custom licenses strangle innovation and the reuse of data.
Fifth, when you are choosing MEP candidates to vote for in May, ask yourself: what have they done to push for openness in our country? Have they signed up to key transparency legislation? Voiced support for access to information and freedom of expression? If you’re not sure, email and ask them.
We need a strong cohort of open advocates at the European Parliament to address the coming issues around privacy, transparency and data protection.
At Open Knowledge International, we will help fight the good fight by continuing our work to bring together communities around the world to celebrate and prove the value of being open in the face of prevailing winds.
Two days ago, with support from OKI, Open Data Day took place with hundreds of events taking place all over the world.
Our next big event is the fourth iteration of csv,conf, a community conference for data makers featuring stories about data sharing and data analysis from science, journalism, government, and open source. By popular demand, this year will see the return of the infamous comma llama.
We are also very proud of the fantastic work by the Open Knowledge network teams around the globe to nurture open communities from Open Knowledge Finland’s creation of the MyData conference and movement to the investigations by journalists and developers enabled by Open Knowledge Germany and OpenCorporates’ recent release of data on 5.1 million German companies.
And here in Belgium, it’s fantastic to hear about the hundreds of students who participated in Open Knowledge Belgium’s Open Summer of Code last year to create innovative open source projects as well as to be inspired by the team’s work on HackYourFuture Belgium, a coding school for refugees.
To finish my speech, I want to echo Claire Melamed of the Global Partnership for Sustainable Development Data: “People’s voices turned into numbers have power … and data has a power to reveal the truth about people’s lives even when words and pictures have failed.”
So whether you’re interested in open government, open education or any of the other fascinating topics being explored today, I hope that you connect with people who will help you fight for openness, fight for the truth and fight for the rights of people in this country and beyond.
I am not aware of a consensus definition of knowledge graph. I’ve been discussing this for awhile with Liliana Giusti Serra, and the topic came up again with my fellow organizers of the knowledge graph session at US2TS as we prepare for a panel.
I’ve proposed the following main features:
RDF-compatible, has a defined schema (usually an OWL ontology)
items are linked internally
may be a private enterprise dataset (e.g. not necessarily openly available for external linking) or publicly available
covers one or more domains
Below are some quotes.
I’d be curious to hear of other definitions, especially if you think there’s a consensus definition I’m just not aware of.
“A knowledge graph consists of a set of interconnected typed entities and their attributes.”
Jose Manuel Gomez-Perez, Jeff Z. Pan, Guido Vetere and Honghan Wu. “Enterprise Knowledge Graph: An Introduction.” In Exploiting Linked Data and Knowledge Graphs in Large Organisations. Springer. Part of the whole book: http://link.springer.com/10.1007/978-3-319-45654-6
“A knowledge graph is a structured dataset that is compatible with the RDF data model and has an (OWL) ontology as its schema. A knowledge graph is not necessarily linked to external knowledge graphs; however, entities in the knowledge graph usually have type information, defined in its ontology, which is useful for providing contextual information about such entities. Knowledge graphs are expected to be reliable, of high quality, of high accessibility and providing end user oriented information services.”
Boris Villazon-Terrazas, Nuria Garcia-Santa, Yuan Ren, Alessandro Faraotti, Honghan Wu, Yuting Zhao, Guido Vetere and Jeff Z. Pan . “Knowledge graphs: Foundations”. In Exploiting Linked Data and Knowledge Graphs in Large Organisations. Springer. Part of the whole book: http://link.springer.com/10.1007/978-3-319-45654-6
“The term Knowledge Graph was coined by Google in 2012, referring to their use of semantic knowledge in Web Search (“Things, not strings”), and is recently also used to refer to Semantic Web knowledge bases such as DBpedia or YAGO. From a broader perspective, any graph-based representation of some knowledge could be considered a knowledge graph (this would include any kind of RDF dataset, as well as description logic ontologies). However, there is no common definition about what a knowledge graph is and what it is not. Instead of attempting a formal definition of what a knowledge graph is, we restrict ourselves to a minimum set of characteristics of knowledge graphs, which we use to tell knowledge graphs from other collections of knowledge which we would not consider as knowledge graphs. A knowledge graph
mainly describes real world entities and their interrelations, organized in a graph.
defines possible classes and relations of entities in a schema.
allows for potentially interrelating arbitrary entities with each other.
covers various topical domains.”
Paulheim, H. (2017). Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic web, 8(3), 489-508.
“ISI’s Center on Knowledge Graphs research group combines artificial intelligence, the semantic web, and database integration techniques to solve complex information integration problems. We leverage general research techniques across information-intensive disciplines, including medical informatics, geospatial data integration and the social Web.”
Its Table 1: Selected definitions of knowledge graph has the following definitions (for citations see that paper)
“A knowledge graph (i) mainly describes real world entities and their interrelations, organized in a graph, (ii) defines possible classes and relations of entities in a schema, (iii) allows for potentially interrelating arbitrary entities with each other and (iv) covers various topical domains.” Paulheim 
“Knowledge graphs are large networks of entities, their semantic types, properties, and relationships between entities.” Journal of Web Semantics 
“Knowledge graphs could be envisaged as a network of all kind things which are relevant to a specific domain or to an organization. They are not limited to abstract concepts and relations but can also contain instances of things like documents and datasets.” Semantic Web Company 
“We define a Knowledge Graph as an RDF graph. An RDF graph consists of a set of RDF triples where each RDF triple (s, p, o) is an ordered set of the following RDF terms: a subjects∈U∪B,apredicatep∈U,andanobjectU∪B∪L. AnRDFtermiseithera URI u ∈ U, a blank node b ∈ B, or a literal l ∈ L.” Färber et al. 
“[…] systems exist, […], which use a variety of techniques to extract new knowledge, in the form of facts, from the web. These facts are interrelated, and hence, recently this extracted knowledge has been referred to as a knowledge graph.” Pujara et al. 
“A knowledge graph is a graph that models semantic knowledge, where each node is a real-world concept, and each edge represents a relationship between two concepts”
That which computation sets out to map and model it eventually takes over. Google sets out to index all human knowledge and becomes the source and the arbiter of that knowledge: it became what people think. Facebook set out to map the connections between people – the social graph – and became the platform for those connections, irrevocably reshaping societal relationships. Like an air control system mistaking a flock of birds for a fleet of bombers, software is unable to distinguish between the model of the world and reality – and, once conditioned, neither are we.
I am here to bring your attention to two developments that are making me worried:
The Social Graph of Scholarly Communications is becoming more tightly bound into institutional metrics that have an increasing influence on institutional funding
The publishers of the Social Graph of Scholarship are beginning to enclose the Social Graph, excluding the infrastructure of libraries and other independent, non-profit organizations
Normally, I would try to separate these ideas into two dedicated posts but in this case, I want to bring them together in writing because if these two trends converge together, things will become very bad, very quickly.
Let me start with the first trend:
1. The social graph that binds
When I am asked to explain how to achieve a particular result within scholarly communication, more often than not, I find myself describing four potential options:
a workflow of Elsevier products (BePress, SSRN, Scopus, SciVal, Pure)
a workflow of Clarivate products (Web of Science, InCites, Endnote, Journal Citation Reports)
a workflow of Springer-Nature products (Dimensions, Figshare, Altmetrics)
a DIY workflow from a variety of independent sources (the library’s institutional repository, ORCiD, Open Science Framework)
(My apologies for not sharing the text that goes with the slides. Since January of this year, I have been the Head of the Information Services Department at my place of work. In addition to this responsibility, much of my time this year has been spent covering the work of colleagues currently on leave. Finding time to write has been a challenge.)
In Ontario, each institution of higher education must submit a ‘Strategic Mandate Agreement‘ with its largest funding body, the provincial government. Universities are currently in the second iteration of these types of agreements and are preparing for the third round. These agreements are considered fraught by many, including Marc Spooner, a professor in the faculty of education at the University of Regina, who wrote the following in an opinion piece in University Affairs:
The agreement is designed to collect quantitative information grouped under the following broad themes: a) student experience; b) innovation in teaching and learning excellence; c) access and equity; d) research excellence and impact; and e) innovation, economic development and community engagement. The collection of system-wide data is not a bad idea on its own. For example, looking at metrics like student retention data between years one and two, proportion of expenditures on student services, graduation rates, data on the number and proportion of Indigenous students, first-generation students and students with disabilities, and graduate employment rates, all can be helpful.
Where the plan goes off-track is with the system-wide metrics used to assess research excellence and impact: 1) Tri-council funding (total and share by council); 2) number of papers (total and per full-time faculty); and 3) number of citations (total and per paper). A tabulation of our worth as scholars is simply not possible through narrowly conceived, quantified metrics that merely total up research grants, peer-reviewed publications and citations. Such an approach perversely de-incentivises time-consuming research, community-based research, Indigenous research, innovative lines of inquiry and alternative forms of scholarship. It effectively displaces research that “matters” with research that “counts” and puts a premium on doing simply what counts as fast as possible…
Even more alarming – and what is hardly being discussed – is how these damaging and limited terms of reference will be amplified when the agreement enters its third phase, SMA3, from 2020 to 2023. In this third phase, the actual funding allotments to universities will be tied to their performance on the agreement’s extremely deficient metrics.
The measure by which citation counts for each institution are going to be assessed have already been decided. The Ontario government has already stated that it is going to use Elsevier’s Scopus (although I presume they really meant SciVal).
What could possibly go wrong? To answer that question, let’s look at the second trend: enclosure.
2. Enclosing the social graph
The law locks up the man or woman Who steals the goose from off the common But leaves the greater villain loose Who steals the common from off the goose.
As someone who spends a great deal of time ensuring that the scholarship of the University of Windsor’s Institutional Repository meets the stringent restrictions set by publishers, it’s hard not to feel a slap in the face when reading Springer Nature Syndicates Content to ResearchGate.
ResearchGate has been accused of “massive infringement of peer-reviewed, published journal articles.”
They say that the networking site is illegally obtaining and distributing research papers protected by copyright law. They also suggest that the site is deliberately tricking researchers into uploading protected content.
It is not uncommon to find selective enforcement of copyright within the scholarly communication landscape. Publishers have cast a blind eye to the copyright infringement of ResearchGate and Academia.edu for years, while targeting course reserve systems set up by libraries.
Any commercial system that is part of the scholarly communication workflow can be acquired for strategic purposes.
One of the least understood and thus least appreciated functions of calibre is that it uses the Open Publication Distribution System (OPDS) standard (opds-spec.org) to allow one to easily share e-books (at least those without Digital Rights Management software installed) to e-readers on the same local network. For example, on my iPod Touch, I have the e-reader program Stanza (itunes.apple.com/us/app/stanza/id284956128) installed and from it, I can access the calibre library catalogue on my laptop from within my house, since both are on the same local WiFi network. And so can anyone else in my family from their own mobile device. It’s worth noting that Stanza was bought by Amazon in 2011 and according to those who follow the digital e-reader market, it appears that Amazon may have done so solely for the purpose of stunting its development and sunsetting the software (Hoffelder,2013)
And sometimes companies acquire products to provide a tightly integrated suite of services and seamless workflow.
If individual researchers determine that seamlessness is valuable to them, will they in turn license access to a complete end-to-end service for themselves or on behalf of their lab?
And, indeed, whatever model the university may select, if individual researchers determine that seamlessness is valuable to them, will they in turn license access to a complete end-to-end service for themselves or on behalf of their lab? So, the university’s efforts to ensure a more competitive overall marketplace through componentization may ultimately serve only to marginalize it.
The repository must be registered in the Directory of Open Access Repositories (OpenDOAR) or in the process of being registered.
In addition, the following criteria for repositories are required:
Automated manuscript ingest facility
Full text stored in XML in JATS standard (or equivalent)
Quality assured metadata in standard interoperable format, including information on the DOI of the original publication, on the version deposited (AAM/VoR), on the open access status and the license of the deposited version. The metadata must fulfil the same quality criteria as Open Access journals and platforms (see above). In particular, metadata must include complete and reliable information on funding provided by cOAlition S funders. OpenAIRE compliance is strongly recommended.
Open API to allow others (including machines) to access the content
QA process to integrate full text with core abstract and indexing services (for example PubMed)
Automated manuscript ingest facility probably gives me the most pause. Automated means a direct pipeline from publisher to institutional repository that could be based on a publishers’ interpretation of fair use/fair dealing and we don’t know what the ramifications of that decision making might be. I’m feeling trepidation because I believe we are already experiencing the effects of a tighter integration between manuscript services and the IR.
Many publishers – including Wiley, Taylor and Francis, IEEE, and IOP – already use a third party manuscript service called ScholarOne. ScholarOne integrates the iThenticate service which produces reports of what percentage of a manuscript has already been published. Journal editors have the option to set what extent a paper can make use of a researcher’s prior work, including their thesis. Manuscripts that exceed these set thresholds can be automatically rejected without human interjection from the editor. We are only just starting to understand how this workflow is going to impact the willingness of young scholars to make their theses and dissertations open access.
It is also worth noting that ScholarOne is owned by Clarivate Analytics, the parent company of Web of Science, Incites, Journal Citation Reports, and others. One on hand, having a non-publisher act as a third party to the publishing process is probably ideal since it reduces the chances of a conflict of interest. On the other hand, I’m very unhappy with Clarivate Analytics’s product called Kopernio which provides “fast, one-click access to millions of research papers” and “integrates with Web of Science, Google Scholar, PubMed” and 20,000 other sites” (including ResearchGate and Academia.edu natch). There are prominent links to Kopernio within Web of Science that essentially positions the product as a direct competitor to a university library’s link resolver service and in doing so, removes the library from the scholarly workflow – other than the fact that the library pays for the product’s placement.
The winner takes it all
The genius — sometimes deliberate, sometimes accidental — of the enterprises now on such a steep ascent is that they have found their way through the looking-glass and emerged as something else. Their models are no longer models. The search engine is no longer a model of human knowledge, it is human knowledge. What began as a mapping of human meaning now defines human meaning, and has begun to control, rather than simply catalog or index, human thought. No one is at the controls. If enough drivers subscribe to a real-time map, traffic is controlled, with no central model except the traffic itself. The successful social network is no longer a model of the social graph, it is the social graph. This is why it is a winner-take-all game.
My latest Pocket project is called pocket-snack and was born out of a conversation I had with my comrade and fellow Pocket-lover lissertations. We both faced the same dilemma a gore-loving Netflix account holder has: overwhelmed by so much choice, it's difficult to choose any one item. So our Pockets continued to fill up, the dread of opening them grew, and the anxiety caused by all that unread material hovered. This is a familiar problem for those with over-large physical 'to be read' piles (I have one of those too), or sheds full of junk that 'might be useful one day'.
I came up with a solution using a Python script that tags everything in the list with tbr, archives the lot, and then re-adds just a small number of randomly-chosen items back into the list. Now instead of having a list of literally hundreds of unread articles from which to choose, I have a dozen or less: a sensible number that can easily be read, or at least processed. The list is refreshed daily, weekly, or on demand. Two things became evident once I started using pocket-snack:
Dealing with a large group of articles by chunking it into smaller groups is surprisingly effective both at getting any traction at all, and significantly speeding up the process. It forces you to focus on just what is in front of you. I feel that this has also helped me to focus on the thing I'm reading - I've tricked my brain into thinking it's only one of 8 items rather than one of 308, so there's no need to rush or be thinking about all the others.
Randomly choosing from a large list of articles I have consciously bookmarked for future reading over the last several months sometimes creates serendipitous sets, or serendipitous timing. Things I bumped into online months apart but on the same topic will sometimes appear in the same 'snack'. At other times, I've been talking with friends or colleagues about a topic and a relevant article I'd forgotten about will appear. I didn't mean to make a little serendipity machine, but it seems that whilst I was just trying to make something to keep my brain a bit quieter, that's exactly what I built.
Yes we decided to go ahead with a rewrite of our digital collections app, with the new app not based on Hyrax or Valkryie, but a persistence layer based on ActiveRecord (making use of postgres-specific features were appropriate), and exposing ActiveRecord models to the app as a whole.
No, we are not going forward with trying to make that entire“toolkit”, with all the components mentioned there.
But Yes, unlike Alberta, we are taking some functionality and putting it in a gem that can be shared between institutions and applications. That gem is kithe. It includes some sharable modeling/persistence code, like Valkyrie (but with a very different approach than Valkyrie), but also includes some additional fundamental components too.
Scaling back the ambition—and abstraction—a bit
The total architecture outlined in my original post was starting to feel overwhelming to me. After all, we also need to actually produce and launch an app for ourselves, on a “reasonable” timeline, with fairly high chance of success. I left my conversation with U Alberta (which was quite useful, thank you to the Alberta team!), concerned about potential over-reach and over-abstraction. Abstraction always has a cost and building shared components is harder and more time-consuming than building a custom app.
But, then, also informed by my discussion with Alberta, I realized we basically just had to build a Rails app, and this is something I knew how to do, and we could, as we progressed, jetison anything that didn’t seem actually beneficial for that goal or seem feasible at the moment. And, also after discussion with a supportive local team, my anxiety about the project went down quite a bit — we can do this.
Even when writing the original proposal, I knew that some elements might be traps. Building a generalized ACL permissions system in an rdbms-based web app… many have tried, many have fallen. :) Generalized controllers are hard, because they are a piece very tightly tied to your particular app’s UI flows, which will vary.
So we’ve scaled back from trying to provide a toolkit which can also be “scaffolding” for a complete starter app. The goals of the original thought-experiment proposal — a toolkit which provides pieces developers put together when building their own app — are better approached, for now, by scaling back and providing fewer shared tools, which we can make really solid.
After all, building shared code is always harder than building code for your app. You have more use cases to figure out and meet, and crucially, shared code is harder to change because it’s (potentially) got cross-institutional dependents, which you have to not break. For the code I am putting into kithe, I’m trying to make it solidly constructed and well-polished. In purely local code, I’m more willing to do something experimental and hacky — it’s easy enough (comparatively!) to change local app code later. As with all software, get something out there that works, iterating, using what you learn. (It’s just that this is a lot harder to do with shared dependencies without pain!)
So, on October 1st, we decided to embark on this project. We’re willing to show you our fairly informal sketch of a work plan, if you’d like to look.
But we’re not just building a local app, we are also trying to create some shareable components. While the costs and risks of shared code and abstractions are real, I ultimately decided that “just Rails” would not get us to the most maintainable code after all. (And of course nothing is really just Rails, you are always writing code and using non-Rails dependencies; it’s a matter of degree, how much your app seems like a “typical” Rails app to developers).
It’s just too hard to model the data we ourselves already needed (including nested/compound/repeated models) in “just” ActiveRecord, especially in a way that lets you work with it sanely as “just” ActiveRecord, and is still performant. (So we use attr_json, which I also developed, for a No-SQLy approach without giving up rdbms or ActiveRecord benefits including real foreign-key-based associations). And in another example, ActiveStorage was not flexible/powerful enough for our file-handling needs (which are of course at the core of our domain!), and I wasn’t enthused about CarrierWave either — it makes sense to me to make some solid high-quality components/abstractions for some of our fundamental business/domain concerns, while being aware of the risks/costs.
So I’ve put into kithe the components I thought seemed appropriate on several considerations:
Most valuable to our local development effort
Handling the “trickiest” problems, most useful to share
Handling common problems, most likely to be shareable; and it’s hard to build a suite of things that work together without some modelling/persistence assumptions, so got to start there.
I had enough understanding of the use-cases (local and community) that I thought I could, if I took a reasonable amount of extra time, produce something well-polished, with a good developer experience, and a relatively stable API.
That already includes, in maybe not 1.0-production-ready but used in our own in-progress app and released (well-tested and well-documented) in kithe:
A modeling and persistence layer tightly coupled to ActiveRecord, with some postgres-specific features, and recommending use of attr_json, for convenient “NoSQL”-like modelling of your unique business data (in common with existing samvera and valkyrie solutions, you don’t need to build out a normalized rdbms schema for your data). With models that are samvera/PCDM-ish (also like other community solutions).
Including pretty slick handling of “representatives”, dealing with the performance issues in figuring out representative to display with constant query time (using some pg-specific SQL to look up and set “leaf” representative on save).
Including UUIDs as actual DB pk/fks, but also a friendlier_id feature for shorter public URL identifiers, with logic to automatically create such if you wish.
Along with a new derivatives architecture, which seems to me to have the right level of abstraction and affordances to provide a “polished” experience.
All file-handling support based on assuming expensive things happen in the background, and “direct upload” from browser pre-form-submit (possibly to cloud storage)
It will eventually include some solr/blacklight support, including a traject-based indexing setup, and I would like to develop an intervention in blacklight so after solr results are returned, it immediately fetches the “hit” records from ActiveRecord (with specified eager-loading), so you can write your view code in terms of your actual AR models, and not need to duplicate data to solr and logic for dealing with it. This latter is taken from the design of sunspot.
But before we get there, we’re going to spend a little bit of time on purely local features, including export/import routines (to get our data into the new app; with some solid testing/auditing to be confident we have), and some locally bespoke workflow support (I think workflow is something that works best just writing the Rails).
We do have an application deployed as demo/staging, with a basic more-than-just-MVP-but-not-done-yet back-end management interface (note: it does not use Solr/Blacklight at all which I consider a feature), but not yet any non-logged-in end-user search front-end. If you’d like a guest login to see it, just ask.
Technical Evaluation So Far
We’ve decided to tie our code to Rails and ActiveRecord. Unlike Valkyrie, which provides a data-mapper/repository pattern abstraction, kithe expects the dependent code to use ActiveRecord APIs (along with some standard models and modelling enhancements kithe gives you).
This means, unlike Valkyrie, our solution is not “persistence-layer agnostic”. Our app, and any potential kithe apps, are tied to Rails/ActiveRecord, and can’t use fedora or other persistence mechanisms. We didn’t have much need/interest in that, we’re happy tying our application logicand storage to ActiveRecord/postgres, and perhaps later focusing on regularly exporting our data to be stored for preservation purposes in another format, perhaps in OCFL.
It’s worth noting that the data-mapper/repository pattern itself, along the lines valkyrie uses, is favored by some people for reasons other than persistence-swapability. In the Rails and ruby web community at large, there is a contingent that think the data-mapper/repository pattern is better than what Rails gives you, and gives you better architecture for maintainable code. Many of this contingent is big on hanami, and the dry-rb suite. (I have never been fully persuaded by this contingent).
And to be sure, in building out our approach over the last 4 months, I sometimes ran right into the architectural issues with Rails “model-based” architecture and some of what it encourages like dreaded callbacks. But often these were hypothetical problems, “What if someone wanted to do X,” rather than something I actually needed/wanted to do now. Take a breath, return to agility and “build our app”.
And a Rails/ActiveRecord-focused approach has huge advantages too. ActiveRecord associations and eager-loading support are very mature and powerful tools, that when exposed to the app as an API give you very mature, time-tested tools to build your app flexibly and performantly (at least for the architectures our community are used to, where avoiding n+1 queries still sometimes seems like an unsolved problem!). You have a whole Rails ecosystem to rely on, which kithe-dependent apps can just use, making whatever choices they want (use reform or not?) as with most any Rails app, without having to work out as many novel approaches or APIs. (To be sure, kithe still provides some constraints and choices and novelty — it’s a question of degree).
Trying to build up an alternative based on data-mapper/repository, whether in hanami or valkyrie, I think you have a lot of work to do to be competitive with Rails mature solutions, sometimes reproducing features already in ActiveRecord or it’s ecosystem. And it’s not just work that’s “time implementing”, it’s work figuring out the right APIs and patterns. Hanami, for instance, is probably still not as mature, as Rails, or as easy to use for a newcomer.
By not having to spend time re-inventing things that Rails already has solutions for, I could spend time on our actual (digital collections) domain-specific components that I wasn’t happy with existing solutions for. Like spending time on creating shareable file handling and derivatives solutions that seem to me to be well-polished, and able to be used for flexible use-cases without feeling like you’re fighting the system or being surprised by it. Components that hopefuly can be re-used by other apps too.
I think schneem’s thoughts on “polish” are crucial reading when thinking about the true costs of shared abstractions in our community. There is a cost to additional abstractions: in initial implementation, ongoing maintenance, developer on-boarding, and just figuring out the right architectures and APIs to provide that polish. Sometimes these costs are worthwhile in delivered benefits, of course.
I’d consider our kithe-based approach to be somewhere in between U Alberta’s approach and valkryie, in the dimension of “how close do we stick to and tie our line to ‘standard’ Rails”.
Unlike Hyrax, we are building our own app, not trying to use a shared app or “solution bundle” like Hyrax. I would suggest we share that aspect with both the U Alberta approach as well as the several institutions building valkyrie-not-hyrax apps. But if you’ve had good experiences with the over-time maintenance costs of Hyrax, you have a use case/context where Hyrax has worked well for you — then that’s great, and there’s never anything wrong with doing what has worked for you.
Overall, 4 months in, while some things have taken longer to implement than I expected, and some unexpected design challenges have been encountered — I’m still happy with the approach we are taking.
If you are considering a based-on-valkyrie-no-hyrax approach, I think you might be in a good position to consider a kithe approach too.
How do we evaluate success?
We want to have a replacement app launched in about a year.
I think we’re basically on target, although we might not hit it on the nose, I feel confident at this point that we’re going to succeed with a solid app, in around that timeline. (knock on wood).
When we were considering alternate approaches before committing to this one, we of course tried to compare how long this would take to various other approaches. This is very hard to predict, because you are trying to compare multiple hypotheticals, but we had to make some ballpark guesses (others may have other estimates).
Is this more or less time than it would have taken to migrate our sufia app to current hyrax? I think it’s probably taking more time to do it this new way, but I think migrating our sufia app to current hyrax (with all it’s custom functionality for current features) would not have been easy or quick — and we weren’t sure current hyrax was a place we wanted to end up.
Is it going to take more or less time than it would have taken to write an app on valkyrie, including any work we might contribute to valkyrie for features we needed? It’s always hard to guess these things, but I’d guess in the same ballpark, although I’m optimistic the “kithe” approach can lead to developer time-savings in the long-run.
(Of course, we hope if someone else wants to follow our path, they can re-use what’s now worked out in kithe to go quicker).
We want it to be an app whose long-term maintenance and continued development costs are good
In our sufia-based app, we found it could be difficult and time-consuming to add some of the features we needed. We also spent a lot of time trying to performance-tune to acceptable levels (and we weren’t alone), or figure out and work towards a manageable and cost-efficient cloud deployment architecture.
I am absolutely confident that our “kithe” approach will give us something with a lower TCO (“total cost of ownership”) than we had with sufia.
Will it be a lower TCO than if we were on the present hyrax (ignoring how to get there), with our custom features we needed? I think so, and that current hyrax isn’t different enough from sufia we are used to — but again this is necessarily a guess, and others may disagree. In the end, technical staff just has to make their best predictions based on experience (individual and community). Hyrax probably will continue to improve under @no-reply’s steady leadership, but I think we have to make our decisions on what’s there now, and that potential rosey future also requires continued contribution by the community (like us) if it is to come to fruition, which is real time to be included in TCO too. I’m still feeling good about the “write our own app” approach vs “solution bundle”.
Will we get a lower TCO than if we had a non-hyrax valkyrie-based app? Even harder to say. Valkryie has more abstractions and layers that have real ongoing maintenance costs (that someone has to do), but there’s an argument that those layers will lower your TCO over the long-term. I’m not totally persuaded by that argument myself, and when in doubt am inclined to choose the less-new-abstraction path, but it’s hard to predict the future.
One thing worth noting is the main thing that forced our hand in doing something with our existing sufia-based app is that it was stuck on an old version of Rails that will soon be out-of-support, and we thought it would have been time-consuming to update, one way or another. (When Rails 6.0 is released, probably in the next few months, Rails maintenance policy says nothing before 5.2 will be supported.) Encouragingly, both kithe and attr_json dependency (also by me), are testing green on Rails 6.0 beta releases — and, I was gratified to see, didn’t take any code changes to do so, they just passed. (Valkyrie 1.x requires Rails 5.1, but a soon-to-be-released 2.0 is planned to work fine up to Rails 6; latest hyrax requires Rails 5.1 as well, but the hyrax team would like to add 5.2 and 6 soon).
We want easier on-boarding of new devs for succession planning
All developers will leave eventually (which is one reason I think if you are doing any local development, a one-developer team is a bad idea — you are guaranteeing that at some point 100% of your dev team will leave at once).
We want it to be easier to on-board new developers. We share U Alberta’s goal that what we could call a “typical Rails developer” should be able to come on and maintain and enhance the app.
Are we there? Well, while our local app is relatively simple rails code (albeit using kithe API’s), the implementation of kithe and attr_json, which a dev may have to delve into, can get a bit funky, and didn’t turn out quite as simple as I would have liked.
But when I get a bit nervous about this, I reassure myself remembering that:
Also worth pointing out that when we last posted a position, we got no qualified applicants with samvera, or even Rails, experience. We did make a great hire though, someone who knew back-end web dev and knew how to learn new tools; it’s that kind of person that we ideally need our codebase to be accessible to, and the sufia-based one was not.
b) Recruiting and on-boarding new devs is always a challenge for any small dev shop, especially if your salaries are not seen as competitive. It’s just part of the risk and challenge you accept when doing local development as a small shop on any platform. (Whether that is the right choice is out of scope for this post!)
I think our code is going to end up more accessible to actually-existing newly onboarded devs than a customized hyrax-based solution would be. More than Valkyrie? I do think so myself, I think we have fewer layers of “specialty” stuff than valkyrie, but it’s certainly hard to be sure, and everyone must judge for themselves.
I do think any competent Rails consultancy (without previous LAM/samvera expertise) could be hired to deal with our kithe-based app no problem; I can’t really say if that would be true of a Valkyrie-based app (it might be); I do not personally have confidence it would be true of a hyrax-based app at this point, but others may have other opinions (or experience?).
Evaluating success with the community?
Ideally, we’d of course love it if some other institutions eventually developed with the kithe toolkit, with the potential for sharing future maintenance of it.
Even if that doesn’t happen, I don’t think we’re in a terrible place. It’s worth noting that there has been some non-LAM-community Rails dev interest in attr_json, and occasional PRs; I wouldn’t say it’s in a confidently sustainable place if I left, but I also think it’s code someone else could step into and figure out. It’s just not that many lines of code, it’s well-tested and well-documented, and and i’ve tried to be careful with it’s design — but take a look at and decide for yourself!. I can not emphasize enough my belief that if you are doing local development at all (and I think any samvera-based app has always been such), you should have local technical experts doing evaluation before committing to a platform — hyrax, valkyrie, kithe, entirely homegrown, whatever.
Even if no-one else develops with kithe itself, we’d consider it a success if some of the ideas from kithe influence the larger samvera and digital collections/repository communities. You are welcome to copy-paste-modify code that looks useful (It’s MIT licensed, have at it!). And even just take API ideas or architectural concepts from our efforts, if they seem useful.
We do take seriously participating in and giving back to the larger community, and think trying a different approach, so we and others can see how it goes, is part of that. Along with taking the extra time to do it in public and write things up, like this. And we also want to maintain our mutually-beneficial ties to samvera and LAM technologist communities; even if we are using different architectures, we still have lots of use-cases and opportunities for sharing both knowledge and code in common.
Take a look?
If you are considering development of a non-Hyrax valkyrie-based app, and have the development team to support that — I believe you have the development team to support a kithe-based approach too.
I would be quite happy if anyone took a look, and happy to hear feedback and have conversations, regardless of whether you end up using the actual kithe code or not. Kithe is not 1.0, but there’s definitely enough there to check it out and get a sense of what developing with it might be like, and whether it seems technically sound to you. And I’ve taken some time to write some good “guide” overview docs, both for potential “onboarding” of future devs here, and to share with you all.
We have a staging server for our in-development app based on kithe; if you’d like a guest login so you can check it out, just ask and I can share one with you.
I had mentioned on twitter that an option exists in the MarcEdit preferences that automatically configures the tool to keep a 10 day backup of one’s configuration settings. This led to a couple questions, like – what is being backed up, how do I turn this on, when was it added?
This made me realize that I probably didn’t do a good job pointing out when this was integrated into the application. So, let’s answer those questions now.
When was this added to MarcEdit?
This was added to MarcEdit 7 around version 7.1.75, and will be added to MarcEdit Mac 3.1.60.
What is being backed up?
Essentially, I wanted to make sure that if something wonky happened – there would be a place where a last good back up could be found. This is especially true of protecting a users investments in creating tasks. To that end, the back up function automatically creates a copy of a users config, macros, and xslt directories. This preserves a backed up copy of all configuration data and task data found within the program.
How are backups managed?
The program will keep the 10 most recent backups, with backups time stamped. After 10 backups have been accumulated, the program will drop the oldest and rotate a new backup into the backup folder. If this function is enabled, this evaluation occurs the first time the program is opened for the day.
If I need them, how do I recover my backups?
Right now – backups are stored in the User Data Directory (generally c:\users\[username]\appdata\marcedit7) inside the backup_settings folder.
Backups are stored as plain zip files – this means that to recover, users simply need to extract the zip file and replace the current folders with the data found in the zip file. Longer term, I may add a on click restore to the application to streamline the process.
How do I turn the backup functionality on?
This is turned on by default. But users can enable or disable it by going to the application Preferences, selecting Other, and unchecking/checking the Backup Settings option.
That’s it. The idea behind this functionality was to try and be unobtrusive while at the same time providing users with a way to keep copies of important data.
I learned about serif and sans serif typefaces, about varying the amount of space between different letter combinations, about what makes great typography great. It was beautiful, historical, artistically subtle in a way that science can’t capture, and I found it fascinating.
Generosity and thoughtfulness are not in abundance right now, and so Kathleen Fitzpatrick‘s important new book, Generous Thinking: A Radical Approach to Saving the University, is wholeheartedly welcome. The generosity Kathleen seeks relates to lost virtues, such as listening to others and deconstructing barriers between groups. As such, Generous Thinking can be helpfully read alongside of Alan Jacobs’s How to Think, as both promote humility and perspective-taking as part of a much-needed, but depressingly difficult, re-socialization. Today’s polarization and social media only make this harder.
Fitzpatrick’s analysis of the university’s self-inflicted wounds is painful to acknowledge for those of us in the academy, but undoubtedly true. Scholars are almost engineered to cast a critical eye on all that passes before them, and few articulate their work well to broader audiences. Administrators are paying less attention than in the past to the communities that surround their campuses. Perhaps worst of all, the incentive structures of universities, such as the tenure process and college rankings, strongly reinforce these issues.
I read Generous Thinkingin a draft form last year and thought an appropriate alternate title might be The Permeable University. Many of Fitzpatrick’s prescriptions involve dissolving the membrane of the academy so that it can integrate in a mutually beneficial way with the outside world, on an individual and institutional level. You will be unsurprised to hear that I agree completely with many of her suggestions, such as open access to scholarly resources and the importance of scholars engaging with the public. Like Fitzpatrick, I have had a career path that has alternated between the nonprofit and academic worlds in the pursuit of platforms and initiatives that try to maximize those values.
With universities currently receiving withering criticism from both the right and left, it is critical for all of us in the academy to take Generous Thinking seriously, and to think about other concrete steps we can take to open our doors and serve the wider public. The deep incentive structures will be very hard to change, but we can all take more modest steps such as thinking about how new media like podcastscan play a role in a more publicly approachable and helpful university, or how we might be able to provide services (e.g., archival services) to local communities. Fitzpatrick’s Humanities Commons, a site for scholars to connect not just with each other but with the public, is another venue for making the generosity she seeks a reality.