Planet Code4Lib

2021 Evergreen International Online Conference / Evergreen ILS

We are just a couple weeks away from the 2021 Evergreen International Online Conference! The conference is taking place Tuesday, May 25th through Thursday, May 27th 2021. A separate preconference event will be held Monday, May 24th and a documentation and developer hackfest is planned for Friday, May 28th. We encourage you to use the conference hashtag #evgils21 in any social media posts.

Registration for preconference & main conference events will be open until Monday, May 17th. Registration for the hackfests, which are free, will remain open though the conference.

There’s a great lineup of sessions planned, so please see the full Conference Schedule and Program Descriptions pages. All preconference and regular conference sessions will be recorded and made available within a few weeks after the conference. Recordings will be posted on the community YouTube page. All sessions except interest groups & roundtables will also have live captioning available.

As a reminder to all attendees, this event is subject to the community Code of Conduct and Photography/Audio/Video Policy. There are designated Responders for this event as well as a form to report Code of Conduct violations.

Last but not least, a huge thanks to our Sponsors and Exhibitors for making this event possible! We are so grateful for your support of both the conference and the larger Evergreen community.

Venture Capital Isn't Working: Addendum / David Rosenthal

I didn't find Nicholas Colin's Bill Janeway on Who Should Be in Control in time, or it would have been a significant part of Venture Capital Isn't Working. So, below the fold, an addendum discussing legendary VC Bill Janeway's views, and an interesting paper that he cites, The Rise of Dual-Class Stock IPOs by Dhruv Aggarwal et al.

Janeway and Aggrawal et al are both discussing another trend that, over the last couple of decades, has also led impaired the effectiveness and the returns of startups. This is the way that founders have been able to retain control of their company even after an IPO by creating two classes of stock, one that they sell to the public and another, with vastly greater voting rights, that they retain. Successful examples include Google and Facebook, but there are many more recent contrary ones, including Theranos and WeWork.

Janeway starts by identifying four types of risk that startups encounter:
both the entrepreneur and the investor are confronted with four dimensions of risk, which have different degrees of uncertainty:
  • Technology Risk: “When I plug it in, will it light up?”
  • Market Risk: “Who will buy it if it does work?”
  • Financing Risk: “Will the capital be there to fund the venture to positive cash flow?”
  • Business (or Management) Risk: “Will the team manage the transition from startup to sustainable business, especially given the challenge of building an effective channel to the market?”
Janeway starts, as both Funk and I did, by pointing out that there is "too much money" flooding the market for startups:
Today, we happen to be in a regime where there is an overwhelming supply of risk-seeking capital, most of which is coming from so-called “unconventional investors” (private equity firms, hedge funds, mutual funds, SoftBank, etc.). ... over the last five years, more than half of the amount of venture capital deployed in the US entrepreneurial ecosystem has been coming from these unconventional investors.
Too much money from inexperienced investors chasing too few good startups has not merely reduced the returns to VC investors, but also tilted the bargaining table against them:
Such investors are accepting illiquidity and paying very high prices, all with no ability to change their minds. These are typically public market investors who tend to take liquidity for granted, and yet in this case, even though they are giving up liquidity, by and large, they are not gaining any governance rights! As a result, the balance of power between entrepreneurs and investors has shifted.

Actually the balance of power was already shifting thanks to the concept of the entrepreneur-friendly venture capital firm, of which, of course, my dear friends at Andreessen Horowitz are the most visible advocates. This trend began well before the flood of unconventional capital into venture.
When founders have the upper hand, they will choose VCs offering them the most advantageous terms, which likely include flattering their vanity by entrenching their control over "their company". To maintain their "deal flow", VCs have to become increasingly "entrepreneur-friendly". Aggarwal et al's abstract reads:
We create a novel dataset to examine the nature and determinants of dual-class IPOs. We document that dual-class firms have different types of controlling shareholders and wedges between voting and economic rights. We find that the founders' wedge is largest when founders have stronger bargaining power. The increase in founder wedge over time is due to increased willingness by venture capitalists to accommodate founder control and technological shocks that reduced firms' needs for external financing. Greater founder bargaining power is also associated with a lower likelihood of sunset provisions that eliminate dual-class structures within specified periods.
Janeway argues that the start of this process was:
Google opting for a founder-friendly governance and later a dual-class structure as a public company seems to be the foundational event that inspired others, including Mark Zuckerberg at Facebook. I wasn’t present, so I don’t know how Larry Page and Sergey Brin did it. They had no revenue, and yet they managed to negotiate their entrenchment with two of the best venture capitalists in the history of capitalism, John Doerr at Kleiner Perkins and Mike Moritz at Sequoia.
Janeway summarizes Aggrawal et al's conclusions thus:
the entrenchment of founder control all the way to the IPO is due to two things: “Increased willingness by venture capitalists to accommodate founder control”  at the earlier stages, and “technological shocks that reduced firms’ needs for external financing” (the rise of open source software and cloud computing, which reduces the amount of capital required to start and grow a company). In other words, founders raise less capital relative to previous generations, and they raise it from more accommodating venture capitalists.
How does the recent rise in founders whose control is entrenched make failure or fraud more likely? Janeway writes:
In my own 35-year experience ... the adequacy of management (Business Risk) is the dominant risk to which a startup is exposed.

I have this phrase I used to use: It’s amazing what first-class management can do with a second-class idea, but there is no idea so good that it can’t be destroyed by inadequate management. And it is the case in more than several salient examples, ... that changing management was the essential step towards creating a really successful, sustainable, valuable business.
There have been a lot of recent CEOs that should have been (e.g. Elizabeth Holmes) or belatedly were (e.g. Adam Neumann, Justin Zhu) changed. Janeway makes a point with which I completely agree, the importance of:
the experience of stress. In BEA Systems, a company we invested in and became a huge success, and where we didn’t fire the founders, all of the founders had been through a failed startup or a near-death experience of an established company.

By the way, this is usually omitted in the mythic histories of Silicon Valley, but circa 1990, both Sun Microsystems and Oracle each almost went bankrupt. The truth is, building a really substantial business at a fast pace is extremely stressful, and understanding how people have been able to learn from that explains why some founders succeed against all odds.
One of the BEA founders they didn't fire was Bill Coleman, for whom I worked at Sun. The other founders were also ex-Sun, so they all had been through the near-death experience from which Sun escaped thanks to the 1987 deal with AT&T.

Among the major companies that went through the near-death experience are:
  • Intel, during the transition from making memories to making processors.
  • Nvidia, both when their first chip was a technical success but a market failure, and again before 3DFX's patent lawsuit was stymied by Nvidia's countersuit.
A striking feature of the early days at Sun was the emphasis the company's first CFO, the redoutable Bob Smith, placed on ensuring everyone was aware each quarter of the financials. Jen-Hsun Huang continued this in the early days of Nvidia, constantly repeating "cash is king"! Janeway agrees:
when you have obligations that you have to meet in cash: you can pay those obligations out of operating revenues, out of the sale of assets, or out of the issuance of new securities. And when you run out of those three alternatives, that’s when you call in the lawyers!
...
this is my concern with the entrenchment of founders: in my experience, unless they have previously been through a failed startup or at least the near-death experience of an established business (that is, flirted with bankruptcy), they are insensitive to those stressors. And that’s the problem in the current environment: there’s no need for them to be sensitive! But I find it extraordinarily unlikely that the current environment will last indefinitely.
It is a common misperception that now, when there is a flood of money available, is a great time to start a company. This is precisely backwards. The best time to do a startup is when no-one else is. We started Nvidia in one of Silicon Valley's periodic downturns - we were the only hardware company to get money that quarter. The downturn meant we could get to the best lawyers, the best real estate people, the best CAD software and, crucially, the best VCs. Six months later, when we knew of over 30 other companies starting to attack our market, we would have been competing for attention from these best-in-class resources. The difference between the best-in-class VCs and the rest is huge, as Janeway points out:
It’s really up to entrepreneurs to do their due diligence on the investors they get on board. Venture capitalists have track records. Some of them are ignorant or stupid, some of them are bullies, some of them behave badly. This should not be a surprise. It’s possible that a naive first-time entrepreneur might not know how or have the network to do effective due diligence. But it’s out there waiting to be done.

About that, let me just say one thing about venture capital that’s really different. It’s not the extremely skewed returns: we see that across various asset classes. Rather, it’s the persistence of a firm’s returns over several decades, as seen in the US and documented with the data provided not by venture capitalists themselves, but by their limited partners!
Aggrawal et al conclude:
We show that the main factor that predicts dual-class structures and a greater wedge is the amount of available private financing for startups. The more outside opportunities the founders have, the greater their bargaining power when raising capital. Therefore, they are better able to retain control of the firm after the IPO. We also document a decrease in VC firms’ aversion to dual-class structures and attribute it–at least in part–to the reduction in the costs of doing business due to technological advances in the software and service industries. This finding is consistent with the idea that when there is a lower need for financing, founders are less likely to relinquish their power to VC firms.
...
The literature to date has shown that increases in private financing have led to a decrease in the number of firms in public markets, and therefore more concentrated ownership in the economy as a whole. We further show that greater private financing may also cause public markets themselves to change by allowing founders to retain greater power to pursue their visions, without keeping an economic stake in the firm.

Our findings provide critical input for evaluating policy proposals that affect the nature of public and private markets. They suggest that policies to liberalize private markets by loosening the restrictions on selling and trading in private securities (SEC, 2019) may not only make public issuances less desirable, but may also increase the likelihood that the firms that do ultimately go public will be controlled by their founders.
Who could have predicted that the flood of money would lead to looser discipline of entrepreneurs, leading to lower returns, fraud and irresponsible behavior by founders, greater inequality, and reduced influence for holders of public equity?

twarc2 demo on TwitterDev / Ed Summers

I wrote recently about twarc2 and the support for Twitter’s v2 API, but I also just wanted to share this session that I did on April 22, 2021 with Suhem Parack on the TwitterDev Twitch about installing twarc2 and using it to collect data from the Twitter v2 API. We focused our discussion primarily on researchers who have access to the Academic Product Track, which allows searching of the full archive of tweets.

This video is hosted on the Nocturlab’s PeerTube instance which is nice because if you watch it while other people happen to be looking at it too you will download parts of it from them using the magic of PeerTube and WebTorrent.

But, in case that fails for whatever reason, there’s also a backup at the Internet Archive.

‘Reimagine Descriptive Workflows’ in libraries and archives / HangingTogether

In the OCLC Research Library Partnership, we have a practice of “Learning Together” – we listen for issues that form a shared challenge space within our global partnership, then we synthesize and share findings, and seek to define a path forward, when appropriate.

What we have heard, for some time, is that describing collections in a respectful and inclusive ways is a challenge. We saw this reflected in a survey we conducted in 2017 on Equity, Diversity, and Inclusion (EDI) efforts at OCLC RLP institutions. Although institutions responding to that survey had active efforts in many different aspects of EDI, they struggled to gain traction in the area of describing collections in respectful and appropriate ways. This explains why webinars we have organized on these topics have been so popular, and why so many people view the recordings and slides of these sessions. People are hungry for tools and models to advance their work in this area. This is also a topic that our Metadata Managers Focus Group has returned to, time and again. To learn more, we undertook a series of interviews last year to better understand the specific challenges that are faced by those who seek to implement respectful and inclusive descriptive practices centered around Indigenous people and materials related to them.

Over the last few years, we have adjusted our own practices and approaches to research, using a more consultative and community-based approach than we had previously engaged in. We are also actively working as a team to educate ourselves so we can better advance racial equity.

I’m delighted to share with you the great news that we are undertaking a new project in this area, funded in part by the Andrew W. Mellon Foundation. You can read the official announcement below. Several of your OCLC RLP team members, including our Executive Director, Rachel Frick, along with Mercy Procaccini, Chela Scott Weber, and myself are engaged in the project. We are doing this work because of what we’ve learned from you, and in response to the needs of the OCLC RLP. Thank you for the inspiration, and thank you for your support through Partnership dues which helps make our work possible!

I hope you will share news about this project in your networks. We are ALWAYS happy to answer your questions, and will be sharing more about this work here so stay tuned!

Reimagine Descriptive Workflows, a new project from OCLC underway

OCLC has been awarded a grant from The Andrew W. Mellon Foundation to convene a diverse group of experts, practitioners, and community members to determine ways to address systemic biases and improve racial equity in descriptive practices, tools, infrastructure and workflows in libraries and archives. The multi-day virtual convening is part of an eight-month project, Reimagine Descriptive Workflows. Read the press release.

Working in consultation with Shift Collective, a nonprofit consulting group that helps cultural institutions build stronger communities through lasting engagement, along with an advisory group of community leaders, OCLC will:

  • Convene a conversation of community stakeholders about how to address the systemic issues of bias and racial equity within our current collection description infrastructure.
  • Share with libraries the need to build more inclusive and equitable library collections.
  • Develop a community agenda to help clarify issues for those who do knowledge work in libraries, archives, and museums; prioritize areas for attention from these institutions; and provide guidance for those national agencies and suppliers.

Learn more about the initiative at https://oc.lc/reimagine-workflows

The post ‘Reimagine Descriptive Workflows’ in libraries and archives appeared first on Hanging Together.

PTPBio and the Reader / Eric Lease Morgan

reflections[The following missive was written via an email message to a former colleague, and it is a gentle introduction to Distant Reader “study carrels”. –ELM]

On another note, I see you help edit a journal (PTPBio), and I used it as a case-study for a thing I call the Distant Reader.

The Distant Reader takes an arbitrary amount of text as input, does text mining and natural language processing against it, saves the result as a set of structured data, writes a few reports, and packages the whole thing into a zip file. The zip file is really a data set, and Distant Reader data sets are affectionately called “study carrels”. I took the liberty of applying the Reader process to PTPBio, and the result has manifested itself in a number of ways. Let me enumerate them. First, there is the cache of the original content;

Next, there are plain text versions of the cached items. These files are used for text mining, etc.:

The Reader does many different things against the plain text. For example, the Reader enumerates and describes each and every token (“word”) in each and every document. The descriptions include the word, its lemma, is part-of-speech, and its location in the corpus. Each plain text file is really a tab-delimited file easily importable into your favorite spreadsheet or database program:

Similar sets of files are created for named entities, URLs, email addresses, and statistically significant keywords:

All of this data is distilled into a (SQLite) database file, and various reports are run against the database. For example, a very simple and rudimentary report as well as a more verbose HTML report:

All of this data is stored in a single directory:

Finally, the whole thing is zipped up and available for downloading. What is cool about the download is that it is 100% functional on your desktop as it is on the ‘Net. The study carrels does not require the ‘Net to be operational; study carrels are manifested as plain text files, are stand-alone items, and will endure the test of time:

“But wait. There’s more!”

It is not possible for me to create a Web-based interface empowering students, researchers, or scholars to answer any given research question. There are too many questions. On the other hand, since the study carrels are “structured”, one can write more sophisticated applications against the data. That is what the Reader Toolbox and Reader (Jupyter) Notebooks are for. Using the Toolbox and/or the Notebooks the student, researcher, or scholar can do all sorts of things:

  • download carrels from the Reader’s library
  • extract ngrams
  • do concordancing
  • do topic modeling
  • create a full text index
  • output all sentences containing a given word
  • find all people, use the ‘Net to get birth date and death dates, and create a timeline
  • find all places, use the ‘Net to get locations, and plot a map
  • articulate an “interesting” idea, and illustrate how that idea ebbed & flowed over time
  • play hangman, do a cross-word puzzle, or plat a hidden word search game

Finally, the Reader is by no means perfect. “Software is never done. If it were, then it would be called ‘hardware’.” Ironically though, the hard part about the Reader is not interpreting the result. The hard part is two other things. First, in order to use the Reader effectively, a person needs to have a (research) question in mind. The question can be as simple as “What are the characteristics of the given corpus?” Or, they can be as sublime as “How does St. Augustine define love, and how does his definition differ from Rousseau’s?”

Just as difficult it the creation of the corpus to begin with. For example, I needed to get just the PDF versions of your journal, but the website (understandably) is covered with about pages, navigation pages, etc. Listing the URLs of the PDF files was not difficult, but it was a bit tedious. Again, that is not your fault. In fact, your site was (relatively) easy. Some places seem to make it impossible to get to the content. (Sometimes I think the Internet is really one huge advertisement.)

Okay. That was plenty!

Your journal was a good use-case. Thank you for the fodder.

Oh, by the way, the Reader is located at https://distantreader.org, and it available for use by anybody in the world.

Dogecoin Disrupts Bitcoin! / David Rosenthal

Two topics I've posted about recently, Elon Musk's cult and the illusory "prices" of cryptocurrencies, just intersected in spectacular fashion. On April 14 the Bitcoin "price" peaked at $63.4K. Early on April 15, the Musk cult saw this tweet from their prophet. Immediately, the Dogecoin "price" took off like a Falcon 9.

A day later, Jemima Kelley reported that If you believe, they put a Dogecoin on the moon. That was to say that:
Dogecoin — the crypto token that was started as a joke and that is the favourite of Elon Musk — is having a bit of a moment. And when we say a bit of a moment, we mean that it is on a lunar trajectory (in crypto talk: it is going to da moon).

At the time of writing this, it is up over 200 per cent in the past 24 hours — more than tripling in value (for those of you who need help on percentages, it is Friday afternoon after all). Over the past week it’s up more than 550 per cent (almost seven times higher!).
The headlines tell the story — Timothy B. Lee's Dogecoin has risen 400 percent in the last week because why not and Joanna Ossinger's Dogecoin Rips in Meme-Fueled Frenzy on Pot-Smoking Holiday.

The Dogecoin "price" graph Kelly posted was almost vertical. The same day, Peter Schiff, the notorious gold-bug, tweeted:
So far in 2021 #Bitcoin has lost 97% of its value verses #Dogecoin. The market has spoken. Dogecoin is eating Bitcoin. All the Bitcoin pumpers who claim Bitcoin is better than gold because its price has risen more than gold's must now concede that Dogecoin is better than Bitcoin.
Below the fold I look back at this revolution in crypto-land.

I'm writing on April 21, and the Bitcoin "price" is around $55K, about 87% of its peak on April 14. In the same period Dogecoin's "price" peaked at $0.37, and is now around $0.32, or 267% of its $0.12 "price" on April 14. There are some reasons for Bitcoin's slump apart from people rotating out of BTC into DOGE in response to Musk's tweet. Nivesh Rustgi reports:
Bitcoin’s hashrate dropped 25% from all-time highs after an accident in the Xinjiang region’s mining industry caused flooding and a gas explosion, leading to 12 deaths with 21 workers trapped since.
...
The leading Bitcoin mining data centers in the region have closed operations to comply with the fire and safety inspections.

The Chinese central authority is conducting site inspections “on individual mining operations and related local government agencies,” tweeted Dovey Wan, partner at Primitive Crypto.
...
The accident has reignited the centralization problems arising from China’s dominance of the Bitcoin mining sector, despite global expansion efforts.
The drop in the hash rate had the obvious effects. David Gerard reports:
The Bitcoin hash rate dropped from 220 exahashes per second to 165 EH/s. The rate of new blocks slowed. The Bitcoin mempool — the backlog of transactions waiting to be processed — has filled. Transaction fees peaked at just over $50 average on 18 April.
The average BTC transaction fee is now just short of $60, with a median fee over $26! The BTC blockchain did around 350K transactions on April 15, but on April 16 it could only manage 190K.

It is also true that DOGE had upward momentum before Musk's tweet. After being nearly flat for almost a month, it had already doubled since April 6.

Kelly quotes David Kimberley at Freetrade:
Dogecoin’s rise is a classic example of greater fool theory at play, Dogecoin investors are basically betting they’ll be able to cash out by selling to the next person wanting to invest. People are buying the cryptocurrency, not because they think it has any meaningful value, but because they hope others will pile in, push the price up and then they can sell off and make a quick buck.

But when everyone is doing this, the bubble eventually has to burst and you’re going to be left short-changed if you don’t get out in time. And it’s almost impossible to say when that’s going to happen.
Kelly also quotes Khadim Shubber explaining that this is all just entertainment:
Bitcoin, and cryptocurrencies in general, are not directly analogous to the fairly mundane practice of buying a Lottery ticket, but this part of its appeal is often ignored in favour of more intellectual or high-brow explanations.

It has all the hallmarks of a fun game, played out across the planet with few barriers to entry and all the joy and pain that usually accompanies gambling.

There’s a single, addictive reward system: the price. The volatility of cryptocurrencies is often highlighted as a failing, but in fact it’s a key part of its appeal. Where’s the fun in an asset whose price snoozes along a predictable path?

The rollercoaster rise and fall and rise again of the crypto world means that it’s never boring. If it’s down one day (and boy was it down yesterday) well, maybe the next day it’ll be up again.
Note the importance of volatility. In a must-read interview that New York Magazine entitled BidenBucks Is Beeple Is Bitcoin Prof. George Galloway also stressed the importance of volatility:
Young people want volatility. If you have assets and you’re already rich, you want to take volatility down. You want things to stay the way they are. But young people are willing to take risks because they can afford to lose everything. For the opportunity to double their money, they will risk losing everything. Imagine a person who has the least to lose: He’s in solitary confinement in a supermax-security prison. That person wants maximum volatility. He prays for such volatility, that there’s a revolution and they open the prison.

People under the age of 40 are fed up. They have less than half of the economic security, as measured by the ratio of wealth to income, that their parents did at their age. Their share of overall wealth has crashed. A lot of them are bored. A lot of them have some stimulus money in their pocket. And in the case of GameStop, they did what’s kind of a mob short squeeze.
...
I see crypto as a mini-revolution, just like GameStop. The central banks and governments are all conspiring to create more money to keep the shareholder class wealthy. Young people think, That’s not good for me, so I’m going to exit the ecosystem and I’m going to create my own currency.
This all reinforces my skepticism about the "price" and "market cap" of cryptocurrencies.

Update 5th May 2021: Jemima Kelly returns to the subject with Dogecoin really is man’s best friend:
Doge’s performance over the past few months has made the bitcoin maximalists and other earnest crypto types who try to convince the world that crypto is a serious asset class very mad, though they want to pretend it hasn’t. Because, it does make them look a little silly doesn’t it?

But given that markets have been a joke for the past year or so, why shouldn’t a joke coin come out on top? In a world where nothing makes sense, this actually . . . kinda makes sense.

Documentation Interest Group Sprint Wrap Up / Islandora

Documentation Interest Group Sprint Wrap Up agriffith Wed, 05/05/2021 - 20:59
Body

The Islandora Documentation Interest Group (DIG) held a sprint from April 19-30, 2021 in order to audit the current Islandora documentation in preparation for a second sprint aimed at editing and improving those docs for the upcoming release. After two weeks of work, almost all of the docs have been audited and reviewed, and the DIG has plans for the sections that remain incomplete. 

Big thanks to our team of volunteers, especially the University of Texas at Austin, which was very well represented!

  • Melissa Anez (LYRASIS)

  • Amy Blau (Whitman College)

  • Melanie Cofield (University of Texas Libraries)

  • Katie Coldiron (UT Austin School of Information)

  • Morgan A. Colbert (University of Texas Libraries)

  • Cary Gordon (Cherry Hill Company)

  • Mirko Hanke (University of Texas Libraries)

  • David Kwasny (University of Toronto Scarborough)

  • Mandy Ryan (University of Texas Libraries)

  • Yamil Suarez (Berklee College)

 

Next up will be a sprint focussed on writing and editing those audited docs, and will run from May 19 - 30, 2021. Please sign up now to join us!

Code Sprint Wrap Up / Islandora

Code Sprint Wrap Up agriffith Wed, 05/05/2021 - 19:16
Body

This past week we had the pleasure of wrapping up a 2-week code sprint where we accomplished a massive list of things and made great headway toward release! I'd like to take a moment to thank everyone who took part. Y'all turned on the fire hose for contributions and we made out like bandits. 

Here's a breakdown of everything we received:

And as if that weren't enough amazing contributions, we're still sitting on a pile of more very valuable ones that haven't been merged yet, specifically:

I know we were supposed to freeze the code on Monday, but I'd like to give folks some more time to review and respond to feedback on these outstanding issues. We will revisit at the end of the week to wrap things up. It feels foolish to leave such great contributions on the table when they could be part of the software in the upcoming release.

Regardless, when you step back and take a look at the volume and quality of contributions that we received in such a short time frame, it's pretty staggering. I'd like to give a BIG thanks to everyone involved and their employers who graciously allowed them to work on Islandora while on the clock.

  • Eli Zoller (Arizona State University)
  • Willow Gillingham (Arizona State University)
  • Seth Shaw (University of Nevada Las Vegas)
  • Alan Stanley
  • Don Richards (Born Digital)
  • Mark Jordan (Simon Fraser University)
  • Alexander O'Neill (University of Prince Edward Island)
  • Nigel Banks (Lyrasis)
  • Kristina Spurgin (Lyrasis) and the Metadata Interest Group, whose recommendations are being implemented

Thanks a million!

Join us for the Total Cost of Stewardship / #OCLC_TCoS Twitter Chat! / HangingTogether

Talking Black Birds from Noun Project

On May 20th from noon-1pm PDT / 3-4pm EDT / 8-9pm BST, we are hosting a Twitter chat inspired by our recent report Total Cost of Stewardship: Responsible Collection Building in Archives and Special Collections. We’ll discuss your thoughts on the ideas presented in the report, resource-sensitive collecting in archives and special collections, and how you might approach conversations or strategies in your own institution to collect in a way that is resource-sensitive. We really hope you will join us!

I’ll offer details here about the plans for our Twitter chat, as well as some information about how Twitter chats generally work, in case you’re joining one for the first time or could use a refresher.

Questions for Discussion

Over the course of an hour, participants will introduce themselves and then discuss four questions. I’ll be abbreviating Total Cost of Stewardship to TCoS and we’ll be using the hashtag #OCLC_TCoS. I’ll also be hosting the chat, so I’ll tweet out each of the questions, including the question number, and the #OCLC_TCoS hashtag.

These are the questions we’ll discuss over the course of the hour:

  • Question 1: What does collecting look like where you work? Are there clear guiding principles for collecting? Who makes decisions? Are you a part of the decision-making process, and if so, how? #OCLC_TCoS
  • Question 2: After reading the TCoS report, does working in a way that is aligned with the TCoS framework make sense to you? Where do you think the greatest areas of need are at your institution to start working toward that goal? #OCLC_TCoS
  • Question 3: Have you had any conversations with colleagues about TCoS since reading the report, or would you like to? Tell us about the reaction — what you talked about or what you want to talk about? #OCLC_TCoS
  • Question 4: What challenges do you think you might have, and what strategies are you thinking about employing, to move toward a more resource-sensitive approach to collection building and stewardship at your institution? #OCLC_TCoS

How Does a Twitter Chat Work?

Twitter chats are planned conversations that happen via the Twitter platform. The host tweets out a series of questions using an agreed upon hashtag, in our case #OCLC_TCoS. Participants then answer those questions, using the answer number and hashtag. By using the hashtag, everyone can follow along in real time, seeing both questions and answers from anyone participating, and replying to others in the conversation.

Our chat will last an hour. I’ll welcome everyone to the chat and ask those participating to introduce themselves. We’ll then move into the questions outlined above. I’ll pose them one at a time, about 10-12 minutes apart, so you have time to reply and read and respond to others before another question rolls in. I will number the questions and include the hashtag, and anyone responding can number their corresponding answer to make it easier to follow along. So an example would be:

  • Q1: Would you please tell us about your favorite cardigan? #OCLC_TCoS
  • A1: My very favorite is a vintage navy cashmere cardigan that is extra soft and warm. #OCLC_TCoS

You’ll be able to follow along with the chat in real-time by clicking this link. The results there will include all tweets with the #OCLC_TCoS hashtag, allowing you to see our questions and responses by other users as they happen. Alternately, you can search the #OCLC_TCoS hashtag on the Twitter interface to follow along – just make sure you are looking at “Most Recent” tweets to see the full conversation.

You can answer some or all of the questions, or just follow along via the hashtag to see the conversation. The value of the conversation comes from the experience and ideas of the group, so we invite you to jump in and participate as you feel comfortable. This will be my first time hosting a Twitter chat, so if it is your first time participating in one, you will be in good company!

I am really looking forward to this conversation and hope you will join in! If you have any questions, feel free to contact me using the contact information in my profile below.

The post Join us for the Total Cost of Stewardship / #OCLC_TCoS Twitter Chat! appeared first on Hanging Together.

AMIA+DLF Cross-Pollinator Reflection: Justine Thomas / Digital Library Federation

Justine ThomasThis post was written by Justine Thomas, who received the AMIA+DLF Cross-Pollinator Registration Award to attend the virtual AMIA Spring 2021 conference.

Justine (@JustineThomasM) is currently a Digital Programs Contractor at the National Museum of American History (NMAH) focusing on digital asset management and collections information support. Prior to graduating in 2019 with a Master’s in Museum Studies from the George Washington University, Justine worked at NMAH as a collections processing intern in the Archives Center and as a Public Programs Facilitator encouraging visitors to discuss American democracy and social justice issues.


 

“Restoration is the rejuvenation of the soul.” This powerful quote from AMIA’s opening plenary set the tone for the conference’s powerful and engaging lesson of restoration, preservation, and sustainability in the archival field. Over the course of the conference, AMIA, the Association of Moving Image Archivists, hosted several informative sessions about the successes and challenges facing the audiovisual field as well as AMIA’s organizational role in supporting institutions. These sessions included discussions about audiovisual preservation, digitization projects during the pandemic, the state of the archival world ten years in the future, and even virtual film archive tours.

I attended AMIA’s conference as a DLF Cross-Pollinator and was excited to see what other institutions are doing to enable access to digital audiovisual collections, reach diverse audiences, and facilitate meaningful conversations. As an aspiring digital archivist and emerging professional in the field, it was impactful to hear from a wide breadth of moving image archivists at different points in their careers, sharing successful preservation projects. I especially liked learning more about projects such as the Merry Pranksters Audiovisual collection, which was a ten-year endeavor preserving films from the wild road trips and partying days of Ken Kesey, author of One Flew Over the Cuckoo’s Nest. The presenters showed the complete lifecycle of the Merry Pranksters archival collection – from initial preservation efforts, to cataloging description models and the completion of a finding aid, commercial licensing requests, and the Magic Trip documentary produced in 2011 using footage from the collection. While this collection had its share of preservation woes working with fading and rapidly degrading materials, the collections staff’s thoughtful and deliberate conservation efforts resulted in a well-documented and restored film collection that is publicly accessible for generations to come.

Aside from discussing the essential and timely process of restoring moving images, one of the most poignant points the conference emphasized was the lack of diversity in the archival field and how organizations like AMIA can and should encourage meaningful change in the profession. From the opening plenary until the closing session of the conference, it was clear that AMIA wanted to highlight the importance of working through uncomfortable and difficult conversations to encourage true dialogue supporting diversity, equity, inclusion, and access. Speakers like Pamela Vizner and Michael Pazmino led the powerful opening discussion about confronting the widespread issues of inequity in the underfunded field of audiovisual preservation, which puts restrictions on who can afford to be in the archival profession. This inequity creates borders in moving image archiving, which only adds to the difficulty underprivileged individuals face when breaking into the field. To facilitate discussion about these problematic practices, AMIA created Borderlands, specific sessions to showcase conversations on how the field of moving image archiving can operate across borderlands, in hopes of becoming a more collaborative and equitable profession in the future.

One of the most notable examples of collaboration across borderlands was the screening of the film Star Wars: A New Hope, dubbed in the Navajo language. Manuelito Wheeler of the Navajo Nation and former director of the Navajo Nation Museum discussed the process of creating the dubbed film to promote and preserve the Navajo culture and language. He described the critical moment indigenous communities are facing with the rapid extinction of native languages and the immense cultural loss the Navajo Nation would face if the language is not saved. The ability to showcase a film as culturally significant as Star Wars in the Navajo language was an immense success and taught generations of Navajo individuals the fundamentals of the Navajo language and the necessity of preservation efforts.

These imperative presentations about borderlands and inequity in the profession were the most powerful aspects of my conference experience. They showed that even in a virtual environment, organizations could actively attempt to highlight the problematic and systemic issues in the field and offer spaces to discuss the changes professional organizations can make and projects institutions can take on to support diverse communities. I hope to attend subsequent AMIA conferences and look forward to hearing more about future borderlands conversations and opportunities for collaboration.

The post AMIA+DLF Cross-Pollinator Reflection: Justine Thomas appeared first on DLF.

Next-generation metadata and the semantic continuum / HangingTogether

OCLC metadata discussion series

Many thanks to Titia van der Werf, OCLC, for writing the majority of this blog post.  

On 13 April 2021, the Closing Plenary webinar took place to synthesize the OCLC Research Discussion Series on Next Generation Metadata. Senior OCLC colleagues Andrew Pace (Technical Research), Rachel Frick (Research Library Partnership), and John Chapman (Metadata Services) were invited to respond and react to the findings. 

The series 

The “Transitioning to the Next Generation of Metadata” report served as background reading and inspiration for the eight virtual round table discussions that were held in six different European languages, throughout the month of March. The main discussion question: “How do we make the transition to the next generation of metadata happen at the right scale and in a sustainable manner, building an interconnected ecosystem, not a garden of silos?” struck a chord with most participants and made for lively conversations. 

In total, 86 participants from 17 countries – mostly Europe, but also a few from the Middle East and North Africa – contributed to the round table discussions. They represented 71 different institutions, of which 48 came from university libraries. Hosting the round tables in local languages was conducive to having deep and meaningful interactions, and that was much appreciated. Attendees found the discussions inspiring, they liked comparing experiences, the rich reflections, and the free format of the conversations. Many found the sessions of 90 minutes too short, and asked for follow up sessions. Some thought it would be good to have more decision makers at the table, and other stakeholders. A summary of each discussion has been shared via posts on HangingTogether.org (the blog of OCLC Research), and all are available in the language of the session as well: 

  1. First English round table: Towards a critical mass of interoperable library data 
  1. Italian round table: Interoperability, Sustainability and More 
  1. Second English round table: Silos and other challenges 
  1. French round table: The challenge lies in managing multiple, co-existing ‘right scales’ 
  1. German round table: Formats, contexts and deficits 
  1. Spanish round table: Managing researcher identities is top of mind 
  1. Third English round table: Investing in the utility of authorities and identifiers 
  1. Dutch round table: Think bigger than NACO and WorldCat 

Predominant conversation threads 

All the sessions started with a mapping exercise. We asked attendees to place next generation metadata projects they were aware of on a 2×2 matrix characterizing three different application areas: 1) bibliographic data and the supply chain, 2) cultural heritage data and 3) research information management (RIM) data and scholarly communications. The collage of the eight maps that came out of the round tables shows that cultural heritage data projects and bibliographic data projects were predominant, reflecting the focus and expertise of the attendees. There were few RIM projects on the maps. All the maps showed interesting clusters of post-it notes relating to persistent identifiers (PIDs) – such as ISNI, ORCID, VIAF, DOI – which demonstrated the importance attributed to them by participants. 

Image: Collage of the next generation metadata maps from the 8 round table discussions Collage of the next generation metadata maps from the 8 round table discussions

Collaborating to produce and publish authority data 

From the conversations, we learned that libraries are strongly investing in the transformation and publication of their authority files (both the name authorities and the subject headings) to leverage them as next generation metadata. The open government data policies are driving this focus. We also heard that these policies are inciting collaboration at the national level to maximize the benefits of centralization, normalization, and efficiency of data production and publication. The collaboration between the two largest producers of library data in France, BnF and Abes, as mentioned in the French session, is an exemplar of this. 

Achieving a critical mass of authority data 

In an effort to achieve critical mass, libraries are intentionally feeding external systems with their authority data – e.g., the University’s Research Portal, the ORCID-database, Wikidata, etc. They are also embedding PIDs in the value chain – this is particularly true for libraries and bibliographic agencies that act as registration agencies for identifiers, such as ISNI for example, and who operate in the context of their national bibliography and legal deposit tasks. Some of them, like the British Library, pro-actively encourage the adoption of ISNIs upstream in publishers’ metadata records and downstream though reconciliation with VIAF and LC/NACO files – so that the ISNIs become part of the libraries’ cataloging workflow. 

Where to let go of control 

At several sessions, we heard concern about the large numbers of cultural heritage projects that create separate and dedicated ontologies and vocabularies, which then remain isolated and are of limited value to others. There were many observations about duplication of efforts and the reluctance to refer to data that has already been defined by others. The key question is: Where to let go of control and where to focus and control? The control issue is also one of organization and governance. There is a growing sense of the need to negotiate with the different bibliographic data stakeholders and parties to agree on who does what in this newly emerging ecosystem.  

How to participate in the connecting game?  

We heard that harvesting, cleaning, normalizing, reconciliating, and transforming data at scale – what aggregators do – is still important during the transition period, but libraries want the enriched data or at least the identifiers to flow back to them, so they can participate in the connecting game. They also believe that decentralizing the workflow would allow to better leverage local expertise. There was much enthusiasm about the many opportunities of linked data, like the ability to connect different languages, to link to more granular information than authority files provide, and to automate subject indexing and name entity disambiguation with the help of AI technologies. 

Managing multiple scales into a semantic continuum 

Finally, there is no such thing as one “right scale” for doing linked data. There are many different reasons that justify the choices institutions make for the scale of their workflows: local expertise, efficiency, convenience, national policies, consortial economies of scale, differences between humanities, social science, and hard science data, etc. 

The semantic continuumThe semantic continuum

Andrew Pace described the challenge of managing these multiple scales as “bridging the effort between the short tail and the long tail”, in other words, between scaled effort and localized domain and collection expertise. He explained that to achieve a ‘semantic continuum’:  

We balance large, shared, homogenous collections and data, while accounting for a myriad of de-centralized and heterogenous collections. We improve machine-learning and scaled reconciliation with the necessary tools for the dedicated knowledge work that happens in libraries. We can start in the big spaces involving person names and bibliographic works, while acknowledging and preparing for the more difficult work ahead like concepts and aboutness. And we can prepare for the pending paradigm shift that comes with blending bibliographic and authority work together and the challenges of balancing object description with an increase in contextual description. And across this continuum, we know that a large centralized infrastructure is needed and that custom applications will enhance the effort.” 

Ongoing professional development and training needs 

With the paradigm shift, we need to prepare for a new kind of knowledge work in metadata management, discovery, and access in libraries. During the round table discussions, one thread running through all the conversations was transitioning from the old to the new, or rather, the question of “How to build the new while still having to maintain the old and the established?” We know that the systems and services required are not ready and we know that there will be ongoing professional development and training needs.  

Rachel Frick answered to the skills need and distinguished between the need for 1) practitioners skilled to implement, 2) managers understanding the opportunities, and 3) leaders recognizing the priority. She pointed to OCLC’s programs that support library metadata upskilling needs and active learning, namely: 

  • WebJunction Course Catalog, which offers library specific courses and webinar recordings, for free, to all library workers and volunteers; 
  • OCLC Research Library Partnership Metadata Managers Focus Group, which offers an opportunity to engage with peers who are responsible for creating and managing metadata; and 
  • OCLC Community Center, which offers a community space for exchanging on cataloging and metadata issues and practices. 

Q&A on the OCLC Shared Entity Management Infrastructure 

During the round table discussions and the closing plenary webinar, participants shared their expectations, interest and questions about the Shared Entity Management Infrastructure (SEMI).  

John Chapman took the opportunity to provide some additional insights on aspects of interest to the participants. The goal of SEMI is to address infrastructure needs identified by libraries during past efforts such as Project Passage and the CONTENTdm Linked data pilot, and in conversations with the OCLC Research Library Partnership.  

To make library linked data workflows more effective, and to deliver on both sides of the “semantic continuum” that Andrew Pace described, OCLC has been building a new infrastructure. This effort is funded in part by a grant from the Andrew W. Mellon Foundation.  A first version will be operational by the end of 2021, with plans to explore integration with other OCLC services and applications next.  

To respond to questions concerning the business model, John explained that, in their grant award, the Mellon Foundation specified that OCLC provide free access to data, while also providing valuable services that earn the revenue required to keep the infrastructure sustainable. To that end, OCLC will be publishing entity data as linked open data, l also be providing subscription access to user interfaces and APIs to work with the data. As with VIAF, there will be public facing information on each identifier, so libraries can have a common reference for the entity URIs. 

Continuing the conversation 

It has been delightful to organize the OCLC Research Discussion Series on next generation metadata with such inspiring participation. We received invitations to organize follow up conversations on this topic regionally. You can also revisit all the content from the series, by going to the event page

We also plan to repeat the OCLC Research Series in the EMEA region next year, on another topic. So, stay tuned and thank you all for your contributions! 

The post Next-generation metadata and the semantic continuum appeared first on Hanging Together.

Sandwich 🥪 / Ed Summers

Last night I noticed Darius Kazemi post about a little web page that displays a random sandwich from a list of Sandwiches on Wikipedia. It tickled me, probably because the page is so simple (view the source), and also because the results were so delightfully varied, within such a narrow domain. Also, I guess I was hungry…

It just so happens that for some things at work I’ve been thinking about Wikidata a bit more recently. For the past year or so I’ve been lurking on the mailing list and in various get togethers put on by the LD4-Wikidata Affinity Group–who are doing some great work connecting Linked Data to the actual work of libraries, museums and archives. I also attended a really inspiring event from Rhizome this week about the relaunch of their Artbase which uses Wikibase, which is the open source software that powers Wikidata. They are usually really good about putting their online events up on their Vimeo channel so keep an eye out for it there if you are interested.

So, naturally, it seemed like a fun little Friday evening project to adapt Darius’ JavaScript to pull the sandwiches from Wikidata using their query service. You can see the result here.

https://edsu.github.io/sandwich/

My initial motivation was to see if the query service’s CORS headers were set up properly to allow HTTP traffic from the browser (they are) and to test my ability to craft a good enough SPARQL query. But it also turned out to be an interesting exercise in how to send people over to Wikidata to improve things, since the descriptions of sandwiches in Wikidata are not as good as the ones found in Wikipedia proper, and you can link directly to a page to edit them in Wikidata.

Crafting the SPARQL query was quite a bit easier than I thought it would be because they have a useful Examples tab, where I typed sandwiches and up came the start of a query I could work with:

There are currently 345 of these example queries and you can even add your own by editing this wiki page. After a few minutes of noodling around I arrived at this query:

SELECT ?item ?itemLabel ?itemDescription ?countryLabel ?image
WHERE
{
  ?item wdt:P279 wd:Q28803.
  ?item wdt:P495 ?country.
  ?item wdt:P18 ?image.
  SERVICE wikibase:label {bd:serviceParam wikibase:language "en".}
}
LIMIT 1000

This wasn’t my first dance with SPARQL, so I won’t pretend this was easy. But it’s really nice to be able to be able to tweak the query examples and see what happens. It seems obvious that a query by example query language would have, well, examples. But Wikidata have done this in a very thoughtful way. It is really nice to be able to query information like this live from the client, so that users of your app can see the latest data that’s available, not some static snapshot of it.

But of course that comes with risks too, if the service is offline, if the metadata structure changes, if the entity is defaced or deleted. None of these things really matter for this toy page, but could for something more solid. I’ve got a bit more to say about this, but I’m going to save it for another post.

Fedora Migration Paths and Tools Project Update: April 2021 / DuraSpace News

This is the seventh in a series of monthly updates on the Fedora Migration Paths and Tools project – please see last month’s post for a summary of the work completed up to that point. This project has been generously funded by the IMLS.

Born Digital has set up both staging and production servers for the Whitman College pilot team, including the new theme and configurations. This has allowed the grant team to begin running sample ingests using Islandora Workbench. The samples are composed of representative content selected from the Whitman College Islandora 7 instance, which will allow the team to verify that everything is working as intended before proceeding to production migrations. Based on the sample content the team is also able to proceed with a review of the functional requirements noting where there are any gaps.

The University of Virginia pilot team is ready to begin validating their migrated content using the migration validation utility. However, the current migration is taking longer than expected due to additional checksum validation that was introduced in the Beta version of migration-utils along with slow transfer speeds to their AWS instance. The team will run a parallel migration with checksum validation disabled to benchmark the performance differences for the community. Fortunately, checksums can still be validated after a migration completes by using the migration validation utility.

Based on the grant project work so far an online Fedora 3.x to Fedora 6.0 migration workshop has been scheduled for mid-June. This workshop, pitched at an intermediate technical level, will provide an overview of the migration-utils and migration validation utility along with an opportunity to try these tools hands-on. This will be a free workshop but Fedora members will have an opportunity to register in advance of the public registration announcement. A recording of the workshop will be made publicly available afterward. Join the Fedora mailing list to get announcements about this and other events, and join as a member to support our work.

Next month we hope to complete test migrations on the Whitman College staging server and move on to migrating collections onto the production server. We’ll also conduct a review of functional requirements and identify any significant gaps. Meanwhile, the University of Virginia pilot team will use the validation utility to verify that their Fedora 6.x content has been faithfully migrated from their Fedora 3.x repository before loading the data into a Fedora 6.0 Beta instance to test against their functional requirements.

Stay tuned for future updates!

The post Fedora Migration Paths and Tools Project Update: April 2021 appeared first on Duraspace.org.

Venture Capital Isn't Working / David Rosenthal

I was an early employee at three VC-funded startups from the 80s and 90s. All of them IPO-ed and two (Sun Microsystems and Nvidia) made it into the list of the top 100 US companies by market capitalization. So I'm in a good position to appreciate Jeffrey Funk's must-read The Crisis of Venture Capital: Fixing America’s Broken Start-Up System. Funk starts:
Despite all the attention and investment that Silicon Valley’s recent start-ups have received, they have done little but lose money: Uber, Lyft, WeWork, Pinterest, and Snapchat have consistently failed to turn profits, with Uber’s cumulative losses exceeding $25 billion. Perhaps even more notorious are bankrupt and discredited start-ups such as Theranos, Luckin Coffee, and Wirecard, which were plagued with management failures, technical problems, or even outright fraud that auditors failed to notice.

What’s going on? There is no immediately obvious reason why this generation of start-ups should be so financially disastrous. After all, Amazon incurred losses for many years, but eventually grew to become one of the most profitable companies in the world, even as Enron and WorldCom were mired in accounting scandals. So why can’t today’s start-ups also succeed? Are they exceptions, or part of a larger, more systemic problem?
Below the fold, some reflections on Funk's insightful analysis of the "larger, more systemic problem".

Funk introduces his argument thus:
In this article, I first discuss the abundant evidence for low returns on VC investments in the contemporary market. Second, I summarize the performance of start-ups founded twenty to fifty years ago, in an era when most start-ups quickly became profitable, and the most successful ones rapidly achieved top-100 market capitalization. Third, I contrast these earlier, more successful start-ups with Silicon Valley’s current set of “unicorns,” the most successful of today’s start-ups. Fourth, I discuss why today’s start-ups are doing worse than those of previous generations and explore the reasons why technological innovation has slowed in recent years. Fifth, I offer some brief proposals about what can be done to fix our broken start-up system. Systemic problems will require systemic solutions, and thus major changes are needed not just on the part of venture capitalists but also in our universities and business schools.

Is There A Problem?

Funk's argument that there is a problem can be summarized thus:
  • The returns on VC investments over the last two decades haven't matched the golden years of the proceeding two decades.
  • In the golden years startups made profits.
  • Now they don't.

VC Returns Are Sub-Par

Source
This graph from a 2020 Morgan Stanley report shows that during the 90s the returns from VC investments greatly exceeded the returns from public equity. But since then the median VC return has been below that of public equity. This doesn't reward investors for the much higher risk of VC investments. The weighted average VC return is slightly above that of public equity because, as Funk explains:
a small percentage of investments does provide high returns, and these high returns for top-performing VC funds persist over subsequent quarters. Although this data does not demonstrate that select VCs consistently earn solid profits over decades, it does suggest that these VCs are achieving good returns.
It was always true that VC quality varied greatly. I discussed the advantages of working with great VCs in Kai Li's FAST Keynote:
Work with the best VC funds. The difference between the best and the merely good in VCs is at least as big as the difference between the best and the merely good programmers. At nVIDIA we had two of the very best, Sutter Hill and Sequoia. The result is that, like Kai but unlike many entrepreneurs, we think VCs are enormously helpful.
One thing that was striking about working with Sutter Hill was how many entrepreneurs did a series of companies with them, showing that both sides had positive experiences.

Startups Used To Make Profits

Before the dot-com boom, there used to be a rule that in order to IPO a company, it had to be making profits. This was a good rule, since it provided at least some basis for setting the stock price at the IPO. Funk writes:
There was a time when venture capital generated big returns for investors, employees, and customers alike, both because more start-ups were profitable at an earlier stage and because some start-ups achieved high market capitalization relatively quickly. Profits are an important indicator of economic and technological growth, because they signal that a company is providing more value to its customers than the costs it is incurring.

A number of start-ups founded in the late twentieth century have had an enormous impact on the global economy, quickly reaching both profitability and top-100 market capitalization. Among these are the so-called FAANMG (Facebook, Amazon, Apple, Microsoft, Netflix, and Google), which represented more than 25 percent of the S&P’s total market capitalization and more than 80 percent of the 2020 increase in the S&P’s total value at one point—in other words, the most valuable and fastest-growing compa­nies in America in recent years.
Funk's Table 2 shows the years to profitability and years to top-100 market capitalization for companies founded between 1975 and 2004. I'm a bit skeptical of the details because, for example, the table says it took Sun Microsystems 6 years to turn a profit. I'm pretty sure Sun was profitable at its 1986 IPO, 4 years from its founding.

Note Funk's stress on achieving profitability quickly. An important Silicon Valley philosophy used to be:
  • Success is great!
  • Failure is OK.
  • Not doing either is a big problem.
The reason lies in the Silicon Valley mantra of "fail fast". Most startups fail, and the costs of those failures detract from the returns of the successes. Minimizing the cost of failure, and diverting the resource to trying something different, is important.

Unicorns, Not So Much

What are these unicorns? Wikipedia tells us:
In business, a unicorn is a privately held startup company valued at over $1 billion. The term was coined in 2013 by venture capitalist Aileen Lee, choosing the mythical animal to represent the statistical rarity of such successful ventures.
Back in 2013 unicorns were indeed rare, but as Wikipedia goes on to point out:
According to CB Insights, there are over 450 unicorns as of October 2020.
Unicorns are breeding like rabbits, but the picture Funk paints is depressing:
In the contemporary start-up economy, “unicorns” are purportedly “disrupting” almost every industry from transportation to real estate, with new business software, mobile apps, consumer hardware, internet services, biotech, and AI products and services. But the actual performance of these unicorns both before and after the VC exit stage contrasts sharply with the financial successes of the previous generation of start-ups, and suggests that they are dramatically overvalued.

Figure 3 shows the profitability distribution of seventy-three unicorns and ex-unicorns that were founded after 2013 and have released net income and revenue figures for 2019 and/or 2020. In 2019, only six of the seventy-three unicorns included in figure 3 were profitable, while for 2020, seven of seventy were.
Hey, they're startups, right? They just need time to become profitable. Funk debunks that idea too:
Furthermore, there seems to be little reason to believe that these unprofitable unicorn start-ups will ever be able to grow out of their losses, as can be seen in the ratio of losses to revenues in 2019 versus the founding year. Aside from a tiny number of statistical outliers ... there seems to be little relationship between the time since a start-up’s founding and its ratio of losses to revenues. In other words, age is not correlated with profits for this cohort.
Funk goes on to note that startup profitability once public has declined dramatically, and appears inversely related to IPO valuation:
When compared with profitability data from decades past, recent start-ups look even worse than already noted. About 10 percent of the unicorn start-ups included in figure 3 were profitable, much lower than the 80 percent of start-ups founded in the 1980s that were profitable, according to Jay Ritter’s analysis, and also below the overall percentage for start-ups today (20 percent). Thus, not only has profitability dramatically dropped over the last forty years among those start-ups that went public, but today’s most valuable start-ups—those valued at $1 billion or more before IPO—are in fact less profitable than start-ups that did not reach such lofty pre-IPO valuations.
Funk uses electric vehicles and biotech to illustrate startup over-valuation:
For instance, driven by easy money and the rapid rise of Tesla’s stock, a group of electric vehicle and battery suppliers—Canoo, Fisker Automotive, Hyliion, Lordstown Motors, Nikola, and QuantumScape—were valued, combined, at more than $100 billion at their listing. Likewise, dozens of biotech firms have also achieved billions of dollars in market capitalizations at their listings. In total, 2020 set a new record for the number of companies going public with little to no revenue, easily eclipsing the height of the dot-com boom of telecom companies in 2000.
The Alphaville team have been maintaining a spreadsheet of the EV bubble. They determined that there was no way these companies' valuations could be justified given the size of the potential market. Jamie Powell's April 12th Revisiting the EV bubble spreadsheet celebrates their assessment:
At pixel time the losses from their respective peaks from all of the electric vehicle, battery and charging companies on our list total some $635bn of market capitalisation, or a fall of just under 38 per cent. Ouch.

What Is Causing The Problem

This all looks like too much money chasing too few viable startups, and too many me-too startups chasing too few total available market dollars.

Funk starts his analysis of the causes of poor VC returns by pointing to the obvious one, one that applies to any successful investment strategy. Its returns will be eroded over time by the influx of too much money:
There are many reasons for both the lower profitability of start-ups and the lower returns for VC funds since the mid to late 1990s. The most straightforward of these is simply diminishing returns: as the amount of VC investment in the start-up market has increased, a larger proportion of this funding has necessarily gone to weaker opportunities, and thus the average profitability of these investments has declined.
But the effect of too much money is even more corrosive. I'm a big believer in Bill Joy's Law of Startups — "success is inversely proportional to the amount of money you have". Too much money allows hard decisions to be put off. Taking hard decisions promptly is key to "fail fast".

Nvidia was an example of this. The company was founded in one of Silicon Valley's recurring downturns. We were the only hardware company funded in that quarter. We got to working silicon on a $2.5M A round. Think about it — each of our VCs invested $1.25M to start a company currently valued at $380,000M. Despite delivering ground-breaking performance, as I discussed in Hardware I/O Virtualization, that chip wasn't a success. But it did allow Jen-Hsun Huang to raise another $6.5M. He down-sized the company by 2/3 and got to working silicon of the highly successful second chip with, IIRC, six weeks' money left in the bank.

Funk then discusses a second major reason for poor performance:
A more plausible explanation for the relative lack of start-up successes in recent years is that new start-ups tend to be acquired by large incumbents such as the faamng companies before they have a chance to achieve top 100 market capitalization. For instance, YouTube was founded in 2004 and Instagram in 2010; some claim they would be valued at more than $150 billion each (pre-lockdown estimates) if they were independent companies, but instead they were acquired by Google and Facebook, respectively.18 In this sense, they are typical of the recent trend: many start-ups founded since 2000 were subsequently acquired by faamng, including new social media companies such as GitHub, LinkedIn, and WhatsApp. Likewise, a number of money-losing start-ups have been acquired in recent years, most notably DeepMind and Nest, which were bought by Google.
But he fails to note the cause of the rash of acquisitions, which is clearly the total Lack Of Anti-Trust Enforcement in the US. As with too much money, the effects of this lack are more pernicious than at first appears. Again, Nvidia provides an example.

Just like the founders and VCs of Sun, when we started Nvidia we knew that the route to an IPO and major return on investment involved years and several generations of product. So, despite the limited funding and with the full support of our VCs, we took several critical months right at the start to design an architecture for a family of successive chip generations based on Hardware I/O Virtualization. By ensuring that the drivers in application software interacted only with virtual I/O resources, the architecture decoupled the hardware and software release cycles. The strong linkage between them at Sun had been a consistent source of schedule slip.

The architecture also structured the implementation of the chip as a set of modules communicating via an on-chip network. Each module was small enough that a three-person team could design, simulate and verify it. The restricted interface to the on-chip network meant that, if the modules verified correctly, it was highly likely that the assembled chip would verify correctly.

Laying the foundations for a long-term product line in this way paid massive dividends. After the second chip, Nvidia was able to deliver a new chip generation every 6 months like clockwork. 6 months after we started Nvidia, we knew over 30 other startups addressing the same market. Only one, ATI, survived the competition with Nvidia's 6-month product cycle.

VCs now would be hard to persuade that the return on the initial time and money to build a company that could IPO years later would be worth it when compared to lashing together a prototype and using it to sell the company to one of the FAANMGs. In many cases, simply recruiting a team that could credibly promise to build the prototype would be enough for an "aqui-hire", where a FAANMG buys a startup not for the product but for the people. Building the foundation for a company that can IPO and make it into the top-100 market cap list is no longer worth the candle.

But Funk argues that the major cause of lower returns is this:
Overall, the most significant problem for today’s start-ups is that there have been few if any new technologies to exploit. The internet, which was a breakthrough technology thirty years ago, has matured. As a result, many of today’s start-up unicorns are comparatively low-tech, even with the advent of the smartphone—perhaps the biggest technological breakthrough of the twenty-first century—fourteen years ago. Ridesharing and food delivery use the same vehicles, drivers, and roads as previous taxi and delivery services; the only major change is the replacement of dispatchers with smartphones. Online sales of juicers, furniture, mattresses, and exercise bikes may have been revolutionary twenty years ago, but they are sold in the same way that Amazon currently sells almost everything. New business software operates from the cloud rather than onsite computers, but pre-2000 start-ups such as Amazon, Google, and Oracle were already pursuing cloud computing before most of the unicorns were founded.
Remember, Sun's slogan in the mid 80s was "The network is the computer"!

Virtua Fighter on NV1
In essence, Funk argues that succssful startups out-perform by being quicker than legacy companies to exploit the productivity gains made possible by a technological discontinuity. Nvidia was an example of this, too. The technological discontinuity was the transition of the PC from the ISA to the PCI bus. It wasn't possible to do 3D games over the ISA bus, it lacked the necessary bandwidth. The increased bandwidth of the first version of the PCI bus made it just barely possible, as Nvidia's first chip demonstrated by running Sega arcade games at full frame rate. The advantages startups have against incumbents include:
  • An experienced, high-quality team. Initial teams at startups are usually recruited from colleagues, so they are used to working together and know each other's strengths and weaknesses. Jen-Hsun Huang was well-known at Sun, having been the application engineer for LSI Logic on Sun's first SPARC implementation. The rest of the initial team at Nvidia had all worked together building graphics chips at Sun. As the company grows it can no longer recruit only colleagues, so usually experiences what at Sun was called the "bozo invasion".
  • Freedom from backwards compatibility constraints. Radical design change is usually needed to take advantage of a technological discontinuity. Reconciling this with backwards compatibility takes time and forces compromise. Nvidia was able to ignore the legacy of program I/O from the ISA bus and fully exploit the Direct Memory Access capability of the PCI bus from the start.
  • No cash cow to defend. The IBM-funded Andrew project at CMU was intended to deploy what became the IBM PC/RT, which used the ROMP, an IBM RISC CPU competing with Sun's SPARC. The ROMP was so fast that IBM's other product lines saw it as a threat, and insisted that it be priced not to under-cut their existing product's price/performance. So when it finally launched, its price/performance was much worse than Sun's SPARC-based products, and it failed.
Funk concludes this section:
In short, today’s start-ups have targeted low-tech, highly regulated industries with a business strategy that is ultimately self-defeating: raising capital to subsidize rapid growth and securing a competitive position in the market by undercharging consumers. This strategy has locked start-ups into early designs and customer pools and prevented the experimentation that is vital to all start-ups, including today’s unicorns. Uber, Lyft, DoorDash, and GrubHub are just a few of the well-known start-ups that have pursued this strategy, one that is used by almost every start-up today, partly in response to the demands of VC investors. It is also highly likely that without the steady influx of capital that subsidizes below-market prices, demand for these start-ups’ services would plummet, and thus their chances of profitability would fall even further. In retrospect, it would have been better if start-ups had taken more time to find good, high-tech business opportunities, had worked with regulators to define appropriate behavior, and had experimented with various technologies, designs, and markets, making a profit along the way.
But, if the key to startup success is exploiting a technological discontinuity, and there haven't been any to exploit, as Funk argues earlier, taking more time to "find good, high-tech business opportunities" wouldn't have helped. They weren't there to be found.

How To Fix The Problem?

Funk quotes Charles Duhigg skewering the out-dated view of VCs:
For decades, venture capitalists have succeeded in defining themselves as judicious meritocrats who direct money to those who will use it best. But examples like WeWork make it harder to believe that V.C.s help balance greedy impulses with enlightened innovation. Rather, V.C.s seem to embody the cynical shape of modern capitalism, which too often rewards crafty middlemen and bombastic charlatans rather than hardworking employees and creative businesspeople.
And:
Venture capitalists have shown themselves to be far less capable of commercializing breakthrough technologies than they once were. Instead, as recently outlined in the New Yorker, they often seem to be superficial trend-chasers, all going after the same ideas and often the same entrepreneurs. One managing partner at SoftBank summarized the problem faced by VC firms in a marketplace full of copycat start-ups: “Once Uber is founded, within a year you suddenly have three hundred copycats. The only way to protect your company is to get big fast by investing hundreds of millions.”
VCs like these cannot create the technological discontinuities that are the key to adequate returns on investment in startups:
we need venture capitalists and start-ups to create new products and new businesses that have higher productivity than do existing firms; the increased revenue that follows will then enable these start-ups to pay higher wages. The large productivity advantages needed can only be achieved by developing breakthrough technologies, like the integrated circuits, lasers, magnetic storage, and fiber optics of previous eras. And different players—VCs, start-ups, incumbents, universities—will need to play different roles in each in­dustry. Unfortunately, none of these players is currently doing the jobs required for our start-up economy to function properly.

Business Schools

Success in exploiting a technological discontinuity requires understanding of, and experience with, the technology, its advantages and its limitations. But Funk points out that business schools, not being engineering schools, need to devalue this requirement. Instead, they focus on "entrepreneurship":
In recent decades, business schools have dramatically increased the number of entrepreneurship programs—from about sixteen in 1970 to more than two thousand in 2014—and have often marketed these programs with vacuous hype about “entrepreneurship” and “technology.” A recent Stanford research paper argues that such hype about entrepreneurship has encouraged students to become entrepreneurs for the wrong reasons and without proper preparation, with universities often presenting entrepreneurship as a fun and cool lifestyle that will enable them to meet new people and do interesting things, while ignoring the reality of hard and demanding work necessary for success.
One of my abiding memories of Nvidia is Tench Coxe, our partner at Sutter Hill, perched on a stool in the lab playing the "Road Rash" video game about 2am one morning as we tried to figure out why our first silicon wasn't working. He was keeping an eye on his investment, and providing a much-needed calming influence.

Focus on entrepreneurship means focus on the startup's business model not on its technology:
A big mistake business schools make is their unwavering focus on business model over technology, thus deflecting any probing questions students and managers might have about what role technological breakthroughs play and why so few are being commercialized. For business schools, the heart of a business model is its ability to capture value, not the more important ability to create value. This prioritization of value capture is tied to an almost exclusive focus on revenue: whether revenues come from product sales, advertising, subscriptions, or referrals, and how to obtain these revenues from multiple customers on platforms. Value creation, however, is dependent on technological improvement, and the largest creation of value comes from breakthrough technologies such as the automobile, microprocessor, personal computer, and internet commerce.
The key to "capturing value" is extracting value via monopoly rents. The way to get monopoly rents is to subsidize customer acquisition and buy up competitors, until the customers have no place to go. This doesn't create any value. In fact once the monopolist has burnt through the investor's money they find they need a return that can only be obtained by raising prices and holding the customer to ransom, destroying value for everyone.

It is true a startup that combines innovation in technology with innovation in business has an advantage. Once more, Nvidia provides an example. Before starting Nvidia, Jen-Hsun Huang had run a division of LSI Logic that traded access to LSI Logic's fab for equity in the chips it made. Based on this experience on the supplier side of the fabless semiconductor business, one of his goals for Nvidia was to re-structure the relationship between the fabless company and the fab to be more of a win-win. Nvidia ended up as one of the most successful fabless companies of all time. But note that the innovation didn't affect Nvidia's basic business model — contract with fabs to build GPUs, and sell them to PC and graphics board companies. A business innovation combined with technological innovation stands a chance of creating a big company; a business innovation with no technology counterpart is unlikely to.

Research

Funk assigns much blame for the lack of breakthrough technologies to Universities:
University engineering and science programs are also failing us, because they are not creating the breakthrough technologies that America and its start-ups need. Although some breakthrough technologies are assembled from existing components and thus are more the responsibility of private companies—for instance, the iPhone—universities must take responsibility for science-based technologies that depend on basic research, technologies that were once more common than they are now.
Note that Funk accepts as a fait accompli the demise of corporate research labs, which certainly used to do the basic research that led not just to Funk's examples of "semiconductors, lasers, LEDs, glass fiber, and fiber optics", but also, for example, to packet switching, and operating systems such as Unix. As I did three years ago in Falling Research Productivity, he points out that increased government and corporate funding of University research has resulted in decreased output of breakthrough technologies:
Many scientists point to the nature of the contemporary university research system, which began to emerge over half a century ago, as the problem. They argue that the major breakthroughs of the early and mid-twentieth century, such as the discovery of the DNA double helix, are no longer possible in today’s bureaucratic, grant-writing, administration-burdened university. ... Scientific merit is measured by citation counts and not by ideas or by the products and services that come from those ideas. Thus, labs must push papers through their research factories to secure funding, and issues of scientific curiosity, downstream products and services, and beneficial contributions to society are lost.
Funk's analysis of the problem is insightful, but I see his ideas for fixing University research as simplistic and impractical:
A first step toward fixing our sclerotic university research system is to change the way we do basic and applied research in order to place more emphasis on projects that may be riskier but also have the potential for greater breakthroughs. We can change the way proposals are reviewed and evaluated. We can provide incentives to universities that will encourage them to found more companies or to do more work with companies.
Funk clearly doesn't understand how much University research is already funded by companies, and how long attempts to change the reward system in Universities have been crashing into the rock comprised of senior faculty who achieved their position through the existing system.

He is more enthusiastic but equally misled about how basic research in corporate labs could be revived:
One option is to recreate the system that existed prior to the 1970s, when most basic research was done by companies rather than universities. This was the system that gave us transistors, lasers, LEDs, magnetic storage, nuclear power, radar, jet engines, and polymers during the 1940s and 1950s. ... Unlike their predecessors at Bell Labs, IBM, GE, Motorola, DuPont, and Monsanto seventy years ago, top university scientists are more administrators than scientists now—one of the greatest mis­uses of talent the world has ever seen. Corporate labs have smaller administrative workloads because funding and promotion depend on informal discussions among scientists and not extensive paperwork.
Not understanding the underlying causes of the demise of corporate research labs, Funk reaches for the time-worm nostrums of right-wing economists, "tax credits and matching grants":
We can return basic research to corporate labs by providing much stronger incentives for companies—or cooperative alliances of companies—to do basic research. A scheme of substantial tax credits and matching grants, for instance, would incentivize corporations to do more research and would bypass the bureaucracy-laden federal grant process. This would push the management of detailed technological choices onto scientists and engineers, and promote the kind of informal discussions that used to drive decisions about technological research in the heyday of the early twentieth century. The challenge will be to ensure these matching funds and tax credits are in fact used for basic research and not for product development. Requiring multiple companies to share research facilities might be one way to avoid this danger, but more research on this issue is needed.
In last year's The Death Of Corporate Research Labs I discussed a really important paper from a year earlier by Arora et al, The changing structure of American innovation: Some cautionary remarks for economic growth, which Funk does not cite. I wrote:
Arora et al point out that the rise and fall of the labs coincided with the rise and fall of anti-trust enforcement:
Historically, many large labs were set up partly because antitrust pressures constrained large firms’ ability to grow through mergers and acquisitions. In the 1930s, if a leading firm wanted to grow, it needed to develop new markets. With growth through mergers and acquisitions constrained by anti-trust pressures, and with little on offer from universities and independent inventors, it often had no choice but to invest in internal R&D. The more relaxed antitrust environment in the 1980s, however, changed this status quo. Growth through acquisitions became a more viable alternative to internal research, and hence the need to invest in internal research was reduced.
Lack of anti-trust enforcement, pervasive short-termism, driven by Wall Street's focus on quarterly results, and management's focus on manipulating the stock price to maximize the value of their options killed the labs:
Large corporate labs, however, are unlikely to regain the importance they once enjoyed. Research in corporations is difficult to manage profitably. Research projects have long horizons and few intermediate milestones that are meaningful to non-experts. As a result, research inside companies can only survive if insulated from the short-term performance requirements of business divisions. However, insulating research from business also has perils. Managers, haunted by the spectre of Xerox PARC and DuPont’s “Purity Hall”, fear creating research organizations disconnected from the main business of the company. Walking this tightrope has been extremely difficult. Greater product market competition, shorter technology life cycles, and more demanding investors have added to this challenge. Companies have increasingly concluded that they can do better by sourcing knowledge from outside, rather than betting on making game-changing discoveries in-house.
It is pretty clear that "tax credits and matching grants" aren't the fix for the fundamental anti-trust problem. Not to mention that the idea of "Requiring multiple companies to share research facilities" in and of itself raises serious ant-trust concerns. After such a good analysis, it is disappointing that Funk's recommendations are so feeble.

We have to add inadequate VC returns and a lack of startups capable of building top-100 companies to the long list of problems that only a major overhaul of anti-trust enforcement can fix. Lina Khan's nomination to the FTC is a hopeful sign that the Biden adminstration understands the urgency of changing direction, but Biden's hesitation about nominating the DOJ's anti-trust chief is not.

Watch Samvera Virtual Connect 2021 / Samvera

Recordings are now available for all sessions from Samvera Virtual Connect 2021

Videos from each day are available on YouTube with bookmarks to take you directly to each session. You can also find links to individual presentations and accompanying slides on the program page. The sessions and materials will be added to the Samvera Community Repository in the coming days.

A big thanks to all of our excellent planners, presenters, hosts, and attendees for an informative two days of sessions!

The post Watch Samvera Virtual Connect 2021 appeared first on Samvera.

The Total Cost of Stewardship Tool Suite / HangingTogether

“Total cost of stewardship” by OCLC Research, from Total Cost of Stewardship: Responsible Collection Building in Archives and Special Collections (https://doi.org/10.25333/zbh0-a044), CC BY 4.0

I recently shared our excitement about a new publication from OCLC Research, Total Cost of Stewardship: Responsible Collection Building in Archives and Special Collections. The publication itself is not just a report, it is body of materials that includes a report, an annotated bibliography of sources relevant to the ideas shared in the report, and a suite of practical tools that you can adapt and implement at your own institution. Today I’d like to share details about the tool suite, and some resources and upcoming opportunities for advice and support for implementing the tools in your own institution.

Central to all of the work is the idea of Total Cost of Stewardship, which we define as: All of the costs associated with building, managing, and caring for collections so they can be used by and useful to the public. Underlying the definition is the understanding that research libraries and cultural heritage institutions hold their archives and special collections in trust for the public, that we uphold a professional value of providing broad and equitable access to rare and unique collections, and the idea that for our collections to be truly valuable, they must be available for use.

Total Cost of Stewardship Framework

In the report, we propose a Total Cost of Stewardship framework, a holistic approach to understanding the resources needed to responsibly acquire and steward archives and special collections. The framework aims to bring together collection management and collection development considerations; and support communication between colleagues in curatorial, administrative, and technical services roles; and ultimately to help you make informed, shared collection building decisions. There are four elements to the framework:

  • Document Collecting Priorities: this element assures everyone what you want to collect
  • Determine Stewardship Capacity: this element advocates a clear assessment of the time, skills, and monetary resources you have available to allocate to collection needs
  • Gather and Share Information: this element supports activities to gather and share information about the impact an acquisition will have on repository staff and operations
  • Make Decisions Together: this element focuses on the ability to make decisions together that bring together a shared understanding about the value a collection might bring and the resources that will be necessary to realize that value
Diagram of the Total Cost of Stewardship Framework “Total cost of stewardship framework” by OCLC Research, from Total Cost of Stewardship: Responsible Collection Building in Archives and Special Collections (https://doi.org/10.25333/zbh0-a044), CC BY 4.0

The Tool Suite

The Total Cost of Stewardship Tool Suite is intended to operationalize this framework. The tools support assessment of the cost and capacity impact of any potential acquisition, consistent discussion of potential value of an acquisition, and communication across roles to share information and responsibility in the collection building process. You can see in this graphic that we’ve listed specific tools that correspond with each element of the framework.

We have produced two kinds of tools:

  • Cost Estimation tools, to facilitate estimation of the tangible costs of addressing collection needs. The Quick Cost Estimator uses existing time estimation models and turns them into actionable spreadsheet, you can use it to get a quick estimate of the time required to catalog or process materials in archives and special collections. The Operational Impact Estimator allows you to lay out institutional staffing and budgetary capacity for collection stewardship activities, and then assess how work on a specific collection or potential acquisition might impact that capacity.
  • Communication tools, to facilitate discussion of both tangible (labor, supply, and other costs) and intangible (research value, community and other relationships, etc) factors that are weighed in collection decisions. They include templates for a Collection Development Policy; an Operational Impact Report, which assesses and outlines the cost, time, labor, skills, and other resources that will need to be dedicated to a collection to steward it effectively and responsibly; an Acquisition Proposal; a Processing Plan; and a Digitization Project Proposal.

All of the tools are intended to be flexible and broadly applicable to many collecting institutions. To support this, we provide a detailed manual for the cost estimation tools and a usage guide for the communication tools that includes guidance for considerations for each template and offers links to examples from different institutions for similar kinds of tools. The templates are maximal in nature and include many factors that may or may not be relevant in different collecting institutions. You will likely want to tailor them to fit your institutional needs, resources, priorities, and workflows.

Support for Using the Tool Suite

We hope that people will adapt the tools for use in their own institutions. We are offering up a number of resources to support you in exploring, adapting, and implementing the tools.

Office Hours: We are hosting four open, virtual office hour sessions over the next few weeks. These will be unstructured time for you to ask any questions and get advice on using and implementing any of the tools in the tool suite. Two of the sessions are open to RLP members only, and two are open to everyone, as indicated below. Please share with colleagues and join us!

  • Thursday, May 6: 8-9am PDT/11am-noon EDT/4-5pm BST. Open to RLP members only *Most convenient time for UK* register here
  • Thursday, May 6: 1-2pm PDT/4-5pm EDT. Open to all register here
  • Tuesday, May 11: 11am-noon PDT/2-3pm EDT. Open to RLP members only register here
  • Wednesday, May 12: 4-5pm PDT/7-8pm EDT/ Thursday May 13 9-10am AEST. Open to all *Most convenient time for Australia and New Zealand* register here

Tutorial Videos: We have created video tutorials for all the tools in the tool suite, which can be viewed at your convenience. 

  • Quick Cost Estimator – includes an overview, step by step instructions, and a walk through of a sample collection.
  • Operational Impact Estimator – includes an overview, step by step instructions, and a walk through of a sample collection. 
  • Communication Tools – includes an overview of each tool and considerations for implementation. 

Webinar: View our recent webinar which introduces the Total Cost of Stewardship framework, gives a brief overview of the tool suite, and offers insight on how they might be implemented and adapted to various institutional contexts, including concrete examples of how these tools have been used at Emory University’s Rose Library. 

Twitter Chat: Join our Twitter chat on May 20 at noon-1pm PDT / 3-4pm EDT / 8-9pm BST #oclc_tcos. Look for a post with more details about questions and how to participate here soon!

The post The Total Cost of Stewardship Tool Suite appeared first on Hanging Together.

856 / Ed Summers

Coincidence?


The #DLFteach Toolkit: Recommending EPUBs for Accessibility / Digital Library Federation

DLF Digital Library Pedagogy LogoThis post was written by Hal Hinderliter, as part of Practitioner Perspectives: Developing, Adapting, and Contextualizing the #DLFteach Toolkit, a blog series from DLF’s Digital Library Pedagogy group highlighting the experiences of  digital librarians and archivists who utilize the #DLFteach Toolkit and are new to teaching and/or digital tools.

The Digital Library Pedagogy working group, also known as #DLFteach, is a grassroots community of practice, empowering digital library practitioners to see themselves as teachers and equip teaching librarians to engage learners in how digital library technologies shape our knowledge infrastructure. The group is open to anyone interested in learning about or collaborating on digital library pedagogy. Join our Google Group to get involved.

 


For this blog post, I’ve opted to provide some background information on the topic of my #DLFteach Toolkit entry: the EPUB (not an acronym) format, used for books and other documents. Librarians, instructors, instructional designers and anyone else who needs to select file formats for content distribution should be aware of what EPUB has to offer!

Electronic books: the fight over formats

The production and circulation of books, journals, and other long-form texts has been radically impacted by the growth of computer-mediated communication. Electronic books (“e-books”) first emerged near a half-century ago as text-only ASCII files, but are now widely available in a multitude of different file formats. Most notably, three competing options have been competing for market dominance: PDF files, KF8 files (for Amazon’s Kindle devices), and the open-source EPUB format. The popularity of handheld Kindle devices has created a devoted fan base for KF8 e-books, but in academia the ubiquitous PDF file remains the most common way to distribute self-contained digital documents. In contrast to these options, a growing movement is urging that libraries and schools eschew Kindles and abandon their reliance on PDFs in favor of the EPUB electronic book format.

The EPUB file format preserves documents as self-contained packages that manage navigation and presentation separately from the document’s reflowable content, allowing users to alter font sizes, typefaces, and color schemes to suit their individual preferences. E-books saved in the EPUB format are compatible with Apple’s iPads and iPhones as well as Sony’s Reader, Barnes & Nobles Nook, and an expansive selection of software applications for desktop, laptop, and tablet computers. Increasingly, that list includes screen reader software such as Voice Dream and VitalSource Bookshelf, meaning that a single file format – EPUB 3 – can be readily accessed by both sighted and visually impaired audiences.

The lineage of EPUB can be traced back to the Digital Audio-based Information System (DAISY), developed in 1994 under the direction of the Swedish Library of Talking Books and Braille. Today, EPUB is an open-source standard that is managed by the International Digital Publishing Forum, part of the W3C. In contrast to the proprietary origins of both PDF and KF8 e-books, modifications to the open EPUB standard have always been subject to public input and debate.

Accessibility in Academia: EPUB versus PDF

Proponents of universal design principles recommend the use of documents that are fully accessible to everyone, including users of assistive technologies, e.g., screen readers and refreshable braille displays. The DTBook format, a precursor to EPUB, was specifically referenced by Rose et al. (2006) in their initial delineation of Universal Design for Learning (UDL) as part of UDL’s requirement for multiple means of presentation. At the time, the assumption was that DTBooks would be distributed only to students who needed accessible texts, with either printed copies or PDF files for sighted learners. Today, however, it is no longer necessary to provide multiple formats, since EPUB 3 (the accessibility community’s preferred replacement for DTBooks) can be used with equal efficacy by all types of students.

In contrast, PDF files can range from completely inaccessible to largely accessible, depending on the amount of effort the publisher expended during the remediation process. PDF files generated from word processing programs (e.g., Microsoft Word) are not accessible by default, but instead require additional tweaks that necessitate the use of Adobe’s Acrobat Pro software (the version of Acrobat that retails for $179 per year). Users of assistive technologies have no recourse but to attempt opening a PDF file before often finding that the document lacks structure (needed for navigation), alt tags, metadata, or other crucial features. Even for sighted learners, PDFs downloaded from their university’s online repository will be difficult to view on smartphones, since PDF’s fixed page dimensions will require endless zooming and scrolling to display each column of text at an adequate font size.

The superior accessibility of EPUB has inspired major publishers to establish academic repositories of articles in EPUB format, e.g., ABC-CLIO, ACLS Humanities, EBSCO E-Books, Proquest’s Ebrary, Elsevier’s ScienceDirect, Taylor & Francis. Many digital-only journals offer their editions as EPUBs. For example, Trude Eikebrokk, editor of Professions & Professionalism, investigated the advantages of publishing in the EPUB format as described in this excerpt from the online journal Code{4}lib:

There are two important reasons why we wanted to replace PDF as our primary e-journal format. PDF is a print format. It will never be the best choice for reading on tablets (e.g. iPad) or smartphones, and it is challenging to read PDF files on e-book readers … We wanted to replace or supplement the PDF format with EPUB to better support digital reading. Our second reason for replacing PDF with EPUB was to alleviate accessibility challenges. PDF is a format that can cause many barriers, especially for users of screen readers (synthetic speech or Braille). For example, Excel tables are converted into images, which makes it impossible for screen readers to access the table content. PDF documents might also lack search and navigation support, due to either security restrictions, a lack of coded structure in text formats, or the use of PDF image formats. This can make it difficult for any reader to use the document effectively and impossible for screen reader users. On the other hand, correct use of XHTML markup and CSS style sheets in an EPUB file will result in search and navigation functionalities, support for text-to-speech/braille and speech recognition technologies. Accessibility is therefore an essential aspect of publishing e-journals: we must consider diverse user perspectives and make universal design a part of the publishing process.

The Future of EPUB

A robust community of accessibility activists, publishers, and e-book developers continues to advance the EPUB specification. The update to EPUB3 added synchronized audio narration, embedded video, MathML equations, HTML5 animations, and Javascript-based interactivity to the format’s existing support for metadata, hyperlinks, embedded fonts, text (saved as XHTML files) and illustrations in both Scalable Vector Graphic (SVG) and pixel-based formats. Next up: the recently announced upgrade to EPUB 3.2, which embraces documents created under the 3.0 standard while improving support for Accessible Rich Internet Applications (ARIA) and other forms of rich media. If you’re ready to join this revolution, have a run through the #DLFteach Toolkit’s EPUB MakerSpace lesson plan!

The post The #DLFteach Toolkit: Recommending EPUBs for Accessibility appeared first on DLF.

Nederlandse ronde tafel sessie over next generation metadata: Denk groter dan NACO en WorldCat / HangingTogether

Met dank aan Ellen Hartman, OCLC, voor het vertalen van de oorspronkelijke Engelstalige blogpost.

Op 8 maart 2021 werd een Nederlandse ronde tafel discussie georganiseerd als onderdeel van de OCLC Research Discussieserie over Next Generation metadata

OCLC metadata discussion series

Bibliothecarissen, met achtergronden in metadata, bibliotheeksystemen, de nationale bibliografie en back-office processen, namen deel aan deze sessie. Hierbij werd een mooie variatie aan academische en erfgoed instellingen in Nederland en België vertegenwoordigd. De deelnemers waren geëngageerd, eerlijk en leverden met hun kennis en inzicht constructieve bijdragen aan een prettige uitwisseling van kennis. 

In kaart brengen van initiatieven 

Kaart van next-gen metadata initiatieven (Nederlandse sessie)

Net als in de andere ronde tafel sessies werden de deelnemers gevraagd om in kaart te helpen brengen wat voor next generation metadata initiatieven er in Nederland en België worden ontplooid. De kaart die daarmee werd gevuld laat zien dat in deze regio een sterke vertegenwoordiging is van bibliografische en erfgoed projecten (zie de linker helft van de matrix). Verschillende next-generation metadata projecten van de Koninklijke Bibliotheek Nederland werden omschreven, zoals:

  • Automatische metadata creatie, waarbij tools voor het taggen en catalogiseren van naam authority records worden geïdentificeerd en getest. 
  • De Entity Finder, een tool die wordt ontwikkeld om RDA entities (personen, werken en expressies) te helpen ontlenen vanuit authorities en bibliografische records. 

De Digitale Erfgoed Referentie Architectuur (DERA) is ontwikkeld als onderdeel van een nationale strategie voor digitaal erfgoed in Nederland. Het is een framework voor het beheren en publiceren van erfgoed informatie als linked open data (LOD), op basis van overeengekomen conventies en afspraken. Het van Gogh Worldwide platform is een voorbeeld van de applicatie van DERA, waar metadata gerelateerd aan de kunstwerken van van Gogh, die in bezit zijn van Nederlandse erfgoed instellingen en in privé bezit worden geaggregeerd.   

Een noemenswaardig in kaart gebracht initiatief op het gebied van Research Informatie Management (RIM) en Scholarly Communications was de Nederlandse Open Knowledge Base. Een in het afgelopen jaar opgestart initiatief binnen de context van de deal tussen Elsevier en VSNU, NFU en NWO om gezamenlijk open science services te ontwikkelen op basis van RIM systemen, Elsevier databases, analytics oplossingen en de databases van de Nederlandse onderzoeksinstellingen. De Open Knowledge Base zal nieuwe applicaties kunnen voeden met informatie, zoals een dashboard voor het monitoren van de sustainable development goals van de universiteiten. Het uitgangspunt van de Knowledge Base is het significant kunnen verbeteren van de analyse van de impact van research. 

Wat houdt ons tegen? 

Ondanks dat er tijdens de sessie innovatieve projecten in kaart werden gebracht, werd er net als in sommige andere sessies, onduidelijkheid gevoeld over hoe we nu verder door kunnen ontwikkelen. Ook was er sprake van enig ongeduld met de snelheid van de transitie naar next generation metadata. Sommige bibliotheken waren gefrustreerd over het gebrek aan tools binnen de huidige generatie systemen om deze transitie te versnellen. Zoals de integratie van Persistant Identifiers (PID), lokale authorities of links met externe bronnen. Meerdere tools moeten gebruiken voor een workflow voelt als een stap terug in plaats van vooruit.  

Buiten praktische belemmeringen werd de discussie vooral gedomineerd door de vraag wat ons tegenhoudt in deze ontwikkeling. Met zoveel bibliografische data die al als LOD gepubliceerd wordt, wat is er dan verder nodig om deze data te linken? Zouden we niet op zoek moeten naar partners om samen een kennis-ecosysteem te ontwikkelen? 

Vertrouwen op externe data 

Een deelnemer gaf aan dat bibliotheken voorzichtig of terughoudend zijn met de databronnen waarmee ze willen linken. Authority files zijn betrouwbare bronnen, waarvoor er nog geen gelijkwaardige alternatieven bestaan in het zich nog ontwikkelende linked data ecosysteem. Het gebrek aan conventies voor de betrouwbaarheid is misschien een reden waarom bibliotheken misschien wat terughoudend zijn in het aangaan van linked data partnerschappen of terug deinzen voor het vertrouwen op externe data, zelfs van gevestigde bronnen als Wikidata. Want, het linken naar een databron is een indicatie van vertrouwen en een erkenning van de datakwaliteit. 

Het gesprek ging vervolgens verder over linked datamodellen. Welke data creëer je zelf? Hoe geef je je data vorm en link je met andere data? Sommige deelnemers gaven aan dat er nog steeds een gebrek aan afspraken en duidelijkheid is over concepten zoals een “werk”. Anderen gaven aan dat het vormgeven van concepten precies is waar linked data om draait en dat meerdere onthologieën naast elkaar kunnen bestaan. In andere woorden, het is misschien niet nodig om de naamgeving in harde standaarden te vatten.

“Er is geen uniek semantisch model. Wanneer je verwijst naar gegevens die al door anderen zijn gedefinieerd, geef je de controle over dat stukje informatie op, en dat kan een mentale barrière zijn tegen het op de juiste manier werken met linked data. Het is veel veiliger om alle data in je eigen silo op te slaan en te beheren. Maar op het moment dat je dat los kunt laten, kan de wereld natuurlijk veel rijker worden dan je in je eentje ooit kunt bereiken.” 

Oefenen met denken in linked data 

Het gesprek ging verder met een discussie over wat we kunnen doen om bibliotheekmedewerkers die catalogiseren te trainen. Een van de deelnemers vond dat het handig zou zijn om te beginnen met ze te leren te denken in linked dataconcepten en om te oefenen met het opbouwen van een knowledge graph en het experimenteren met het bouwen van verschillende structuren. Net als dat een kind dat doet door met LEGO te spelen. De deelnemers waren het erover eens dat we op dit moment nog te weinig kennis hebben van de mogelijkheden en de consequenties van het gebruik van linked data.

“We moeten leren onszelf te zien als uitgevers van metadata, zodat anderen het kunnen vinden – maar we hebben geen idee wie de anderen zijn, we moeten zelfs groter denken dan de NACO van de Library of Congress of WorldCat. We hebben het niet langer over de records die we maken, maar over stukjes records die uniek zijn, want veel komt al van elders. We moeten ons dit realiseren en onszelf afvragen: wat is onze rol in het grotere geheel? Dit is erg moeilijk om te doen!” 

De deelnemers gaven aan dat het erg belangrijk was om deze discussie binnen hun bibliotheek op gang te brengen. Maar hoe doe je dat precies? Het is een groot onderwerp en het zou mooi zijn als daar vanuit het management ook aandacht voor is. 

Niet relevant voor mijn bibliotheek 

Een leidinggevende binnen de deelnemersgroep reageerde hierop en gaf aan:

“Het valt me op dat de hoeveelheid bibliotheken die hier nog echt mee te maken hebben kleiner wordt. (…) [In mijn bibliotheek] produceren we nauwelijks zelf nog metadata. (…) Als we kijken naar wat we zelf nog produceren is dat bijvoorbeeld nog het beschrijven van foto’s van een studentenvereniging, eigenlijk niets dus. Metadata is eigenlijk alleen nog een onderwerp voor een kleine groep specialisten.” 

Hoe provocerend deze observatie ook was, dit weerspiegelt wel een realiteit die we moeten erkennen en tegelijkertijd in perspectief moeten plaatsen. Daar was helaas geen tijd voor, want de sessie liep ten einde. Het was zeker een gesprek waar we nog een tijd hadden kunnen doorpraten! 

Over de OCLC Research Discussie Serie over Next Generation Metadata 

In maart 2021 hield OCLC Research een discussiereeks gericht op twee rapporten: 

  1. Transitioning to the Next Generation of Metadata”   
  1. Transforming Metadata into Linked Data to Improve Digital Collection Discoverability: A CONTENTdm Pilot Project”.   

De rondetafelgesprekken werden gehouden in verschillende Europese talen en de deelnemers konden hun eigen ervaringen delen, een beter begrip krijgen van het onderwerp en kregen handvatten om vol vertrouwen plannen te maken voor de toekomst. . 

De plenaire openingssessie opende de vloer voor discussie en verkenning en introduceerde het thema en de bijbehorende onderwerpen. Samenvattingen van alle rondetafelgesprekken worden gepubliceerd op de OCLC Research-blog Hanging Together.

Op de afsluitende plenaire vergadering op 13 april werden de verschillende rondetafelgesprekken samengevat.  

The post Nederlandse ronde tafel sessie over next generation metadata: Denk groter dan NACO en WorldCat appeared first on Hanging Together.

Announcing the winner of the Net Zero Challenge 2021 / Open Knowledge Foundation

Net Zero Challenge logo

Three months ago, the team at Open Knowledge Foundation launched a new project – the Net Zero Challenge – a global competition to identify, promote and support innovative, practical and scalable uses of open data that advance climate action.

We received over 80 applications from 40 different countries. Many of them were excellent.

From the applications, we chose a shortlist of five teams to compete in a live pitch contest event. Each team had three minutes to explain their project or concept, and up to seven minutes to take questions from the Panel of Experts and the live audience.

Watch the recording of the live event here.

Today we are pleased to announce the winner of the Net Zero Challenge 2021 is CarbonGeoScales – a framework for standardising open data for GHG emissions at multiple geographical scales. The project is built by a team of volunteers from France – supported by Data for Good and Open Geo Scales. You can check out their Github here.

The winning prize for the Net Zero Challenge is USD$1000 – which, we are told, will be spent on cloud storage.

A few days ago, we caught up with the team from CarbonGeoScales to learn about who they are, what they are doing and how their project uses open data to advance climate action.

Watch our interview with the CarbonGeoScales team here.

We would like to thank all the teams who competed in the Net Zero Challenge 2021 – especially the four other teams who were all shortlisted for the pitch contest. We are grateful to the Panel of Experts to taking the time to judge the contest – especially to Bonnie Lei from Microsoft’s AI for Earth – who joined at the last minute. Thanks also to Open Data Charter and the Open Data & Innovation Team at Transport for New South Wales for their strategic advice during the development of this project.

The Net Zero Challenge has been supported by our partners Microsoft and the UK Foreign, Commonwealth & Development Office.

Working with Ghost / Lorcan Dempsey

The choice

Working with Ghost

In looking at building a personal site, there seemed to be some very clear tradeoffs, between prefabrication and control, between time spent writing and time spent building. Especially as a secondary goal for me in starting a new web presence is to learn a little more about front-end web technologies, while starting from a low knowledge base (CSS, Handlebars, JavaScript). Some options:

  • WordPress is the most obvious choice. WordPress powers a surprisingly large part of today&aposs web and of course there is a massive ecosystem - hosting sites, website builder layers, themes, plugins and so on. There are multiple choices, options, and additions for everything. I have used WordPress for a long time, admittedly in a simple and work-supported way. And I also had a simple personal WP deployment as a placeholder on this URL for some years. However, I found the sheer volume of WP choices and opportunities a little oppressive and bloated. Of course, I understand one could constrain this and that the ecosystem is evolving but I was interested in looking at more modern approaches. And I just felt like a change.
  • One could use a website builder like the very popular Squarespace. However, I resisted the packaged and closed nature of this option.
  • The most attractive option was a static site builder, like Jekyll, Gatsby or Hugo. The most attractive philosophically, I should say, but maybe not practically. I thought the technical lift might be too great for me, unless I was prepared to invest quite a bit of time, and it did seem to me that I would need pretty active support. In retrospect, maybe I should have spent some time with Jekyll or Hugo given the growing ecosystem available.
  • I wanted to do more than one can do on Medium or Tumblr.
  • I have been interested to see the growing popularity of Substack, but I was not sure I would have the discipline to send out a regular newsletter, at least not now.

In this context, Ghost looked more and more interesting, for a variety of reasons. I was attracted by the non-profit status and culture of Ghost, its commitment to open source, its vision of enabling creativity and writing on the web, and its modern design. I confess I was overinfluenced by these factors, perhaps irrationally so. But most important I was interesting in a good publishing experience coupled with an opportunity to be that bit closer to the building process.

A few things about Ghost:

  1. In describing itself, Ghost foregrounds its role as a writing platform and very much ties itself to an independent writing/journalism/publishing movement. This is perhaps most closely associated in more general discussion with the popularity of Substack, although it was also interesting to see the recent acquisition of Revue by Twitter. (Craig Mod writes: 2020: the year “Substack” entered the realm of “Kleenex.”) Ghost supports this role by providing a very engaging editing interface, the capacity to manage membership and subscription newsletters, and support for paid subscriptions. This was attractive, as it provides a framework for sharing, even when one doesn&apost implement the paid subscription options. While it might be interesting to experiment with the paid newsletter model, we operate in a very small community and I am not very confident that this site would aggregate very much in the way of paid membership :-) Anyway, this is not planned here, although I am interested in the membership functionality to facilitate mail updates and comments. Ghost also emphasizes its support for startups developing a web presence, and there are some nice examples on the list below.
  2. Ghost is also a small and very focused organization (21 employees according to the website), and is also supported by voluntary code contribution. The Ghost Foundation is registered in Singapore, but I did not see anything immediately about a board or governance. It seems still to have that startup energy and culture.
  3. Ghost is a headless CMS, a growing category of provider.  Simply, it focuses on backend content management, exposing content through APIs. It does not focus on frontend presentation. If you have the dev capacity  you can build your own front end. Or you can rely on themes. Ghost ships with Handlebars.js, a templating language which provides a themes layer. There is a default theme, and a small industry of theme creators. More about this below.
Headless CMS - Top Content Management Systems | Jamstack
Check out this showcase of some of the best, open source headless CMSes. This is community-drive so be sure to submit your favorite CMS today!
Working with Ghost
Ghost Customers – A showcase of real sites built with Ghost
Learn about Ghost customers and why thousands of businesses rely on Ghost to build professional publishing websites. #1 CMS for privacy, security and speed.
Working with Ghost

The execution

So, how has it gone?  My remarks here are confined to the hosted version, Ghost(Pro), and also rest on my experiences as a non-initiate single user, rather than somebody with more development experience or an organization with some development capacity. I feel it is important to emphasize this, as I fully understand that the system may not be optimized for my level of expertise. That said, I remain reasonably positive about the overall experience.

Ghost

Ghost emphasizes writing. And it has a very nice editor that really focuses you on the words in front of you. Markdown can be used &aposin the flow&apos and there is unobtrusive elegant support for embedding video, images, raw HTML code, and so on (although it doesn&apost have the extensive options of, say, Notion). It hides the clutter of functionality. On one side of the page, you can open up a dashboard that gives access to general Ghost features, and, on the other, to a dashboard that gives access to publishing, SEO and other options for that page. It is nicely designed.

It usefully includes some things:

  • SSL - although you have to register your domain with a third party, Ghost provides SSL at no additional cost.
  • SEO,CDN, security and speed - reading reviews I was reassured that Ghost was making good design and engineering choices more connected to the future than some other options. See the review below for a brief overview of these areas from a more knowledgable perspective than mine. SEO is integral, not provided through a plugin. You can configure for social media, add keywords, Google snippet text, and so on. I have not experienced how well this works yet, as I have only just published the site. It also sits on a content distribution network, has good security options, and says it does backup (although, apparently you cannot roll the site back to a previous version). It provides a nice option to do a JSON dump of the entire site in a few seconds.
  • Membership and paid options, as noted above. Mail distribution is built into Ghost, although you need to provide your own email to receive notifications/replies. Here is an example of a site with premium membership options deployed.

Ghost suggests that it can be used out of the box by the non-technical writer. This may be mostly true if you are prepared to accept what comes out of the box, and do not need to change the theme. But in practice, there are some areas where you may need to do a little more which exposes you to CSS or Handlebars. There are two design choices that mean that starting up with Ghost is a little more challenging than starting up with, say, Medium or Squarespace. The first is the Ghost development focus on the functional core, creating a need either for third party integrations or for custom theme work to implement desired features. And the second is the reliance on themes rather than any inbuilt design layer.

Third party integrations

The clear focus on core capabilities means that functionality seen as more peripheral becomes literally so, as it is passed to third-party integrations. This is the case with comments, for example, although of course this is now common as the widespread use of Disqus shows. I am experimenting with Cove here, which is built specifically for Ghost and handily sits on top of its membership model. Similarly, contact forms are provided through third parties. I am using Formspree here. Depending on choices, in some cases this will add subscription cost, especially if one wants to avoid the perennial spam problem. Ghost is not a large organization and this discipline does mean that there is a welcome focus on enhancing the core (blogging, membership, subscription). However, it is surprising that search is not natively provided. Typically, this is managed in different ways on a per-theme basis. And in fact, given the volume of posts I have imported, I have run into issues with the theme-provided search - it was hanging when it was trying to search the full text, so I have limited it to titles for the moment. I will need to look at other options - it would be better if this were natively provided.

Some integrations are built in: Unsplash for images, Slack, Stripe for payments, and some more. Others can be managed through Zapier, a third-party integration app. Integrations supported in this way are described here:

Ghost Integrations – Connect your favourite Tools & Apps to your site
Keep your stack aligned and integrate your most used tools & apps with your Ghost site: automation, analytics, marketing, support and much more! 👉
Working with Ghost

There is no support for plugins, so, while you are not tempted to load up on unnecessary plugins, you also may be short of readymade and basic functionality. Philosophically, I find this quite pleasing, focusing on simplicity and the task in hand. However, again from a practical point of view, it means that things that might be simple to do in another environment (pulling out tweetable quotes, for example, or having an archive, organized by month or some other period, or creating a table of contents) have to be custom crafted here. And of course, in turn, this requires some technical knowledge.

Themes

You can build your own theme or website if you are able, and there are some &aposstarter&apos themes around for those who wish to develop their own. It also comes with a default theme (Casper - geddit?). I need to go with an existing theme.

My experience with themes has been very variable. The first I tried turned out not to be developed or supported any more, so I got a refund. I tried a couple for which I had to chase for documentation and responses were delayed or non-existent. However, I also had a very good experience with another developer who was interested in how his theme was being used or could be improved. I settled on Krabi which is more expensive than most (and I notice the price has gone up since I subscribed). It was used in some nice sites, and, at that price, I expected it to be well-supported. Given the subscription price of any individual theme, development needs to be a volume business, if it is not a labor of love. There is naturally a line between implementation support on one side, and customization or development advice on the other, and you are not really paying enough for the latter.

This means that  you have to settle for being constrained by the capacities of your chosen theme, or you have to have enough knowledge to make adaptations yourself if you want to change design or features. Of course, if you are happy with theme choices or do not have additional functional requirements, then it is not a big issue. Although it does mean that you don&apost have the control you might have hoped for. The irony here is that while you may have been attracted to Ghost by the promise of openness and control you may end up being limited by the options supported in the theme you have chosen.

There are occasions when you do need to know a little more. Membership is one. Ghost has a newly developed membership Portal which natively provides elegant membership management functionality. However, this is recent, so similar functionality is built into themes. On this site, I have a signup/in button (the Portal) and signup forms (the theme). I can also link to Portal functionality (e.g. signup here). I could of course do away with one approach but it took me a while to figure out how to hide all pay options, what to show where, and so on. This requires some editing of theme files.

Given its design, Ghost is very flexible, but you do have the development skills to take fuller advantage of its possibilities. If you do have those skills, I can see that it could be quite powerful and rewarding.

Incidentally, the business ecosystem around themes is fascinating: I would be very interested to read an analysis of the economics and distribution of theme development. Themes are created by all sorts, by small companies, solo operators, and sometimes as a sideline to other work. It is very much a global activity. And of course, the WordPress theme building ecosystem is enormous.

Ghost has acquired a small theme developer, and it is bundling free themes with its new lowest subscription tier which does not allow you to customise themes or to use premium (i.e. paid-for) themes.

The learning curve is in the wrong place

Of course, the well-known problem here is that the learning curve is in the wrong place. When you are setting things up is precisely when you need to know what you are doing. And things that become simple with familiarity can be tedious, frustrating or time-consuming when you don&apost know what you are doing.

Making adaptations does require some knowledge of Handlebars and CSS. Copying from other parts of the theme or from other themes, trial and error, and some learning will help move things along. Becoming familiar with the structure of the theme, the classes used, the syntax of Handlebars and CSS, and so on, comes with experience. And of course depending on interest and time, you can engage in some more structured learning. And such learning is truly accompanied by many small pleasures.

I was ridiculously pleased for example to implement the table of contents and even to get a &apossticky&apos version working. However, it would have taken me some time to figure out how to prevent it from floating over the header when scrolling to the top of the post. And then I would have to figure out how to switch it off in mobile versions. So I disabled it. I also did some styling, placing it in the white-box widget characteristic of the theme. I wanted to place it both in the sidebar and at the head of the post (as in the mobile version, sidebar content goes to the bottom). But it would not show in two places and I could not immediately figure out how to make that happen - did I need to run the generating script twice (it is based on a third party script), or something else, or was it just not possible? Also, you may notice that if there are no headings in the post, the white TOC box continues to appear but is empty. The logic of what I need to do to prevent that is obvious (has no heading, do not include TOC) but after a few guesses at the syntax I moved on.  I will fix it some time in the future, no doubt after stumbling over the answer in another context (leave a comment if you can help :-)).

I did end up wanting to do quite a bit of reformatting. But given that I was learning as I was going, it is somewhat inelegant, to say the least. A glance at the Inspect Panel will quickly show this! I have probably over-loaded the code injection facility (the ability to adapt styling etc at the point of page/post creation). Although I have certainly made changes to the theme, I did not want to make too many, as, when it is updated, I have to reconcile my changes and the new version. And indeed, deploying an updated theme is somewhat tedious if you have edited existing theme files. Reconciliation is a time-consuming manual process.  There is certainly a tradeoff between how much you want to implement your own style choices, and how much you are happy to go with the theme.

And of course, if one makes local changes to the theme without being very familiar with its overall structure, then predictably enough one may create unanticipated issues elsewhere.

With more time and experience I would hope to clean things up, and to do things more efficiently and elegantly.

There are a couple of things in particular I would like to see:

  • The ability to present archive pages for blog entries - not individually listed (this won&apost scale with the number of entries I have, over many years), but rather a chronological list of months and a link to a page for each month which shows the list of posts for that month.
  • The ability to send out a monthly digest, which gives title/excerpt of each post in that month. The default behavior is that it is possible to send each post to your members. A digest would be nice.

I did not see a theme which did these things for me. And I understand that Ghost is certainly flexible enough to support that functionality. However, it is currently beyond my capability.

I should note that Ghost does point to some experts who can provide for-fee development services, and there are occasional enquiries on the Forum looking for freelance design or development input.

One drawback of the general themes environment is that you have to buy before you try. Indeed, one of the reasons I began using Krabi was that it offered a refund if things did not work out in early stages. But this is not common. This means that the initial choice may be based on surface style. One does begin to develop better cues - is there online documentation and what does it look like, for example. However, it would also be good to be able to assess the clarity of the file structure of the theme, check whether there is helpful commenting, and so on.

All that said, I have enjoyed working on it. And I understand that if I had only a small bit of additional knowledge, the story might be quite different.

The curse of knowledge

Ghost support is very responsive and friendly. There is a forum for discussion of technical questions, which does not get hyper-active. There is extensive documentation, a little confusingly divided between developer and general guidance. There is a refreshingly personal nature to some of this, given the scale and culture of Ghost. It is a welcome contrast to the scripted customer support we have become used to, sometimes tied to an upsell agenda.

However, the curse of knowledge is a very real factor here, which is not really surprising - maybe? - given the demographic of Ghost users, who, I imagine, lean more technical than, say, a general blogging audience?

Nor are theme developers really set up to hand-hold their unsophisticated subscribers through the learning journey. Ditto in the case of some third-party applications (e.g. Cove, where my chat questions about implementation have gone unanswered).

You are suffering from the curse of knowledge when you know things that the other person does not and you have forgotten what it’s like to not have this knowledge. This makes it harder for you to identify with the other person’s situation and explain things in a manner that is easily understandable to someone who is a novice. // Lifehack

The signal of the curse of knowledge is that little word &aposjust&apos, which often crops up in answers to questions.  "Just insert this line in the theme." "Just run this script." "Just adapt the CSS."

As I describe in the next section, I have imported almost two thousand blog entries from another environment. I would like to be able to provide a chronological overview archive of these, showing the entries by month, for example. This is not a very unusual feature in blogging environments but is not available in the theme I am using,  or natively in Ghost. I did ask about it in the Forum and the response was that it was straightforward to build using the Ghost API. Well, yes ....

The future

I have moved 15 years of so of work over from my work blog (over 1,900 entries). Even with some help, there were glitches. I have cosmetically upgraded entries that were widely referenced or influential, and tagged them classics. I may add more. And time permitting, I will fix formatting in other posts that might get some attention. (I do have a redirection issue - internal links in the WP blog resolve to an archive entry of the post, which has a different URL. With some help I am working on that.)

Even though it may continue to exist elsewhere, I wanted to bring the archive under my own control. I have created a separate &aposcollection&apos for them so that they are under their own particular path and will redirect from the original. A &aposrecord&apos of the major contributions exists as the book of the blog which does ensure some durability.

The book of the blog
Not very many blogs end up as books. This one did!
Working with Ghost

Ghost is an evolving system, so some things will get easier. However, there is something very attractive about its spareness, and the design choices it has made. I do feel that a little more help could be provided though - if it is to be more attractive to the individual blogger. And remember, that has been my perspective here, understanding that there are other types of user. I mentioned the table of contents and chronological archive above. I was aware in the transfer of entries from the other site that Ghost did not have some of the bulk editing tools of WordPress - the ability to merge tags for example.

I have enjoyed my Ghost experience. And after many years it has been interesting to peek under the hood again, being exposed to brackets in curly and angled varieties. It has been been nice, if occasionally frustrating,  figuring out how to do things and seeing the changes ramify through the web version. Even if it takes a while for things to work or you have to unpick mistakes, or you can&apost figure out what to do. And  you know that you are tangling things up a bit.

Maybe there is something a little sentimental about this, but if you want to focus on writing, are prepared to do without bells and whistles, and if you want to learn, it is quite satisfying.  

Coda - after Ghost 4 release

So, the above was largely written a little while ago, before the release of Ghost 4. This introduced evolved functionality, some design changes, a rebranding, and was generally greeted positively. It clearly represents a major commitment by the small team. Developments are described here:

Ghost 4.0
Almost exactly 8 years ago, we announced the first prototype of Ghost onKickstarter. Today, over 20,000 commits later we’re releasing Ghost 4.0[https://ghost.org/features/], the latest major version of the product, as wellas small refresh of our brand. If you’re short on time, here are the abbre…
Working with Ghost

The design refresh has changed the simple arrangement which appealed to me above, changing how things are presented, adding links to Ghost materials, moving some functionality to a settings section, and so on. However, the editing experience remains engaging and they have added a very useful preview feature, showing what the post will look like on different devices. They have added a cheaper hosting option for those that are prepared to use one of a set of free themes without additional customization.

They have also moved the membership and subscription Portal UI into production making it a core part of the platform. There has been some discussion about this from those that do not want to use the functionality and would like easily to switch it off. I am sure something will be done to address this in time.

What I found most interesting in the discussion was the reaffirmation of the strategic centrality to Ghost of the paid subscription model, and the focus on the "creator economy." From the Changelog:

Memberships, subscriptions and the creator economy as a whole is a noisy space right now. Many companies building products in this space weren&apost around 8 years ago, and most won&apost be around 8 years from now. We will be.
Decentralised, open source technology is methodical, powerful, and inevitable.

I don&apost have any insight into Ghost&aposs numbers. Or into how their users break down across the range from personal to startup/corporate (without membership) to unpaid and paid independent creators.

The future of that independent creator economy is certainly pretty interesting and something I might talk about again. At the same time, I hope that Ghost continues to be an effective platform for the rest of us. I admire the achievement of the Ghost team. I have grown to like it and enjoy working with it ... even in my limited way!

Picture: I took the feature picture at Ruskin Park in Camberwell, London.

Open Data Day 2021 – it’s a wrap / Open Knowledge Foundation

Open Data Day 2021 event flyers

On Saturday 6th March 2021, the eleventh Open Data Day took place with people around the world organising over 300 events to celebrate, promote and spread the use of open data.

Thanks to the generous support of this year’s mini-grant funders –Microsoft, UK Foreign, Commonwealth and Development Office, Mapbox, Global Facility for Disaster Reduction and Recovery, Latin American Open Data Initiative, Open Contracting Partnership and Datopian – the Open Knowledge Foundation offered more than 60 mini-grants to help organisations run online or in-person events for Open Data Day.

We captured some of the great conversations across Asia/the Pacific, Europe/Middle East/Africa and the Americas using Twitter Moments.

Below you can discover all the organisations supported by this year’s scheme as well as seeing photos/videos and reading their reports to help you find out how the events went, what lessons they learned and why they love Open Data Day:

Environmental data

  • Code for Pakistan
    • A hack day to open and publish the block coordinates of the plantation conducted during the billion tree tsunami in Pakistan
    • Read event report
  • DRM Africa (Democratic Republic of the Congo)
    • Preventing vulnerable communities from river floods through risk data collection, analysis and communication
    • Read event report
  • Escuela de Fiscales (Argentina)
    • Our goal is to show the community and other civil society organizations the importance of open data in preserving and caring for the environment, and the urgency of taking action against climate change and pollution, and how open data can improve public politics with the participation of citizens
    • Read event report
  • Government Degree College Bemina,J and K Higher Education (India)
    • Make the community aware about the availability and benefits of environmental data for addressing environmental concerns in Kashmir Valley
    • Read event report
  • Future Lab (Mexico)
    • Engage with the local community and enable citizen participation through the use of open data for the proposal of cleaner and more sustainable public policies
    • Read event report
  • IUCN (Switzerland)
    • We will talk about the PANORAMA initiative and web platform, which allows conservation practitioners to share and reflect on their experiences, increase recognition for successful work, and to learn with peers how similar challenges have been addressed around the globe
    • Read event report
  • Mijas Multimedia (Democratic Republic of the Congo)
    • Strengthen the community resilience to the rapid rise of Lake Tanganyika through the use of open data
    • Read event report
  • Niger Delta SnapShots (Nigeria)
    • Use open data to uncover hidden threats damaging Nigerian mangrove and demonstrate the necessity for urgent action to save Nigerian Mangrove
    • Read event report
  • Open Knowledge Nepal
    • Organise a datathon that will bring open data enthusiasts to work on the real-time air quality data and Twitter bot enhancement, so that people can use the service and get informed with the recent situations of air quality in their surroundings
    • Read event report
  • PermaPeople (Germany)
    • Present and discuss the importance and challenges of collecting and sharing open source data on plants and growing to assist in the growth of the regenerative movement
    • Read event report
  • Zanzibar Volunteers for Environmental Conservation (Tanzania)
    • The main goal is to contribute to open data initiatives by helping the students understand more about open data and environmental issues
    • Read event report

Tracking public money flows

Open mapping

  • DIH Slovenia
    • Disseminating existing open mapping solutions, sharing best practices and discussion of possibilities for improving life in communities through open mapping
    • Read event report
  • Federal University of Bahia (Brazil)
    • Strengthen a global network of community data collectors from communities, organisations, as well as academic institutions by 1) focusing on sharing experiences from specific cases where particular mapping tools were used as part of strategies of community empowerment and 2) using the insights to subsequently co-design a platform to empower data collectors globally
    • Read event report
  • Geoladies PH (Philippines)
    • Since March is International Women’s Month and 31st March is International Transgender Day of Visibility, we would like to hold an event that empowers and engages women (cisgender and transgender) to map out features and amenities (women support desks, breastfeeding stations, gender-neutral comfort rooms, and LGBT safe spaces) and feature lightning talks to highlight women in mapping
    • Read event report
  • GEOSM (Cameroon)
    • Host a “geo-evangelisation”, workshop in the use of JOSM (Java OpenStreetMap ) and GEOSM (the first 100% African open source geolocation platform)
    • Read event report
  • iLabs@Mak Project (Uganda)
    • To understand and value the need of Farmers’ Live Geo Map across food value chain in Africa to better food traceability and security
    • Read event report
  • LABIKS – Latin American Bike Knowledge Sharing
    • To promote and stimulate the sharing of open data about the bike-sharing systems in Latin America and to promote and discuss our online open map, aiming to improve it
    • Read event report
  • Monitor de Femicidios de UTOPIX (Venezuela)
  • Periféria Policy and Research Center
    • Learn about the relevance of open data in collective/critical mapping of gentrification in Hungary
    • Read event report
  • PoliMappers (Italy)
    • Host an introductory mapping event on OpenStreetMap so that students and people interested in collaborating gain the basic skills needed to tackle more advanced tools later in the year
    • Read event report
  • SmartCT (Philippines)
    • Launch the MapaTanda Initiative (a portmanteau of Mapa — which means a map — and Tanda — which can mean an older adult but can also mean remember); which is an initiative that seeks to improve the number and quality of data in OpenStreetMap that are important and relevant to older adults (senior citizens) and the ageing population (60+ years old) in the Philippines
    • Read event report
  • SUZA Youthmappers
    • Create awareness on open data data use, and how the students can use the data in developing innovative web and mobile applications to solve existing challenges in the society
    • Read event report
  • TuTela Learning Network in collaboration with local activists and researchers
    • Start a debate on alternative, community-managed forms of housing in the city of Lisbon based on the model of grant of use and raising awareness on the importance of accessible data on available real estate resources owned by the city
    • Read event report
  • Unificar Ações e Informações Geoespaciais – UAIGeo – Universidade Federal de São João del-Rei (UFSJ) 
    • Disseminate the use and importance of open data to support the solution of territorial tension points, the use of water and the preservation of cultural heritage, as well as providing participants with contacts with collaborative mapping applications
    • Read event report

Data for equal development

  • 254 Youth Policy Cafe (Kenya)
    • Undertake a webinar via the Zoom Platform themed “Leveraging Open Data as an Asset for Inclusive & Sustainable Development in Kenya”
    • Read event report
  • ACCESA (Costa Rica)
    • Explore, map, visualize and disseminate key data about the projects being implemented by the Territorial Councils of Rural Development, the main participatory bodies for fostering rural development in Costa Rica, and assess their progress, the money being spent on them, the results obtained, and their impact in narrowing the many social gaps that currently affect the different rural regions of the country
    • Read event report
  • Afroimpacto
    • Discuss the importance to the black community of the open data discussion
    • Read event report
  • CoST Honduras
    • Present how we can promote sustainable infrastructure by using data disclosed under the Open Contracting for Infrastructure Data Standard and engage citizens and civil society organisations to demand government accountability by using a tool called InfraS
    • Read event report
  • Dados Abertos de Feira (Brazil)
    • Promote and discuss the open data knowledge to our local community (city of Feira de Santana, countryside of Brazil), bringing together the academy, government agents and the society itself
    • Read event report
  • DataFest Tbilisi (Georgia)
    • Highlight and promote the use of data and data-driven products as an effective way to tackle pressing social issues and inequality
    • Read event report
  • Demokrasya (Democratic Republic of the Congo)
    • Raise awareness of the Congolese community especially the women’s rights community on the use of open data in defending the women’s accessibility to employment
    • Read event report
  • Fundación Eduna (Colombia)
    • Develop activities to address the issue of strengthening the capacity for creative thinking of children and young people in the central region of Colombia making use and taking advantage of open data
    • Read event report
  • Gênero e Número (Brazil)
    • Explore open data to get a comprehensive landscape on the labour market for women in Brazil during the pandemic
    • Read event report
  • Girls’ Tech-Changer Community (Cameroon)
    • Show the benefits of open data (such as an increase in efficiency, transparency, innovation, and economic growth) and to encourage the adoption of open data policies in various government bodies, businesses, and civil societies
    • Read event report
  • Hawa Feminist Coalition (Somalia)
    • Advance the production, dissemination and openness of sex-disaggregated data in Somalia in support of evidence-based planning and policy-making as well as tracking of progress by the government and other stakeholders to achieve the Sustainable Development Goals (SDGs)
    • Read event report
  • Hope for Girls and Women Tanzania
    • Teaching community about the benefit of using data for development
    • Read event report
  • International Youth Alliance for Family Planning- TOGO (IYAFP-TOGO)
    • Develop an open map of contraceptive methods and service availability in Agbalepedo area
    • Read event report
  • IPANDETEC (Panama)
    • Train Panamanian women on their current position, role and future in the world of open data
    • Read event report
  • iWatch Africa
    • Demonstrate how equal development within the digital ecosystem in Africa can be improved by leveraging data on online abuse and harassment of female journalists
    • Read event report
  • Kiyita Foundation
    • Encourage local women to get access to data about economic development
    • Read event report
  • Madagascar Initiatives for Digital Innovation
  • Nepal Open Source Klub
    • We will create a glossary of technical terms and words that are commonly used on websites/in software and translate those into Nepali
    • Read event report
  • Nukta Africa (Tanzania)
    • Maximizing the use of open data to increase accountability through data journalism
    • Read event report
  • Programming Historian (Chile)
    • Walk participants through the process of visualising qualitative and quantitative development open data for equal development in Latin America, using open access tools
    • Read event report
  • Punch Up (Thailand)
    • Emphasise what would be lost if we don’t have open data in our country
    • Read event report
  • Rausing Zimbabwe
    • Create a platform and outlet for information distribution, updates and discussion with communities on the issues surrounding peace and security in the age of the pandemic
    • Read event report
  • Vilnius Legal Hackers (Lithuania)

Thanks to everyone who organised or took part in these celebrations and see you next year for Open Data Day 2022!

Need more information?

If you have any questions, you can reach out to the Open Knowledge Foundation’s Open Data Day team by emailing opendataday@okfn.org or on Twitter via @OKFN.

The #DLFteach Toolkit: Participatory Mapping In a Pandemic / Digital Library Federation

DLF Digital Library Pedagogy LogoThis post was written by Jeanine Finn (Claremont Colleges Library), as part of Practitioner Perspectives: Developing, Adapting, and Contextualizing the #DLFteach Toolkit, a blog series from DLF’s Digital Library Pedagogy group highlighting the experiences of  digital librarians and archivists who utilize the #DLFteach Toolkit and are new to teaching and/or digital tools.

The Digital Library Pedagogy working group, also known as #DLFteach, is a grassroots community of practice, empowering digital library practitioners to see themselves as teachers and equip teaching librarians to engage learners in how digital library technologies shape our knowledge infrastructure. The group is open to anyone interested in learning about or collaborating on digital library pedagogy. Join our Google Group to get involved.


See the original lesson plan in the #DLFteach Toolkit.

Our original activity was designed around using a live GoogleSheet in coordination with ArcGIS Online to collaboratively map historic locations for an in-class lesson to introduce students to geospatial analysis concepts. In our example, a history instructor had identified a list of cholera outbreaks with place names from 18th-century colonial reports.

In the original activity, students were co-located in a library classroom, reviewing the historic cholera data in groups. A Google Sheet was created and shared with everyone in the class for students to enter “tidied” data from the historic texts collaboratively. The students then worked with a live link from Google Sheets, allowing the outbreak locations to be served directly to the ArcGIS Online map. It was successful and a useful tool for encouraging engagement and for getting familiar with GIS.

Then COVID-19 in 2020 arrived. Instead of a centuries-distant disease outbreak, students learning digital mapping this past year were thrust into socially-distant instructional settings driven by a contemporary pandemic that radically altered their modes of learning. The collaborative affordances of tools like ArcGIS Online were pressed into service to help students collaborate effectively and meaningfully in real-time while learning from home.

As an example, one geology professor at Pomona College encouraged her students to explore the geology of their local environment. Building on shared readings and lectures on geologic history and rock formations, students were encouraged to research the history of the land around them, and include photographs, observations, and other details to enrich the ArcGIS StoryMap. The final map included photographs and geology facts from students’ home locations around the world.

Geology of places we live: Group projects for Module 1 "Geology of the solid Earth" in GEOL 20E. 1, Pomona College, September 29, 2020Header for Geology class group StoryMap at Pomona College, Fall 2020

 

A key feature of the ArcGIS StoryMap platform that appealed to the instructor was the ability for the students to work collaboratively on the platform itself — not across shared files on folders on Box, GSuite, the LMS, etc. While this functioned reasonably well, there were several roadblocks to effective collaboration that we encountered along the way. Most of the challenges related to permissions settings related to ArcGIS Online administration, as the “shared update” features are not set as default permissions. Other challenges included file size limitations for images the students wished to upload, the inability of more than one user to edit the same file simultaneously, and potential security issues (including firewalls) in nations with more restrictive internet laws.

Reflecting on these uses of StoryMaps over this past semester, we encourage instructors and library staff interested in to:

  1. Review user license permissions and best practices for ArcGIS StoryMap collaboration from Esri (some links below).
  2. Plan ahead to help students with collecting appropriate images, including discussions of file size and copyright.
  3. Encourage the instructor to coordinate student groups with defined roles and responsibilities to lessen the likelihood of multiple editors working on the same StoryMap at once (which can cause corruption of the files.
  4. Get clarity from IT and other support staff as needed to determine if students are working remotely from countries that may have restrictions on internet use.

 

Resources:

Participatory Mapping with Google Forms, Google Sheets, and ArcGIS Online (Esri community education blog): https://community.esri.com/t5/education-blog/participatory-mapping-with-google-forms-google-sheets-and-arcgis/ba-p/883782

Optimize group settings to share stories like never before (Esri ArcGIS blog): https://www.esri.com/arcgis-blog/products/story-maps/constituent-engagement/optimize-group-settings-to-share-stories-like-never-before/

Teach with Story Maps: Announcing the Story Maps Curriculum Portal (University of Minnesota, U-Spatial: https://research.umn.edu/units/uspatial/news/teach-story-maps-announcing-story-maps-curriculum-portal

Getting Started with ArcGIS StoryMaps (Esri): https://storymaps.arcgis.com/stories/cea22a609a1d4cccb8d54c650b595bc4

VI Conclusion recommendations

Gather materials ahead of time. Photographs from digital archives, maps
There may be data cleaning issues.

The post The #DLFteach Toolkit: Participatory Mapping In a Pandemic appeared first on DLF.

What Is The Point? / David Rosenthal

During a discussion of NFTs, Larry Masinter pointed me to his 2012 proposal The 'tdb' and 'duri' URI schemes, based on dated URIs. The proposal's abstract reads:
This document defines two URI schemes.  The first, 'duri' (standing
for "dated URI"), identifies a resource as of a particular time.
This allows explicit reference to the "time of retrieval", similar to
the way in which bibliographic references containing URIs are often
written.

The second scheme, 'tdb' ( standing for "Thing Described By"),
provides a way of minting URIs for anything that can be described, by
the means of identifying a description as of a particular time.
These schemes were posited as "thought experiments", and therefore
this document is designated as Experimental.
As far as I can tell, this proposal went nowhere, but it raises a question that is also raised by NFTs. What is the point of a link that is unlikely to continue to resolve to the expected content? Below the fold I explore this question.

I think there are two main reasons why duri: went nowhere:
  • The duri: concept implies that Web content in general is not static, but it is actually much more dynamic than that. Even the duri: specification admits this:
    There are many URIs which are, unfortunately, not particularly
    "uniform", in the sense that two clients can observe completely
    different content for the same resource, at exactly the same time.
    Personalization, advertisements, geolocation, watermarks, all make it very unlikely that either several clients accessing the same URI at the same time, or a single client accessing the same URI at different times, would see the same content.
  • When this proposal was put forward in 2012, it was competing with a less elegant but much more useful competitor that had been in use for 16 years. The duri: specificartion admits that:
    There are no direct resolution servers or processes for 'duri' or
    'tdb' URIs. However, a 'duri' URI might be "resolvable" in the sense
    that a resource that was accessed at a point in time might have the
    result of that access cached or archived in an Internet archive
    service. See, for example, the "Internet Archive" project
    But the duri: URI doesn't provide the information needed to resolve to the "cached or archived" content. The Internet Archive's Wayback Machine uses URIs which, instead of the prefix duri:[datetime]: have the prefix https://web.archive.org/web/[datetime]/. This is more useful, both because browsers will actually resolve these URIs, and because they resolve to a service devoted to delivering the content of the URI at the specified time.
The competition for duri: was not merely long established, but also actually did what users presumably wanted, which was to resolve to the content of the specified URL at the specified time.

It is true that a user creating a Wayback Machine URL, perhaps using the "Save Page Now" button, would preserve the content accessed by the Wayback Machine's crawler. which might be different from that accessed by the user themselves. But the user could compare the two versions at the time of creation, and avoid using the created Wayback Machine URL if the differences were significant. Publishing a Wayback Machine URL carries an implicit warranty that the creator regarded any differences as insignificant.

The history of duri: suggests that there isn't a lot of point in "durable" URIs lacking an expectation that they will continue to resolve to the original content. NFTs have the expectation, but lack the mechanism necessary to satisfy the expectation.

Recognizing bias in research data – and research data management / HangingTogether

As the COVID pandemic grinds on, vaccinations are top of mind. A recent article published in JAMA Network Open examined whether vaccination clinical trials over the last decade adequately represented various demographic groups in their studies. According to the authors, the results suggested they did not: “among US-based vaccine clinical trials, members of racial/ethnic minority groups and older adults were underrepresented, whereas female adults were overrepresented.” The authors concluded that “diversity enrollment targets should be included for all vaccine trials targeting epidemiologically important infections.”

Dr. Tiffany Grant

My colleague Rebecca Bryant and I recently enjoyed an interesting and thought-provoking conversation with Dr. Tiffany Grant, Assistant Director for Research and Informatics with the University of Cincinnati Libraries (an OCLC Research Library Partnership member) on the topic of bias in research data. Dr. Grant neatly summed up the issue by observing that data collected should be inclusive of all the groups who are impacted by outcomes. As the JAMA article illustrates, that is clearly not always the case – and the consequences can be significant for decision- and policy-making in critical areas like health care.

The issue of bias in research data has been acknowledged for some time; for example, the launch of the Human Genome Project in the late 1990s/early 2000s helped raise awareness of the problem, as did observed differences in health care outcomes across demographic groups. And efforts are underway to help remedy some of the gaps. One initiative, the US National Institutes of Health’s All of Us Research Program, aims to build a database of health data collected from a diverse cohort of at least one million participants. The rationale for the project is clearly laid out: “To develop individualized plans for disease prevention and treatment, researchers need more data about the differences that make each of us unique. Having a diverse group of participants can lead to important breakthroughs. These discoveries may help make health care better for everyone.”

Extrapolation of findings observed in one group to all other groups often leads to poor inferences, and researchers should take this into account when designing data collection strategies. The peer review process should act as a filter for identifying research studies that overlook this point in their design – but how well is it working? As in many other aspects of our work and social lives, unconscious bias may play a role here: lack of awareness of the problem on the part of reviewers means that studies with flawed research designs may slip through.

And that leads us to what Dr. Grant believes is the principal remedy for the problem of bias in research data: education. Researchers need training that helps them recognize potential sources of bias in data collection, as well as understand the implications of bias for interpretation and generalization of their findings. The first step in solving a problem is to recognize that there is a problem. Some disciplines are further along than others in addressing bias in research data, but in Dr. Grant’s view, there is still ample scope for raising awareness across campus about this topic.

Academic libraries can help with this, by providing workshops and training programs, and gathering relevant information resources. At the University of Cincinnati, librarians are often embedded in research teams, providing an excellent opportunity to share their expertise on this issue. Raising awareness about bias in research data is also an opportunity to partner with other campus units, such as the office of research, colleges/schools, and research institutes (for more information on how to develop and sustain cross-campus partnerships around research support services see our recent OCLC Research report on social interoperability).

Many institutions are currently implementing Equality, Diversity, and Inclusion (EDI) training, and modules addressing bias in research data might be introduced as part of EDI curricula for researchers. This could also be an area of focus for professional development programs supporting doctoral, postdoctoral, and other early-career researchers. It seems that many EDI initiatives focus on issues related to personal interactions or recruiting more members of underrepresented groups into the field. For researchers, it may be useful to supplement this training with additional programs that focus on EDI issues as they specifically relate to the responsible conduct of research. In other words, how do EDI-related issues manifest in the research process, and how can researchers effectively address them? A great example is the training offered by We All Count, a project aimed at increasing equity in data science.

Funders can also contribute toward mitigating bias in research data, by issuing research design guidelines on inclusion of underrepresented groups, and by establishing criteria for scoring grant proposals on the basis of how well these guidelines are addressed. The big “carrots and sticks” wielded by funders are a powerful tool for both raising awareness and shifting behaviors.

Bias in research data extends to bias in research data management (RDM). Situations where access to and ability to use archived data sets is not equitable is another form of bias. While it is good to mandate that data sets be archived under “open” conditions, as many funders already do, the spirit of the mandate is compromised if the data sets are put into systems that are not accessible and usable to everyone. It is important to recognize that the risk of introducing bias into research data exists throughout the research lifecycle, including curation activities such as data storage, description, and preservation.

Our conversation focused on bias in research data in STEM fields – particularly medicine – but the issue also deserves attention in the context of the social sciences, as well as the arts and humanities. Our summary here highlights just a sample of the topics worthy of discussion in this area, with much to unpack in each one. We are grateful to Dr. Grant for starting a conversation with us on this important issue and look forward to continuing it in the future as part of our ongoing work on RDM and other forms of research support services.

Like so many other organizations, OCLC is reflecting on equity, diversity, and inclusion, as well as taking action. Check out an overview of that work, and explore efforts being undertaken in OCLC’s Membership and Research Division. Thanks to Tiffany Grant, Rebecca Bryant, and Merrilee Proffitt for providing helpful suggestions that improved this post!

The post Recognizing bias in research data – and research data management appeared first on Hanging Together.

Distributing DEI Work Across the Organization / Tara Robertson

I enjoyed being a guest on Seed&Spark‘s first monthly office hours session where Stefanie Monge, Lara McLeod and I talked about distributing diversity, equity and inclusion work across organizations.

Here’s some of the work that I mentioned:

The post Distributing DEI Work Across the Organization appeared first on Tara Robertson Consulting.

Thoughts on NACOs proposed process on updating CJK records / Terry Reese

I would like to take a few minutes and share my thoughts about an updated best practice recently posted by the PCC and NACO related to an update on CJK records. The update is found here: https://www.loc.gov/aba/pcc/naco/CJK/CJK-Best-Practice-NCR.docx. I’m not certain if this is active or a simply a proposal, but I’ve been having a number of private discussions with members at the Library of Congress and the PCC as I’ve been trying to understand the genesis for this policy change. I personally believe that formally adopting a policy like this would be exceptionally problematic, and I wanted to flesh out my thoughts on why and some potential better options that could fix the issue that this problem is attempting to solve.

But first, I owe some folks an apology. In chatting with some folks at LC (because, let’s be clear, this proposal was created specifically because there are local, limiting practices at LC that artificially are complicating this work) – it came to my attention that the individuals that spent a good deal of time considering and creating this proposal have received some unfair criticism – and I think I bare a lot of responsibility for that. I have done work creating best practices and standards and its thankless, difficult work. Because of that, in cases where I disagree with a particular best practice, my preference has been to address those privately and attempt to understand and share my issues with a set of practices. This is what I have been doing related to this work. However, on the MarcEdit list (a private list), when a request was made related to a feature request in MarcEdit to support this work – I was less thoughtful in my response as the proposed change could fundamentally undo almost a decade of work as I have dealt with thousands of libraries stymied by these kinds of best practices that have significant unintended consequences. My regret is that I’ve been told that my thoughts shared on the MarcEdit list, have been used by others in more public spaces to take this committee’s work to task. This is unfortunate and disappointing, and something I should have been more thoughtful of in my responses on the MarcEdit list. Especially, given that every member of that committee is doing this work as a service to the community. I know I forget that sometimes. So, to the folks that did this work – I’ve not followed (or seen) any feedback you may have received, but in as much that I’m sure I played a part in any push back you may have received, I’m sorry.

What does this problem seek to solve?

If you look at the proposal, I think that the writers do a good job identifying the issue. Essentially, this issue is unique to authority records. At present, NACO still requires that records created within the program only utilize UTF8 characters that fall within the MARC-8 repertoire. OCLC, the pipeline for creating these records, enforces this rule by invalidating records with UTF8 characters outside the MARC8 range. The proposal seeks to address this by encouraging the use of NRC (Numeric Character Reference) data in UTF8 records, to work around these normalization issues.

So, in a nutshell, that is the problem, and that is the proposed solution. But before we move on, let’s talk a little bit about how we got here. This problem currently exists because of, what I believe to be, an extremely narrow and unproductive read of what MARC8 repertoire actually means. For those not in Libraries, MARC8 is essentially a made-up character encoding, used only in libraries, that has so outlived its usefulness. Modern systems have largely stopped supporting it outside of legacy ingest workflows. The issue is that for every academic library or national library that has transitioned to UTF8, hundreds of small libraries or organizations around the world have not. MARC8 continues to exist because the infrastructure that supports these smaller libraries is built around it.

But again, I think it is worth thinking about today, what actually is the MARC8 repertoire. Previously, this had been a hard set of defined values. But really, that changed in 2004ish when LC updated guidance and introduced the concept of NRCs to preserve lossless data transfer between systems that were fully UTF8 compliant and older MARC8 systems. NRCs in MARC8 were workable, because it left local systems the ability to handle (or not handle) the data as it seen fit and finally provided an avenue for the Library community as a whole to move on from the limitations MARC8 was imposing on systems. It allowed for the facilitation of data into non-MARC formats that were UTF8 compliant and provided a pathway to allow data from other metadata formats, the ability to reuse that data in MARC records. I would argue that today, the MARC8 repertoire includes NRC notation – and to assume or pretend otherwise, is shortsighted and revisionist.

But why is all of this important. Well, it is at the heart of the problem that we find ourselves in. For authority data, the Library of Congress appears to have adopted this very narrow view of what MARC8 means (against their own stated recommendations) and as a result, NACO and OCLC place artificial limits on the pipeline. There are lots of reasons why LC does this, I recognize they are moving slowly because any changes that they make are often met with some level of resistance from members of our community – but in this case, this paralysis is causing more harm to the community than good.

Why this proposal is problematic?

So, this is the environment that we are working in and the issue this proposal sought to solve. The issue, however, is that the proposal attempts to solve this problem by adopting a MARC8 solution and applying it within UTF8 data – essentially making the case that NRC values can be embedded in UTF8 records to ensure lossless data entry. And while I can see why someone might think that – that assumption is fundamentally incorrect. When LC developed its guidance on NRC notation, this was guidance that was specifically directed in the lossless translation of data to MARC8. UTF8 data has no need for NRC notation. This does not mean that it does not sometimes show up – and as a practical purpose, I’ve spent thousands of hours working with Libraries dealing with the issues this creates in local systems. Aside from the issues this creates in MARC systems around indexing and discovery, it makes data almost impossible to be used outside of that system and in times of migration. In thinking about the implications of this change in the context of MarcEdit, I had the following, specific concerns:

  1. NRC data in UTF8 records would break existing workflows for users with current generation systems that would have no reason to expect this data as being present in UTF8 MARC records
  2. It would make normalization functionally virtually impossible and potentially re-introduce a problem I spent months solving for organizations related to how UTF8 data is normalized and introduced into local systems.
  3. It would break many of the transformation options.  MarcEdit allows for the flow of data to many different metadata formats – all are built on the concept that the first thing MarcEdit does is clean up character encodings to ensure the output data is in UTF8.
  4. MarcEdit is used by ~20k active users and ~60k annual users.  Over 1/3 of those users do not use MARC21 and do not use MARC-8.  Allowing the mixing of NRCs and UTF8 data potentially breaks functionality for broad groups of international users.

While I very much appreciate the issue that this is attempting to solve, I’ve spent years working with libraries where this kind of practice would introduce a long-term data issue that is very difficult to identify and fix and often shows up unexpectedly when it comes time to migration or share this information with other services, communities, or organizations.

So what is the solution?

 

I think that we can address this issue on two fronts. First, I would advise NACO and OCLC to essentially stop limiting data entry to this very limited notion of MARC8 repertoire. In all other contexts, OCLC provides the ability to enter any valid UTF8 data. This current limit within the authority process is artificial and unnecessary. OCLC could easily remove it, and NACO could amend their process to allow record entry to utilize any valid UTF8 character. This would address the problem that this group was attempting to solve for catalogers creating these records.

The second step could take two forms. If LC continues to ignore their own guidance and cleave to an outdated concept of the MARC8 repertoire – OCLC could provide to LC via their pipeline a version of the records where data includes NRC notation for use in LCs own systems. It would mean that I would not recommend using LC as a trusted system for downloading authorities if this was the practice unless I had an internal local process to remove any NRC data found in valid UTF8 records. Essentially, we essentially treat LC’s requirements as a disease and quarantine them and their influence in this process. Of course, what would be more ideal, is LC making the decision to accept UTF8 data without restrictions and rely on applicable guidance and MARC21 best practice by supporting UTF8 data fully, and for those still needing MARC8 data – providing that data using the lossless process of NRCs (per their own recommendations).

Conclusion

Ultimately, this proposal is a recognition that the current NACO rules and process is broken and broken in a way that it is actively undermining other work in the PCC around linked data development. And while I very much appreciate the thoughtful work that went into the consideration of a different approach, I think the unintended side affects would cause more long-term damage that any short-term gains. Ultimately, what we need is for the principles to rethink why these limitations are in place, and, honestly, really consider ways that we start to deemphasize the role LC plays as a standard holder if in that role, LC’s presence continues to be an impediment for moving libraries forward.

Accomplishments and priorities for the OCLC Research Library Partnership / HangingTogether

With 2021 well underway, the OCLC Research Library Partnership is as active as ever. We are heartened by the positive feedback and engagement our Partners have provided in response to our programming and research directions. Thank you to those who have shared your stories of success and challenge; listening to your voices is what guides us and drives us forward. We warmly welcome the University of Notre Dame, University of Waterloo, and OCAD University into the Partnership and are pleased to see how they have jumped right into engagement with SHARES and other activities.

The SHARES resource sharing community

Photo by Caleb Chen on Unsplash

The SHARES community has been a source of support and encouragement as resource sharing professionals around the world strive to meet their communities’ information needs during COVID-19. During the last year, Dennis Massie has convened more than 50 SHARES town halls to date to learn how SHARES members are changing practice to adapt to quickly evolving circumstances. Dennis has documented how resource sharing practices have changed.  

Inspired by the SHARES community, we are also excited to have launched the OCLC Interlibrary Loan Cost Calculator. For library administrators and funders to evaluate collection sharing services properly, they need access to current cost information, as well as benchmarks against which to measure their own library’s data. The Cost Calculator is a free online tool that has the potential to act as a virtual real-time ILL cost study. Designed in collaboration with resource sharing experts and built by OCLC Research staff, the calculator has been in the hands of beta testers and early adopters since October 2019. A recorded webinar gives a guided tour of what the tool does (and does not do), what information users need to gather, how developers addressed privacy issues, and how individual institutions and the library community can benefit.

Total cost of stewardship: responsible collection building in archives and special collections

A big thanks to our Partners who contributed to the Total Cost of Stewardship: Responsible Collection Building in Archives and Special Collections. This publication addresses the ongoing challenge of descriptive backlogs in archives and special collections by connecting collection development decisions with stewardship responsibilities. The report proposes a Total Cost of Stewardship framework for bringing together these important, interconnected functions. Developed by the RLP’s Collection Building and Operational Impacts Working Group, the Total Cost of Stewardship Framework is a model that considers the value of a potential acquisition and its alignment with institutional mission and goals alongside the cost to acquire, care for, and manage it, the labor and specialized skills required to do that work, and institutional capacity to care for and store collections.

This publication includes a suite of communication and cost estimation tools to help decision makers assess available resources, budgets, and timelines to plan with confidence and set realistic expectations to meet important goals. The report and accompanying resources provide special collections and archives with tools to support their efforts to meet the challenges of contemporary collecting and to ensure they are equitably serving and broadly documenting their communities.

Transitioning to the next generation of metadata

In December, we had a bittersweet moment celebrating Senior Program Officer Karen Smith-Yoshimura’s retirement. As Mercy Procaccini and others take over the role of coordinating the stalwart Metadata Managers Focus Group, we are taking time to refine how this dynamic group works and plans future discussions together to better support their efforts. A synthesis of this group’s discussions from the past six years traces how metadata services are transitioning to the “next generation of metadata.”

Transforming metadata into linked data

The RLP’s commitment to advancing learning and operational support for linked data continues with the January publication of Transforming Metadata into Linked Data to Improve Digital Collection Discoverability: A CONTENTdm Pilot Project. The report details a pilot project that investigated methods for—and the feasibility of—transforming metadata into linked data to improve the discoverability and management of digitized cultural materials and their descriptions. Five institutions partnered with OCLC to collaborate on this linked data project, representing a diverse cross-section of different types of institutions: The Cleveland Public Library The Huntington Library, Art Museum, and Botanical Gardens The Minnesota Digital Library Temple University Libraries University of Miami Libraries.

OCLC has invested in pathbreaking linked data work for over a decade, and it is wonderful to add the publication to this knowledge base.

Social interoperability in research support  

In the area of research support, Rebecca Bryant developed a robust series of webinars as a follow-on to the 2019–2020 OCLC Research project, Social Interoperability in Research Support. The resulting report, Social Interoperability in Research Support: Cross-campus Partnerships and the University Research Enterprise, synthesizes information about the highly decentralized, complex research support ecosystem at US research institutions. The report additionally offers a conceptual model of campus research support stakeholders and provides recommendations for establishing and stewarding successful cross-campus relationships. The social interoperability webinar series complements this work by offering in-depth case studies and “stakeholder spotlights” from RLP institutions, demonstrating how other campus  are eager to collaborate with the library. This is a great example of the type of programming you can find in our Works in Progress Webinar Series

Equity, diversity, and inclusion

Our team has been digging into issues of equity, diversity, and inclusion: we’ve developed a “practice group” to help our team be better situated to engaging in difficult conversations around race, and we also have been learning and engaging in conversations about the difficulty of cataloging topics relating to Indigenous peoples in respectful ways. 

This work has helped to prepare the way for important new work that I’m pleased to share with you today. OCLC will be working in consultation with Shift Collective on The Andrew W. Mellon-funded convening, Reimagine Descriptive Workflows. The project will bring together a wide range of community stakeholders to interrogate the existing descriptive workflow infrastructure to imagine new workflows that are inclusive, equitable, scalable, and sustainable. We are following an approach developed in other work we have carried out, such as the Research and Learning Agenda for Archives, Special, and Distinctive Collections in Research Libraries, and more recently, in Responsible Operations: Data Science, Machine Learning, and AI in Libraries. In that vein, we will host a virtual convening later this year to inform a Community Agenda publication. 

Reimagine Descriptive Workflows is the next stage of a journey that we’ve been on for some time, informed by numerous webinars, surveys, and individual conversations. I am very grateful to team members and the RLP community for their contributions and guidance. We are truly “learning together.”

Looking forward

If you are at an OCLC RLP affiliated institution and would like to learn more about how to get the most out of your RLP affiliation, please contact your staff liaison (or anyone on our energetic team) and we be happy to set up a virtual orientation or refresher on our programs and opportunities for active learning.

It is with deep gratitude that I offer my thanks to to our Partners for their investment in the Research Library Partnership. We are committed to offering our very best to serve your research and learning needs.

The post Accomplishments and priorities for the OCLC Research Library Partnership appeared first on Hanging Together.

Watch the Net Zero Challenge pitch contest / Open Knowledge Foundation

This week, five shortlisted teams took part in the final stage of the Net Zero Challenge – a global competition to identify, promote and support innovative, practical and scalable uses of open data that advance climate action.

The five teams presented their three-minute project pitches to the Net Zero Challenge Panel of Experts, and a live audience. Each pitch was followed by a live Q&A.

The winner of the pitch contest will be announced in the next few days.

If you didn’t have the chance to attend the event in person, watch the event here (46.08.min) or see below for links to individual pitches.

A full unedited video of the event is at the bottom of this page.

Introduction – by James Hamilton, Director of the Net Zero Challenge

Watch video here (4.50min) // Introduction Slide Deck 

Pitch 1 – by Matt Sullivan from Snapshot Climate Tool which provides greenhouse gas emission profiles for every local government region (municipality) in Australia.

Watch pitch video here (10.25min) // Snapshot Slide Deck

Pitch 2 – by Saif Shabou from CarbonGeoScales which is a framework for standardising open data for green house gas emissions at multiple geographical scales (built by a team from France).

Watch pitch video here (9.07min) // CarbonGeoScales Slide Deck

Pitch 3 – by Jeremy Dickens. He presents Citizen Science Avian Index for Sustainable Forests a new bio monitoring tool that uses open data on bird observations to provide crucial information on forest ecological conditions (from South Africa).

Watch pitch video here (7.03min)  // Avian Index – Slide Deck

Pitch 4 – by Cristian Gregorini from Project Yarquen which is a new API tool and website to organise climate relevant open data for use by civil society organisations, environmental activists, data journalists and people interested in environmental issues (built by a team from Argentina).

Watch pitch video here (8.20min)

Pitch 5 – by Beatriz Pagy from Clima de Eleição which analyses recognition of climate change issues by prospective election candidates in Brazil, enabling voters to make informed decisions about who to vote in to office.

Watch pitch video here (5.37min) // Clima de Eleição – Slide Deck

Concluding remarks – by James Hamilton, Director of the Net Zero Challenge

Watch video here (0.46min)


A full unedited video of the Net Zero Challenge is here (55.28min)


There are many people who collaborated to make this event possible.

We wish to thank both Microsoft and the UK Foreign, Commonwealth & Development Office for their support for the Net Zero Challenge. Thanks also to Open Data Charter and the Open Data & Innovation Team at Transport for New South Wales for their strategic advice during the development of this project. The event would not have been possible without the enthusiastic hard work of the Panel of Experts who will judge the winning entry, and the audience who asked such great questions. Finally – to all the pitch teams. Your projects inspire us and we hope your participation in the Net Zero Challenge has been – and will continue to be – supportive for your work as you use open data to advance climate action.

A new web presence: transitions and towers / Lorcan Dempsey

A brief tour

A new web presence: transitions and towers

There are pages about writing, career and life, the usual stuff.  I especially enjoyed providing an overview of my writing, thinking about continuity and evolution. As I note, I hope to write more about social and institutional aspects of libraries in the future.

Writing
I have written widely over the years. Topics have evolved, but follow a general chronological arc from the technical, through the description of evolving digital systems and services, to organizational and institutional issues.
A new web presence: transitions and towers
Career stuff
The career to date .. a bit more detail.
A new web presence: transitions and towers
LorcanDempsey.net
Welcome! This is a place to bring together my work on library services and directions.
A new web presence: transitions and towers

I have ported over the content from Lorcan Dempsey&aposs Weblog, and say a little more about that below. Earlier manifestations will redirect to here, which is now where I will blog.

Join

I have implemented a membership feature. This is of course optional but you should sign up if you are interested in receiving updates or if you would like to comment on any entries. More details here:

Join?
Membership - spread the loveWhy join? Only join if you wish to enjoy some additional features. Specifically, memberscan comment on posts and members can receive updates when significant newcontent is added. Signup to join. You can then opt in to receive updates. Once you are a member, you n…
A new web presence: transitions and towers

Lorcan Dempsey&aposs Weblog: Orweblog

Over many years, I accumulated a lot of content at this venue. Over one thousand nine hundred posts I discovered when we moved it. I am very proud of that achievement. Many posts are referenced and linked to in the literature and elsewhere. A variety of concepts was first introduced and explicated here, before being amplified in presentations, other writing and in my work generally. It stands as a record of many years of thinking and contribution.

Blogging has become much less frequent in recent years. In some ways, my trajectory has followed the trajectory of blogging generally, peaking some years ago. However, more recently, it has been interesting seeing the growth of Substack, Revue, Patreon and other manifestations of the independent creator trend, this time often accompanied by a revenue model.

I observed some blogging dynamics in this post a while ago:

Because I don’t blog all that often any more, I find that when I sit down to do an entry I have too much stuff to say. I end up writing a short article rather than a blog entry. Indeed, I have a couple of pieces that are lying around and have grown to several thousand words.

I do hope to recapture some of that blogging fluency here.

One handed writing - the blog
Recently @mishdalton [https://twitter.com/mishdalton] pointed me at an article[https://www.nytimes.com/2017/08/25/opinion/tips-for-aspiring-op-ed-writers.html?mcubz=1] about writing op-eds for the New York Times. I was immediately struck by howapplicable much of what was said was to blog writing.…
A new web presence: transitions and towers

Some notes about the Lorcan Dempsey Weblog content on this site:

  • To facilitate redirection and also to identify it here, I have given the imported content its own path .. /orweblog/{entry}. It can also be seen in the navigation the top of the page.
  • I have also tagged some important or influential posts as &apos*Classics&apos to elevate them in the new site.
  • I have mapped categories in the older blog to tags here, which have a more prominent role in navigation. The mapping isn&apost perfect, and I have lost some definition. But it works well.
  • There will be occasional glitches with formatting, links, or missing images, but overall the transition has worked reasonably well once various issues were identified and addressed.
  • I am grateful to Brian Pichman of libchalk (who host some OCLC blogs) for the patient and expert support he provided as we moved.
*Classics - LorcanDempsey.net
Significant or influential posts. This selection is seeded by select posts from Lorcan Dempsey’s Weblog, my blog at OCLC.
A new web presence: transitions and towers

The Poolbeg Towers

I have always felt a touch of nostalgia when visiting Lorcan Dempsey&aposs weblog – nothing to do with the content, but rather the masthead of the webpage, which carries a panoramic photograph of his home city, Dublin, resting on Sandymount Strand in the foreground with its two Pigeon House towers and sweeping across the sea to Howth Head at the other side of Dublin Bay. //  Rónán O’Beirne, Library and Information Research, Volume 39 Number 121, 2015.

Continuous with my blogging identity over so many years, I do carry over the twin towers at Poolbeg, an iconic Dublin landmark. These featured in the banner picture of Lorcan Dempsey&aposs Weblog for many years, and are the feature picture of this post.

I am grateful to my son Eoghan for creating a mark for this blog which incorporates the towers.

A new web presence: transitions and towers

Here is the picture from the original blog .. somewhat low resolution.

A new web presence: transitions and towers

The towers have always been special, standing in the arc of the bay. At the foot of the South Great Wall, which stretches out into the sea. They look out over the city, the traffic of the port, and the emigrant ships. And in turn, they look back at the towers. In the time since we have left Dublin, they have become more pervasive as an icon and emblem of the city, despite also being threatened with demolition.

And as a special bonus here is Lisa Hannigan singing Snow. The towers are in the background.

Picture: I took the feature picture on Sandymount Strand in Dublin. Maybe I should find a better picture, but somehow the focus here seemed right. These are now much photographed.

A barbaric yawp / Hugh Rundle

Over the Easter break I made a little Rust tool for sending toots and/or tweets from a command line. Of course there are dozens of existing tools that enable either of these, but I had a specific use in mind, and also wanted a reasonably small and achievable project to keep learning Rust.

For various reasons I've recently been thinking about the power of "the Unix philosophy", generally summarised as:

  • Write programs that do one thing and do it well.
  • Write programs to work together.
  • Write programs to handle text streams, because that is a universal interface.

My little program takes a text string as input, and sends the same string to the output, the intention being not so much that it would normally be used manually on its own (though it can be) but more that it can "work together" with other programs or scripts. The "one thing" it does (I will leave the question of "well" to other people to judge) is post a tweet and/or toot to social media. It's very much a unidirectional, broadcast tool, not one for having a conversation. In that sense, it's like Whitman's "Barbaric yawp", subject of my favourite scene in Dead Poets Society and a pretty nice description of what social media has become in a decade or so. Calling the program yawp therefore seemed fitting.

yawp takes text from standard input (stdin), publishes that text as a tweet and/or a toot, and then prints it to standard output (stdout). Like I said, it's not particularly complex, and not even all that useful for your daily social media posting needs, but the point is for it to be part of a tool chain. For this reason yawp takes the configuration it needs to interact with the Mastodon and Twitter APIs from environment (ENV) variables, because these are quite easy to set programatically and a fairly "universal interface" for setting and getting values to be used in programs.

Here's a simple example of sending a tweet:

yawp 'Hello, World!' -t

We could also send a toot by piping from the echo program (the - tells yawp to use stdin instead of looking for an argument like it uses above):

echo 'Hello again, World!' | yawp - -m

In bash, you can send the contents of a file to stdin, so we could do this too:

yawp - -mt <message.txt

But really the point is to use yawp to do something like this:

app_that_creates_message | yawp - -mt | do_something_else.sh >> yawping.log

Anyway, enjoy firing your barbaric yawps into the cacophony.


I haven’t failed, I’ve just tried a lot of ML approaches that don’t work / Andromeda Yelton

“Let’s blog every Friday,” I thought. “It’ll be great. People can see what I’m doing with ML, and it will be a useful practice for me!” And then I went through weeks on end of feeling like I had nothing to report because I was trying approach after approach to this one problem that simply didn’t work, hence not blogging. And finally realized: oh, the process is the thing to talk about…

Hi. I’m Andromeda! I am trying to make a neural net better at recognizing people in archival photos. After running a series of experiments — enough for me to have written 3,804 words of notes — I now have a neural net that is ten times worse at its task. 🎉

And now I have 3,804 words of notes to turn into a blog post (a situation which gets harder every week). So let me catch you up on the outline of the problem:

  1. Download a whole bunch of archival photos and their metadata (thanks, DPLA!)
  2. Use a face detection ML library to locate faces, crop them out, and save them in a standardized way
  3. Benchmark an off-the-shelf face recognition system to see how good it is at identifying these faces
  4. Retrain it
  5. Benchmark my new system

Step 3: profit, right? Well. Let me also catch you up on some problems along the way:

Alas, metadata

Archival photos are great because they have metadata, and metadata is like labels, and labels mean you can do supervised learning, right?

Well….

Is he “Du Bois, W. E. B. (William Edward Burghardt), 1868-1963” or “Du Bois, W. E. B. (William Edward Burghardt) 1868-1963” or “Du Bois, W. E. B. (William Edward Burghardt)” or “W.E.B. Du Bois”? I mean, these are all options. People have used a lot of different metadata practices at different institutions and in different times. But I’m going to confuse the poor computer if I imply to it that all these photos of the same person are photos of different people. (I have gone through several attempts to resolve this computationally without needing to do everything by hand, with only modest success.)

What about “Photographs”? That appears in the list of subject labels for lots of things in my data set. “Photographs” is a person, right? I ended up pulling in an entire other ML component here — spaCy, to do some natural language processing to at least guess which lines are probably names, so I can clear the rest of them out of my way. But spaCy only has ~90% accuracy on personal names anyway and, guess what, because everything is terrible, in predictable ways, it has no idea “Kweisi Mfume” is a person.

Is a person who appears in the photo guaranteed to be a person who appears in the photo? Nope.

Is a person who appears in the metadata guaranteed to be a person who appears in the photo? Also nope! Often they’re a photographer or other creator. Sometimes they are the subject of the depicted event, but not themselves in the photo. (spaCy will happily tell you that there’s personal name content in something like “Martin Luther King Day”, but MLK is unlikely to appear in a photo of an MLK day event.)

Oh dear, linear algebra

OK but let’s imagine for the sake of argument that we live in a perfect world where the metadata is exactly what we need — no more, no less — and its formatting is perfectly consistent. 🦄

Here you are, in this perfect world, confronted with a photo that contains two people and has two names. How do you like them apples?

I spent more time than I care to admit trying to figure this out. Can I bootstrap from photos that have one person and one name — identify those, subtract them out of photos of two people, go from there? (Not reliably — there’s a lot of data I never reach that way — and it’s horribly inefficient.)

Can I do something extremely clever with matrix multiplication? Like…once I generate vector space embeddings of all the photos, can I do some sort of like dot-product thing across all of my photos, or big batches of them, and correlate the closest-match photos with overlaps in metadata? Not only is this a process which begs the question — I’d have to do that with the ML system I have not yet optimized for archival photo recognition, thus possibly just baking bad data in — but have I mentioned I have taken exactly one linear algebra class, which I didn’t really grasp, in 1995?

What if I train yet another ML system to do some kind of k-means clustering on the embeddings? This is both a promising approach and some really first-rate yak-shaving, combining all the question-begging concerns of the previous paragraph with all the crystalline clarity of black box ML.

Possibly at this point it would have been faster to tag them all by hand, but that would be admitting defeat. Also I don’t have a research assistant, which, let’s be honest, is the person who would usually be doing this actual work. I do have a 14-year-old and I am strongly considering paying her to do it for me, but to facilitate that I’d have to actually build a web interface and probably learn more about AWS, and the prospect of reading AWS documentation has a bracing way of reminding me of all of the more delightful and engaging elements of my todo list, like calling some people on the actual telephone to sort out however they’ve screwed up some health insurance billing.

Nowhere to go but up

Despite all of that, I did actually get all the way through the 5 steps above. I have a truly, spectacularly terrible neural net. Go me! But at a thousand-plus words, perhaps I should leave that story for next week….

Talk: Using light from the dumpster fire to illuminate a more just digital world / Erin White

This February I gave a lightning talk for the Richmond Design Group. My question: what if we use the light from the dumpster fire of 2020 to see an equitable, just digital world? How can we change our thinking to build the future web we need?

Presentation is embedded here; text of talk is below.

Hi everybody, I’m Erin. Before I get started I want to say thank you to the RVA Design Group organizers. This is hard work and some folks have been doing it for YEARS. Thank you to the organizers of this group for doing this work and for inviting me to speak.

This talk isn’t about 2020. This talk is about the future. But to understand the future, we gotta look back.

The web in 1996

Travel with me to 1996. Twenty-five years ago!

I want to transport us back to the mindset of the early web. The fundamental idea of hyperlinks, which we now take for granted, really twisted everyone’s noodles. So much of the promise of the early web was that with broad access to publish in hypertext, the opportunities were limitless. Technologists saw the web as an equalizing space where systems of oppression that exist in the real world wouldn’t matter, and that we’d all be equal and free from prejudice. Nice idea, right?

You don’t need to’ve been around since 1996 to know that’s just not the way things have gone down.

Pictured before you are some of the early web pioneers. Notice a pattern here?

These early visions of the web, including Barlow’s declaration of independence of cyberspace, while inspiring and exciting, were crafted by the same types of folks who wrote the actual declaration of independence: the landed gentry, white men with privilege. Their vision for the web echoed the declaration of independence’s authors’ attempts to describe the world they envisioned. And what followed was the inevitable conflict with reality.

We all now hold these truths to be self-evident:

  • The systems humans build reflect humans’ biases and prejudices.
  • We continue to struggle to diversify the technology industry.
  • Knowledge is interest-driven.
  • Inequality exists, online and off.
  • Celebrating, rather than diminishing, folks’ intersecting identities is vital to human flourishing.

The web we have known

Profit first: monetization, ads, the funnel, dark patterns
Can we?: Innovation for innovation’s sake
Solutionism: code will save us
Visual design: aesthetics over usability
Lone genius: “hard” skills and rock star coders
Short term thinking: move fast, break stuff
Shipping: new features, forsaking infrastructure

Let’s move forward quickly through the past 25 years or so of the web, of digital design.

All of the web we know today has been shaped in some way by intersecting matrices of domination: colonialism, capitalism, white supremacy, patriarchy. (Thank you, bell hooks.)

The digital worlds where we spend our time – and that we build!! – exist in this way.

This is not an indictment of anyone’s individual work, so please don’t take it personally. What I’m talking about here is the digital milieu where we live our lives.

The funnel drives everything. Folks who work in nonprofits and public entities often tie ourselves in knots to retrofit our use cases in order to use common web tools (google analytics, anyone?)

In chasing innovation™ we often overlook important infrastructure work, and devalue work — like web accessibility, truly user-centered design, care work, documentation, customer support and even care for ourselves and our teams — that doesn’t drive the bottom line. We frequently write checks for our future selves to cash, knowing damn well that we’ll keep burying ourselves in technical debt. That’s some tough stuff for us to carry with us every day.

The “move fast” mentality has resulted in explosive growth, but at what cost? And in creating urgency where it doesn’t need to exist, focusing on new things rather than repair, the end result is that we’re building a house of cards. And we’re exhausted.

To zoom way out, this is another manifestation of late capitalism. Emphasis on LATE. Because…2020 happened.

What 2020 taught us

Hard times amplify existing inequalities
Cutting corners mortgages our future
Infrastructure is essential
“Colorblind”/color-evasive policy doesn’t cut it
Inclusive design is vital
We have a duty to each other
Technology is only one piece
Together, we rise

The past year has been awful for pretty much everybody.

But what the light from this dumpster fire has illuminated is that things have actually been awful for a lot of people, for a long time. This year has shown us how perilous it is to avoid important infrastructure work and to pursue innovation over access. It’s also shown us that what is sometimes referred to as colorblindness — I use the term color-evasiveness because it is not ableist and it is more accurate — a color-evasive approach that assumes everyone’s needs are the same in fact leaves people out, especially folks who need the most support.

We’ve learned that technology is a crucial tool and that it’s just one thing that keeps us connected to each other as humans.

Finally, we’ve learned that if we work together we can actually make shit happen, despite a world that tells us individual action is meaningless. Like biscuits in a pan, when we connect, we rise together.

Marginalized folks have been saying this shit for years.
More of us than ever see these things now.
And now we can’t, and shouldn’t, unsee it.

The web we can build together

Current state:
– Profit first
– Can we?
– Solutionism
– Aesthetics
– “Hard” skills
– Rockstar coders
– Short term thinking
– Shipping

Future state:
– People first: security, privacy, inclusion
– Should we?
– Holistic design
– Accessibility
– Soft skills
– Teams
– Long term thinking
– Sustaining

So let’s talk about the future. I told you this would be a talk about the future.

Like many of y’all I have had a very hard time this year thinking about the future at all. It’s hard to make plans. It’s hard to know what the next few weeks, months, years will look like. And who will be there to see it with us.

But sometimes, when I can think clearly about something besides just making it through every day, I wonder.

What does a people-first digital world look like? Who’s been missing this whole time?

Just because we can do something, does it mean we should?

Will technology actually solve this problem? Are we even defining the problem correctly?

What does it mean to design knowing that even “able-bodied” folks are only temporarily so? And that our products need to be used, by humans, in various contexts and emotional states?

(There are also false binaries here: aesthetics vs. accessibility; abled and disabled; binaries are dangerous!)

How can we nourish our collaborations with each other, with our teams, with our users? And focus on the wisdom of the folks in the room rather than assigning individuals as heroes?

How can we build for maintenance and repair? How do we stop writing checks our future selves to cash – with interest?

Some of this here, I am speaking of as a web user and a web creator. I’ve only ever worked in the public sector. When I talk with folks working in the private sector I always do some amount of translating. At the end of the day, we’re solving many of the same problems.

But what can private-sector workers learn from folks who come from a public-sector organization?

And, as we think about what we build online, how can we also apply that thinking to our real-life communities? What is our role in shaping the public conversation around the use of technologies? I offer a few ideas here, but don’t want them to limit your thinking.

Consider the public sector

I don’t have a ton of time left today. I wanted to talk about public service like the very excellent Dana Chisnell here.

Like I said, I’ve worked in the public sector, in higher ed, for a long time. It’s my bread and butter. It’s weird, it’s hard, it’s great.

There’s a lot of work to be done, and it ain’t happening at civic hackathons or from external contractors. The call needs to come from inside the house.

Working in the public sector


I want you to consider for a minute how many folks are working in the public sector right now, and how technical expertise — especially in-house expertise — is something that is desperately needed.

Pictured here are the old website and new website for the city of Richmond. I have a whole ‘nother talk about that new Richmond website. I FOIA’d the contracts for this website. There are 112 accessibility errors on the homepage alone. It’s been in development for 3 years and still isn’t in full production.

Bottom line, good government work matters, and it’s hard to find. Important work is put out for the lowest bidder and often external agencies don’t get it right. What would it look like to have that expertise in-house?

Influencing technology policy

We also desperately need lawmakers and citizens who understand technology and ask important questions about ethics and human impact of systems decisions.

Pictured here are some headlines as well as a contract from the City of Richmond. Y’all know we spent $1.5 million on a predictive policing system that will disproportionately harm citizens of color? And that earlier this month, City Council voted to allow Richmond and VCU PD’s to start sharing their data in that system?

The surveillance state abides. Technology facilitates.

I dare say these technologies are designed to bank on the fact that lawmakers don’t know what they’re looking at.

My theory is, in addition to holding deep prejudices, lawmakers are also deeply baffled by technology. The hard questions aren’t being asked, or they’re coming too late, and they’re coming from citizens who have to put themselves in harm’s way to do so.

Technophobia is another harmful element that’s emerged in the past decades. What would a world look like where technology is not a thing to shrug off as un-understandable, but is instead deftly co-designed to meet our needs, rather than licensed to our city for 1.5 million dollars? What if everyone knew that technology is not neutral?

Closing

This is some of the future I can see. I hope that it’s sparked new thoughts for you.

Let’s envision a future together. What has the light illuminated for you?

Thank you!

NFTs and Web Archiving / David Rosenthal

One of the earliest observations of the behavior of the Web at scale was "link rot". There were a lot of 404s, broken links. Research showed that the half-life of Web pages was alarmingly short. Even in 1996 this problem was obvious enough for Brewster Kahle to found the Internet Archive to address it. From the Wikipedia entry for Link Rot:
A 2003 study found that on the Web, about one link out of every 200 broke each week,[1] suggesting a half-life of 138 weeks. This rate was largely confirmed by a 2016–2017 study of links in Yahoo! Directory (which had stopped updating in 2014 after 21 years of development) that found the half-life of the directory's links to be two years.[2]
One might have thought that academic journals were a relatively stable part of the Web, but research showed that their references decayed too, just somewhat less rapidly. A 2013 study found a half-life of 9.3 years. See my 2015 post The Evanescent Web.

I expect you have noticed the latest outbreak of blockchain-enabled insanity, Non-Fungible Tokens (NFTs). Someone "paying $69M for a JPEG" or $560K for a New York Times column attracted a lot of attention. Follow me below the fold for the connection between NFTs, "link rot" and Web archiving.

Kahle's idea for addressing "link rot", which became the Wayback Machine, was to make a copy of the content at some URL, say:
http://www.example.com/page.html
keep the copy for posterity, and re-publish it at a URL like:
https://web.archive.org/web/19960615083712/http://www.example.com/page.html
What is the difference between the two URLs? The original is controlled by Example.Com, Inc.; they can change or delete it on a whim. The copy is controlled by the Internet Archive, whose mission is to preserve it unchanged "for ever". The original is subject to "link rot", the second is, one hopes, not subject to "link rot". The Wayback Machine's URLs have three components:
  • https://web.archive.org/web/ locates the archival copy at the Internet Archive.
  • 19960615083712 indicates that the copy was made on 15th June, 1996 at 8:37:12.
  • http://www.example.com/page.html is the URL from which the copy was made.
The fact that the archival copy is at a different URL from the original causes a set of problems that have bedevilled Web archiving. One is that, if the original goes away, all the links that pointed to it break, even though there may be an archival copy to which they could point to fulfill the intent of the link creator. Another is that, if the content at the original URL changes, the link will continue to resolve but the content it returns may no longer reflect the intent of the link creator, although there may be an archival copy that does. Even in the early days of the Web it was evident that Web pages changed and vanished at an alarming rate.

The point is that the meaning of a generic Web URL is "whatever content, or lack of content, you find at this location". That is why URL stands for Universal Resource Locator. Note the difference with URI, which stands for Universal Resource Identifier. Anyone can create a URL or URI linking to whatever content they choose, but doing so provides no rights in or control over the linked-to content.

In People's Expensive NFTs Keep Vanishing. This Is Why, Ben Munster reports that:
over the past few months, numerous individuals have complained about their NFTs going “missing,” “disappearing,” or becoming otherwise unavailable on social media. This despite the oft-repeated NFT sales pitch: that NFT artworks are logged immutably, and irreversibly, onto the Ethereum blockchain.
So NTFs have the same problem that Web pages do. Isn't the blockchain supposed to make things immortal and immutable?

Kyle Orland's Ars Technica’s non-fungible guide to NFTs provides an over-simplified explanation:
When NFT’s are used to represent digital files (like GIFs or videos), however, those files usually aren’t stored directly “on-chain” in the token itself. Doing so for any decently sized file could get prohibitively expensive, given the cost of replicating those files across every user on the chain. Instead, most NFTs store the actual content as a simple URI string in their metadata, pointing to an Internet address where the digital thing actually resides.
NFTs are just links to the content they represent, not the content itself. The Bitcoin blockchain actually does contain some images, such as this ASCII portrait of Len Sassaman and some pornographic images. But the blocks of the Bitcoin blockchain were originally limited to 1MB and are now effectively limited to around 2MB, enough space for small image files. What’s the Maximum Ethereum Block Size? explains:
Instead of a fixed limit, Ethereum block size is bound by how many units of gas can be spent per block. This limit is known as the block gas limit ... At the time of writing this, miners are currently accepting blocks with an average block gas limit of around 10,000,000 gas. Currently, the average Ethereum block size is anywhere between 20 to 30 kb in size.
That's a little out-of-date. Currently the block gas limit is around 12.5M gas per block and the average block is about 45KB. Nowhere near enough space for a $69M JPEG. The NFT for an artwork can only be a link. Most NFTs are ERC-721 tokens, providing the optional Metadata extension:
/// @title ERC-721 Non-Fungible Token Standard, optional metadata extension
/// @dev See https://eips.ethereum.org/EIPS/eip-721
/// Note: the ERC-165 identifier for this interface is 0x5b5e139f.
interface ERC721Metadata /* is ERC721 */ {
/// @notice A descriptive name for a collection of NFTs in this contract
function name() external view returns (string _name);

/// @notice An abbreviated name for NFTs in this contract
function symbol() external view returns (string _symbol);

/// @notice A distinct Uniform Resource Identifier (URI) for a given asset.
/// @dev Throws if `_tokenId` is not a valid NFT. URIs are defined in RFC
/// 3986. The URI may point to a JSON file that conforms to the "ERC721
/// Metadata JSON Schema".
function tokenURI(uint256 _tokenId) external view returns (string);
}
The Metadata JSON Schema specifies an object with three string properties:
  • name: "Identifies the asset to which this NFT represents"
  • description: "Describes the asset to which this NFT represents"
  • image: "A URI pointing to a resource with mime type image/* representing the asset to which this NFT represents. Consider making any images at a width between 320 and 1080 pixels and aspect ratio between 1.91:1 and 4:5 inclusive."
Note that the JSON metadata is not in the Ethereum blockchain, it is only pointed to by the token on the chain. If the art-work is the "image", it is two links away from the blockchain. So, given the evanescent nature of Web links, the standard provides no guarantee that the metadata exists, or is unchanged from when the token was created. Even if it is, the standard provides no guarantee that the art-work exists or is unchanged from when the token is created.

Caveat emptor — Absent unspecified actions, the purchaser of an NFT is buying a supposedly immutable, non-fungible object that points to a URI pointing to another URI. In practice both are typically URLs. The token provides no assurance that either of these links resolves to content, or that the content they resolve to at any later time is what the purchaser believed at the time of purchase. There is no guarantee that the creator of the NFT had any copyright in, or other rights to, the content to which either of the links resolves at any particular time.

There are thus two issues to be resolved about the content of each of the NFT's links:
  • Does it exist? I.e. does it resolve to any content?
  • Is it valid? I.e. is the content to which it resolves unchanged from the time of purchase?
These are the same questions posed by the Holy Grail of Web archiving, persistent URLs.

Assuming existence for now, how can validity be assured? There have been a number of systems that address this problem by switching from naming files by their location, as URLs do, to naming files by their content by using the hash of the content as its name. The idea was the basis for Bram Cohen's highly successful BitTorrent — it doesn't matter where the data comes from provided its integrity is assured because the hash in the name matches the hash of the content.

The content-addressable file system most used for NFTs is the Interplanetary File System (IPFS). From its Wikipedia page:
As opposed to a centrally located server, IPFS is built around a decentralized system[5] of user-operators who hold a portion of the overall data, creating a resilient system of file storage and sharing. Any user in the network can serve a file by its content address, and other peers in the network can find and request that content from any node who has it using a distributed hash table (DHT). In contrast to BitTorrent, IPFS aims to create a single global network. This means that if Alice and Bob publish a block of data with the same hash, the peers downloading the content from Alice will exchange data with the ones downloading it from Bob.[6] IPFS aims to replace protocols used for static webpage delivery by using gateways which are accessible with HTTP.[7] Users may choose not to install an IPFS client on their device and instead use a public gateway.
If the purchaser gets both the NFT's metadata and the content to which it refers via IPFS URIs, they can be assured that the data is valid. What do these IPFS URIs look like? The (excellent) IPFS documentation explains:
https://ipfs.io/ipfs/<CID>
# e.g
https://ipfs.io/ipfs/Qme7ss3ARVgxv6rXqVPiikMJ8u2NLgmgszg13pYrDKEoiu
Browsers that support IPFS can redirect these requests to your local IPFS node, while those that don't can fetch the resource from the ipfs.io gateway.

You can swap out ipfs.io for your own http-to-ipfs gateway, but you are then obliged to keep that gateway running forever. If your gateway goes down, users with IPFS aware tools will still be able to fetch the content from the IPFS network as long as any node still hosts it, but for those without, the link will be broken. Don't do that.
Note the assumption here that the ipfs.io gateway will be running forever. Note also that only some browsers are capable of accessing IPFS content without using a gateway. Thus the ipfs.io gateway is a single point of failure, although the failure is not complete. In practice NFTs using IPFS URIs are dependent upon the continued existence of Protocol Labs, the organization behind IPFS. The ipfs.io URIs in the NFT metadata are actually URLs; they don't point to IPFS, but to a Web server that accesses IPFS.

Pointing to the NFT's metadata and content using IPFS URIs assures their validity but does it assure their existence? The IPFS documentation's section Persistence, permanence, and pinning explains:
Nodes on the IPFS network can automatically cache resources they download, and keep those resources available for other nodes. This system depends on nodes being willing and able to cache and share resources with the network. Storage is finite, so nodes need to clear out some of their previously cached resources to make room for new resources. This process is called garbage collection.

To ensure that data persists on IPFS, and is not deleted during garbage collection, data can be pinned to one or more IPFS nodes. Pinning gives you control over disk space and data retention. As such, you should use that control to pin any content you wish to keep on IPFS indefinitely.
To assure the existence of the NFT's metadata and content they must both be not just written to IPFS but also pinned to at least one IPFS node.
To ensure that your important data is retained, you may want to use a pinning service. These services run lots of IPFS nodes and allow users to pin data on those nodes for a fee. Some services offer free storage-allowance for new users. Pinning services are handy when:
  • You don't have a lot of disk space, but you want to ensure your data sticks around.
  • Your computer is a laptop, phone, or tablet that will have intermittent connectivity to the network. Still, you want to be able to access your data on IPFS from anywhere at any time, even when the device you added it from is offline.
  • You want a backup that ensures your data is always available from another computer on the network if you accidentally delete or garbage-collect your data on your own computer.
Thus to assure the existence of the NFT's metadata and content pinning must be rented from a pinning service, another single point of failure.

In summary, it is possible to take enough precautions and pay enough ongoing fees to be reasonably assured that your $69M NFT and its metadata and the JPEG it refers to will remain accessible. Whether in practice these precautions are taken is definitely not always the case. David Gerard reports:
But functionally, IPFS works the same way as BitTorrent with magnet links — if nobody bothers seeding your file, there’s no file there. Nifty Gateway turn out not to bother to seed literally the files they sold, a few weeks later. [Twitter; Twitter]
Anil Dash claims to have invented, with Kevin McCoy, the concept of NFTs referencing Web URLs in 2014. He writes in his must-read NFTs Weren’t Supposed to End Like This:
Seven years later, all of today’s popular NFT platforms still use the same shortcut. This means that when someone buys an NFT, they’re not buying the actual digital artwork; they’re buying a link to it. And worse, they’re buying a link that, in many cases, lives on the website of a new start-up that’s likely to fail within a few years. Decades from now, how will anyone verify whether the linked artwork is the original?

All common NFT platforms today share some of these weaknesses. They still depend on one company staying in business to verify your art. They still depend on the old-fashioned pre-blockchain internet, where an artwork would suddenly vanish if someone forgot to renew a domain name. “Right now NFTs are built on an absolute house of cards constructed by the people selling them,” the software engineer Jonty Wareing recently wrote on Twitter.
My only disagreement with Dash is that, as someone who worked on archiving the "old-fashioned pre-blockchain internet" for two decades, I don't believe that there is a new-fangled post-blockchain Internet that makes the problems go away. And neither does David Gerard:
The pictures for NFTs are often stored on the Interplanetary File System, or IPFS. Blockchain promoters talk like IPFS is some sort of bulletproof cloud storage that works by magic and unicorns.

Evergreen 3.7.0 released / Evergreen ILS

The Evergreen Community is pleased to announce the release of Evergreen 3.7.0. Evergreen is highly-scalable software for libraries that helps library patrons find library materials and helps libraries manage, catalog, and circulate those materials, no matter how large or complex the libraries.

Evergreen 3.7.0 is a major release that includes the following new features of note:

  • Support for SAML-based Single Sign On
  • Hold Groups, a feature that allows staff to add multiple users to a named hold group bucket and place title-level holds for a record for that entire set of users
  • The Bootstrap public catalog skin is now the default
  • “Did you mean?” functionality for catalog search focused on making suggestions for single search terms
  • Holdings on the public catalog record details page can now be sorted by geographic proximity
  • Library Groups, a feature that allows defining groups of organizational units outside of the hierarchy that can be used to limit catalog search results
  • Expired staff accounts can now be blocked from logging in
  • Publisher data in the public catalog display is now drawn from both the 260 and 264 field
  • The staff catalog can now save all search results (up to 1,000) to a bucket in a single operation
  • New opt-in settings for overdue and predue email notifications
  • A new setting to allow expired patrons to renew loans
  • Porting of additional interfaces to Angular, including Scan Item as Missing Pieces and Shelving Location Groups

Evergreen admins installing or upgrading to 3.7.0 should be aware of the following:

  • The minimum version of PostgreSQL required to run Evergreen 3.6 is PostgreSQL 9.6.
  • The minimum version of OpenSRF is 3.2.
  • This release adds anew OpenSRF service, open-ils.geo.
  • The release also adds several new Perl module dependencies, Geo::Coder::Google, Geo::Coder::OSM, String::KeyboardDistance, and Text::Levenshtein::Damerau::XS.
  • The database update procedure has more steps than usual; please consult the upgrade section of the release notes.

The release is available on the Evergreen downloads page. Additional information, including a full list of new features, can be found in the release notes.

Unveiling the new Frictionless Data documentation portal / Open Knowledge Foundation

Have you used Frictionless Data documentation in the past and been confused or wanted more examples? Are you a brand new Frictionless Data user looking to get started learning? 

We invite you all to visit our new and improved documentation portal.

Thanks to a fund that the Open Knowledge Foundation was awarded from the Open Data Institute, we have completely reworked the guides of our Frictionless Data Framework website according to the suggestions from a cohort of users gathered during several feedback sessions throughout the months of February and March. 

We cannot stress enough how precious those feedback sessions have been to us. They were an excellent opportunity to connect with our users and reflect together with them on how to make all our guides more useful for current and future users. The enthusiasm and engagement that the community showed for the process was great to see and reminded us that the link with the community should be at the core of open source projects.

We were amazed by the amount of extremely useful inputs that we got. While we are still digesting some of the suggestions and working out how to best implement them, we have made many changes to make the documentation a smoother, Frictionless experience.

So what’s new?

A common theme from the feedback sessions was that it was sometimes difficult for novice users to understand the whole potential of the Frictionless specifications. To help make this clearer, we added a more detailed explanation, user examples and user stories to our Introduction. We also added some extra installation tips and a troubleshooting section to our Quick Start guide.

The users also suggested several code changes, like more realistic code examples, better explanations of functions, and the ability to run code examples in both the Command Line and Python. This last suggestion was prompted because most of the guides use a mix of Command Line and Python syntax, which was confusing to our users. We have clarified that by adding a switch in the code snippets that allows user to work with a pure Python Syntax or pure Command Line (when possible), as you can see here. We also put together an FAQ section based on questions that were often asked on our Discord chat. If you have suggestions for other common questions to add, let us know!

The documentation revamping process also included the publication of new tutorials. We worked on two new Frictionless tutorials, which are published under the Notebooks link in the navigation menu. While working on those, we got inspired by the feedback sessions and realised that it made sense to give our community the possibility to contribute to the project with some real life examples of Frictionless Data use. The user selection process has started and we hope to get the new tutorials online by the end of the month, so stay tuned!

What’s next?

Our commitment to continually improving our documentation is not over with this project coming to an end! Do you have suggestions for changes you would like to see in our documentation? Please reach out to us or open a pull request to contribute. Everyone is welcome to contribute! Learn how to do it here.

Thanks, thanks, thanks!

Once again, we are very grateful to the Open Data Institute for giving us the chance to focus on this documentation in order to improve it. We cannot thank enough all our users who took part in the feedback sessions. Your contributions were precious.

More about Frictionless Data

Frictionless Data is a set of specifications for data and metadata interoperability, accompanied by a collection of software libraries that implement these specifications, and a range of best practices for data management. The project is funded by the Sloan Foundation.

Cryptocurrency's Carbon Footprint / David Rosenthal

China’s bitcoin mines could derail carbon neutrality goals, study says and Bitcoin mining emissions in China will hit 130 million tonnes by 2024, the headlines say it all. Excusing this climate-destroying externality of Proof-of-Work blockchains requires a continuous flow of new misleading arguments. Below the fold I discuss one of the more recent novelties.

In Bitcoin and Ethereum Carbon Footprints – Part 2, Moritz Seibert claims the reason for mining is to get the mining reward:
Bitcoin transactions themselves don’t cause a lot of power usage. Getting the network to accept a transaction consumes almost no power, but having ASIC miners grind through the mathematical ether to solve valid blocks does. Miners are incentivized to do this because they are compensated for it. Presently, that compensation includes a block reward which is paid in bitcoin (6.25 BTC per block) as well as a miner fee (transaction fee). Transaction fees are denominated in fractional bitcoins and paid by the initiator of the transaction. Today, about 15% of total miners’ rewards are transactions fees, and about 85% are block rewards.
So, he argues, Bitcoin's current catastrophic carbon footprint doesn't matter because, as the reward decreases, so will the carbon footprint:
This also means that the power usage of the Bitcoin network won’t scale linearly with the number of transactions as the network becomes predominantly fee-based and less rewards-based (which causes a lot of power to the thrown at it in light of increasing BTC prices), and especially if those transactions take place on secondary layers. In other words, taking the ratio of “Bitcoin’s total power usage” to “Number of transactions” to calculate the “Power cost per transaction” falsely implies that all transactions hit the final settlement layer (they don’t) and disregards the fact that the final state of the Bitcoin base layer is a fee-based state which requires a very small fraction of Bitcoin’s overall power usage today (no more block rewards).
Seibert has some vague idea that there are implications of this not just for the carbon footprint but also for the security of the Bitcoin blockchain:
Going forward however, miners’ primary revenue source will change from block rewards to the fees paid for the processing of transactions, which don’t per se cause high carbon emissions. Bitcoin is set to become be a purely fee-based system (which may pose a risk to the security of the system itself if the overall hash rate declines, but that’s a topic for another article because a blockchain that is fully reliant on fees requires that BTCs are transacted with rather than held in Michael Saylor-style as HODLing leads to low BTC velocity, which does not contribute to security in a setup where fees are the only rewards for miners.)
Lets leave aside the stunning irresponsibility of arguing that it is acceptable to dump huge amounts of long-lasting greenhouse gas into the atmosphere now because you believe that in the future you will dump less. How realistic is the idea that decreasing the mining reward will decrease the carbon footprint?


The graph shows the history of the hash rate, which is a proxy for the carbon footprint. You can see the effect of the "halvening", when on May 11th 2020 the mining reward halved. There was a temporary drop, but the hash rate resumed its inexorable rise. This experiment shows that reducing the mining reward doesn't reduce the carbon footprint. So why does Seibert think that eliminating it will reduce the carbon footprint?

The answer appears to be that Seibert thinks the purpose of mining is to create new Bitcoins, that the reason for the vast expenditure of energy is to make the process of creating new coins secure, and that it has nothing to do with the security of transactions. This completely misunderstands the technology.

In The Economic Limits of Bitcoin and the Blockchain, Eric Budish examines the return on investment in two kinds of attacks on a blockchain like Bitcoin's. The simpler one is a 51% attack, in which an attacker controls the majority of the mining power. Budish explains what this allows the attacker to do:
An attacker could (i) spend Bitcoins, i.e., engage in a transaction in which he sends his Bitcoins to some merchant in exchange for goods or assets; then (ii) allow that transaction to be added to the public blockchain (i.e., the longest chain); and then subsequently (iii) remove that transaction from the public blockchain, by building an alternative longest chain, which he can do with certainty given his majority of computing power. The merchant, upon seeing the transaction added to the public blockchain in (ii), gives the attacker goods or assets in exchange for the Bitcoins, perhaps after an escrow period. But, when the attacker removes the transaction from the public blockchain in (iii), the merchant effectively loses his Bitcoins, allowing the attacker to “double spend” the coins elsewhere.
Such attacks are endemic among the smaller alt-coins; for example there were three successful attacks on Ethereum Classic in a single month last year. Clearly, Seibert's future "transaction only" Bitcoin must defend against them.

There are two ways to mount a 51% attack, from the outside or from the inside. An outside attack requires more mining power than the insiders are using, whereas an insider attack only needs a majority of the mining power to conspire. Bitcoin miners collaborate in "mining pools" to reduce volatility of their income, and for many years it would have taken only three or so pools to conspire for a successful attack. But assuming insiders are honest, outsiders must acquire more mining power than the insiders are using. Clearly, Bitcoin insiders are using so much mining power that this isn't feasible.

The point of mining isn't to create new Bitcoins. Mining is needed to make the process of adding a block to the chain, and thus adding a set of transactions to the chain, so expensive that it isn't worth it for an attacker to subvert the process. The cost, and thus in the case of Proof of Work the carbon footprint, is the whole point. As Budish wrote:
From a computer security perspective, the key thing to note ... is that the security of the blockchain is linear in the amount of expenditure on mining power, ... In contrast, in many other contexts investments in computer security yield convex returns (e.g., traditional uses of cryptography) — analogously to how a lock on a door increases the security of a house by more than the cost of the lock.
Lets consider the possible futures of a fee-based Bitcoin blockchain. It turns out that currently fee revenue is a smaller proportion of total miner revenue than Seibert claims. Here is the chart of total revenue (~$60M/day):

And here is the chart of fee revenue (~$5M/day):

Thus the split is about 8% fee, 92% reward:
  • If security stays the same, blocksize stays the same, fees must increase to keep the cost of a 51% attack high enough.

    The chart shows the average fee hovering around $20, so the average cost of a single transaction would be over $240. This might be a problem for Seibert's requirement that "BTCs are transacted with rather than held".
  • If blocksize stays the same, fees stay the same, security must decrease because the fees cannot cover the cost of enough hash power to deter a 51% attack. Similarly, in this case it would be 12 times cheaper to mount a 51% attack, which would greatly increase the risk of delivering anything in return for Bitcoin. It is already the case that users are advised to wait 6 blocks (about an hour) before treating a transaction as final. Waiting nearly half a day before finality would probably be a disincentive.
  • If fees stay the same, security stays the same, blocksize must increase to allow for enough transactions so that their fees cover the cost of enough hash power to deter a 51% attack. Since 2017 Bitcoin blocks have been effectively limited to around 2MB, and the blockchain is now over one-third of a Terabyte growing at over 25%/yr. Increasing the size limit to say 22MB would solve the long-term problem of a fee-based system at the cost of reducing miners income in the short term by reducing the scarcity value of a slot in a block. Doubling the effective size of the block caused a huge controversy in the Bitcoin community for precisely this short vs. long conflict, so a much larger increase would be even more controversial. Not to mention that the size of the blockchain a year from now would be 3 times bigger imposing additional storage costs on miners.

    That is just the supply side. On the demand side it is an open question as to whether there would be 12 times the current demand for transactions costing $20 and taking an hour which, at least in the US, must each be reported to the tax authorities.
Short vs. Long
None of these alternatives look attractive. But there's also a second type of attack in Budish's analysis, which he calls "sabotage". He quotes Rosenfeld:
In this section we will assume q < p [i.e., that the attacker does not have a majority]. Otherwise, all bets are off with the current Bitcoin protocol ... The honest miners, who no longer receive any rewards, would quit due to lack of incentive; this will make it even easier for the attacker to maintain his dominance. This will cause either the collapse of Bitcoin or a move to a modified protocol. As such, this attack is best seen as an attempt to destroy Bitcoin, motivated not by the desire to obtain Bitcoin value, but rather wishing to maintain entrenched economical systems or obtain speculative profits from holding a short position.
Short interest in Bitcoin is currently small relative to the total stock, but much larger relative to the circulating supply. Budish analyzes various sabotage attack cases, with a parameter attack representing the proportion of the Bitcoin value destroyed by the attack:
For example, if attack = 1, i.e., if the attack causes a total collapse of the value of Bitcoin, the attacker loses exactly as much in Bitcoin value as he gains from double spending; in effect, there is no chance to “double” spend after all. ... However, attack is something of a “pick your poison” parameter. If attack is small, then the system is vulnerable to the double-spending attack ... and the implicit transactions tax on economic activity using the blockchain has to be high. If attack is large, then a short time period of access to a large amount of computing power can sabotage the blockchain.
The current cryptocurrency bubble ensures that everyone is making enough paper profits from the golden eggs to deter them from killing the goose that lays them. But it is easy to create scenarios in which a rush for the exits might make killing the goose seem like the best way out.

Seibert's misunderstanding illustrates the fundamental problem with permissionless blockchains. As I wrote in A Note On Blockchains:
If joining the replica set of a permissionless blockchain is free, it will be vulnerable to Sybil attacks, in which an attacker creates many apparently independent replicas which are actually under his sole control. If creating and maintaining a replica is free, anyone can authorize any change they choose simply by creating enough Sybil replicas.

Defending against Sybil attacks requires that membership in a replica set be expensive.
There are many attempts to provide less environmentally damaging ways to make adding a block to a blockchain expensive, but attempts to make adding a block cheaper are self-defeating because they make the blockchain less secure.

There are two reasons why the primary use of a permissionless blockchain cannot be transactions as opposed to HODL-ing:
  • The lack of synchronization between the peers means that transactions must necessarily be slow.
  • The need to defend against Sybil attacks means either that transactions must necessarily be expensive, or that blocks must be impractically large.

Islandora Open Meeting: April 27, 2021 / Islandora

Islandora Open Meeting: April 27, 2021 agriffith Tue, 04/13/2021 - 16:11
Body

We are happy to announce the date of our next Open Meeting! Join us on April 27, 2021 any time between 10:00-2:00pm EDT. The Open Meetings are drop-in style sessions where users of all levels and abilities gather to ask questions, share use cases and get updates on Islandora. There will be experienced Islandora 8 users on hand to answer questions or give demos. We would love for your to join us any time during the 4-hour window, so feel free to pop by any time!

More details about the Open Meeting, and the Zoom link to join, are in this Google doc

Registration is not required. If you would like a calendar invite as a reminder, please let us know at community@islandora.ca.

Call for Proposals open for NDSA Digital Preservation 2021! / Digital Library Federation

NDSA Digital Preservation Banner

The NDSA is very pleased to announce the Call for Proposals is open for Digital Preservation 2021: Embracing Digitality (#DigiPres21) to be held ONLINE this year on November 4th, 2021 during World Digital Preservation Day.

Submissions from members and nonmembers alike are welcome, and you can learn more about session format options through the CFP. The deadline to submit proposals is Monday, May 17, at 11:59pm Eastern Time.

Digital Preservation 2021 (#DigiPres21) is held in partnership with our host organization, the Council on Library and Information Resources’ (CLIR) Digital Library Federation. Separate calls are being issued for CLIR+DLF’s 2021 events, the 2021 DLF Forum (November 1-3) and associated workshop series Learn@DLF (November 8-10). NDSA strives to create a safe, accessible, welcoming, and inclusive event, and adheres to DLF’s Code of Conduct.

We look forward to seeing you online on November 4th,

~ 2021 DigiPres Planning Committee

The post Call for Proposals open for NDSA Digital Preservation 2021! appeared first on DLF.

Dutch round table on next generation metadata: think bigger than NACO and WorldCat / HangingTogether

OCLC metadata discussion series

As part of the OCLC Research Discussion Series on Next Generation Metadata, this blog post reports back from the Dutch language round table discussion held on March 8, 2021. (A Dutch translation is available here).

Librarians – with backgrounds in metadata, library systems, reference work, national bibliography, and back-office processes – joined the session, representing a nice mix of academic and heritage institutions from the Netherlands and Belgium. The participants were engaged, candid, and thoughtful and this stimulated constructive knowledge exchange in a pleasant atmosphere.  

Mapping exercise

Map of next-gen metadata projects (Dutch session)

As in all the other round table discussions, participants started with taking stock of next generation metadata projects in their region or initiatives they were aware of elsewhere. The resulting map shows a strong representation of bibliographic and cultural heritage data-projects (see upper- and lower-left quadrants of the matrix). Several next-generation metadata research projects of the National Library of the Netherlands were listed and described, such as:

  • Automatic Metadata Generation, which identifies and tests tools to support subject tagging and cataloging of name authority records;
  • The Entity Finder, a tool being developed to help extract RDA entities (persons, works, expressions) from both authority and bibliographic records.

The Digital Heritage Reference Architecture (DERA) was developed as part of the national strategy for digital heritage in the Netherlands. It is a framework for managing and publishing heritage information as Linked Open Data (LOD), according to agreed practices and conventions. The Van Gogh Worldwide platform is an exemplar of the application of DERA – where metadata, relating to the painter’s art works residing at 17 different Dutch heritage institutions and private collectors, have been pulled from source systems by API.

A noteworthy initiative listed in the RIM/Scholarly Communications quadrant of the matrix is the NL-Open Knowledge Base, an initiative in the context of last year’s deal between Elsevier and the Dutch Research institutions, to jointly develop open science services based on their RIM systems, Elsevier’s databases and analytics solutions and the Dutch funding organizations’ databases. The envisaged Open Knowledge Base could potentially feed new applications – for example, a dashboard to monitor the achievement of the universities’ Sustainable Development Goals – and allow to significantly improve the analysis of research impact.

What is keeping us from moving forward?

Notwithstanding the state-of-the-art projects mentioned during the mapping exercise, the participants were impatient about the pace of the transition to the next generation of metadata. One participant experienced frustration with having to use multiple tools for a workflow that supports the transition, namely: integration of PIDs, local authorities, or links to and from external sources. Another participant noted that there is still a lot of efficiency to be gained in the value chain:

 “When we look at the supply chain, it is absurd to start from scratch because there is already so much data. When a book comes out on the market, it must already have been described. There should not be a need to start from scratch in the library.”

The group also wondered – with so many bibliographic datasets already published as Linked Open Data – what else needs to be done to interconnect them in meaningful ways?

The question of what is keeping us from moving forward dominated the discussion.

Trusting external data

One participant suggested that libraries are cautious about the data sources they link up with. Authority files are persistent and reliable data sources, which have yet to find their counterparts in the newly emerging linked data ecosystem. The lack of conventions around reliability and persistence might be a reason why libraries are hesitant entering into linked data partnerships or holding back from relying on external data – even from established sources, such as Wikidata. After all, linking to a data source is an indication of trust and recognition of data quality.

The conversation moved to data models: which linked data do you create yourself? How will you design it and link it up to other data? Some participants found there was still a lack of agreement and clarity about the meaning of key concepts such as a “work”. Others pointed out that defining the meaning of concepts used is exactly what linked data is about and this feature allows the co-existence of multiple ontologies – in other words, there is no need any longer to fix semantics in hard standards.

There is no unique semantic model. When you refer to data that has already been defined by others, you relinquish control over that piece of information, and that can be a mental barrier against doing linked data the proper way. It is much safer to store and manage all the data in your own silo. But the moment you can let go of that, the world can become much richer than you can ever achieve on your own.”

Thinking in terms of linked data

The conversation turned to the need to train cataloging staff. One participant thought it would be helpful to get started by learning to think in terms of linked data, to mentally practice building linked data graphs and play with different possible structures, as one does with LEGO bricks. The group agreed there is still too little understanding of the possibilities and of the consequences of practicing linked data.

We have to learn to see ourselves as publishers of metadata, so that others can find it – but we have no idea who the others are, we have to think even bigger than the Library of Congress’s NACO or WorldCat. We are no longer talking about the records we create, but about pieces of records that are unique, because a lot already comes from elsewhere. We have to wrap our minds around this and ask ourselves: What is our role in the bigger picture? This is very hard to do!

The group thought it was very important to start having that discussion within the library. But how exactly do you do that? It’s a big topic and it must be initiated by the library’s leadership team.

Not relevant for my library

One university library leader in the group reacted to this and said:

What strikes me is that the number of libraries faced with this challenge is shrinking. (…) [In my library] we hardly produce any metadata anymore. (…) If we look at what we still produce ourselves, it is about describing photos of student fraternities (…). It’s almost nothing anymore. Metadata has really become a topic for a small group of specialists.”

The group objected that this observation was overlooking the importance of the discovery needs of the communities libraries serve. However provocative this observation was, it reflects a reality that we need to acknowledge and at the same time put in perspective. Alas, there was no time for that, as the session was wrapping up. It had certainly been a conversation to be continued!

About the OCLC Research Discussion Series on Next Generation Metadata

In March 2021, OCLC Research conducted a discussion series focused on two reports: 

  1. Transitioning to the Next Generation of Metadata” 
  2. Transforming Metadata into Linked Data to Improve Digital Collection Discoverability: A CONTENTdm Pilot Project”. 

The round table discussions were held in different European languages and participants were able share their own experiences, get a better understanding of the topic area, and gain confidence in planning ahead. 

The Opening Plenary Session opened the forum for discussion and exploration and introduced the theme and its topics. Summaries of all eight round table discussions are published on the OCLC Research blog, Hanging Together. This is the last post and it is preceded by the posts reporting on the first English session, the Italian session, the second English session, the French session, the German session, the Spanish session and the third English session.

The Closing Plenary Session on April 13 will synthesize the different round table discussions. Registration is still open for this webinar: please join us

The post Dutch round table on next generation metadata: think bigger than NACO and WorldCat appeared first on Hanging Together.

2021 AMIA Cross-Pollinator: Justine Thomas / Digital Library Federation

Justine ThomasThe Association of Moving Image Archivists (AMIA) and DLF will be sending Justine Thomas to attend the 2021 virtual DLF/AMIA Hack Day and AMIA spring conference! As this year’s “cross-pollinator,” Justine will enrich both the Hack Day event and the AMIA conference, sharing a vision of the library world from her perspective.

About the Awardee

Justine Thomas (@JustineThomasM) is currently a Digital Programs Contractor at the National Museum of American History (NMAH) focusing on digital asset management and collections information support. Prior to graduating in 2019 with a Master’s in Museum Studies from the George Washington University, Justine worked at NMAH as a collections processing intern in the Archives Center and as a Public Programs Facilitator encouraging visitors to discuss American democracy and social justice issues.

 

About Hack Day and the Award

 

 

 

 

 

The seventh AMIA+DLF Hack Day (online April 1-15) will be a unique opportunity for practitioners and managers of digital audiovisual collections to join with developers and engineers to remotely collaborate to develop solutions for digital audiovisual preservation and access.

The goal of the AMIA + DLF Award is to bring “cross-pollinators”–developers and software engineers who can provide unique perspectives to moving image and sound archivists’ work with digital materials, share a vision of the library world from their perspective, and enrich the Hack Day event–to the conference.

Find out more about this year’s Hack Day activities here.

The post 2021 AMIA Cross-Pollinator: Justine Thomas appeared first on DLF.