Wikidata for Catalogers: Workshop Materials and Design

2019-12-04 cataloging, linked data

Recently, I conducted a workshop with catalogers at Penn State on how we might use our skills to create or enhance records about people on Wikidata. I’m sharing a zip file download of the slides with speaker notes and handouts¹ from the workshop for folks who want to learn or adapt these for local use.

This post provides details on the exercise I devised for the workshop. I could not include it in the handouts because it was very specific and we’d already done the work. Instead, I’ll walk through the process I used to create the exercises so you can replicate it too.

The Wikidata Exercise in Brief

The short form:

I used our university’s websites to identify a number of faculty who have published books and who have LC identifiers but do not have a Wikidata record.
I documented major facts about each person, along with URLs for citation.
I put together a QuickStatements batch job which would create basic records for each person.
I created handouts with additional statements to add for each person’s record.
Just before the workshop, I verified nobody’d been added in the meantime. I then ran the QuickStatements batch add. During the workshop, each person took a handout of 3 records and added additional statements. We walked through the first as a group, then each person did their own.

Finding and Preparing the Data

I used Penn State’s faculty pages to locate promising faculty members (based on field, position held, etc.). For example, an associate professor of linguistics is more likely to have published a book (and thus have an LC identifier) than one in mathematics. A woman still seems to be less likely to have a Wikidata record.

I made lists and checked each against Library of Congress and Wikidata. If the person existed in LC but didn’t exist in Wikidata (or had a very stub record), I would add them to the working list. I then gathered as much as I could find of the following information for each person:

preferred form of name
other forms of name used
a short description of their work
sex or gender (based on their sites)
occupation and/or field of work

I put this information into a QuickStatments-formatted batch file which I could use to create all the records at once. I also gathered the following information, which I put into a Word document:

Library of Congress Authority ID (P244)
VIAF ID (P214)
Goodreads ID (P2963) Employer (Penn State University)
Position Held
reference URL for citation

Sample Data

The example below uses a fake person, although the Wikidata properties and entites are all real. First, I put together a QuickStatements import record for the person. This person is a professor of sociology and gender studies. She appears to publish with her middle initial but does not use it in other documents. I tended to use field of work here because it often presented more variety than “ist” occupations.

CREATE
LAST|Len|"Mabel J. Fakeworthy"
LAST|Aen|"Mabel Fakeworthy"
LAST|Den|"professor of sociology and gender studies"
LAST|P31|Q5
LAST|P21|Q6581072
LAST|P101|Q21201
LAST|P101|Q1662673

Next, I put together about her identifiers and employment. I gave the class this data as both a physical handout they could use to track what they were doing but also as a digital document so they could copy-paste where appropriate (esp for identifiers):

Mabel J. Fakeworthy (Q999999999)
 Library of Congress Authority ID (P244): n201099999
 VIAF ID (P214): 170999999
 Goodreads ID (P2963): 999999
 Employer: Pennsylvania State University
   Position Held: associate professor (Q9344260)
   Source: https://altoona.psu.edu/person/mabel-fakeworthy

One issue which came up during the class was that Goodreads ID expects people who have one to have a statement of “occupation”. In a case like Dr. Fakeworthy’s I recommended using something like “sociologist.” In other cases, only “professor” seemed to fit (e.g. “professor of women’s studies” or “professor of French”).

Adding the Data

About an hour before the workshop, I double-checked Wikidata to ensure no records had been created for the people I was about to add. I then used QuickStatements to do a batch ingest and checked the outcome.

The workshop presentation (see PowerPoint in download package) introduced Wikidata, related it to work already done in cataloging, showed examples of how to add information, explained constraints, and demonstrated in which data could be queried. We then moved on to the hands-on portion.

We held the workshop in a computer lab, so everyone could participate in hands-on exercises. Each person had created a Wikidata account before the workshop.² I handed out the worksheets and shared a bitly with the digital copy, which could be used for copy-paste.

Each person got three Wikidata records to update. We worked through the first person together. I demonstrated how to add a property and check the result. We added identifiers to the record, then checked to be sure that the link it generated took us to the correct page.

Next, I demonstrated how to add a statement (employer) with a qualifier (position held) and a Reference URL. One issue which came up a few times was the inclination to paste the url into the wrong field when creating a reference. The issue made sense. One clicks “add reference,” a box appears, and one pastes in the URL. The correct process is to type “Reference URL,” select the field, and then paste the URL into the next box which appears. Otherwise, the web interface seemed fairly intuitive.

After folks had finished doing the other record updates on their own, I demonstrated a Wikidata query for people employed at Penn State and showed how everyone we’d worked on was now on there. I also showed how one might expand the query to see their LC identifiers, see which employees don’t have LC identifiers, or see fields of work represented.

Moving Forward

Most participants stayed late to discuss ways we might work this into the department’s work. One of the most promising options was the possibility of putting in data related to ECIP work. We might also review all folks affiliated with Penn State and see who doesn’t have library identifiers and where they might be added.

Coming up, I plan to hold a second workshop on creating entire records for people. I will likely gather similar kinds of information and have participants create the whole record in the browser. In the meantime, I’ve shared the materials included in the download package, such as a reference sheet for properties related to people, a worksheet for gathering information about a person one plans to add to Wikidata, etc.

Footnotes

Contents of the package are: Wikidata-Intro-Catalogers.pptx — slides with speaker notes. Wikidata_Fields_People.pdf — a reference document with properties names and examples. Provides guidance fields one might add to a person’s record. Wikidata_People_Worksheet.docx — a worksheet based on the fields above. Provides field names and blank spaces so one can gather information about a person either in a working document or in a printed copy before creating the record. Finding_Identifiers.pdf — a guide I created for finding identifiers in international library authority files (using VIAF as a central source) as well as places like GoodReads and ORCID. Last updated 2019-12-04. ↩︎
NB, one doesn’t need an account to edit Wikidata, IP address would be recorded instead. ↩︎