Weighing Fields in Library Catalog Search, or, The Hillbilly Elegy Problem
In our work on the library’s catalog, we’ve got a habit of naming known issues after an item that exemplifies them particularly well. The “Hillbilly Elegy Problem” is the case where a book appears in search results, including title search results, after one or more books about it. While those books may also be of interest to a patron, or even what they’re actually looking for, we prioritize the extremely common user story of “when I search for a book by its title, I expect to see books with that title first."1
This post is less groundbreaking than it is view-from-the-ground about what it looks like to work on a catalog’s index. It doesn’t assume that you understand Blacklight, Traject, or Solr but does reference MARC fields without defining them.
Ranking Search Results
We create our catalog’s index using a piece of sofware called Traject. We use Traject to group2 MARC fields into Solr fields that support different kinds of search (keyword, title, subject, author, identifier, series, publisher, etc). So, for example, our subject fields (MARC 650, 600, 651, etc) are all grouped for subject search. Sometimes, we index parts of fields. One can tell traject to output just the 505a and 505t, along with subfields “t” from other fields, into a particular field and then add the resulting field to a title search.
Next, we assign weights to each (group) field. Subjects weigh more than series titles in the keyword index (but we also have a specific series title field in advanced search). Within an author search, the 1xx field weighs slightly more than the 7xx does. Similarly, in title, the 505s weigh significantly less than the 245. Of all the weights, the standalone 245 (title) is the highest.
Records appear in search results based on a combination of:
- what fields the search terms appear in
- if they appear in the same field
- their proximity to each other when they appear in the same field
- how many fields they appear in3
Normally, looking at a search term, a record, and our configuration can give me a good idea of why the record appeared and even why it appeared in relation to the other records. In more confusing cases, actually looking at Solr may provide some insight.
The Hillbilly Elegy Problem
The problem we were experiencing was that Appalachian reckoning : a region responds to Hillbilly Elegy was appearing higher in the title and keyword search results for
hillbilly elegy than the record for Hillbilly Elegy itself. Because we have both items in print and as ebooks, I’m going to compare the print MARC records.6
Where do the words appear in each record and how is that field weighted? (Because the words
hillbilly elegy occur next to each other at least once in each field, one doesn’t also have take proximity into account.) I’ll reproduce MARC below using $a syntax to indicate subfields but am leaving out indicators. I will bold the term where it occurs.
Fields in Appalachian Reckoning
- 245: $a Appalachian reckoning : $b a region responds to Hillbilly Elegy (original weight: 100000)
- 600: $a Vance, J. D. $t Hillbilly elegy. (original weight: 1000)
- 600: $a Vance, James David, $d 1984- $t| Hillbilly elegy. $2 idsbb (original weight: 1000)
- 505: $a Part I. Considering Hillbilly Elegy. Interrogating. Hillbilly elitism / T.R.C. Hutton – Social capital / Jeff Mann – Once upon a time in “Trumpalachia”: Hillbilly Elegy, personal choice, and the blame game / Dwight B. Billings – Stereotypes on the syllabus: exploring Hillbilly Elegy’s use as an instructional text at colleges and universities / Elizabeth Catte – Benham, Kentucky, coal miner / Wise County, Virginia, landscape / Theresa Burriss – Panning for gold: A reflection of life from Appalachia / Ricardo Nazario y Colón – Will the real hillbilly please stand up? Urban Appalachian migration and culture seen through the lens of Hillbilly Elegy / Roger Guy – What Hillbilly Elegy reveals about race in twenty-first-century America / Lisa R. Pruitt – Prisons are not innovation / Lou Murrey – Down and out in Middletown and Jackson: drugs, dependency, and decline in J.D. Vance’s Capitalist Realism / Travis Linnemann and Corina Medley. Responding. Keep your “elegy": the Appalachia I know is very much alive / Ivy Brashear – HE said/SHE said / Crystal Good – The hillbilly miracle and the fall / Michael E. Maloney – Elegies / Dana Wildsmith – In defense of J.D. Vance / Kelli Hansel Haywood – It’s crazy around here, I don’t know what to do about It, and I’m just a kid / Allen Johnson – “Falling in love, " Balsam Bald, the Blue Ridge Parkway, 1982 / Danielle Dulken – Black hillbillies have no time for elegies / William H. Turner. (original weight: 50)
- 505: $a Part II. Beyond Hillbilly Elegy. Nothing familiar / Jesse Graves – History / Jesse Graves – Tether and plow / Jesse Graves – On and on: Appalachian accent and academic power / Meredith McCarroll – Olivia’s ninth birthday party / Rebecca Kiger – Kentucky, coming and going / Kirstin L. Squint – Resistance, or our most worthy habits / Richard Hague – Notes on a mountain man / Jeremy B. Jones – These stories sustain me: the wyrd-ness of my Appalachia / Edward Karshner – Watch children / Luke Travis – The mower-1933 / Robert Morgan – Consolidate and salvage / Chelsea Jack – How Appalachian I am / Robert Gipe – Aunt Rita along the King Coal Highway, Mingo County, West Virginia / Roger May – Holler / Keith S. Wilson – Loving to fool with things / Rachel Wise – Antebellum cookbook / Kelly Norman Ellis – How to make cornbread, or thoughts on being an Appalachian from Pennsylvania who calls Virginia home but now lives in Georgia / Jim Minick – Tonglen for my Mother / Linda Parsons – Olivia at the intersection / Meg Wilson – Appalachian apophenia, or the psychogeography of home / Jodie Childers – Canary dirge / Dale Marie Prenatt – Poet, priest, and “poor white trash” / Elizabeth Hadaway. (original weight: 50)
- 520: $a “With hundreds of thousands of copies sold, a Ron Howard movie in the works, and the rise of its author as a media personality, J.D. Vance’s Hillbilly Elegy: A Memoir of a Family and Culture in Crisis has defined Appalachia for much of the nation. What about Hillbilly Elegy accounts for this explosion of interest during this period of political turmoil? Why have its ideas raised so much controversy? And how can debates about the book catalyze new, more inclusive political agendas for the region’s future? Appalachian Reckoning is a retort, at turns rigorous, critical, angry, and hopeful, to the long shadow Hillbilly Elegy has cast over the region and its imagining. But it also moves beyond Hillbilly Elegy to allow Appalachians from varied backgrounds to tell their own diverse and complex stories through an imaginative blend of scholarship, prose, poetry, and photography. The essays and creative work collected in Appalachian Reckoning provide a deeply personal portrait of a place that is at once culturally rich and economically distressed, unique and typically American. Complicating simplistic visions that associate the region almost exclusively with death and decay, Appalachian Reckoning makes clear Appalachia’s intellectual vitality, spiritual richness, and progressive possibilities."–Back cover. (original weight: 50)
- 520: $a In Hillbilly elegy, J.D. Vance described how his family moved from poverty to an upwardly mobile clan while navigating the collective demons of the past. The book has come to define Appalachia for much of the nation. This collection of essays is a retort, at turns rigorous, critical, angry, and hopeful, to the long shadow cast over the region and its imagining. But it also moves beyond Vance’s book to allow Appalachians to tell their own diverse and complex stories of a place that is at once culturally rich and economically distressed, unique and typically American. – adapted from back cover (original weight: 50)
So we have one occurrence in a field that weighs 100000, one each in two fields that weigh 1000, and one or more in four in fields that weigh 50.
Fields in Hillbilly Elegy
- 245: $a Hillbilly elegy : $b a memoir of a family and culture in crisis (original weight: 100000)
- 520: $a “Hillbilly Elegy is a passionate and personal analysis of a culture in crisis–that of poor, white Americans. The disintegration of this group, a process that has been slowly occurring now for more than forty years, has been reported with growing frequency and alarm but has never before been written about as searingly from the inside. In Hillbilly Elegy, J.D. Vance tells the true story of what a social, regional, and class decline feels like when you were born with it hanging around your neck."–Page 4 of cover. (original weight: 50)
So in this, we one match in field which is weighted at 100000 and two matches in a single field weighted at 50.
Even without doing complex math based on proximity and number of matches, etc., it’s pretty clear why Appalachian Reckoning returned first. But as this also demonstrates, the characteristics which cause a book about the book to rank higher are due entirely to a) the book and chapter titling decisions made in the creation of that book, b) the cataloging decisions (e.g. is the table of contents present? what about the abstracts?), and c) our decision to purchase the book. None of which allow us to determine whether this should break one of our top conventions for search results.
What we ended up doing was taking the 245a field, which we were already indexing for other uses, and adding it to our weighting separately from the 245ab. We then weighted it higher, at “150000. What this means is that now Hillbilly Elegy gets the following weights for its title.
- 245a: 150000
- 245ab: 100000 (because it’s also in the 245ab)
So its combined weight jumps to 250000 plus the 50, while Appalachian Reckoning retains its weight of 100000 because Hillbilly Elegy is in the 245ab. Now, a full (245ab) title match and a few subject fields which would otherwise push it ahead (plus misc fields which might also tip a balance) are overshadowed by the 245a, if it exists. We also continue to apply the 245ab weighting, so it gets more than double the weight if it’s in the “a.”
When tested on this book and a few others which have similar title + subject weights outweighing the original, it now shared them in the order expected–the books about the book are still the next-highest ranked, so they’re right there for someone who wasn’t looking for the work whose title they’re searching (and finding the original book first is unlikely to make them think we don’t have books about it–whereas we had at least one report of the inverse occurring).
There will still be outliers, particularly when both have the same title value in the 245a, but this didn’t make those cases any worse.
As I said in the introduction, this isn’t anything particularly groundbreaking. I hope, however, that it’s an interesting example of troubleshooting and the iterative development of a library catalog.
Because I’m a librarian, let me close with some reading suggestions. First, let me suggest that any consideration you give the actual book Hillbilly Elegy be done with an awareness of the kind of person Vance has shown himself to be (even years ago)—give it the same consideration you’d give a book by Marjorie Taylor Greene or Mitch McConnell.7 Scholars of Appalachia from Appalachia roundly criticized him at the time of the book’s publication, when he appeared at the Appalachian Studies Association’s Annual Meeting, and ever since.
If you’re interested in reading more nuanced books about the region/responses to it, consider Elizabeth Catte’s What You Are Getting Wrong About Appalachia (or interview with her here) and check out Belt Publishing generally. While I haven’t read Appalachian Reckoning, it is written by dozens of regionally, racially, and sexually/gender diverse Appalachian authors, which means you’re going to be getting a lot more stories, ideas, and perspectives than you would from Vance.
One might ask “but shouldn’t a book about a book rank higher if it has all the characteristics which would rank it higher?” I’ll get to that. ↩︎
There are a few search fields that are standalone vs grouped, but even things like the author search groups multiple 1xx and 7xx fields together. Display fields are indexed separately and much more granularly. ↩︎
I believe this is particularly the case if these are weighted fields like something appearing in both a 650 and 505t, whereas there’s less impact if they appear several times in fields that just go into the general keyword index (like a 583 and 586). ↩︎
Stemming would match both “quilt” and “quilting” for “quilts,” and vice-versa, hence the importance of ranking “quilts” slightly higher. ↩︎
Our current configuration is treating searches of up to 5 words/terms like an AND search, then up to 9 words/phrases as an
AND -1search, so that we also return results if we find 5 words of your 6 word search in a record, 6 of your 7, etc. Beyond that, we return records where 90% or more of your terms were found in the record. This can be overridden with Boolean searching and records containing all words/phrases will always rank higher. (we called some early challenges with this “The Because of Winn-Dixie Problem”) ↩︎
So was an ebook record for the same book which ranks equal to but sorts above the print in a keyword search, though it ranks below in a title search (because the many 505t fields in the print title boosts it way up). ↩︎
And don’t give me “but they could never write something so eloquent” — one’s ability to write well is not dependent on one’s worldview. Similarly, one’s choices of words on Twitter, etc., don’t preclude one’s ability to write thoughtfully in other settings. Very few monstrous people are monstrous at all times. ↩︎