Tuesday, April 08, 2008

Where the MARC meets the XML

TEI (Text Encoding Initiative) is not new. At all. It was born in 1987, although wasn't put into XML format until this century, I believe. Mostly, English nerds love it. It allows them to find patterns in literature that's been encoded, and differences across editions and versions. It warms their nerdy little hearts, and at the same time allows for the writing of more literary criticism than ever before possible. This also fuels the academic cataloging departments, of course, so I'm not complaining. Much.

Anyway, for this big project we're working on, we're taking scholars and having them do TEI markup on printed works and handwritten manuscripts of all kinds, and then everything will be searchable by keyword and etc etc. I think that tomorrow I'm going to write about TEI, and put in links and things, because I think that not enough librarians know a lot about TEI. But today, I'm going to talk about the relationship of TEI to MARC.

Yes. They have a relationship.

I noticed it right away while we were talking about the capabilities of TEI. The thing about it is--if encoded correctly, there's no need for a cataloging record. The TEI will have captured title, author, format, genre, extant, publisher information, year published, translators, as well as chapter and section titles. The search mechanisms then pull that information out directly from the digital document.
The caveat of course is that the document has to be digital. But think about it--you could easily have a catalog that pulls not only MARC records, but also TEI document information, and have both types of things in one catalog. There's not even really a need for a search portal--you could write a fairly simple program to pull information out of a TEI document and automatically generate a MARC record with it, and then import that record into your catalog. You could even do an 856 and link the whole thing together, and you could do it all with minimal effort on the part of the cataloger.

If there were other librarians at my desk with me right now, they'd all be screaming about subject headings, and yeah, this model doesn't do a thing for subject headings, but we don't create subject headings for manuscripts, anyway, really. It's too hard. And for printed books--subject analysis is a heck of a lot less of a time commitment than doing an entire record from scratch.

And when I think about it, MODS and EAD are the same way. Terry Reese at Oregon State has written a conversion program for EAD to MARC21, and I know that MarcEdit is the perfect platform for such things...but it's also not terribly intuitive all the time, and it only deals with EAD. We're fast approaching a time when having programs for conversion of other XML formats will be really, really useful...Why hasn't a little program been written yet? It's times like these that I wish I were a programmer. Unfortunately for me (but fortunately for humanity), I am not.

I just feel like we've given up on MARC, as a profession, when in reality, it's still pretty useful for parsing information and making it searchable. And these other metadata schemas all still take the same stuff out of the original and put it into machine-readable format...why not use the structures we have in place (like our ILSes) and put them to work?

No comments:

"Wicked people never have time for reading. It's one of the reasons for their wickedness." —Lemony Snicket, The Penultimate Peril.