Thursday, December 20, 2007

Networking and Subject Headings

I got to meet some new people yesterday. All of them were technical service librarians/digital librarians. And twice I heard the same comment/question: "What do you think about folksomonies?"
I think that they're a fad? I think that people only use them because they have no idea that other subject searching is available? I think that LCSH needs to stop being a browsing list?
I got the feeling, though, that the idea of controlled language "death" is very scary for librarians.
And then TODAY, I see this:
University of Chicago Libraries

Which is EXACTLY what I've been thinking about! Do a search in the UC catalog now, and you get not only the list of things the catalog thinks you might want, but the ability to refine that search within the LC classification schema. We have the classification scheme already laid out for us, which roughly corresponds to the LCSH , and why not use it to help make LCSH more hierarchical? I've been gushing over AAT since as long as I can remember, because it takes the headings and makes them hierarchical. I can actually use the headings to help me find more headings! What a concept!
Now, obviously LCSH hasn't always been this way. The books are actually pretty useful when it comes to finding other headings that might be useful. But when everything went online, we really lost that ability. There aren't nearly as many cross-references anymore, or see alsos.
We need to reclaim that heritage, and make our LCSH work FOR us again, instead of against us, and I think that folksonomies will end up following. All people really need is a way to understand a system for them to use it.
I mean, if enough people adopt it, everyone knows what "h8r" means, right? Why not understand subject headings?

Monday, December 17, 2007

Functions of Cataloging

What do I believe a cataloging department is charged to do?

Call me a product of my environment (and people do!), but I do not think that new technologies are dragging cataloging departments away from their primary responsibilities. As an information organizer, I see my role in any place to be one of facilitation. Although cataloging departments are not traditionally known for their social and outgoing ways, cataloging itself is about serving the user. Of course, all departments of a library are about serving the user: we are a service industry. And I think that cataloging is no different. All of our systems, all of our rules and notations, are about serving the user and helping him or her to find what they need with as little trouble as possible.

Now, that is not to say that the library catalog is good at this. In fact, I think that in many ways its not that good at all. When even reference librarians complain about the Library of Congress Subject Headings, something is definitely wrong. When I would rather use Google than a library catalog, something has to be wrong. So we're at an intersection—the intersection between traditional cataloging tools, users, and emerging technologies. Because I do not think that our mandate as catalogers has changed; rather, I think that the user has always been at the center of what we do. It’s the technologies that are starting to fall in our laps that will really make a difference in the next few years, and how flexible we can be in response to those.

Some scholars say that libraries have already missed the boat. RDA (Resource Description and Access) is dead before it ever lived, because it is going to be too much like AACRII and not enough like Vannevar Bush’s Memex. Some of the same people have given in to quiet resignation over LCSH, which, because it of its basic opposition to clustering, should have died a long time ago. MARC is too clunky, authority records are useful but may not be widely known enough to make the leap from libraries to other users in the digital world who might find them useful, too.

Other problems also arise as we start to imagine how a library might better use the new resources at hand in the form of metadata schemas. One is that there aren’t that many people in the library world who are thoroughly familiar with all the available resources for digital organization. Unlike traditional cataloging, which has produced thousands of people versant in AACRII and MaRC, there are so many different metadata standards and technologies blooming all the time, that there is very little knowledge transfer in the typical mentor-mentee model. This leads to a lot of reinventing the wheel. Listservs have become the standby community for many of us (I subscribe to at least 7), but so many social networking utilities are clunky and not conducive to actual substantive conversation in the way that typical workshops and classroom environments offered with ease.

Another issue is that the Web has simply not developed along the lines of creating machine-readable documentation. Tim Berners-Lee said “The web has developed most rapidly as a medium of documents for people rather than data and information that can be processed automatically,” and he is absolutely right. Even now, some 4 years after his statement, the semantic web is still growing. Most users notice it only when they type in “real estate” into the google search engine and get maps of all the real estate in a given area, or when they search for a person and get their phone number. These are the beginnings of the semantic web, but so much is left to be done that it seems, at times, insurmountable, especially when one thinks of all the published resources that are out there, unused because they are not accessible via the web. The Principle of Least Effort is alive and well in the Interwebs, and it is not going away anytime soon. The mandate to librarians is to make the numerous diverse collections of materials into one coherent and searchable whole. Although daunting, the institutions that do this (like NCSU’s catalog) will find they have happier and better informed users (not to mention MORE users).

I do think that the role of catalogers is changing, though, even though the mandate remains the same. As we inevitably move away from books and move past other forms of media into more raw data, our means of making it available are changing.

Friday, December 14, 2007

Metadata Standards

I don't think of myself as a complete novice when it comes to metadata schemas. But I've never really taken the time to make a concerted effort to learn about them, either. I take them as they come. MaRC, EAD, Dublin Core--all of these I learned through practice, not a class.
However, I am taking a class right now! On metadata of all things! And while sometimes it's boring, many times it's....enlightening? Edifying? Anyway, it's pretty cool.
Something I've learned: even though I've always thought MaRC and EAD were the same thing, they actually aren't. MaRC is "discovery" metadata and EAD is "structural" metadata. Although they both facilitate use (which is why I thought they did the same thing), they come from different places. MaRC is for helping users search, and EAD is for...well, helping users search. But searching different aspects of the collection, not the subject of the collection itself.
I also learned about PREMIS (administrative metadata for preservation), and rights management metadata (also administrative). Having actual, bonified metadata standards is pretty cool.
When I was in grad school (lo these many 3 years ago), there really weren't any metadata "standards" per se. People were trying to pretend that there were standards, but no one was using them. Archivists weren't comfortable enough with digital anything, and librarians were still too invested in paper. My "digital archives" professor felt like she was banging her head against the wall when it came to getting archivists to start preserving digital materials. She would always, at every conference, stand up and tell archivists to start preserving their own born-digital records, in order to get experience in preserving other peoples' born-digital records, but I think that people thought she was just crazy. And she kind of was, but in a really great way.
Because now, not that far along in the future, people really ARE starting to preserve born-digital things, and to use the metadata standards that OCLC and ISO were creating back then. I used to feel kind of awash in fake-standards, but now, I feel very good about the tools that are out there, waiting to be used for creating records for born-digital items.
I even heard the term "digital archaeologist" yesterday. Yes, a person who used the preservation metadata to figure out what the digital object was, and try to bring it back to its former glory. This is particularly useful, these days, for things like 5-inch floppy disks and 8-bit files. At any rate, I love it. When are archaeology departments going to start offering digital classes? Get out your brushes!

Tuesday, December 11, 2007

Integrated Library Systems

There are several ILSes out there....and my library should know, since we're probably going to purchase one of them soon.
I think I've talked a little about Koha and Evergreen, the new kids on the block. They're both open source, which means that a couple of American companies have moved in to provide support for these free systems, for a nominal (!) fee of course. I've heard mixed things about the Koha support company, LibLime, mostly because they purport to support Evergreen, but in reality try to keep people from getting it. So all of their users use Koha. Equinox, which I believe was started by the people who originally built Evergreen, is the main supporter of Evergreen. There are a few other companies that support these open source systems, one in Canada, I think, one in France, a couple in Australia.
There are also the proprietary systems. From conversations I've had and overheard, people think it's a foregone conclusion that proprietary ILSes are going the way of the dodo. I don't know that i believe this. The systems themselves might eventually become obsolete, but these companies can just follow the path of LibLime or Equinox, and start their own open source support (LibLime certainly charges as much as a proprietary vendor).
The PVs (proprietary vendors) are: III (Innovative), Ex Libris, AGent (Auto-Graphics), Polaris, SirsiDynix, and Liberty3 (Softlink).
There are a lot of proprietary systems, although the numbers do seem to be dwindling. I was talking to my husband about the apparently anti-competitive behavior on the part of SirsiDynix in relation to Horizon, and he said "ah, but it's really just good business practice."
And he's right. All of these companies just do what they have to do in order to make money. Lots of library people (and the library vendors) like to talk about how they're all just librarians at heart, and they really want what's best for us. But in actuality, they're just a business, and they want to make money. That's their real mandate. All the rest is just fluff. And I think that the third-party vendors who are riding the wave of open source software will end up the same way--in it to make a buck.
But hey, this is America! That's why we're all here.

Tuesday, December 04, 2007

Learning Metadata

I've been going through "the literature", as the kids say nowadays, on metadata creation. Reading snippets of books published 3-4 years ago, reading blogs, reading articles, reading powerpoint presentations, watching webcasts (tangentially, have you noticed how many ways there are to disseminate information these days? whew!).
A question has been put to me "describe how cataloging departments can balance traditional cataloging functions with emerging technologies." Ok, that's not a question, really, but you get the idea.
And the answer is---I'm not sure. I actually think the question has a lot more to do with the idea of a "traditional" cataloging function than it does the emerging technologies. Traditional implies "old", doesn't it? Maybe "quaint". Something that's been around the block a few times, at least. But I don't think that, at core, cataloging functions are changing at all. And I certainly don't think it's about striking a balance. Because technologies are just tools.
I know that many catalogers (and computer scientists) think that the technology IS the function. MaRC is what we do in cataloging! EAD is what we do in archives now! But that's just not true. In reality, we SERVE. That's what we do.
As I've said before, I can be what many catalogers would call "lax" about the AACRII. The only reason I am, though, is because I don't see how it benefits users, in all cases. If a user needs to see something in order to understand the work better, then I give it to them, in any way I can. This can lead to bending or breaking of the rules, but so be it. I'm not a cataloging slave (that's reserved for my student-workers).
When we were in graduate school, I remember the students in the cataloging course that needed to know the exact right way to catalog everything. My cataloging professor told them over and over that the "rules" are not really rules at all; that there exist many different ways to catalog any given work/manifestation/item.
I think that the question I've been posed could benefit from this advice. Why do I need to balance my duties with technology? Don't I use my technologies to do my duties? Isn't that the point? I think maybe the question was designed to make me think about how cataloging is changing. And it is, I know it is. But I think that all these "monster" changes that are taking place are just semantics. If I start using Dspace instead of Horizon, the only thing that has changed is that I'm now focused on making electronic resources available rather than paper resources. And my goal is the same--give the user everything I possibly can to help them get their information. The balancing act is how do I serve, not how do I remember to put a colon after the title statement.

Monday, November 12, 2007

A short history of Horizon, part II

Turns out, by "tomorrow" I meant Monday. Psych!
At any rate, back to Horizon. So the stage was set for Horizon to die off, all the libraries to go stomping off in anger to open source systems or the other big library companies. The people who bought SirsiDynix obviously hadn't thought about this eventuality. A lot of librarians like to make the investors out to be complete jerks, who think that librarians are all lambs to the slaughter. I don't think that's exactly right. I think they just underestimated the market. Lots of libraries had been with Horizon for many, many years (since it was called Dynix Classic; we're talking pre-1995, at least). And library systems are a lot like cars--if you've had the same 1995 Corolla for the past 12 years, you're at least open to the idea of trading it in. And once the transmission goes out, you're definitely in the market. You might buy the new Corolla, but then again, Nissan just came out with the new Altima and it's pretty hot-looking.
This is how libraries reacted, initially. Horizon was dying, they started looking at Symphony, but then--to stretch the analogy a little further--the Toyota salesperson turned out to be really insincere and kind of pushy. The SD people also turned out to be insincere, to many people. There was obviously a backlash from the people whose contracts were nullified, and what did SD do? They issued a press release (you should really read it) where they basically said that everyone who didn't like the new system should just learn to live with it, because that's how things are.
Of course, almost everyone jumped up and said "I'm buying an Altima!" (or a Ford Fusion, or maybe even a VW Jetta). Once it became clear the libraries would just take their toys and go home, SD did an almost exact 180. Fast forward a year, and we're at the conference of users of Horizon (this wrapped over the weekend). Now the tune is completely different.
Upgrading to Symphony has become just that--an upgrade. Not a migration costing almost $100K, but an upgrade, which costs nothing, and a 40% discount on support if you sign a three-year contract. I do believe that someone has changed their minds about librarians. Of course, even with that kind of financial incentive, my library is still thinking of going to open source software. The tide has officially turned, and I don't think that SD was really the cause. Just a push in a more egalitarian direction.

Friday, November 09, 2007

A short history of Horizon, part 1

As the Technical Services librarian, my job entails not only cataloging, but also database administration. We use a system called Horizon, that is supported by a company called SirsiDynix. I haven't talked much about Horizon on this blog, but it's dying. A brief history:
Sirsi and Dynix were two separate entities, and decided to merge in 2005, I believe, or 2006. They brought their two systems into the relationship: Sirsi had Unicorn, and Dynix had Horizon. Most people agree that Horizon is the more modern system, with more bells and whistles and such.
In 2007 at some point, SD was bought by a company called Vista, who immediately decided that having two Integrated Library Systems (ILS) was a really bad idea for business, and it would be so much more cost-effective if everyone just used one system. This is sound business practice, actually, but the whole thing was a public relations nightmare. Horizon was about to be updated to Horizon 8.0, which was to be this totally new system, a rebuild. Customers had already signed contacts with SD to go to Horizon 8. But....Vista decided to pull the plug on Horizon, and instead of informing the customers who already had contracts privately, just put out a big press release about it so they could find out along with everyone else.
Vista also decided to scrap Horizon entirely, and to tout a "new" system called Symphony. Thing is, apparently it looks just like Unicorn, but with some minor cosmetics. It's generally agreed all over the library community that Unicorn is old. Old old old. Like, its core system was created in 1985 and has never been changed kind of old. Who wants a system like that? No one. Plus it's a turnkey system, which usually means that you have little to no control over how your system looks. Horizon, on the other hand, is one of the most customizable databases I've ever seen, and you don't have to call customer support in order to customize it. You can do it within the system if you want to. You can also create custom SQL queries, from any of the tables. It's really pretty awesome when you get down to the nuts and bolts of it.
But I digress. The other thing that we all found out when we were told about the death of Horizon was that going to Symphony would be a migration, not an upgrade. The difference? You have to pay for a migration, and don't pay for an upgrade. Um, what? we all said. We have to pay to go to the system that you're making us go to if we want to stay with your company? (Keep in mind that Horizon, 10 years ago, had an upfront cost of $75K.)
This pushed a lot of people to rethink their systems. The explosion of open source, of course, just happened to coincide with this announcement, and the other library vendors in the business started salivating when thinking of how SD was about to go under and that left 20,000 libraries looking for new systems. Jackpot.

(continued tomorrow)

Monday, November 05, 2007

Metadata for Manuscripts

There are lots of different kinds of metadata out there for library and archival materials. Some might say, too many kinds. There are different metadata schemes for every kind of material. There is Dublin Core, METS, MODS, even Marc for XML. All these metadatas are trying to solve an age-old problem with archival materials: they're too unique to have a standard applied to them. Books are easy; they all have title pages and authors and they're all wrapped up in neat packages that lend themselves to cataloging. Archival material, on the other hand, can be anything--and usually are. Paintings, bills of sale, letters, buttons, book manuscripts, musical instruments. The list goes on and on. And, especially with paper things like letters and bills, there is the problem of having so much paper on your hands that you cannot simply describe every single piece of paper as a single entity. So we group things, and then we catalog the groups. More or less. It's an inexact science.
At least, it was until the standards started getting made. Dublin Core and METS and MODS are all designed to help archivists catalog the things that are uncatalog-able. And now there's a new(ish) metadata standard: PREMIS.
PREMIS stands for something fancy, but at core, it is preservation metadata. Yes, now archivists can code not only information about the creator and the date and the content of a piece of paper, but also the physical condition, history of the physical condition, and any repairs that have been made to the paper.
Are we getting over-metadata'ed? Do we really need metadata, encoded into XML, that tells us if something is fragile and old? Can't we go to the paper itself and see what condition its in? Or this strictly for statistical purposes?
I've made myself a copy of the PREMIS manual--all 283 pages of it. So, we'll see what this is supposed to do/be. It can't be for the users of the material (which is generally who I consider metadata to be for), so it's not something that should be displayed to the user, probably. And a lot of archives don't even have the time to go through and process basic information, let alone preservation data. Maybe this is one of those things that is reserved strictly for libraries where everything is already processed and they have lots of extra time on their hands to go through and put in fuller, more complete information on each and every collection. Good on them, I say.

Friday, October 19, 2007

Archives Deathmatch

For years, there have been no real archival management systems built that are specifically for manuscript archivists. There are museum systems, and obviously library systems.
And in the last year, TWO archival management systems have been released: Archon and Archivists Toolkit.

Since I am on the Archivists' Toolkit listserv, I see all the posts that get made about that particular product. AT comes from a group of universities (The University of California, San Diego, New York University, and the Five Colleges), and is supposed to revolutionize the way that archivists look at data. And I think that it does. Although to me, it looks an awful lot like a cataloging system with a really easy user interface. But I would never tell an archivist that. Archivists are notoriously touchy about being compared to catalogers (except for me, since I'm both). I've actually written about this before, in the guise of the "uniqueness" argument.

But, at any rate, archivists are trying stuff out, and it's not going so well, methinks. Lots of buggy issues with Archivists Toolkit. Some people can never get their computer to let them install the software, some can never figure out how to publish the information that they put into the database, and I don't want to think about the people who may never be able to extract the information that they put in. But of course, Archivists Toolkit is just one of the products; the other is Archon.

Archon was developed by the U of Illinois Urbana-Champaign. Archon is web-based, instead of software based. I haven't played with it as much as AT, and since I'm not on the listserv I can't speak to its usefulness now that it's hit version 2.01. But from the descriptions---it just sounds so much better. Web-based? Automatic publication onto the web? Automatic search functionality? Yes, please! And they even appease archivists--"With Archon, there is no need to encode a finding aid, input a catalog record, or program a sytlesheet. " See that? NO NEED to catalog. Just like archivists like it.

We'll see which one wins. I actually put my money on AT, but only because everyone's touting it as a wonderful piece of wonderfullness that will revolutionize archival work. And they only say that because UC and NYU are involved. Not that I'm jaded or anything.

Tuesday, October 16, 2007

Cataloging in the Digital Age

I've never found much about how our cataloging has changed with the rise of electronic cataloging, and I've been looking around quite a bit for it in the past few months. Maybe because it's so intrinsic in our everyday lives, so no one bothers to write about it. Or maybe catalogers just don't have the time, and the researchers never catalog.
The point is this: we catalog a lot more fully now, then we did before 1990. I never understood why this was. Did catalogers just not care? Was the rise of the internet and search engines creating a demand for better records? Or was it something else?
I decided that it was definitely something else, and that is the catalog card. Not user perceptions or demands, but the nature of the catalog card.
In the years of the paper card catalog, the main entry "card" was usually more like 2-3 cards. A good catalog record is complex, and has subjects, and author, and title and varying title and numbers and extant and everything else we have in catalog records today. That can be a lot of cards. When LC made your cards for you and sent them to you, they also did the cataloging, in effect, because they were making the cards. However, when OCLC came online in the 1970s, suddenly it was possible to have a much longer record. The computer didn't care how long your record was. But--here's the catch--you still had a card catalog, and when you "cataloged" in OCLC, you still ordered your cards from LC.
I have ranted and raved over the seeming incompetence of the cataloging librarian who came before me, who was in the position for 30 years before retiring, and who, in her "wisdom", deleted content note fields. 505? Gone! Anything beyond information on indexes and bibliographies? Gone! This is made me angry to no end, because of course nowadays we positively love for our books to have content fields; the users demand it!
And one day it dawned on me: she was deleting these fields because they ate up space--on a paper card. Catalogers, in the 1970s, 80s and some of the 90s, were still ordering paper from LC, and no one wants a 6-page card. So what do you get rid of? The fat. Anything in the 5xx fields was fair game. Getting a book's record down to the absolute minimum of cards was the goal. Unfortunately, the mentality (without the reason) kept on for a long time after it was unnecessary. And some of it was never fixed. Find a record in OCLC for a book published before 1980, that isn't really important enough to have been reprinted. The record is woefully inadequate by today's standards. Why? Because that was the most efficient record that could be created. Efficiency of SPACE overruled efficiency of SEARCH. Which also explains why searching an online catalog used to be so very annoying. A catalog full of small records? How can you find anything when you're limited to only 3 subject headings and no content field? What kind of keyword searching is THAT?
Today, of course, its common for records to be very long--a table of contents field, a summary field, 6 or 7 or 10 subject heading fields. Space? Who cares about space? Space doesn't exist.

Thursday, May 17, 2007

The Rise of the "Web 2.0"

I hate to start talking about the "generation gap", but sometimes it becomes increasingly obvious. I'm not an undergrad anymore, but I still use the tools that lots of undergrads use: blogs, facebook, text messaging, online document handlers, etc etc. I like technology, and I like knowing about the newest things to come out and how people are using them.
But many people insist on using software to do things that could be done so much more effortlessly through the web. They call it "web 2.0" and seem not to understand that it's the same thing as it always was: social interaction. People find the path of easiest communication and then use it until something even more useable comes along.
Why use Blackboard technology when you could be using blogs? Forget emails; use RSS feeds, or even pinging products to send out text messages. This stuff isn't hard; in fact, it's ridiculously easy, because people are thinking of things all the time. Why use a paper or email survey when you can just put a poll into your website that generates automatic results that users can see? Or why use Java-enabled chat rooms when you can use an embedded widget?
The opportunites that are out there, and are free, are amazing, and yet I feel like many people aren't seeing that these are awesome solutions. I blame Windows operating systems on this, because people believe firmly that software is designed to crash. It really isn't, you know. It's supposed to be designed NOT to do that, but Windows probably WAS designed to mess up a lot, so you'd continue to buy the new, "better" version (Java, anyone?).
At any rate, I think it's kind of sad that people feel like they're trapped in boxes of software and ownership, when the web is exploding with things that make ownership and licenses basically irrelevant.

Monday, April 30, 2007

The Problem with Blogs by Catalogers/Techies

I really like reading blogs by other catalogers and by technology people. They can be informative, and insightful, and give me links to more good things and more NEW things.
However, they also tend to use tech terms and cataloging terms to the extreme. This is related to the problem I have with FRBR (the "new" cataloging standard). If I am not an experienced cataloger, or a tech guru, some, nay, MANY of these blogs are completely unintelligible to me. Thus, they are useless! I think that one reason Lawrence Lessig has enjoyed such ridiculous success with his articles, and blogs, and books, is that he is accessible to almost everyone. He doesn't dive into minutiae; he keeps it general and smart and, most importantly, relevant to a broad audience. Jim over in your cataloging department who has "the coolest blog about cataloging!", cites people you've never heard of, terms you've never used, and programs you would never want.
Now, I'm a fairly experienced cataloger and organizer. I know what they're getting at, most of the time. But sometimes I feel like a kindergartener, and I don't think that is my fault.
I'm going to use the example of FRBR again. I use AACRII, and am comfortable with the terminology of AACRII. I'm even somewhat experienced in FRBR terminology, since my cataloging professor has been part of the movement to change our terminology to encompass all kinds of materials (books, web sites, journals, antelopes, etc).
But when I go to almost any website that talks about FRBR, I'm lost almost immediately. Who is that person they're touting? What article? What the hell do you mean by manifestation? How do I apply that very abstract term to my own concrete stack of books and cds and multi-volume treatises that are sitting on my desk? Does anyone know? Does anyone really care?
I know how whiny this all sounds...gee, what a freaking complainer, why not just find the defintions of terms for FRBR and get it figured out?

Well, I would, if I could find anything that acts as a cross-walk between FRBR and AACRII. As it stands, I merely tread water and do things all the old-fashioned way. Which for a 27-year-old database administrator/cataloger, is saying something.
"Wicked people never have time for reading. It's one of the reasons for their wickedness." —Lemony Snicket, The Penultimate Peril.