Open source software in libraries

This short essay, originally prepared for a presentation at the 2001 American Library Association Annual Conference in San Francisco, describes my personal experience with open source software and enumerates a number of ways open source software can be used in libraries to provide better library service. The essay does this in three ways. First, it reflects on the similarities of gift cultures, open source software development, and librarianship. Second, it describes the present evolution of email.cgi, an open source software application I support, and MyLibrary@NCState, a portal application designed for libraries. Third, it summarizes very recent comments from the OSS4Lib mailing list calling for more proactive activities in the library community.

Gift Cultures Revisited

I originally got this gig because I wrote an essay entitled "Gift Cultures, Librarianship, and Open Source Software Development". See: http://www.infomotions.com/musings/gift-cultures/

Since then the essay has been published as a review of Eric S. Raymond's The Cathedral & the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary in Information Technology and Libraries 19(2) March 2000.

Summarizing from the article I drew four (4) similarities between open source software development and librarianship:

Both open source software development and librarianship put a premium on open access. Both camps hope the shared information will be used to improve our place in the world. Just as Jefferson's informed public is a necessity for democracy, open source software is a necessary for the improvement of computer applications.
Human interactions are a necessary part of the mix. Open source development requires an understanding of the problem the computer application is trying to solve, and the maintainer must assimilate patches with the application. Librarians understand that information seeking behavior is a human process. While databases and many "digital libraries" house information, these collections are really "data stores" until the data is given value and put to use whereby the stores become libraries.
Third, it has been stated that open source development will remove the necessity for programmers. Yet Raymond posits that no such thing will happen. If anything, there will an increased need for programmers. Similarly, many librarians feared the advent of the Web because they believed their jobs would be in jeopardy. Ironically, librarianship is flowering under new rubrics such as information architects and knowledge managers.
Both institutions use peer-review, a process where "given enough eyeballs all bugs are shallow".

Since the article was published more than a year ago I have been politely labeled an idealist and a utopian. This is not the first time I have been labeled such, and it probably won't be the last, but for the most part I still believe my original propositions. In general, librarianship is an honorable profession and people are drawn to the profession because of a sense of purpose, a desire to provide service to the community. While many open source software developers create applications to solve local, real-world problems, their efforts are shared because they desire to give back to the community. Do you remember the Internet saying from about ten years ago? "Give back to the 'Net." That saying lives on in open source software and is manifested in the principles of librarianship.

Blake Carver, editor of LIS News, modified Ranganathan's Rules for open sources software. I think it makes a lot of sense:

Software is for use
Every computer its users
Every reader his source code
Save the time of the user
A system is a growing organism.

Email.cgi and MyLibrary@NCState

Email.cgi

I have been giving away my software ever since Steve Cisler welcomed me into the Apple Library Of Tomorrow (ALOT) folds in the very late 1980's. Through my associations with Steve and ALOT I came to write a book about Macintosh-based HTTP servers as well as an AppleScript-based CGI script called email.cgi: http://www.infomotions.com/email-cgi/ .

This simple little script was originally developed for two purposes. First and foremost it was intended to demonstrate how to write an AppleScript CGI application. Second, it was intended to fill a gap in the Web browsers of the time, namely the inability of MacWeb to support mailto URL's. Since then the script has evolved into an application taking the contents of an HTML form, formatting them, and sending the result to one or more email addresses. It works very much like a C program called cgiemail. As TCP utilities have evolved over the years so has email.cgi, and to this date I still get requests for technical support from all over the world, but almost invariably the messages start out something like this. "Thank you so very much for email.cgi. It is a wonderful program, but..." That's okay. The program works and it has helped many people in many ways.

MyLibrary@NCState

MyLibrary@NCState is a more formal open source project supported by the NCSU Libraries where I am employed. This portal application grew out of a set of focus group interviews where faculty of the NC State University said they were suffering from "information overload." In late 1997, when these interviews were taking place, services like My Yahoo, My Excite, My Netscape, and My DejaNews were making their initial appearance. In the Digital Library Initiatives Department where I work Keith Morgan, Doris Sigl, and I thought a similar application based on library content (bibliographic databases, electronic journals, and Internet resources) organized by subjects (disciplines) might prove to be a possible solution to the information overload problem. By prescribing sets of resources to specific groups of people we (the Libraries) could offer focused content as well as provide access to the complete world of available information. For more information about MyLibrary@NCState see: http://dewey.library.nd.edu/mylibrary/ .

Since I relinquished my copyrights to the University and the software has been distributed under the GNU Public License the software has been downloaded about 350 times, mostly from academic libraries. The specific number of active developers is unknown, but many institutions who have downloaded the software have used it as a model for their own purposes. In most cases these institutions have taken the system's database structure and experimented with various interfaces and alternative services. Such institutions include, but are not limited to the University of Michigan, the California Digital Library, Wheaton College, Los Alamos Laboratory, Lund University (Sweden), the University of Cattaneo (Italy), and the University of New Brunswick. Numerous presentations have been given about MyLibrary@NCState including venues such as Harvard University, Oxford University, the Alberta Library, the Canadian Library Association, the ACRL Annual Meeting, and ASIS.

As I see it, there are three or four impediments restricting greater success of the project: system I/O, database restructuring, and technical expertise. MyLibrary@NCState is essentially a database application with a Web front-end. In order to distribute content data must be saved in the database. The question then is, "How will the data be entered?" Right now it must be done by content providers (librarians), but the effort is tedious and as the number of bibliographic databases and electronic journals grow so does the tedium. Lately I have been experimenting with the use of RDF as an import/export mechanism. By relying on some sort of XML standard the system will be able to divorce itself from any particular database application such as an OPAC and the system will be more able to share its data with other portal applications such as uPortal, My Netscape, or O'Reilly's Meerkat through something like RSS. Yet, the problem still remains, "Who is going to do the work?" This is a staffing issue, not necessarily a technical one.

In order to facilitate the needs a wider audience, the underlying database needs to be restructured. For example, the databases contains tables for bibliographic databases, electronic journals, and "reference shelf" items. Each of the items in these tables are classified using a set of controlled vocabulary terms called disciplines. Many institutions want to create alternative data types such as images, associations, or Internet resources. Presently, do accomplish this task oodles of code must be duplicated bloating the underlying Perl module. Instead a new table needs to be created to contain a new controlled vocabulary called "formats". Once this table is created all the information resources could be collapsed into a single table and classified with the new controlled vocabulary as well as the disciplines. Furthermore, a third controlled vocabulary -- intended audience -- could be created so the resources could be classified even further. Given such a structure the system could be more exact when it comes to initially prescribing resources and allowing users to customize their selections. Again, the real problem here is not necessarily technical but intellectual. Librarians make judgments about resources in terms of the resource's aboutness, intended audience, and format all the time but rarely on such a large scale, systematic basis. Our present cataloging methods do not accommodate this sort of analysis, and how will such analysis get institutionalized in our libraries?

The comparitavly low level of technical expertise in libraries is also a barrier to wider acceptence of the system. MyLibrary@NCState runs. It doesn't crash nor hang. It does not output garbage data. It works as advertised, but to install the program initially requires technical expertise beyond the scope of most libraries. It requires the installation of a database program. MySQL is the current favorite, but there are all sort of things that can go wrong with a MySQL installation. Similarly, MyLibrary@NCState is written in Perl. Installing Perl from source usually requires answering a host of questions about your computer's environment, and in all nine or ten years of compiling Perl I still don't know what some of those questions mean and I simply go with the defaults. Then there are all the Perl modules MyLibrary@NCState requires. They are a real pain to install, and unless you have done these sorts of installs before the process can be quite overwhelming. In short, getting MyLibrary@NCState installed is not like the Microsoft wizard process; you have to know a lot about your host computer before you can even get it up and running and most libraries do not employ enough people with this sort of expertise to make the process comfortable. Consequently, actual development of MyLibrary@NCState code is minimal beyond the NCSU Libraries.

State of OSSNLibraries

This brings me to the state of open source software in libraries today. Daniel Chudnov has been the profession's evangelicalist, the original author of jake (jointly administered knowledge environment), and the maintainer of the www.oss4lib.org domain as well as its mailing list. Dan has done a lot to raise the awareness of open source software in libraries. To that end he and Gillian Mayman maintain a list of open source system projects. These projects include lots o' software designed for libraries such as but not limited to:

document delivery applications (Prospero by Eric Schnell)
Z39.50 clients and servers (Yaz and SimpleServer by Sebastian Hammer, Zeta Perl by Rocco Carbone, and JZKit by Knowledge Integration, Ltd.)
systems to manage collections (Catalog by Senga, Greenstone by Ian H. Witten, et al., ROADS funded by JISC via the eLib Programme, OSCR by Wally Grotophorst)
MARC record readers and writers (MARC.pm by Chuck Bearden, et al., m[n]m by Robert McDonald, et al., and XMLMARC by Lane Medical Library)
integrated library systems (Avanti by Peter Schlumpf, Koha by Rosalie Blake and Rachel Hamilton-Williams, and OSDLSP by Jeremy Frumkin and Art Rhyno)
systems to read and write bibliographies (bib2html by Stephane Galland, bp by Dana Jacobsen, gBib by Alejandro Sierra and Felipe Bergo, Pybliographer by Frederic Gobry)

Given the current networked environment, the affinity of open source software development to librarianship, and the sorts of projects enumerated above what can the library profession do to best take advantage of the current milelu? I posed this question to the oss4lib mailing list a few weeks ago generating a lively discussion. A number of themes presented themselves:

national leadership
mainsteaming, workshops, and training
usability and packaging
economic viability
redefining the ILS
open source data

National leadership

One of the strongest themes was the need for a national leader. It was first articulated by David Dorman as the OSLN (Open Source Library Network). Karen Coyle and Aaron Trehab elaborated on this idea by suggesting organizations such as ALA/LITA, the DLF, OCLC, or RLG help fund and facilitate methods for providing credibility, publicity, stability, and coordination to library-based open source software projects.

Mainstreaming, workshops, and training

Along theses same lines was the expressed desire for the mainstreaming of open source software articulated by Carol Erkens, Rachel Cheng, and Peter Schlumpf. This mainstreaming process would include presentations, workshops, and training sessions on local, regional, and national levels. These activities would describe and demonstrate open sources software for libraries. They would enumerate the advantages and disadvantages of open sources software. They would provide extensive instructions on the staffing, installation, and maintenance issue of open source software.

Usability and packaging

In its present state, open sources software is much like microcomputer computing of the '70's as stated by Blake Carver. It is very much a build it yourself enterprise; the systems are not very usable when it comes to installation. This point was echoed by Cheng who recently helped facilitate a NERCOMP workshop on open source software. Peter Schulmpf points to the need for easier installation methods so maintainers of the system can focus on managing content and not software. Using open source software should not be like owning an automobile in the 1920's; I shouldn't necessarily need to know how to fix it in order to make it go.

Economic viability

Open source software needs to be demonstrated as an economically viable method of supporting software and systems. This was pointed out by Eric Schnell and David Dorman. Libraries have spent a lot of time, effort, and money on resource sharing. Why not pool these same resources together to create software satisfying our professional needs? Open source software is not like the "homegrown" systems. Spaghetti code and GOTO statements should be a thing of the past. More importantly, a globally networked computer environment provides a means of sharing expertise in a manner not feasible twenty-five years ago. We need to demonstrate to administrators and funding sources that money spent developing software empowers our collective whole. It is an investment in personnel and infrastructure. Open source software is not a fad, yet is will not necessarily replace commercial software. On the other hand, open source software offers opportunities not necessarily available from the commercial sector.

Redefining the ILS

There are many open source library application available today. Each satisfies a particular need. Maybe each of these individual applications can be brought together into a collective, synergistic whole as described by Jeremy Frumkin and we could redefine the "integrated library system" (ILS). Presently our ILS's manage things like books pretty well. With the addition of 856 fields in MARC records they are beginning to assist in the management of networked resources as well, but libraries are more than books and networked resources. Libraries are about services too: reserves, reading lists, bibliographies, reader advisory services of many types, current awareness, reference, etc. Maybe the existing open source software can be glued together to form something more holistic resulting in a sum greater than its parts. This is also an opportunity, as described by Schnell, for vendors to step in and provide such integration including installation, documentation, and training.

Open source data

Open source software relates to data as well as systems as described by Krichel. The globally networked computer environment allows us to share data as well as software. Why not selectively feed URL's to Internet spiders to create our own, subject-specific indexes? Why not institutionalize services like the Open Directory Project or build on the strength of INFOMINE to share records in a manner similar to the manner of OCLC?

The Week It All Came Together

I am always excited about libraries and librarianship. The recent discussion on the OSS4Lib mailing list exemplifies some of the opportunities for our profession. As Ben Ostrowsky put it, "Years from now, this will be known as The Week It All Came Together." Let's hope so. Let's hope the momentum can be sustained. Let's build on our strengths, continue to pool our resources, and spend our time, money, and energy on ways to improve our situation instead of bemoaning the perceived limitations. As Gordon s said, "These are social problems, rather than technical." Let's explore our alternatives.

Creator: Eric Lease Morgan <eric_morgan@infomotions.com>
Source: Prepared for a presentation at the 2001 American Library Association Annual Conference in San Francisco.
Date created: 2001-06-08
Date updated: 2004-12-06
Subject(s): open source software; presentations;
URL: http://infomotions.com/musings/ossnlibraries/