Cataloging the Web

Making the WWW More Accessible

Recent Postings

Subject: Re: Cataloging the Web
Date: Sat, 03 May 1997 07:38:30 -0700
From: "Eyler Coates, Sr."
CC: Michelle Martin Robertson
Michelle Martin Robertson wrote:

> What concerns me most about the keyword approach is that, the way the
> web is growing, simply having "subjects" and even "forms" as types of
> keywords to identify web sites will become grossly insufficient.  If
> web sites ever begin a trend towards higher degrees of specificity, I
> think we'll end up with a similar problem to the current one.

In fact, isn't that what we have now?  The tendency seems to be toward
websites with very narrowly defined content.  Those covering very broad
subjects ("Philosophy") seem more often to be collections of URL's for
other sites.

> Take
> this example:  We find a web site that addresses "The Effect of Man on
> the Environment."  The webmaster has assigned the subjects "man" and
> "environment."  But how can we differentiate between this website and
> the ones on "The Effects of the Environment on Man?"

For one thing, there is the title itself.  But yours is an excellent
point: although a purusal of entries that a search engine might pull up
would allow a person to pick out what was wanted by examining titles and
abstracts, the idea of an improved system is to narrow the search
sufficiently so that there is not so much irrelevant material to sift
through.  That's what we have now.  Indeed, your point focuses on the
very heart of the problem of improving access.

> It seems to me
> that it would be best to have a variety of possible types of "subject"
> entries.  An "effect of" subject-tag would be very useful in this
> situation.  Of course, most sites wouldn't use it, but to some it
> would be essential for proper identification.   The result for the
> first site would be something like <subject = environment> and <effect
> of = man>.

As this whole discussion thread develops, it begins to become apparent
that there will need to be two approaches to "cataloging the web," which
are roughly analogous to Books in Print on one hand, and a standard
library catalog on the other.  This means a rather "rough" access to the
entire World Wide Web for persons wanting the broadest possible approach
to *all* the material on the Web, and a more focused and selective
approach to those "materials that have (at some level) been evaluated as
useful for our users or germane to our missions," in the words of Steve
Shadle.  Cataloging for the former will mean something that will be
confusing, perhaps, without closer examination of the Website itself;
cataloging for library purposes will be something with greater
specificity, having all the elements you identify clearly spelled out.
We might hope that both types of cataloging will be complementary.

One important difference not to be overlooked in the analogy between BIP
and libraries on one hand, and the WWW and its Search Engines on the
other, is that far more ordinary citizens will be using and dependent on
Search Engines than on BIP.  Adequate access, therefore, becomes a vital

In considering cataloging for the entire WWW, I think the important thing
for us to do is to put ourselves in the mind of both the Surfer and the
Webmaster.  It is not likely that they will look up a proper term in a
list, though it is more likely in the case of the Webmaster.  In truth,
the topics you use as an example are broad ones.  But assuming that the
Webmaster has a page that falls within, say,  "The Effect of Man on the
Environment," I believe we are going to need to be dependent on however
the Webmaster describes the Website and just face philosophically the
fact that the descriptive tags will be, ultimately, inadequate.

> A "form" example (off the top of my head):  You come across a site
> dedicated to Marie Antoinette's fictional appearances in literature.
> How do you differentiate between this site and one that discusses her
> as a historical character?   According to Library of Congress
> practice, her name would be followed by "in literature."  But if you
> simply provide a subject for her name and for "literature" on the web,
> that sounds like it might be a site that presents literature written
> by her, which is very misleading.
> With the "form" heading/tag the last example would be solved.  Sites
> that *contain* literature could have the tag <form = literature>.
> Sites that are *about* literature would have tag <subject =
> literature>.   Sites that focus on both would have both.  If there is
> literature by Marie Antoinette at the site, there could be a tag
> <author = Marie Antoinette>.  This would need to be distinguished from
> the tag for the author of the web page, though... <grin>

I believe your examples ably illustrate the need for expert cataloging
for the best, library-standard results.  They also demonstrate that this
level of cataloging will also be beyond what we could expect from the
vast majority of Webmasters.  Hence, the necessity of the two
complementary approaches.

> I think that keywords are definitely the way to go on the web, for
> simplicity's sake.  But I am sure that users would benefit from
> greater specificity within the keyword framework.  If the subject tags
> are well-documented, webmasters should have plenty of  incentives to
> use them.

Eyler Coates

Please indicate if your reponse may be included in a redaction,
omitting repetitions and irrelevant matter, at:

                     Cataloging the Web
                Making the WWW More Accessible


Subject: Re: Cataloging the Web Date: Sat, 3 May 1997 01:29:46 -0400 From: "Eyler Coates, Sr." Newsgroups: bit.listserv.autocat Please note that Mr. Coates, who originated this thread, wishes to have a comprehesive compilation, with slight editing, of the discussion, to be available on his web site. If you participate in the discussion and have no objection to this you need say nothing. If you DO object, please participate if you wish but CLEARLY state in EACH item you send on this topic that your material is NOT to be included in his compilation. He may assume tacit consent if you do not explicitly deny it. Mr. Coates assures us this is strictly nonprofit. Your tacit consent for him to include your material on his website does not constitue consent for any other use. If you wish to know what he does in the way of editing please contact him directly, or take a look at the website (see the item at the end of his message, below). Please keep this in mind for as long as this discussion continues. If needed, reminders of this will be posted periodically. Douglas ********************************************************************* Steve Shadle wrote: > > But currently we only provide access (through library and other > bibliographic information services) to a portion of available materials, > but these are materials that have (at some level) been evaluated as useful > for our users or germane to our missions. (Contrary to popular belief, > the Library of Congress does *not* contain every book ever published ;-) > > I wholeheartedly agree that user-supplied metadata can bring order to the > Internet universe. But if the Web is growing as exponentially as > presumed, then its even *more* important that we lay the groundwork for > the ability to enable a person to: > > * find a work by a given author (Where's *my* John Smith? Who are > Carr/Holt/Kellow/Plaidy/Tate? Which one is Bill Clinton: William E., > William J. or William R.?) > * identify the intellectual work (vs. the manifestation) (Hamlet, the > Apocryhpa or Beethoven's Eroica by any other name) > * provide information about the bibliographic relations between works > (editions, revisions) > * identify the genre/format of the work (web site vs. text document; > review article vs. research report). > > Our current catalogs attempt these functions with some degree of success. > For those items which are significant to our users, shouldn't we provide > this same level of identification/control? I'm not saying we need to do > it in the same way. I think the Dublin Core comes closer to providing the > elements necessary for us to provide this level of description than > Eyler's suggested metadata structure (although his elements can serve as a > basis for basic description; the Dublin Core is expansible after all). > > I applaud Eyler's suggestions for metadata subject authority and think the > use of user-supplied data for the Web universe is better than we've ever > been able to do in print. But I can't help but think that there's more to > it than just subjects and that we can still provide a selection (not > everything that's submitted to Yahoo gets in) and identification role as > we do with print resources. --Steve Steve's response illustrates an interesting difference of perspective. Although a former librarian (having served time as a cataloger) and now retired, I am right now above all a Webmaster and Internet user. The Web exists in my view as a entity unto itself, further apart from libraries than are books and publishing. In another sense, it is like a separate genre from books, such as periodicals, music recordings, and film. Including Web resources in a library main catalog strikes me, offhand, as strange; something like including journal articles. Cataloging the Web for itself might be considered similar to the "cataloging" in Books in Print. Would libraries include a complete catalog of every book in print if, through some mechanism, every one of them were available on a TV screen for library users? Probably not. In fact, maybe that is a good analogy. Would a library catalog then be a selected guide? Would it exist alongside this other, perhaps less detailed, catalog that provides access to *everything*, just as it now exist alongside BIP and CBI? In that case, it is perfectly reasonable to assume that a library might selectively include certain unique Web resources in its catalog, just as Steve suggests, since they are so much the equivalent of a book. Indeed, I consider my main website, Thomas Jefferson on Politics & Government (forgive the plug ;-), as something that probably should exist in book form. There is no print book available that has assembled together as extensive a collection of Jefferson's ideas on politics and government, in his own words, as that website. Isn't it reasonable that libraries might want to include that type of resource in their catalog? Therefore, what we end up with is a dual set of needs. There is a need for an adequate access tool to the entire Web, just as there is a need for periodical indexes or BIP. There is also a need for an amplified access tool to certain Web resources that would be appropriate for libraries and their computerized catalogs. Ideally, the latter should build upon the former, rather than being an entirely separate approach. The analogy with BIP breaks down here, however, because the Web catalog is not simply a list. The resources are readily at hand. Researchers will want to access the whole, big, messy thing, and they will want an access tool that is the best that is reasonably available, given the level of technology. Moreover, the whole computer thing introduces new elements: besides new ways of storing the data, there are new ways of accessing the data, i.e., of forming searches. If libraries are tied to the "pre-coordinate" subject heading system, however, this might pose real problems. If anything has emerged from the present discussion, it is that LCSH and DDC are not right for the *entire* Web. The access tool suitable for the *select* Web will not be practicable for the *whole* Web, just as it would be impracticable to include complete cataloging data for every item in BIP. The Web promises to be a giant resource in the future, deserving of a cataloging system that would be adequate for its nature. That is one job. Libraries may need to provide selective, detailed access to Web resources. That is another job. But the two ought to be related in some rational way.

Eyler Coates ============================================================= Please indicate if your reponse may be included in a redaction, omitting repetitions and irrelevant matter, at: Cataloging the Web Making the WWW More Accessible ==============================================================

Subject: Re: Cataloging the Web Date: Thu, 01 May 1997 19:55:31 GMT From: (Michelle Martin Robertson) Organization: University of Tennessee Newsgroups: bit.listserv.autocat References: 3368 "Eyler Coates, Sr." wrote: [snip] >> >> Sorry, I was suggesting that we need to categorise Web pages by the form >> the information is in (eg "Images") as well as the subject of the >> information (eg "Hale-Bopp"). Systems like LCSH mix the two functions >> but if you're using postcoordinate indexing you really need to separate >> them. >Is it not true, however, that the keywords (which can also be phrases) >need not be in a separate META category? As long as the terms >designating the form, subject, or whatever, are distinctive indicators, >could they not be intermixed? Thus, a Webmaster who uses the keywords >"Marie Antoinette, biography" would, with two keywords, establish his >work in three subject categories. I don't see how keyword language can be controlled to that extent successfully, without asking Webmasters to research their subject terminology and essentially do the same work catalogers do when assigning subject headings. It is very easy to add more terms as "see" or "see also" references; it will be impossible to divide up multiple meanings of the same word *after* it has been used by Webmasters to define their sites. Since the proposal seems contingent on their being able to use the vocabulary they choose, I think an attempt to separate terms after their assignment to a site would be ill-advised. The webmaster for a site that consists of an extensive collection of American literature will want to use the "literature" keyword. The webmaster for a site that consists of information about American literature (historical, critical, whatever) will want to use the "literature" keyword as well. People who only want to find one of these two things will be enormously frustrated to have to wade through a large selection of both kinds of sites. If you control the vocabulary to the point that people have to use unnatural words to get what they want (especially on the web), they will be discouraged from using that search strategy. The abovementioned metaliterature site could include "literary criticism" "literary history" "book reviews" "literature bibliography" etc., but this is putting an undue demand both on the site to provide all these specific terms that differentiate its content from literature itself, and on the searcher to come up with all the terms he needs. The literature/criticism problem is among a host of similar conflicts. We just don't use different words to distinguish between form and content in English, so any attempt to force the vocabulary to do so will be stilted and confuse the user. A tongue-in-cheek proposal: we could just add "meta" to all the terms to describe "aboutness." "Metaliterature" for criticism and "metametaliterature" for a bibliography of literary criticism... and dare I mention "metametametaliterature" for a catalog record of the bibliography of critical works... these are just not attractive to look at, and they don't really mean anything. Unless there is another proposal that would solve this problem, having different "META" tags for content and form are essential if any sense is to be made out of the keywords I've mentioned. - Michelle --------------------------------------------------------- Michelle Martin Robertson University of Tennessee, Knoxville Libraries
Subject: Re: Cataloging the Web Date: Thu, 01 May 1997 10:31:51 -0700 From: "Eyler Coates, Sr." Organization: Newsgroups: bit.listserv.autocat,schl.sig.lmnet References: Robert Cunnew wrote: > > In article, "Eyler Coates, Sr." > writes > > > >> Given the undesirability of precoordination in subject indexing, I > >> wonder whether there is a need for (5) Forms, taken from a short list of > >> appropriate terms, eg information service, promotional material, images, > >> sounds, software (form not subject, ie downloadable), news, directory, > >> discussion forum ... > > > >I regrettably must confess that I am unfamiliar with your frame of > >reference here and am uncertain of your meaning. I had suggested five > >subject headings as a maximum, though if the Keyword option were > >selected, it might seem that more than five would be required. Are we > >talking about the same thing? > > Sorry, I was suggesting that we need to categorise Web pages by the form > the information is in (eg "Images") as well as the subject of the > information (eg "Hale-Bopp"). Systems like LCSH mix the two functions > but if you're using postcoordinate indexing you really need to separate > them. Is it not true, however, that the keywords (which can also be phrases) need not be in a separate META category? As long as the terms designating the form, subject, or whatever, are distinctive indicators, could they not be intermixed? Thus, a Webmaster who uses the keywords "Marie Antoinette, biography" would, with two keywords, establish his work in three subject categories. ============================================================= Please indicate if your reponse may be included in a redaction, omitting repetitions and irrelevant matter, at: Cataloging the Web Making the WWW More Accessible ==============================================================
Subject: Re: Cataloging the Web Date: Wed, 30 Apr 1997 21:55:08 -0400 From: Kate Bowers Newsgroups: bit.listserv.autocat Regarding cataloging the web: > A standard that is implemented by non-experts will have different > results, perhaps better results than the current chaos that is the Web, > but very different from library catalogs. Let us remember that we are experts. We have become very skilled at deriving meaningful data from efficient inspection of an item. The records we create adhere not only to rigid standards of encoding, but also to intelligent rules for !-content-!. International standards for library cataloging have been in place since 1908. Professional ethics require that catalogers remain impartial. Catalogers follow and create standard lists. We contribute to and use authority files to ensure that the millions of John Williams in the world who create works have separate name entries. This is only scratching the surface of the enterprise of cataloging. In a situation where non-expert persons create records, they do many things unlike library cataloging. Lack of standard authority files will mean that common names will remain undifferentiated. Much as we lament the difficulty of remaining unbiased, we at least recognize the problem, and see more ideas and works from more corners of the world than most people. Lack of this broad overview of content will mean that basic assumptions, let alone biases, will be missed. For instance, a hate group with a Web site is unlikely to note "hate" or "racism" in its keywords; they will not see themselves this way. In a more innocuous example, and one that really exists in paper, picture a work with the word "Constitution" in large letters at its top. Which constitution is it? Is it the US constitution? The Malawi constitution? A translation of a defunct Belorussian constitution? or the Constitution of the Ladies' Auxiliary of the Youngstown, Ohio Knights of Columbus? The work itself may not tell one enough information about it. Its creators know who they are, and are not aware that their identity and that of the document in their hands is linked, but that outside world cannot know this. This is only scratching the surface of the enterprise of organizing the web. %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-% Kate Bowers, Assistant Curator Bibliographic Control and Special Media Harvard University Archives Cambridge, MA 02138 voice: (617) 495-2461 fax: (617) 495-8011 email: %-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
Subject: Re: Cataloging the Web Date: Mon, 28 Apr 1997 12:18:47 -0400 From: "David P. Miller" Newsgroups: bit.listserv.autocat Steve Shadle asks: "One of the specific points I would like feedback on is whether there are institutions out there that feel the need to assign classification *solely* for subject access (i.e., for resources that don't sit on a shelf). Do catalog users *use* classification as a subject retrieval mechanism?" Yes, if we remember that catalog users include librarians. This is unfortunately, and too often, a source of inside jokes and rib-poking, as if there was something inappropriate about librarians making their own tools easier for themselves to use. There isn't. Many's the time when, using our simple character-based INNOPAC system, I've been able to dig up a number of additional resources for students by redirecting a subject or keyword search into the class. number index. Actually, "indexes". We use both DDC and LCC for different parts of the collection, and the respective indexes contain *any* classification number found in the record for an item, not only those used for shelving. In addition, I let significantly different class numbers from the same system remain in a record, unless (and this is rare) they're utterly inappropriate. So, we have really divorced classification from its confinement to shelving. Now, the INNOPAC command says something like "See Items Nearby On Shelf." This obviously isn't always true, given what I've outlined above -- but so far the misnomer hasn't upset anyone. So -- assign classification numbers to intangible resources. Yes, as many as you like. The shelving issue is a red herring. Yours, David Miller Levin Library, Curry College Milton, MA

Get Previous Postings


Post your comments to this page:

Your name or handle:

Please include a phrase to identify the part of the text you are commenting on. It is not necessary to quote a whole section of the text.


Cataloging the Web: Front Page

This page hosted by GeoCities. Get your own Free Home Page.