Cataloging the Web

	Sign In Sign-Up
Cataloging the Web
Making the WWW More Accessible
Recent Postings



Subject: Re: Cataloging the Web
Date: Sat, 03 May 1997 07:38:30 -0700
From: "Eyler Coates, Sr." eyler.coates@worldnet.att.net
Organization: http://www.webspawner.com/users/EylerCoates/
To: AUTOCAT@LISTSERV.ACSU.BUFFALO.EDU
CC: Michelle Martin Robertson robertson@aztec.lib.utk.edu
References: 199704291332.JAA03914@aztec.lib.utk.edu
Michelle Martin Robertson wrote:

> What concerns me most about the keyword approach is that, the way the
> web is growing, simply having "subjects" and even "forms" as types of
> keywords to identify web sites will become grossly insufficient.  If
> web sites ever begin a trend towards higher degrees of specificity, I
> think we'll end up with a similar problem to the current one.

In fact, isn't that what we have now?  The tendency seems to be toward
websites with very narrowly defined content.  Those covering very broad
subjects ("Philosophy") seem more often to be collections of URL's for
other sites.

> Take
> this example:  We find a web site that addresses "The Effect of Man on
> the Environment."  The webmaster has assigned the subjects "man" and
> "environment."  But how can we differentiate between this website and
> the ones on "The Effects of the Environment on Man?"

For one thing, there is the title itself.  But yours is an excellent
point: although a purusal of entries that a search engine might pull up
would allow a person to pick out what was wanted by examining titles and
abstracts, the idea of an improved system is to narrow the search
sufficiently so that there is not so much irrelevant material to sift
through.  That's what we have now.  Indeed, your point focuses on the
very heart of the problem of improving access.

> It seems to me
> that it would be best to have a variety of possible types of "subject"
> entries.  An "effect of" subject-tag would be very useful in this
> situation.  Of course, most sites wouldn't use it, but to some it
> would be essential for proper identification.   The result for the
> first site would be something like <subject = environment> and <effect
> of = man>.

As this whole discussion thread develops, it begins to become apparent
that there will need to be two approaches to "cataloging the web," which
are roughly analogous to Books in Print on one hand, and a standard
library catalog on the other.  This means a rather "rough" access to the
entire World Wide Web for persons wanting the broadest possible approach
to *all* the material on the Web, and a more focused and selective
approach to those "materials that have (at some level) been evaluated as
useful for our users or germane to our missions," in the words of Steve
Shadle.  Cataloging for the former will mean something that will be
confusing, perhaps, without closer examination of the Website itself;
cataloging for library purposes will be something with greater
specificity, having all the elements you identify clearly spelled out.
We might hope that both types of cataloging will be complementary.

One important difference not to be overlooked in the analogy between BIP
and libraries on one hand, and the WWW and its Search Engines on the
other, is that far more ordinary citizens will be using and dependent on
Search Engines than on BIP.  Adequate access, therefore, becomes a vital
issue.

In considering cataloging for the entire WWW, I think the important thing
for us to do is to put ourselves in the mind of both the Surfer and the
Webmaster.  It is not likely that they will look up a proper term in a
list, though it is more likely in the case of the Webmaster.  In truth,
the topics you use as an example are broad ones.  But assuming that the
Webmaster has a page that falls within, say,  "The Effect of Man on the
Environment," I believe we are going to need to be dependent on however
the Webmaster describes the Website and just face philosophically the
fact that the descriptive tags will be, ultimately, inadequate.

> A "form" example (off the top of my head):  You come across a site
> dedicated to Marie Antoinette's fictional appearances in literature.
> How do you differentiate between this site and one that discusses her
> as a historical character?   According to Library of Congress
> practice, her name would be followed by "in literature."  But if you
> simply provide a subject for her name and for "literature" on the web,
> that sounds like it might be a site that presents literature written
> by her, which is very misleading.
>
> With the "form" heading/tag the last example would be solved.  Sites
> that *contain* literature could have the tag <form = literature>.
> Sites that are *about* literature would have tag <subject =
> literature>.   Sites that focus on both would have both.  If there is
> literature by Marie Antoinette at the site, there could be a tag
> <author = Marie Antoinette>.  This would need to be distinguished from
> the tag for the author of the web page, though... <grin>

I believe your examples ably illustrate the need for expert cataloging
for the best, library-standard results.  They also demonstrate that this
level of cataloging will also be beyond what we could expect from the
vast majority of Webmasters.  Hence, the necessity of the two
complementary approaches.

> I think that keywords are definitely the way to go on the web, for
> simplicity's sake.  But I am sure that users would benefit from
> greater specificity within the keyword framework.  If the subject tags
> are well-documented, webmasters should have plenty of  incentives to
> use them.

Eyler Coates

=============================================================
Please indicate if your reponse may be included in a redaction,
omitting repetitions and irrelevant matter, at:

                     Cataloging the Web
                Making the WWW More Accessible

   http://www.geocities.com/Athens/Forum/1683/cwindex.htm

==============================================================






Subject: Re: Cataloging the Web
Date: Sat, 3 May 1997 01:29:46 -0400
From: "Eyler Coates, Sr." 
Newsgroups: bit.listserv.autocat

Please note that Mr. Coates, who originated this thread, wishes to
have a comprehesive compilation, with slight editing, of the discussion,
to be available on his web site.  If you participate in the discussion
and have no objection to this you need say nothing.  If you DO object,
please participate if you wish but CLEARLY state in EACH item you send
on this topic that your material is NOT to be included in his compilation.
He may assume tacit consent if you do not explicitly deny it.  Mr. Coates
assures us this is strictly nonprofit.  Your tacit consent for him to
include your material on his website does not constitue consent for
any other use.  If you wish to know what he does in the way of editing
please contact him directly, or take a look at the website (see the
item at the end of his message, below).

Please keep this in mind for as long as this discussion continues.
If needed, reminders of this will be posted periodically.   Douglas
*********************************************************************
Steve Shadle wrote:
>
> But currently we only provide access (through library and other
> bibliographic information services) to a portion of available materials,
> but these are materials that have (at some level) been evaluated as useful
> for our users or germane to our missions.  (Contrary to popular belief,
> the Library of Congress does *not* contain every book ever published  ;-)
>
> I wholeheartedly agree that user-supplied metadata can bring order to the
> Internet universe.  But if the Web is growing as exponentially as
> presumed, then its even *more* important that we lay the groundwork for
> the ability to enable a person to:
>
> * find a work by a given author (Where's *my* John Smith?  Who are
> Carr/Holt/Kellow/Plaidy/Tate?  Which one is Bill Clinton: William E.,
> William J. or William R.?)
> * identify the intellectual work (vs. the manifestation) (Hamlet, the
> Apocryhpa or Beethoven's Eroica by any other name)
> * provide information about the bibliographic relations between works
> (editions, revisions)
> * identify the genre/format of the work (web site vs. text document;
> review article vs. research report).
>
> Our current catalogs attempt these functions with some degree of success.
> For those items which are significant to our users, shouldn't we provide
> this same level of identification/control?  I'm not saying we need to do
> it in the same way.  I think the Dublin Core comes closer to providing the
> elements necessary for us to provide this level of description than
> Eyler's suggested metadata structure (although his elements can serve as a
> basis for basic description; the Dublin Core is expansible after all).
>
> I applaud Eyler's suggestions for metadata subject authority and think the
> use of user-supplied data for the Web universe is better than we've ever
> been able to do in print.  But I can't help but think that there's more to
> it than just subjects and that we can still provide a selection (not
> everything that's submitted to Yahoo gets in) and identification role as
> we do with print resources. --Steve

Steve's response illustrates an interesting difference of perspective.
Although a former librarian (having served time as a cataloger) and now
retired, I am right now above all a Webmaster and Internet user.  The Web
exists in my view as a entity unto itself, further apart from libraries
than are books and publishing.  In another sense, it is like a separate
genre from books, such as periodicals, music recordings, and film.
Including Web resources in a library main catalog strikes me, offhand, as
strange; something like including journal articles.

Cataloging the Web for itself might be considered similar to the
"cataloging" in Books in Print.  Would libraries include a complete
catalog of every book in print if, through some mechanism, every one of
them were available on a TV screen for library users?  Probably not.  In
fact, maybe that is a good analogy.  Would a library catalog then be a
selected guide?  Would it exist alongside this other, perhaps less
detailed, catalog that provides access to *everything*, just as it now
exist alongside BIP and CBI?

In that case, it is perfectly reasonable to assume that a library might
selectively include certain unique Web resources in its catalog, just as
Steve suggests, since they are so much the equivalent of a book.  Indeed,
I consider my main website, Thomas Jefferson on
Politics & Government (forgive the plug ;-), as something that
probably should exist in book form.  There is no print book
available that has assembled together as extensive a collection of
Jefferson's ideas on politics and government, in his own words, as that
website.  Isn't it reasonable that libraries might want to include that
type of resource in their catalog?

Therefore, what we end up with is a dual set of needs.  There is a need
for an adequate access tool to the entire Web, just as there is a need
for periodical indexes or BIP.  There is also a need for an amplified
access tool to certain Web resources that would be appropriate for
libraries and their computerized catalogs.  Ideally, the latter should
build upon the former, rather than being an entirely separate approach.
The analogy with BIP breaks down here, however, because the Web catalog
is not simply a list.  The resources are readily at hand.  Researchers
will want to access the whole, big, messy thing, and they will want an
access tool that is the best that is reasonably available, given the
level of technology.

Moreover, the whole computer thing introduces new elements: besides new
ways of storing the data, there are new ways of accessing the data, i.e.,
of forming searches.  If libraries are tied to the "pre-coordinate"
subject heading system, however, this might pose real problems.  If
anything has emerged from the present discussion, it is that LCSH and DDC
are not right for the *entire* Web.  The access tool suitable for the
*select* Web will not be practicable for the *whole* Web, just as it
would be impracticable to include complete cataloging data for every item
in BIP.  The Web promises to be a giant resource in the future, deserving
of a cataloging system that would be adequate for its nature.  That is
one job.  Libraries may need to provide selective, detailed access to Web
resources.  That is another job.  But the two ought to be related in some
rational way.

Eyler Coates

=============================================================
Please indicate if your reponse may be included in a redaction, omitting
repetitions and irrelevant matter, at:

                     Cataloging the Web
                Making the WWW More Accessible

   http://www.geocities.com/Athens/Forum/1683/cwindex.htm

==============================================================






Subject: Re: Cataloging the Web
Date: Thu, 01 May 1997 19:55:31 GMT
From: robertson@aztec.lib.utk.edu (Michelle Martin Robertson)
Organization: University of Tennessee
Newsgroups: bit.listserv.autocat
References: 336372F2.7388@worldnet.att.net 
K4E1cFAs+6YzEwWZ@cunnew.demon.co.uk 3364D3F3.3194@worldnet.att.net O2PUvDAF7kZzEweF@cunnew.demon.co.uk 3368

"Eyler Coates, Sr." eyler.coates@worldnet.att.net wrote:

[snip]
>>
>> Sorry, I was suggesting that we need to categorise Web pages by the form
>> the information is in (eg "Images") as well as the subject of the
>> information (eg "Hale-Bopp").  Systems like LCSH mix the two functions
>> but if you're using postcoordinate indexing you really need to separate
>> them.

>Is it not true, however, that the keywords (which can also be phrases)
>need not be in a separate META category?  As long as the terms
>designating the form, subject, or whatever, are distinctive indicators,
>could they not be intermixed?  Thus, a Webmaster who uses the keywords
>"Marie Antoinette, biography" would, with two keywords, establish his
>work in three subject categories.

I don't see how keyword language can be controlled to that extent
successfully, without asking Webmasters to research their subject
terminology and essentially do the same work catalogers do when
assigning subject headings.  It is very easy to add more terms as
"see" or "see also" references; it will be impossible to divide up
multiple meanings of the same word *after* it has been used by
Webmasters to define their sites.  Since the proposal seems contingent
on their being able to use the vocabulary they choose, I think an
attempt to separate terms after their assignment to a site would be
ill-advised.

The webmaster for a site that consists of an extensive collection of
American literature will want to use the "literature" keyword.  The
webmaster for a site that consists of information about American
literature (historical, critical, whatever) will want to use the
"literature" keyword as well.  People who only want to find one of
these two things will be enormously frustrated to have to wade through
a large selection of both kinds of sites.

If you control the vocabulary to the point that people have to use
unnatural words to get what they want (especially on the web), they
will be discouraged from using that search strategy.  The
abovementioned metaliterature site could include "literary criticism"
"literary history" "book reviews" "literature bibliography" etc., but
this is putting an undue demand both on the site to provide all these
specific terms that differentiate its content from literature itself,
and on the searcher to come up with all the terms he needs.  The
literature/criticism problem is among a host of similar conflicts.  We
just don't use different words to distinguish between form and content
in English, so any attempt to force the vocabulary to do so will be
stilted and confuse the user.

A tongue-in-cheek proposal: we could just add "meta" to all the terms
to describe "aboutness."  "Metaliterature" for criticism and
"metametaliterature" for a bibliography of literary criticism... and
dare I mention "metametametaliterature" for a catalog record of the
bibliography of critical works... these are just not attractive to
look at, and they don't really mean anything.

Unless there is another proposal that would solve this problem, having
different "META" tags for content and form are essential if any sense
is to be made out of the keywords I've mentioned.

- Michelle
---------------------------------------------------------
Michelle Martin Robertson     robertson@aztec.lib.utk.edu
University of Tennessee, Knoxville Libraries







Subject: Re: Cataloging the Web
Date: Thu, 01 May 1997 10:31:51 -0700
From: "Eyler Coates, Sr." eyler.coates@worldnet.att.net
Organization: http://www.webspawner.com/users/EylerCoates/
Newsgroups: bit.listserv.autocat,schl.sig.lmnet
References: 336372F2.7388@worldnet.att.net
     K4E1cFAs+6YzEwWZ@cunnew.demon.co.uk
3364D3F3.3194@worldnet.att.net O2PUvDAF7kZzEweF@cunnew.demon.co.uk

Robert Cunnew wrote:
>
> In article 3364D3F3.3194@worldnet.att.net, "Eyler Coates, Sr."
> eyler.coates@worldnet.att.net writes
> >
> >> Given the undesirability of precoordination in subject indexing, I
> >> wonder whether there is a need for (5) Forms, taken from a short list of
> >> appropriate terms, eg information service, promotional material, images,
> >> sounds, software (form not subject, ie downloadable), news, directory,
> >> discussion forum ...
> >
> >I regrettably must confess that I am unfamiliar with your frame of
> >reference here and am uncertain of your meaning.  I had suggested five
> >subject headings as a maximum, though if the Keyword option were
> >selected, it might seem that more than five would be required.  Are we
> >talking about the same thing?
>
> Sorry, I was suggesting that we need to categorise Web pages by the form
> the information is in (eg "Images") as well as the subject of the
> information (eg "Hale-Bopp").  Systems like LCSH mix the two functions
> but if you're using postcoordinate indexing you really need to separate
> them.

Is it not true, however, that the keywords (which can also be phrases)
need not be in a separate META category?  As long as the terms
designating the form, subject, or whatever, are distinctive indicators,
could they not be intermixed?  Thus, a Webmaster who uses the keywords
"Marie Antoinette, biography" would, with two keywords, establish his
work in three subject categories.

=============================================================
Please indicate if your reponse may be included in a redaction,
omitting repetitions and irrelevant matter, at:

                     Cataloging the Web
                Making the WWW More Accessible

   http://www.geocities.com/Athens/Forum/1683/cwindex.htm

==============================================================






Subject: Re: Cataloging the Web
Date: Wed, 30 Apr 1997 21:55:08 -0400
From: Kate Bowers kate_bowers@harvard.edu
Newsgroups: bit.listserv.autocat

Regarding cataloging the web:

> A standard that is implemented by non-experts will have different
> results, perhaps better results than the current chaos that is the Web,
> but very different from library catalogs.

Let us remember that we are experts.  We have become very skilled at
deriving meaningful data from efficient inspection of an item.  The
records we create adhere not only to rigid standards of encoding, but also
to intelligent rules for !-content-!.  International standards for library
cataloging have been in place since 1908. Professional ethics require that
catalogers remain impartial.  Catalogers follow and create standard lists.
We contribute to and use authority files to ensure that the millions of
John Williams in the world who create works have separate name entries.

This is only scratching the surface of the enterprise of cataloging.

In a situation where non-expert persons create records, they do many
things unlike library cataloging.  Lack of standard authority files will
mean that common names will remain undifferentiated.  Much as we lament
the difficulty of remaining unbiased, we at least recognize the problem,
and see more ideas and works from more corners of the world than most
people.  Lack of this broad overview of content will mean that basic
assumptions, let alone biases, will be missed.  For instance, a hate group
with a Web site is unlikely to note "hate" or "racism" in its keywords;
they will not see themselves this way.  In a more innocuous example, and
one that really exists in paper, picture a work with the word
"Constitution" in large letters at its top.  Which constitution is it?  Is
it the US constitution?  The Malawi constitution?  A translation of a
defunct Belorussian constitution? or the Constitution of the Ladies'
Auxiliary of the Youngstown, Ohio Knights of Columbus? The work itself may
not tell one enough information about it.  Its creators know who they are,
and are not aware that their identity and that of the document in their
hands is linked, but that outside world cannot know this.

This is only scratching the surface of the enterprise of organizing the web.

%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%
Kate Bowers, Assistant Curator
Bibliographic Control and Special Media
Harvard University Archives
Cambridge, MA 02138
voice:  (617) 495-2461
fax:    (617) 495-8011
email:  kate_bowers@harvard.edu
%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%





Subject: Re: Cataloging the Web
Date: Mon, 28 Apr 1997 12:18:47 -0400
From: "David P. Miller" dmiller2@curry.edu
Newsgroups: bit.listserv.autocat

Steve Shadle asks:

"One of the specific points I would like feedback on is whether there are
institutions out there that feel the need to assign classification
*solely* for subject access (i.e., for resources that don't sit on a
shelf).  Do catalog users *use* classification as a subject retrieval
mechanism?"

Yes, if we remember that catalog users include librarians. This is
unfortunately, and too often, a source of inside jokes and rib-poking,
as if there was something inappropriate about librarians making their
own tools easier for themselves to use. There isn't.

Many's the time when, using our simple character-based INNOPAC system,
I've been able to dig up a number of additional resources for students
by redirecting a subject or keyword search into the class. number index.
Actually, "indexes". We use both DDC and LCC for different parts of
the collection, and the respective indexes contain *any* classification
number found in the record for an item, not only those used for shelving.
In addition, I let significantly different class numbers from the same
system remain in a record, unless (and this is rare) they're utterly
inappropriate. So, we have really divorced classification from its
confinement to shelving.

Now, the INNOPAC command says something like "See Items Nearby On Shelf."
This obviously isn't always true, given what I've outlined above -- but
so far the misnomer hasn't upset anyone.

So -- assign classification numbers to intangible resources. Yes, as many
as you like. The shelving issue is a red herring.

Yours,

David Miller
Levin Library, Curry College
Milton, MA
dmiller2@curry.edu
Get Previous Postings

Post your comments to this page:

Cataloging the Web: Front Page
This page hosted by GeoCities. Get your own Free Home Page.