Cataloging the Web

Making the WWW More Accessible

Recent Postings





Subject: Re: Cataloging the Web
Date: Mon, 28 Apr 1997 19:54:51 -0700
From: "Eyler Coates, Sr." eyler.coates@worldnet.att.net
Organization: http://www.webspawner.com/users/EylerCoates/
Newsgroups: bit.listserv.autocat


Charley Pennell wrote:
>
> Folks-
>
>   While web resource-embedded metadata is fine for fairly unique items
> which people search using known access points, it is not terribly useful
> for subject searches, partly due to the lack of a controlled vocabulary
> in Dublin Core, TEI headers, etc. and partly due to the web search
> engines' lack of any collocating function between documents such as is
> provided by classification or thesauri. Because metadata is not
> uniformly available in all net documents, the search engines still seek
> relevant terms from throughout the document, thus further diluting the
> provided metadata's usefulness.

The system as proposed in this thread is not intended to rely upon
metadata *as it now exists*.  What we are suggesting is a system that
could gradually be implemented, and that would furnish improved access as
it's acceptance became more widespread.  Right now, there exists
something approaching chaos.  Indexing every term in a document is not
really a system; or it might be called a default system used to provide
access in the absence of a deliberate and rationally organized system.
Indexing every word off a webpage is a kind of catch-all, that will
provide a  "best we can do" access in the absence of a rational system.
 Already, in these exchanges we have stumbled on a systemized way of
employing an UNcontrolled vocabulary, but yet producing controlled
results.  This is done by having unskilled Webmasters supply an array of
subject headings for the same subject, but with a professional librarian
at the Search Engine, who will take these user-supplied "See" references
(and what better source for See references than the unskilled people who
will be using the search engine?) and make all the various terms that
refer to one subject category in fact refer to that one category.  Thus,
you would have a single subject category, but you would have a multiple
number of terms that give access to that category.  This might sound off
hand like another kind of chaos, but what it does is convert the chaos
that exists in the user's approach to searching into an organized
referral.  If a group of Webmasters each used a different Keyword for the
same subject, the cataloger's job would be to see that the Search Engine
referred a search for any one of those terms to ALL the documents which
contained any one of those keywords in its metadata.

>   Letting useful resources languish out on the network is not the
> solution for our patrons in any case. It is the library electronic
> catalogue that combines our expertise in selection, retrieval and
> increasingly, delivery, of information to our clientele.
>
> Getting this
> data into a form which cataloguers could use might best be achieved by
> an electronic CIP arrangement with information providers who would
> contact us (depending on the constituency of the provider) to see about
> getting machine readable copy attached to worthy network resources.  For
> an example of how this might work, see:
>
>         http://www.statcan.ca/english/Dsp/81-003-XPB/81-003-XPB.htm

Unfortunately, the WWW is too big (and growing), too wild, and too
mutable to be limited to the facilities appropriate for a former age.  We
are talking about an explosion, and we are just now at the beginning of
it.  We have materials being produced and made available so easily and so
cheaply, no system that relies on the passage of such an enormous flow of
materials through the hands of professionals working on them one at a
time will meet the challenge of this new age.  At best, such a system
would always mean that only a small portion of the available materials
would be processed, thus failing to provide technological advances in
cataloging to correspond with the technological advances in data
production.

Eyler Coates

--

=============================================================
All of the postings to this thread are available in a redacted
form, without repetitions and irrelevant matter, at:

                     Cataloging the Web
                Making the WWW More Accessible

   http://www.geocities.com/Athens/Forum/1683/cwindex.htm

==============================================================




Subject: Re: Cataloging the Web Date: Mon, 28 Apr 97 14:20:38 +0000 From: Charley Pennell Reply-To: cpennell@morgan.ucs.mun.ca Organization: QEII Library, Memorial University of Newfoundland To: eyler.coates@worldnet.att.net References: <199704280425.BAA16478@piva.ucs.mun.ca> Folks- While web resource-embedded metadata is fine for fairly unique items which people search using known access points, it is not terribly useful for subject searches, partly due to the lack of a controlled vocabulary in Dublin Core, TEI headers, etc. and partly due to the web search engines' lack of any collocating function between documents such as is provided by classification or thesauri. Because metadata is not uniformly available in all net documents, the search engines still seek relevant terms from throughout the document, thus further diluting the provided metadata's usefulness. Letting useful resources languish out on the network is not the solution for our patrons in any case. It is the library electronic catalogue that combines our expertise in selection, retrieval and increasingly, delivery, of information to our clientele. Getting this data into a form which cataloguers could use might best be achieved by an electronic CIP arrangement with information providers who would contact us (depending on the constituency of the provider) to see about getting machine readable copy attached to worthy network resources. For an example of how this might work, see: http://www.statcan.ca/english/Dsp/81-003-XPB/81-003-XPB.htm _______________________________________________________________________ """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" Charley Pennell cpennell@morgan.ucs.mun.ca Head, Cataloguing Division voice: (709)737-7625 Queen Elizabeth II Library fax: (709)737-3118 Memorial University of Newfoundland St. John's, NF Canada A1B 3Y1 World Wide Web: http://sicbuddy.library.mun.ca/~charl8P9/chuckhome.html Cataloguer's Toolbox: http://www.mun.ca/library/cat _______________________________________________________________________ """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
Subject: Re: Cataloging the Web Date: Mon, 28 Apr 1997 09:44:35 -0700 From: "Eyler Coates, Sr." eyler.coates@worldnet.att.net Organization: http://www.webspawner.com/users/EylerCoates/ Newsgroups: bit.listserv.autocat,schl.sig.lmnet Robert Cunnew wrote: > > In article <336372F2.7388@worldnet.att.net>, "Eyler Coates, Sr." > writes > > >(4) SUBJECTS. This would require the existence of a standard list of > >subject headings, probably made available by the Search Engine (see > >below), from which Webmasters could select (with helps) up to five > >appropriate headings for their own page and put them in a tag: > > > >Rather than use something as complicated as the Library of Congress > >Subject Headings, something less detailed such as the Sears Subject > >Headings, would probably be sufficient for the WWW. > > I'm not familiar with Sears, but isn't it - like LC - precoordinate? > Please don't let's suggest that the Web is cluttered up with nineteenth > century notions of subject access designed for catalogue cards. Simple > postcoordinate terms are what is required, eg term 1, Libraries, term 2, > United States, *not* "Libraries - United States" or whatever Sears has > to offer. Search engines may not be perfect but they *can* do Boolean, > even if it's often implicit rather than explicit. These are excellent points. I received an email response that suggested the possibility of keywords as an alternative to either Sears or LCSH. The matter of subject headings or keywords seems to be a crucial part of the system. If this general scheme were adopted, it seems that the "keyword" option might be the most desirable. It may be that a Search Engine could provide a standard list of Keywords created from those actually used by Webmasters, and that this list could serve as a reference list to maintain a level of uniformity. Webmasters could then create new Keywords if there were none on the list adequate for their needs. Thus, the Keyword List would be constantly brought up to date. Such a list would also be useful for user/researchers while browsing. In addition, a human being (cataloger) could monitor the list, creating appropriate "See Also" references as part of the Search Engine's offerings and perhaps even making redundant Keywords, created by Webmasters (who are necessarily amateur catalogers), all refer to the same items. In this way, the equivalent of "See" references would be created by the Webmasters, but their combination into ONE real reference would be supervised and maintained by the professional. Interestingly enough, unlike the standard card catalog "See" references, these "See"-like references would all be considered equally authentic. This, then, would be a dynamic system that would be in a constant state of evolvement. > Given the undesirability of precoordination in subject indexing, I > wonder whether there is a need for (5) Forms, taken from a short list of > appropriate terms, eg information service, promotional material, images, > sounds, software (form not subject, ie downloadable), news, directory, > discussion forum ... I regrettably must confess that I am unfamiliar with your frame of reference here and am uncertain of your meaning. I had suggested five subject headings as a maximum, though if the Keyword option were selected, it might seem that more than five would be required. Are we talking about the same thing? Eyler Coates ============================================================= All of the postings to this thread are available in a redacted form, without repetitions and irrelevant matter, at: Cataloging the Web Making the WWW More Accessible http://www.geocities.com/Athens/Forum/1683/cwindex.htm ==============================================================
Subject: Re: Cataloging the Web Date: Mon, 28 Apr 1997 06:13:36 -0700 From: "Eyler Coates, Sr." Organization: http://www.webspawner.com/users/EylerCoates/ To: mac@slc.bc.ca References: <199704280427.VAA24101@bmd2.baremetal.com> J. McRee Elrod wrote: > > >"If the producers of every site cataloged it themselves, then Yahoo! > >wouldn't have a hard time keeping up with them. Of course, everyone would > >have to agree on standard ways to do this, and if everyone agreed, > >for-profit search sites like Yahoo! probably wouldn't be necessary." > > Mr. Coates elaboration of this proposal seems excellent to me. I have > only one addendum, and one disagreement. > > >(3) NAMES. > > Perhaps this should be qualified as personal and corporate names, to > clarify that both are included, and to exclude "names" like names of > chemicals, plants, and such. This is a very good point and should be included in instructions for Webmasters. > >(4) SUBJECTS. > ... > >Rather than use something as complicated as the Library of Congress > >Subject Headings, something less detailed such as the Sears Subject > >Headings, would probably be sufficient for the WWW. > > The subjects of web sites are even more up-to-date and various than > those of books. My experience of Sears is that it is too limited even > for books. I would suggest LCSH supplemented by the most recent annual > cumulation of the appropriate Wilson periodical index. > > Another option, of course, would be keywords. Another posting to the Newsgroup also relates to this problem, suggesting a rejection of both the Sears and LCSH in favor of search terms that might permit a Boolean search, which I assume would mean the use of keywords. This is a crucial part of the system, and the point that you make about the subjects of websites being "more up-to-date and various than those of books" is a salient one. If this general scheme were adopted, it seems that the "keyword" option might be the most desirable. It may be that a Search Engine could provide a standard list of Keywords created from those actually used by Webmasters, and that this list could serve as a reference list to maintain a level of uniformity. Webmasters could then create new Keywords if there were none on the list adequate for their needs. Thus, the Keyword List would be in a constant state of evolvement. Such a list would also be useful for user/researchers while browsing. In addition, a human being (cataloger) could monitor the list, creating appropriate "See Also" references as part of the Search Engine offering and perhaps even making redundant Keywords, created by Webmasters (who are necessarily amateur catalogers), all refer to the same items. In this way, the equivalent of "See" references would be created by the Webmasters, but their combination into ONE real reference would be supervised and maintained by the professional. This, then, would be a dynamic system that would be in a constant state of evolvement. Eyler Coates -- ============================================================ Thomas Jefferson on Politics & Government http://pages.prodigy.com/jefferson_quotes Eyler Robert Coates, Sr. eyler.coates@worldnet.att.net ============================================================
Subject: Re: Cataloging the Web Date: Mon, 28 Apr 97 02:08:32 +0000 From: mac@slc.bc.ca (J. McRee Elrod) Reply-To: mac@slc.bc.ca Organization: Special Libraries Cataloguing, Inc. To: eyler.coates@worldnet.att.net CC: autocat@ubvm.cc.buffalo.edu References: <199704280427.VAA24101@bmd2.baremetal.com> >"If the producers of every site cataloged it themselves, then Yahoo! >wouldn't have a hard time keeping up with them. Of course, everyone would >have to agree on standard ways to do this, and if everyone agreed, >for-profit search sites like Yahoo! probably wouldn't be necessary." Mr. Coates elaboration of this proposal seems excellent to me. I have only one addendum, and one disagreement. >(3) NAMES. Perhaps this should be qualified as personal and corporate names, to clarify that both are included, and to exclude "names" like names of chemicals, plants, and such. >(4) SUBJECTS. ... >Rather than use something as complicated as the Library of Congress >Subject Headings, something less detailed such as the Sears Subject >Headings, would probably be sufficient for the WWW. The subjects of web sites are even more up-to-date and various than those of books. My experience of Sears is that it is too limited even for books. I would suggest LCSH supplemented by the most recent annual cumulation of the appropriate Wilson periodical index. Another option, of course, would be keywords. Mac __ __ J. McRee (Mac) Elrod (mac@slc.bc.ca) {__ | / Special Libraries Cataloguing HTTP://www.slc.bc.ca/ ___} |__ \__________________________________________________________
Date: Mon, 28 Apr 1997 08:37:16 -0700 Reply-To: Steve Shadle shadle@u.washington.edu Sender: "AUTOCAT: Library cataloging and authorities discussion group" AUTOCAT@LISTSERV.ACSU.BUFFALO.EDU From: Steve Shadle shadle@u.washington.edu Subject: Re: Cataloging the Web I agree with much of what is presented in the article summary, but I do have a couple comments that I would like to hear other people's thoughts on. > different. For example, classification systems (such as Dewey, LC) are > irrelevant because they are designed for grouping physical objects on > shelves for browsing, access and retrieval. Unnecessary elements only My understanding is that the use of classification *solely* for grouping physical objectives is a North American (or at least non-European) practice and that European libraries more frequently used classed catalogs and that it is not an uncommon practice to assign multiple classifications to a work. Finding works on related subjects is important and unless a subject authority system (whether a simple hierarchy like YAHOO or a more complex structure like LCSH) is in place, classification can be used to facilitate this type of access. One of the specific points I would like feedback on is whether there are institutions out there that feel the need to assign classification *solely* for subject access (i.e., for resources that don't sit on a shelf). Do catalog users *use* classification as a subject retrieval mechanism? > of retrieval. If users want more complete information, they can click on > the document itself, unlike in a library where they would need to go up > an elevator to the fourth floor to look at the document. Therefore, Web > Cataloging need only concern itself with retrieving a good list of mostly > relevant documents that the user can then examine more closely. I had this same thought, but I've had students who disagree with this point. When servers are down, when the Net is overloaded and one can't connect to a resource for whatever reason, the catalog can serve as a much quicker mechanism for *identifying* and citing resources. It seems that this generation of workstation users are impatient with even a 15-second wait...getting an instant summary and brief description from a catalog record may provide a better service to a large group of users. IMHO, the use of user-supplied data would help immensely in bringing organization to the vast majority of materials on the net. However, there are some basic concepts in bibliographic description and identification (e.g., name authority) that have the potential to be useful both in web browsers and online catalogs. Cutter's principles don't become irrelevant in a networked world. And can anyone out there tell me what the current status of the Dublin Core (and other metadata schemes) are in terms of development, establishment and actual acceptance in the community? Thanks to Eyler Coates, Sr. for posting the summary and to you for reading this rather random set of thoughts. I look forward to discussion and especially to specific examples. --Steve Steve Shadle shadle@u.washington.edu * * * * Serials Cataloger * * * University of Washington Libraries, Box 352900 * * Seattle, WA 98195 (206) 543-4890 *
Subject: Re: Cataloging the Web Date: Sun, 27 Apr 1997 20:57:32 +0100 From: Robert Cunnew Organization: N/A Newsgroups: bit.listserv.autocat,schl.sig.lmnet References: <336372F2.7388@worldnet.att.net> In article <336372F2.7388@worldnet.att.net>, "Eyler Coates, Sr." writes >(4) SUBJECTS. This would require the existence of a standard list of >subject headings, probably made available by the Search Engine (see >below), from which Webmasters could select (with helps) up to five >appropriate headings for their own page and put them in a tag: > >Rather than use something as complicated as the Library of Congress >Subject Headings, something less detailed such as the Sears Subject >Headings, would probably be sufficient for the WWW. I'm not familiar with Sears, but isn't it - like LC - precoordinate? Please don't let's suggest that the Web is cluttered up with nineteenth century notions of subject access designed for catalogue cards. Simple postcoordinate terms are what is required, eg term 1, Libraries, term 2, United States, *not* "Libraries - United States" or whatever Sears has to offer. Search engines may not be perfect but they *can* do Boolean, even if it's often implicit rather than explicit. Given the undesirability of precoordination in subject indexing, I wonder whether there is a need for (5) Forms, taken from a short list of appropriate terms, eg information service, promotional material, images, sounds, software (form not subject, ie downloadable), news, directory, discussion forum ... -- Robert Cunnew Librarian, Chartered Insurance Institute, London
Subject: Cataloging the Web Date: Sun, 27 Apr 1997 08:38:26 -0700 From: "Eyler Coates, Sr." Organization: http://www.webspawner.com/users/EylerCoates/ Newsgroups: bit.listserv.autocat,schl.sig.lmnet The current issue of Slate (http://www.slate.com) contains an article by Bill Barnes in his Webhead column, "Search Me," on the inadequacy of access to "the vast resources" of the Web as compared to any library. He examines the various search engines available, and details how unsatsifactory they are. He didn't really propose a solution to the problem, but his conclusion included the following: "If the producers of every site cataloged it themselves, then Yahoo! wouldn't have a hard time keeping up with them. Of course, everyone would have to agree on standard ways to do this, and if everyone agreed, for-profit search sites like Yahoo! probably wouldn't be necessary." A lot of people have discussed this problem, but no one that I have discovered has proposed a simple, comprehensive, workable solution as yet. I posted to Slates's "The Fray" an outline of the following possible overall structure that might meet the problem. It is presented in more detail here in hopes that other people might have some input, and we can collectively arrive at a mechanism for providing better access to the WWW. CATALOGING THE WEB A practical system must take into account the nature of materials on the Web, the people who create it, the search engines that find it, and the needs of the people who research it. A system similar to that provided by library services will not work, because so many of the elements are different. For example, classification systems (such as Dewey, LC) are irrelevant because they are designed for grouping physical objects on shelves for browsing, access and retrieval. Unnecessary elements only add unnecessary complication. Web document research doesn't need complete bibliographic data. What's really needed is an efficient means of retrieval. If users want more complete information, they can click on the document itself, unlike in a library where they would need to go up an elevator to the fourth floor to look at the document. Therefore, Web Cataloging need only concern itself with retrieving a good list of mostly relevant documents that the user can then examine more closely. Another factor is the level of expertise of the people that will necessarily be doing the cataloging. Already, Web resources are vast, constantly changing, and only promise to be more so in the future. It is necessary, therefore, that the cataloging be simple enough to be done by an ordinary Webmaster and not require the services of a professional. The basic requirements of people who search the Web for materials are (1) Titles, (2) Names, (3) Subjects, and (4) Brief Abstracts accompanying the results of searches for the first three elements. Other elements, such as dates, publishers (Webmasters?), editions, etc., could be obtained by clicking on the document itself. It is interesting that computers give incredible forms of access to data, but to date, the best that Search Engines can do has been to index every word that appears on a webpage. This, however, produces very unsatisfactory search results most of the time. If searches could be conducted via the three elements above, with an abstract included with each finding, the results would be far more satisfactory for users. What is needed, therefore, are four page attributes, which could be included in each document's head. Two of these already are usually found on every webpage. (1) TITLES. Already included in the header of every Webpage: -------------- (2) DESCRIPTION (Abstract). This is a standard META tag, using the designation: . (3) NAMES. This would require a new META tag, which could include up to five names, selected appropriately by the Webmaster, of authors, editors, corporations, subjects of biographies, etc., using the designation: Distinguishing between authors, editors, etc., would be unnecessary, because the abstract should make clear the relationship of the various names to the page content. Also, the document is right there and can be clicked on if the user wants more precise information. (4) SUBJECTS. This would require the existence of a standard list of subject headings, probably made available by the Search Engine (see below), from which Webmasters could select (with helps) up to five appropriate headings for their own page and put them in a tag: Rather than use something as complicated as the Library of Congress Subject Headings, something less detailed such as the Sears Subject Headings, would probably be sufficient for the WWW. The only problem is, such a list is not presently available on the Web, although it probably would not be too difficult to establish a similar list of subject headings that Webmasters could use. If some public institution supplied such a list on the WWW, all search engines could use it, and everybody would benefit from the uniformity. Simplicity combined with lots of help would be necessary, because it would require application by Webmasters without professional training. Note that the Search Engine itself would contain the equivalent of "See" and "See Also" references. "See" references would automatically bring up the referenced materials in most cases, whereas "See Also" references should present options to the user. In some cases, a search could bring up a request for more specific attributes from the user. All of these would be elements built into the search engine's program. The first requirement to initiate the system would be for some enterprising Search Engine to offer searches based on the META tags described above. This need not replace the present word searches, and it would permit the system to be introduced gradually. Then what would be required is the cooperation of Webmasters in providing the necessary META tags. Would they do it? Of course they would do it! Right now, many of them try all kinds of tricks in order to get their page recognized, such as filling the page with certain key words. The one thing that Webmasters want after going to all the trouble of creating a Webpage is for everyone to have access to it. Surely, if they were required to put in the appropriate META tags to get listed properly, you can bet they would do it. Search engines that rely on just the META tags will be easier to set up. Since a search engine would be concerned only with the data in a page's heading, this would require less storage and might even enable a search engine to visit every page on the Web, and do it more frequently. Eyler Coates -- ============================================================ Thomas Jefferson on Politics & Government http://pages.prodigy.com/jefferson_quotes Eyler Robert Coates, Sr. eyler.coates@worldnet.att.net ============================================================
 

Post your comments to this page:

Your name or handle:

Please include a phrase to identify the part of the text you are commenting on. It is not necessary to quote a whole section of the text.

 

To Front Page

This page hosted by GeoCities. Get your own Free Home Page.