Cataloging the Web

Making the WWW More Accessible

Recent Postings





Subject: Re: Cataloging the Web
Date: Thu, 08 May 1997 19:42:07 -0700
From: "Eyler Coates, Sr." eyler.coates@worldnet.att.net
Organization: http://www.webspawner.com/users/EylerCoates/
To: AUTOCAT@LISTSERV.ACSU.BUFFALO.EDU
References: 970508010853.2.14354@mtigwc01

> "Aardy R. DeVarque"  wrote:
>
> Eyler Coates, Sr.  wrote:
> >Then what would be required is the cooperation of Webmasters in providing
> >the necessary META tags.  Would they do it?  Of course they would do it!
> > Right now, many of them try all kinds of tricks in order to get their
> >page recognized, such as filling the page with certain key words.  The
> >one thing that Webmasters want after going to all the trouble of creating
> >a Webpage is for everyone to have access to it.  Surely, if they were
> >required to put in the appropriate META tags to get listed properly, you
> >can bet they would do it.
>
> One problem: As you yourself say, "Right now, many of them try all kinds of
> tricks in order to get their page recognized, such as filling the page with
> certain key words."
>
> What's to stop these same webmasters from dumping certain key words in the
> META tags as well, whether or not the words actually describe the page?
>
> Answer: Nothing.

Not at all.  There are still things like honor and honesty.  A decent
person discovers early on that his interests are not furthered by
trickery and deceit.  Those who use such methods are discredited and
receive the opprobrium of the very people they are trying to attract.
But even more than moral disincentives, it would be a simple matter for a
Search Engine to have a button for reporting such dishonesty by users,
and for the Search Engine, after a simple investigation, to remove the
offending Website from its files.  There would be nothing to be gained by
indulging in such intellectual vandalism.

>
> Result: Just as many worthless hits.
>

Not very likely.  Dishonesty seldom pays in the long run.  It is true
that there would probably be a small number of mischievous kids that
might try such tricks in spite of the consequences.  But since adequate
protections could easily be in place for dealing with such people, why
let immature or lawless elements prevent what is otherwise reasonable
policy?

> Your idea is nice in theory, but it falls apart due to the chaos of the web
> and the dishonesty of many webmasters that will try just about anything in
> order to get "everyone to have access to [their page]."

It should be noted especially, that those webmasters who try to load
their sites with key terms now, do so not to deceive people into going to
a site unrelated to their quest.  They do it to overcome the insane
system we now have, which "catalogs" sites based on the number of times
certain words appear on a page of text.  Moreover, even now, some Search
Engines can detect this tactic and reject such pages.  Webmasters are not
more dishonest than other people; they are just seeking effective means
for promoting their pages.

> Call me a
> pessimist, but self-moderation only works as long as *everybody* wants it to
> work as it should--and it has already been demonstrated that many of the
> "interested" folk are *not* interested in seeing it work as it should.

Nothing works 100% of the time.  Even trained catalogers can have their
biases that pop up occasionally.  A system will work as long as it is in
the vital interests of participants that it does work, and as long as
there are adequate measures for dealing with those rogues who are intent
on disrupting any system, anywhere.  Webmasters who will go to all the
trouble to establish a website and then promote it with false advertising
are fools.  They could only hope to gain the anger and ostracism of those
who might visit their site.  It is also well to remember that the people
who pull pranks on the Internet, almost always do so anonymously.  They
are not, as a rule, interested in advertising to the world that they are
knaves.  You, Sir, are a pessimist. ;-)

>
> Solution: Get a relatively impartial group of people (or a really good fuzzy
> logic program) to create these fields and store them in a central database.
> While we're at it, let's call the group of people "catalogers", the data
> sets "bibliographic records" and the database a "catalog".  Also, either use
> a controlled keyword vocabulary or set up one heck of a SEE ALSO references
> list.
>
> Solution2: Start with a core of interested personnel.  Tell them what to
> look for, and monitor their work until they can be reasonably trusted to be
> impartial/truthful.  Create a pyramid, where each person spends some time
> checking the input of the newest personnel until those people can go on
> their own and check more people, etc., etc.--much in the same way the
> Internet Oracle works--rather than relying only on degreed catalogers, for
> example. Also, either use a controlled keyword vocabulary or set up one heck
> of a SEE ALSO references list, and make sure the personnel.
>
> Joel Hahn

Your solutions are far more complicated and require too many expert
personnel.  If the WWW explosion is only at its beginning, such high
levels of expert control will only increasingly become unable to deal
with the volume.

Eyler Coates

=============================================================
Please indicate if your reponse may be included in a redaction,
omitting repetitions and irrelevant matter, at:

                     Cataloging the Web
                Making the WWW More Accessible

   http://www.geocities.com/Athens/Forum/1683/cwindex.htm

==============================================================




Date: Tue, 6 May 1997 01:50:22 -0400 Sender: "AUTOCAT: Library cataloging and authorities discussion group" AUTOCAT@LISTSERV.ACSU.BUFFALO.EDU From: "Aardy R. DeVarque" aardy@anet-chi.NOJUNK.com Subject: Re: Cataloging the Web Eyler Coates, Sr. eyler.coates@worldnet.att.net wrote: >Then what would be required is the cooperation of Webmasters in providing >the necessary META tags. Would they do it? Of course they would do it! > Right now, many of them try all kinds of tricks in order to get their >page recognized, such as filling the page with certain key words. The >one thing that Webmasters want after going to all the trouble of creating >a Webpage is for everyone to have access to it. Surely, if they were >required to put in the appropriate META tags to get listed properly, you >can bet they would do it. One problem: As you yourself say, "Right now, many of them try all kinds of tricks in order to get their page recognized, such as filling the page with certain key words." What's to stop these same webmasters from dumping certain key words in the META tags as well, whether or not the words actually describe the page? Answer: Nothing. Result: Just as many worthless hits. Your idea is nice in theory, but it falls apart due to the chaos of the web and the dishonesty of many webmasters that will try just about anything in order to get "everyone to have access to [their page]." Call me a pessimist, but self-moderation only works as long as *everybody* wants it to work as it should--and it has already been demonstrated that many of the "interested" folk are *not* interested in seeing it work as it should. Solution: Get a relatively impartial group of people (or a really good fuzzy logic program) to create these fields and store them in a central database. While we're at it, let's call the group of people "catalogers", the data sets "bibliographic records" and the database a "catalog". Also, either use a controlled keyword vocabulary or set up one heck of a SEE ALSO references list. Solution2: Start with a core of interested personnel. Tell them what to look for, and monitor their work until they can be reasonably trusted to be impartial/truthful. Create a pyramid, where each person spends some time checking the input of the newest personnel until those people can go on their own and check more people, etc., etc.--much in the same way the Internet Oracle works--rather than relying only on degreed catalogers, for example. Also, either use a controlled keyword vocabulary or set up one heck of a SEE ALSO references list, and make sure the personnel. Joel Hahn Feudalism: Serf & Turf Mr. Coates has my permission to include this in his thread compilation.
From: "J. McRee Elrod" mac@SLC.BC.CA Organization: Special Libraries Cataloguing, Inc. Subject: Re: Cataloging the Web Comments: To: PANCHYSH@LIB1.LAN.MCGILL.CA Comments: cc: autocat@UBVM.cc.buffalo.edu In-Reply-To: 199705042324.QAA18318@bmd2.baremetal.com >There are 3 distinct approaches that I have encountered in my approach to >this question. Perhaps a fourth could be added to these: bookmarks on a library homepage. Most I have seen segregate bookmarks into those intended for patrons, and those intended for staff (mainly technical services staff). Why not have library homepages for each subject division, e.g., social sciences, humanities, and sciences, with relevant bookmarks? The major difficulty I have with creating MARC records for many of these sources is that they are in such a state of flux. The bookmark list(s) could be seen as analogous to vertical or pamphlet files, where we put material of current interest which is too ephemeral for full cataloguing. Mac __ __ J. McRee (Mac) Elrod (mac@slc.bc.ca) {__ | / Special Libraries Cataloguing HTTP://www.slc.bc.ca/ ___} |__ \__________________________________________________________
Subject: Re: Cataloging the Web Date: Sat, 3 May 1997 13:02:44 EST5EDT From: "Roman S. Panchyshyn, McGill University" PANCHYSH@LIB1.LAN.MCGILL.CA Reply-To: "AUTOCAT: Library cataloging and authorities discussion group" AUTOCAT@LISTSERV.ACSU.BUFFALO.EDU, "Roman S. Panchyshyn, McGill University" PANCHYSH@LIB1.LAN.MCGILL.CA To: AUTOCAT@LISTSERV.ACSU.BUFFALO.EDU There are 3 distinct approaches that I have encountered in my approach to this question. First, there is the Intercat project at OCLC and the emergence of the 856 MARC field, which has allowed cataloguers to provide access to electronic information via MARC records. While this works fairly well, first generation automation systems usually do not have web-based OPACs which allow for "hot links" to the items themseves. The second and third approaches involve other forms of metadata. There is the approach which uses Text Encoding Initiative (TEI) headers for electronic documents which have been marked up using SGML. The Library of Congress has been working on software which would allow for conversion of TEI headers to MARC and vice-versa. They are probably better suited to comment on the current status of this project than I am. Third, is the Dublin Core approach. This approach would see creators of electronic documents embed a core set of elements in the meta tags of HTML documents. These elements, as has already been pointed out on this list, have also been mapped to USMARC in a recent MARBI DP. For more information on the DC, I would suggest readers contact Stuart Weibel at OCLC. He is one of the major developers of this initiative. My opinions on this issue are very basic ones. If no standards for cataloguing and providing access to these materials is decided upon, we may be faced with an enormous "virtual backlog" of Internet resources. If cataloguers fail to provide access to these materials, then we are not providing the best service possible to our users. Any comments? Roman S. Panchyshyn Central Technical Services Redpath Library McGill University Montreal, QC e-mail: panchysh@lib1.lan.mcgill.ca "Si hoc signum legere potes, operis boni in rebus Latinus alacribus et fructuosis potiri potes!"
Subject: Re; Cataloging the Web -Reply Date: Fri, 2 May 1997 11:13:04 -0600 From: Vianne Sha Sha@LAW.MISSOURI.EDU Newsgroups: bit.listserv.autocat My comments are in between ***** below. >>> Gordon Pew Gordon_Pew@ccmail.ca1.uscourts.gov 05/02/97 09:09am >>> Since I am now a law cataloger, I am interested specifically in what other law libraries are doing on this front (if anything). First, one thread of the discussion is on metadata. Does this imply that the original poster on this issue wanted to embed better internal indexing in Web documents? I thought the issue was whether we catalogers should create bibliographic records, resident in our OPACS, for documents available on the WWW. .... Are any law libraries beginning to catalog the Web in any way? Who in the library decides which Web documents should be cataloged (with an example or two)? How do/would you monitor Web resources to be sure they haven't been substantially changed or even deleted from the Internet? I am not so much interested in the mechanics (classification, description, subject analysis) of cataloging Internet resources: rather, the rationale for doing it in the first place. ***** Our library started cataloging Internet resources since we joined the OCLC InterCat project in 1995. We use OCLC to catalog Internet resources as well as any other resources. I believe some libraries have a collection development committee to decide what to catalog. I have drafted a selection policy for my library with other librarians' consensus and decided what to catalog according to my guidelines. A URL checking program is used to monitor the URL changes of the resources we cataloged. Sometimes human review is necessary to verify content changes. OCLC's PURL is a good way to avoid changing the URL in OCLC and local catalog even though you still need to change the URL in the PURL server. If you want to know more about what I propose to solve this problem in the library field, please attend the B-6 program in American Association of Law Libraries Annual Meeting in Baltimore. My rationale for cataloging Internet resources is based on: 1) integrating all formats of information resources under one search engine (the library OPAC) for our local users; 2) Internet resources are information that is no different from other information resources such as books and microfilms; 3) we can never catalog all the books in the world, but we are still selecting and cataloging them. The same theory applies to Internet resources. I don't think any system can organize all Internet resources in the world right now, but we are still trying to organize them and make some good uses of them. ***** ===================================== Vianne Sha Automation & Bibliographic Management Librarian sha@law.missouri.edu University of Missouri-Columbia School of Law Library

Get Previous Postings

 

Post your comments to this page:

Your name or handle:

Please include a phrase to identify the part of the text you are commenting on. It is not necessary to quote a whole section of the text.

 

Cataloging the Web: Front Page

This page hosted by GeoCities. Get your own Free Home Page.