Cataloging the Web

	Sign In Sign-Up
Cataloging the Web
Making the WWW More Accessible
Recent Postings



Subject: Re: Cataloging the Web
Date: Thu, 08 May 1997 19:42:07 -0700
From: "Eyler Coates, Sr." eyler.coates@worldnet.att.net
Organization: http://www.webspawner.com/users/EylerCoates/
To: AUTOCAT@LISTSERV.ACSU.BUFFALO.EDU
References: 970508010853.2.14354@mtigwc01

> "Aardy R. DeVarque"  wrote:
>
> Eyler Coates, Sr.  wrote:
> >Then what would be required is the cooperation of Webmasters in providing
> >the necessary META tags.  Would they do it?  Of course they would do it!
> > Right now, many of them try all kinds of tricks in order to get their
> >page recognized, such as filling the page with certain key words.  The
> >one thing that Webmasters want after going to all the trouble of creating
> >a Webpage is for everyone to have access to it.  Surely, if they were
> >required to put in the appropriate META tags to get listed properly, you
> >can bet they would do it.
>
> One problem: As you yourself say, "Right now, many of them try all kinds of
> tricks in order to get their page recognized, such as filling the page with
> certain key words."
>
> What's to stop these same webmasters from dumping certain key words in the
> META tags as well, whether or not the words actually describe the page?
>
> Answer: Nothing.

Not at all.  There are still things like honor and honesty.  A decent
person discovers early on that his interests are not furthered by
trickery and deceit.  Those who use such methods are discredited and
receive the opprobrium of the very people they are trying to attract.
But even more than moral disincentives, it would be a simple matter for a
Search Engine to have a button for reporting such dishonesty by users,
and for the Search Engine, after a simple investigation, to remove the
offending Website from its files.  There would be nothing to be gained by
indulging in such intellectual vandalism.

>
> Result: Just as many worthless hits.
>

Not very likely.  Dishonesty seldom pays in the long run.  It is true
that there would probably be a small number of mischievous kids that
might try such tricks in spite of the consequences.  But since adequate
protections could easily be in place for dealing with such people, why
let immature or lawless elements prevent what is otherwise reasonable
policy?

> Your idea is nice in theory, but it falls apart due to the chaos of the web
> and the dishonesty of many webmasters that will try just about anything in
> order to get "everyone to have access to [their page]."

It should be noted especially, that those webmasters who try to load
their sites with key terms now, do so not to deceive people into going to
a site unrelated to their quest.  They do it to overcome the insane
system we now have, which "catalogs" sites based on the number of times
certain words appear on a page of text.  Moreover, even now, some Search
Engines can detect this tactic and reject such pages.  Webmasters are not
more dishonest than other people; they are just seeking effective means
for promoting their pages.

> Call me a
> pessimist, but self-moderation only works as long as *everybody* wants it to
> work as it should--and it has already been demonstrated that many of the
> "interested" folk are *not* interested in seeing it work as it should.

Nothing works 100% of the time.  Even trained catalogers can have their
biases that pop up occasionally.  A system will work as long as it is in
the vital interests of participants that it does work, and as long as
there are adequate measures for dealing with those rogues who are intent
on disrupting any system, anywhere.  Webmasters who will go to all the
trouble to establish a website and then promote it with false advertising
are fools.  They could only hope to gain the anger and ostracism of those
who might visit their site.  It is also well to remember that the people
who pull pranks on the Internet, almost always do so anonymously.  They
are not, as a rule, interested in advertising to the world that they are
knaves.  You, Sir, are a pessimist. ;-)

>
> Solution: Get a relatively impartial group of people (or a really good fuzzy
> logic program) to create these fields and store them in a central database.
> While we're at it, let's call the group of people "catalogers", the data
> sets "bibliographic records" and the database a "catalog".  Also, either use
> a controlled keyword vocabulary or set up one heck of a SEE ALSO references
> list.
>
> Solution2: Start with a core of interested personnel.  Tell them what to
> look for, and monitor their work until they can be reasonably trusted to be
> impartial/truthful.  Create a pyramid, where each person spends some time
> checking the input of the newest personnel until those people can go on
> their own and check more people, etc., etc.--much in the same way the
> Internet Oracle works--rather than relying only on degreed catalogers, for
> example. Also, either use a controlled keyword vocabulary or set up one heck
> of a SEE ALSO references list, and make sure the personnel.
>
> Joel Hahn

Your solutions are far more complicated and require too many expert
personnel.  If the WWW explosion is only at its beginning, such high
levels of expert control will only increasingly become unable to deal
with the volume.

Eyler Coates

=============================================================
Please indicate if your reponse may be included in a redaction,
omitting repetitions and irrelevant matter, at:

                     Cataloging the Web
                Making the WWW More Accessible

   http://www.geocities.com/Athens/Forum/1683/cwindex.htm

==============================================================







Date:         Tue, 6 May 1997 01:50:22 -0400
Sender:       "AUTOCAT: Library cataloging and authorities discussion group"
              AUTOCAT@LISTSERV.ACSU.BUFFALO.EDU
From:         "Aardy R. DeVarque" aardy@anet-chi.NOJUNK.com
Subject:      Re: Cataloging the Web

Eyler Coates, Sr. eyler.coates@worldnet.att.net wrote:
>Then what would be required is the cooperation of Webmasters in providing
>the necessary META tags.  Would they do it?  Of course they would do it!
> Right now, many of them try all kinds of tricks in order to get their
>page recognized, such as filling the page with certain key words.  The
>one thing that Webmasters want after going to all the trouble of creating
>a Webpage is for everyone to have access to it.  Surely, if they were
>required to put in the appropriate META tags to get listed properly, you
>can bet they would do it.

One problem: As you yourself say, "Right now, many of them try all kinds of
tricks in order to get their page recognized, such as filling the page with
certain key words."

What's to stop these same webmasters from dumping certain key words in the
META tags as well, whether or not the words actually describe the page?

Answer: Nothing.

Result: Just as many worthless hits.

Your idea is nice in theory, but it falls apart due to the chaos of the web
and the dishonesty of many webmasters that will try just about anything in
order to get "everyone to have access to [their page]."  Call me a
pessimist, but self-moderation only works as long as *everybody* wants it to
work as it should--and it has already been demonstrated that many of the
"interested" folk are *not* interested in seeing it work as it should.

Solution: Get a relatively impartial group of people (or a really good fuzzy
logic program) to create these fields and store them in a central database.
While we're at it, let's call the group of people "catalogers", the data
sets "bibliographic records" and the database a "catalog".  Also, either use
a controlled keyword vocabulary or set up one heck of a SEE ALSO references
list.

Solution2: Start with a core of interested personnel.  Tell them what to
look for, and monitor their work until they can be reasonably trusted to be
impartial/truthful.  Create a pyramid, where each person spends some time
checking the input of the newest personnel until those people can go on
their own and check more people, etc., etc.--much in the same way the
Internet Oracle works--rather than relying only on degreed catalogers, for
example. Also, either use a controlled keyword vocabulary or set up one heck
of a SEE ALSO references list, and make sure the personnel.

Joel Hahn
Feudalism: Serf & Turf
Mr. Coates has my permission to include this in his thread compilation.







From:         "J. McRee Elrod" mac@SLC.BC.CA
Organization: Special Libraries Cataloguing, Inc.
Subject:      Re: Cataloging the Web
Comments: To: PANCHYSH@LIB1.LAN.MCGILL.CA
Comments: cc: autocat@UBVM.cc.buffalo.edu
In-Reply-To:  199705042324.QAA18318@bmd2.baremetal.com

>There are 3 distinct approaches that I have encountered in my approach to
>this question.

Perhaps a fourth could be added to these: bookmarks on a library
homepage.  Most I have seen segregate bookmarks into those intended for
patrons, and those intended for staff (mainly technical services staff).

Why not have library homepages for each subject division, e.g., social
sciences, humanities, and sciences, with relevant bookmarks?

The major difficulty I have with creating MARC records for many of these
sources is that they are in such a state of flux.  The bookmark list(s)
could be seen as analogous to vertical or pamphlet files, where we put
material of current interest which is too ephemeral for full
cataloguing.

Mac

   __       __   J. McRee (Mac) Elrod (mac@slc.bc.ca)
  {__  |   /     Special Libraries Cataloguing   HTTP://www.slc.bc.ca/
  ___} |__ \__________________________________________________________







Subject: Re: Cataloging the Web
Date: Sat, 3 May 1997 13:02:44 EST5EDT
From: "Roman S. Panchyshyn, McGill University" PANCHYSH@LIB1.LAN.MCGILL.CA
Reply-To: "AUTOCAT: Library cataloging and authorities discussion group" AUTOCAT@LISTSERV.ACSU.BUFFALO.EDU,
     "Roman S. Panchyshyn, McGill University" PANCHYSH@LIB1.LAN.MCGILL.CA
To: AUTOCAT@LISTSERV.ACSU.BUFFALO.EDU

There are 3 distinct approaches that I have encountered in my approach to
this question. First, there is the Intercat project at OCLC and the
emergence of the 856 MARC field, which has allowed cataloguers to provide
access to electronic information via MARC records. While this works fairly
well, first generation automation systems usually do not have web-based
OPACs which allow for "hot links" to the items themseves.
The second and third approaches involve other forms of metadata. There is
the approach which uses Text Encoding Initiative (TEI) headers for
electronic documents which have been marked up using SGML. The Library of
Congress has been working on software which would allow for conversion of
TEI headers to MARC and vice-versa. They are probably better suited to
comment on the current status of this project than I am.
Third, is the Dublin Core approach. This approach would see creators of
electronic documents embed a core set of elements in the meta tags of HTML
documents. These elements, as has already been pointed out on this list,
have also been mapped to USMARC in a recent MARBI DP. For more
information on the DC, I would suggest readers contact Stuart Weibel at
OCLC. He is one of the major developers of this initiative.
My opinions on this issue are very basic ones. If no standards for
cataloguing and providing access to these materials is decided upon, we may
be faced with an enormous "virtual backlog" of Internet resources.
If cataloguers fail to provide access to these materials, then we are not
providing the best service possible to our users.
Any comments?

Roman S. Panchyshyn
Central Technical Services
Redpath Library
McGill University
Montreal, QC
e-mail: panchysh@lib1.lan.mcgill.ca

 "Si hoc signum legere potes, operis boni in rebus
 Latinus alacribus et fructuosis potiri potes!"





Subject: Re; Cataloging the Web -Reply
Date: Fri, 2 May 1997 11:13:04 -0600
From: Vianne Sha Sha@LAW.MISSOURI.EDU
Newsgroups: bit.listserv.autocat

My comments are in between ***** below.

>>> Gordon Pew Gordon_Pew@ccmail.ca1.uscourts.gov 05/02/97 09:09am >>>
        Since I am now a law cataloger, I am interested specifically in
     what other law libraries are doing on this front (if anything).
        First, one thread of the discussion is on metadata. Does this imply
     that the original poster on this issue wanted to embed better internal
     indexing in Web documents?  I thought the issue was whether we
     catalogers should create bibliographic records, resident in our OPACS,
     for documents available on the WWW.  ....  Are
     any law libraries beginning to catalog the Web in any way?  Who in the
     library decides which Web documents should be cataloged (with an
     example or two)?  How do/would you monitor Web resources to be sure
     they haven't been substantially changed or even deleted from the
     Internet?  I am not so much interested in the mechanics
     (classification, description, subject analysis) of cataloging Internet
     resources: rather, the rationale for doing it in the first place.

*****
Our library started cataloging Internet resources since we joined the OCLC
InterCat project in 1995.  We use OCLC to catalog Internet resources as well
as any other resources.  I believe some libraries have a collection
development committee to decide what to catalog.  I have drafted a selection
policy for my library with other librarians' consensus and decided what to
catalog according to my guidelines.  A URL checking program is used to monitor
the URL changes of the resources we cataloged.  Sometimes human review is
necessary to verify content changes.  OCLC's PURL is a good way to avoid
changing the URL in OCLC and local catalog even though you still need to
change the URL in the PURL server.  If you want to know more about what I
propose to solve this problem in the library field, please attend the B-6
program in American Association of Law Libraries Annual Meeting in Baltimore.

My rationale for cataloging Internet resources is based on: 1) integrating all
formats of information resources under one search engine (the library OPAC)
for our local users; 2) Internet resources are information that is no
different from other information resources such as books and microfilms; 3) we
can never catalog all the books in the world, but we are still selecting and
cataloging them. The same theory applies to Internet resources.  I don't think
any system can organize all Internet resources in the world right now, but we
are still trying to organize them and make some good uses of them.
*****

=====================================
Vianne Sha
Automation & Bibliographic Management Librarian
sha@law.missouri.edu
University of Missouri-Columbia
School of Law Library
Get Previous Postings

Post your comments to this page:

Cataloging the Web: Front Page
This page hosted by GeoCities. Get your own Free Home Page.