Cataloging the Web

	Sign In Sign-Up
Cataloging the Web
Making the WWW More Accessible
The Initial Posting



Subject: Cataloging the Web
Date: Sun, 27 Apr 1997 08:38:26 -0700
From: "Eyler Coates, Sr." 
Organization: http://www.webspawner.com/users/EylerCoates/
Newsgroups: bit.listserv.autocat,schl.sig.lmnet

The current issue of Slate (http://www.slate.com) contains an article by
Bill Barnes in his Webhead column, "Search Me," on the inadequacy of
access to "the vast resources" of the Web as compared to any library.  He
examines the various search engines available, and details how
unsatsifactory they are.  He didn't really propose a solution to the
problem, but his conclusion included the following:

"If the producers of every site cataloged it themselves, then Yahoo!
wouldn't have a hard time keeping up with them. Of course, everyone would
have to agree on standard ways to do this, and if everyone agreed,
for-profit search sites like Yahoo! probably wouldn't be necessary."

A lot of people have discussed this problem, but no one that I have
discovered has proposed a simple, comprehensive, workable solution as
yet.  I posted to Slates's "The Fray" an outline of the following
possible overall structure that might meet the problem.  It is presented
in more detail here in hopes that other people might have some input, and
we can collectively arrive at a mechanism for providing better access to
the WWW.

CATALOGING THE WEB

A practical system must take into account the nature of materials on the
Web, the people who create it, the search engines that find it, and the
needs of the people who research it.  A system similar to that provided
by library services will not work, because so many of the elements are
different.  For example, classification systems (such as Dewey, LC) are
irrelevant because they are designed for grouping physical objects on
shelves for browsing, access and retrieval.  Unnecessary elements only
add unnecessary complication.  Web document research doesn't need
complete bibliographic data.  What's really needed is an efficient means
of retrieval.  If users want more complete information, they can click on
the document itself, unlike in a library where they would need to go up
an elevator to the fourth floor to look at the document.  Therefore, Web
Cataloging need only concern itself with retrieving a good list of mostly
relevant documents that the user can then examine more closely.

Another factor is the level of expertise of the people that will
necessarily be doing the cataloging.  Already, Web resources are vast,
constantly changing, and only promise to be more so in the future.  It is
necessary, therefore, that the cataloging be simple enough to be done by
an ordinary Webmaster and not require the services of a professional.

The basic requirements of people who search the Web for materials are (1)
Titles, (2) Names, (3) Subjects, and (4) Brief Abstracts accompanying the
results of searches for the first three elements.  Other elements, such
as dates, publishers (Webmasters?), editions, etc., could be obtained by
clicking on the document itself.  It is interesting that computers give
incredible forms of access to data, but to date, the best that Search
Engines can do has been to index every word that appears on a webpage.
This, however, produces very unsatisfactory search results most of the
time.  If searches could be conducted via the three elements above, with
an abstract included with each finding, the results would be far more
satisfactory for users.

What is needed, therefore, are four page attributes, which could be
included in each document's head.  Two of these already are usually found
on every webpage.

(1) TITLES.  Already included in the header of every Webpage:
        --------------

(2) DESCRIPTION (Abstract).  This is a standard META tag, using the
designation:
        .

(3) NAMES.  This would require a new META tag, which could include up to
five names, selected appropriately by the Webmaster, of authors, editors,
corporations, subjects of biographies, etc., using the designation:
        
Distinguishing between authors, editors, etc., would be unnecessary,
because the abstract should make clear the relationship of the various
names to the page content.  Also, the document is right there and can be
clicked on if the user wants more precise information.

(4) SUBJECTS.  This would require the existence of a standard list of
subject headings, probably made available by the Search Engine (see
below), from which Webmasters could select (with helps) up to five
appropriate headings for their own page and put them in a tag:
        
Rather than use something as complicated as the Library of Congress
Subject Headings, something less detailed such as the Sears Subject
Headings, would probably be sufficient for the WWW.  The only problem is,
such a list is not presently available on the Web, although it probably
would not be too difficult to establish a similar list of subject
headings that Webmasters could use.  If some public institution supplied
such a list on the WWW, all search engines could use it, and everybody
would benefit from the uniformity.  Simplicity combined with lots of help
would be necessary, because it would require application by Webmasters
without professional training.

Note that the Search Engine itself would contain the equivalent of "See"
and "See Also" references.  "See" references would automatically bring up
the referenced materials in most cases, whereas "See Also" references
should present options to the user.  In some cases, a search could bring
up a request for more specific attributes from the user.  All of these
would be elements built into the search engine's program.

The first requirement to initiate the system would be for some
enterprising Search Engine to offer searches based on the META tags
described above.  This need not replace the present word searches, and it
would permit the system to be introduced gradually.

Then what would be required is the cooperation of Webmasters in providing
the necessary META tags.  Would they do it?  Of course they would do it!
 Right now, many of them try all kinds of tricks in order to get their
page recognized, such as filling the page with certain key words.  The
one thing that Webmasters want after going to all the trouble of creating
a Webpage is for everyone to have access to it.  Surely, if they were
required to put in the appropriate META tags to get listed properly, you
can bet they would do it.

Search engines that rely on just the META tags will be easier to set up.
 Since a search engine would be concerned only with the data in a page's
heading, this would require less storage and might even enable a search
engine to visit every page on the Web, and do it more frequently.

Eyler Coates

--
============================================================
        Thomas Jefferson on Politics & Government
        http://pages.prodigy.com/jefferson_quotes

  Eyler Robert Coates, Sr.  eyler.coates@worldnet.att.net
============================================================
Post your comments about this page:

Cataloging the Web: Front Page
This page hosted by GeoCities. Get your own Free Home Page.