International Journal of Scientometrics, Informetrics and Bibliometrics
ISSN 1137-5019
  > Homepage  > The Source  > Tools  > Searching the Web  > New Approaches: Visualization

 

 

NEW APPROACHES: VISUALIZATION

HYPERTEXTUAL RECOVERING

An emerging discipline in the field of recovering info from large hypertext corpus suggested the use of links in the World Wide Web to improve the performance (precision) of the search engines. A brief introduction was provided by Sullivan (1998) in "Counting Clicks and Looking at Links".

Several promising projects are on the way and all of them could be of deep impact on the scientometric research of the Internet, such as:

BIRD is a bibliometric query by example search engine. Given a set of pages of interest to the user, it retrieves a set of similar documents by following citation paths that pass through those given documents.

• The CLEVER project builds on Jon Kleinberg's HITS (Hypertext-Induced Topic Search) algorithm, which seeks to find authoritative sources (Authorities) of information on the Web, together with sites (Hubs) featuring good compilations of such authoritative sources. The original HITS algorithm, devised while Kleinberg was a visiting scientist at IBM Almaden, first uses a standard text search engine to gather a "root set" of pages matching the query subject. Next, it adds to the pool all pages pointing to or pointed to by the root set. Thereafter, it uses only the links between these pages to distill the best authorities and hubs. The key insight is that these links capture the annotative power (and effort) of millions of individuals independently building web pages.

° J. Kleinberg. Authoritative sources in a hyperlinked environment. Proc. 9th ACM

° J. Kleinberg and Steve Lawrence. The Structure of the Web. Science, Vol 294, Issue 5548, 1849-1850 , 30 November 2001

°SIAM Symposium on Discrete Algorithms, 1998. Also appears as IBM Research Report RJ 10076, May 1997.

° D. Gibson, J. Kleinberg, P. Raghavan. Inferring Web communities from link topology. Proc. 9th ACM Conference on Hypertext and Hypermedia, 1998.

° S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, P. Raghavan, S. Rajagopalan, Automatic resource list compilation by analyzing hyperlink structure and associated text. Proc. 7th International World Wide Web Conference, 1998.

° S. Chakrabarti, et al. (1999) Hypersearching the Web.

° S. Chakrabarti, et al. (1999) Mining the link structure of the World Wide Web.

° Kleinberg J et al (1999) The Web as a graph: Measurements, models and methods.

° D. Gibson, J. Kleinberg, P. Raghavan. Structural Analysis of the World Wide Web. Invited position paper at the WWW Consortium Web Characterization Workshop, November 1998.

• CiteSeer. Autonomous Citation Indexing (ACI) which automates the construction of citation indexes.

° Bollacker, K.; Lawrence, Steve; Giles, C. Lee (1998). "CiteSeer: An Autonomous Web Agent for Automatic Retrieval and Identification of Interesting Publications". Proceedings of the 2nd International ACM Conference on Autonomous Agents, pp. 116, 1998. PDF

° Giles, C. Lee; Lawrence, Steve & Krovetz, Bob (1998). "Access to Information on the Web". Letter to Science, 280, (5371):1815.

° Giles, C. Lee; Bollacker, K. & Lawrence, Steve (1998). "CiteSeer: An Automatic Citation Indexing System". ABSTRACT. Proceedings of the 3rd ACM Conference on Digital Libraries, pp. 89-98, 1998.PDF

° Lawrence, Steve & Giles, C. Lee. "Context and Page Analysis for Improved Web Search". IEEE Internet Computing, 2(4), July/August 1998:38-46. PDF

° Lawrence, Steve & Giles, C. Lee. (1998). "Searching the World Wide Web". Science, 280(5360):98.

° Lawrence, Steve & Giles, C. Lee (1998). "Inquirus, The NECI Meta Search Engine". Proceedings of the Seventh International World Wide Web Conference, Brisbane, Australia, 1998. PDF

° Lawrence, Steve & Giles, C. Lee (1998). "Searching the Web". Letter to Science, 281(5374):175

• Web Archeology. A project by Digital Research:

° Bharat Krishna &  Henzinger, Monika R.(1998). "Improved Algorithms for Topic Distillation in Hyperlinked Environments". Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval, 1998.

° Bharat, Krishna; Broder, Andrei; Henzinger, Monika;  Kumar, Puneet & Venkatasubramanian,  Suresh (1998). "The connectivity server: Fast access to linkage information on the Web". Proceedings of the 7th International World Wide Web Conference, 469-477, April 1998.

• Google is becoming one of the main engines, but it is also a very clever approach to the problem of searching great volumes of information using the hypertext links.

° Brin, Sergey & Page, Lawrence  (1998). The Anatomy of a Large-Scale Hypertextual Web Search Engine. Proceedings of the 7th International World Wide Web Conference, April 1998.

° Cho, Junghoo; Garcia-Molina, Hector &  Page, Lawrence (1998). Efficient Crawling Through URL Ordering. Proceedings of the 7th International World Wide Web Conference, April 1998.

• The Open GRiD Project is a proposal to add peer review and peer recognition to Google model in order to improve the ranking of webpages. In a descriptive way, the author (Maxim Lifantsev) call it the voting model.

° Lifantsev, Maxim (2000). Open Peer-Review as Web's Self-Organization Force.  Submitted to the 26th International Conference on Very Large Databases, Cairo, Egypt, September 2000.

° Lifantsev, Maxim (2000). Voting Model for Ranking Web Pages.  Accepted to the 1st International Conference on Internet Computing, Las Vegas, U.S.A., June 2000.

° Lifantsev, Maxim (1999). Rank Computation Methods for Web Documents. Technical Report TR-76, ECSL, Department of Computer Science, SUNY at Stony Brook, Stony Brook, NY, November 1999.

NEURAL NETWORKS AND SELF-ORGANISED MAPS

There is an extensive directory of bibliographies about neural networks in the University of Waterloo.

The Internet has offered a new scenario for representing by maps the scientometric relationships as it has been successfully showed by the projects developed in the CWTS. There is extensive information about neural networks.

But in the last months several proposals have increased the potential use of mapping data. Now these techniques are applied also to the information in the Web with plenty of options to improve the knowledge organisation of the Web contents by Mapping Web Sites.

• Planning Diagrams to Site Maps. Seminar by Paul Kahn
http://www.dynamicdiagrams.com/seminars
/mapping/maptoc.htm
DISAPPEARED

This could increase the capabilities of search engines, providing them with a new and friendly graphical interface.

WEBSOM: Self-Organizing Maps for Internet Exploration. This is the latest result of the work of Teuvo Kohonen (http://www.cis.hut.fi/nnrc/teuvo.html). The algorithm is developed in the following articles:

° T. Kohonen. Self-organizing formation of topologically correct feature maps. Biological Cybernetics, 43(1):59--69, 1982.

° T. Kohonen. The self-organizing map. Proceedings of IEEE, 78:1464--1480, 1990.

SEMIO. Semio's text-mining software provides the ability to discover and leverage new value in the glut of textual information on corporate Intranets.

° SemioMap® has an on-line demo; this tool displays how phrases interact across documents, uncovering meaning that is not readily apparent.

Cyberspace geography visualization. Mapping the World-Wide Web to help people find their way in cyberspace.

TouchGraph. This tool offers a solution by presenting a graph where similar items are clustered next to each other.

Opte Project. This project was created to make a visual representation of a space that is very much one-dimensional, a metaphysical universe.

NeuralWare. NeuralWare offers proven technology tools for developing and deploying neural networks and advanced empirical models.

Topic Maps. It is an independent consortium of parties interested in developing the applicability of the Topic Maps Paradigm to the World Wide Web, by leveraging the XML family of specifications as required.

NewMap. Newsmap is an application that visually reflects the constantly changing landscape of the 'Google News' news aggregator. A treemap visualization algorithm helps display the enormous amount of information gathered by the aggregator.

Mapping Cyberspace.

• Tamara Munzner and Paul Burchard. Visualizing the Structure of the World Wide Web in 3D Hyperbolic Space. Proceedings of VRML '95, (San Diego, California, December 14-15, 1995), special issue of Computer Graphics, ACM SIGGRAPH, New York, 1995, pp. 33-38.

• Kevin Gurney.Neural Nets.