Go to Top

The Structure of the Web


A. Broder et al Graph structure in the Web Computer Networks (2000)

The web remains the largest text corpora to date, although in recent years corporate archives are beginning to catch up. But the web is more than just a massive text database. The above picture is one I enjoy geeking out over. It shows the structure of the web. Here’s a description of its parts:

SCC: This is the Strongest Connected Component, a group of pages connected by links where, from any page, you can get to any other page via a series of links and then back to the original page via another series of links. This is believed to be the largest component, though this can not be proven at present.

In Component: Those pages from which you can get to the SCC, but cannot get back to from the SCC. Think someone who links their personal page to all their friends pages, but gets no links in return.

Out Component: Those pages that you can get to from the SCC, but once there, cannot get back to the SCC.

Tube: A page in the In Component that links directly to a page in the Out component without going through the SCC.

Tendril: This is a difficult thing to explain. My best attempt is as a subset of pages that are all strongly connected, that may be linked to from either the In, Out, or SCC, but once you are there, there are no links out of the Tendril (like a company internal page or set of pages).

facebooktwittergoogle_plusredditpinterestlinkedinmailby feather

One Response to "The Structure of the Web"

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>