ablate.org
v. tr.To remove by erosion, melting, evaporation, or vaporization
The Internet Archive
07.08.2002 01.07.15

The Web is ephemeral: pages blink in and out of existence
and change on a daily, sometimes hourly, basis. Researchers are troubled to cite
Web sites as sources because of the transient nature of data. The Web is not
really built to accommodate permanent data, and because of that, knowledge is
being lost.

How do we keep a record of this insubstantial place? The
Internet Archive has an answer. By using a browser plug-in from a company
called Alexa to acquire a snapshot of pages as users visit, the Internet
Archive has an archive of Websites that stretches back almost 6 years.

At last count the Archive was composed of over 100 terabytes
of information. To put this into perspective, an encyclopedia uses about a
gigabyte (1/1000 terabyte), and a book uses about a megabyte (1/1000 gigabyte),
the same as 1 floppy disk. That makes the Internet Archive the equivalent of a
hundred billion (100,000,000,000) floppy disks, the largest database ever
constructed.

This is all running on several hundred networked machines
that resemble your typical desktop PC. They are all running either Linux or
FreeBSD (a variant of Unix), and act as one large supercomputer. The advantage
of this model is that the hardware architecture is hugely scalable and very
reliable. If more computing power is needed, then new computers can be added at
will, and if a machine has a hardware failure it can simply be repaired or replaced
from inexpensive, off the shelf components.

This is a very important consideration at the Archive as it
is a non-profit organization with very little funding and the database is
growing at the astounding rate of 12 terabytes a month (which is equivalent to
the entire contents of the Library of Congress).

One of the first applications built using the Archive is The
Wayback Machine, jointly constructed by Alexa and the Internet Archive. This
application allows users to see and use Web sites as they existed at various
points in the past. This is of tremendous use both to the casual surfer (who
might enjoy the Wayback Machine for its nostalgic value) and to the
professional researcher (who would like to see a “snapshot” of the Web at
various points in time.

Archiving the Internet is a mind-numbingly large task, but
one that has great value. Our kudos and admiration go to those who, over 6
years ago (that’s 42 in dog and Internet years) had the foresight to begin
capturing sites when the Web as we know it was relatively small.

Posted by Brian at 01:07 AM
The Internet Archive
07.05.2002 10.53.18

The Web is ephemeral: pages blink in and out of existence
and change on a daily, sometimes hourly, basis. Researchers are troubled to cite
Web sites as sources because of the transient nature of data. The Web is not
really built to accommodate permanent data, and because of that, knowledge is
being lost.

How do we keep a record of this insubstantial place? The
Internet Archive has an answer. By using a browser plug-in from a company
called Alexa to acquire a snapshot of pages as users visit, the Internet
Archive has an archive of Websites that stretches back almost 6 years.

At last count the Archive was composed of over 100 terabytes
of information. To put this into perspective, an encyclopedia uses about a
gigabyte (1/1000 terabyte), and a book uses about a megabyte (1/1000 gigabyte),
the same as 1 floppy disk. That makes the Internet Archive the equivalent of a
hundred billion (100,000,000,000) floppy disks, the largest database ever
constructed.

This is all running on several hundred networked machines
that resemble your typical desktop PC. They are all running either Linux or
FreeBSD (a variant of Unix), and act as one large supercomputer. The advantage
of this model is that the hardware architecture is hugely scalable and very
reliable. If more computing power is needed, then new computers can be added at
will, and if a machine has a hardware failure it can simply be repaired or replaced
from inexpensive, off the shelf components.

This is a very important consideration at the Archive as it
is a non-profit organization with very little funding and the database is
growing at the astounding rate of 12 terabytes a month (which is equivalent to
the entire contents of the Library of Congress).

One of the first applications built using the Archive is The
Wayback Machine, jointly constructed by Alexa and the Internet Archive. This
application allows users to see and use Web sites as they existed at various
points in the past. This is of tremendous use both to the casual surfer (who
might enjoy the Wayback Machine for its nostalgic value) and to the
professional researcher (who would like to see a “snapshot” of the Web at
various points in time.

Archiving the Internet is a mind-numbingly large task, but
one that has great value. Our kudos and admiration go to those who, over 6
years ago (that’s 42 in dog and Internet years) had the foresight to begin
capturing sites when the Web as we know it was relatively small.

test

Posted by Brian at 10:53 AM