As described in the clip by Lars Wieneke, the David Boder websites from 2000 and 2009 changed over time as new technology became available. The older version from 2000 no longer exists on the “live” web, but a comparison between the version published in 2000 and the new 2009 version is possible thanks to the archived web. In this assignment, you will learn about the basics of the archived web and become familiar with one of the most popular and useful resources to access archived web pages – the Internet Archive’s Wayback Machine. At the same time you will learn about the challenges and limits of web archiving from a historian’s point of view.
See an archived version of the very first website ever created in 1991 by Tim Berners-Lee and archived by CERN in Switzerland:
The live web consists of any web page or website that you can access at this very moment. If you type the URL https://www.google.com
, the home page of what is currently the world’s most popular search engine, into your web browser, the page will appear and you can immediately access the search engine. This is different from the archived web, which consists of web pages or websites that you can no longer access but whose information has been saved in the form of past versions of that web page or website. Without the archived web we would run the risk of web content disappearing from the accessible historical record entirely. This is why knowing both how to deal with web archives and how to ensure that web content is preserved are key for the historian’s profession.
For a basic understanding of what web archives are, start by reading three sections from the publication Using Web Archives in Research: An Introduction by Janne Nielsen (Aarhus: NetLab, 2016)
Answer the questions below:
Questions:
use the answer form to write down the answers |
Now that you have some knowledge about what web archives are, read the hypothetical example below and answer the questions that follow.
Hypothetical example: You visit a small website that has information about the recent matches of a local football team on Tuesday 11 February. You look for the score of the most recent match and are able to find the information you are searching for. On Wednesday 12 February, you try to access the website again to show the score to your friend, but the website looks completely different. You eventually find the information but it’s on a different page on the website and it takes you more time to find it than before. When you visit the website a third time on Friday 14 February to check the results of the latest match, you discover that you can’t access the website at all, no matter how many times you refresh the page or which browser you use.
use the answer form to write down the answers |
In this sub-assignment, you will explore the history of the Internet Archive, a non-profit organisation that was established by computer engineer Brewster Kahle in 1996 with the objective of creating a complete digital record of the past. It is mentioned in the clip for this lesson, at minute 3:05, by engineer Lars Wieneke, when he explains how it is possible to retrieve “snapshots” of the David Boder website from 2000.
Taking into consideration that approximately 360-380 new websites were created every minute in 2020, Kahle has set himself a nearly impossible task. The way in which the Internet Archive’s Wayback Machine collects material from the web is by programming robots called web crawlers in such a way that they take “snapshots” of certain web pages at specific points in time. These snapshots are not screenshots; they allow users to navigate the web of the past in an interactive manner, although they only yield a selection of web pages rather than the entire history of a website. The scale of what is nonetheless archived remains quite impressive, especially bearing in mind that the number of pages on the live web is around 6 billion, while the Internet Archive Wayback Machine provides access to more than 480 billion archived web pages (as of November 2020). Additionally, the Internet Archive is no longer the only organisation that archives the web. Institutions at the national level (mostly national libraries) in Ireland, Denmark, Luxembourg, Switzerland, Portugal, France and the United Kingdom, to name but a few, curate web content that is relevant for their country. The latter two countries have even included web archives as a category in their legal deposits, meaning that web pages related to their citizens or state are considered as publications that are vital for records of the countries’ official history.
To familiarise yourself with the Internet Archive Wayback Machine, read this short web page and answer the following questions:
Next, use this PDF “cheat sheet” cit that gives an explanation of the Wayback Machine’s interface with Google’s home page as an example, to answer the following questions:
As a historian it is vital to place a resource such as the Wayback Machine in a historical context. Idealism and belief in the democratising power of technology lie at the heart of the creation of this resource. To understand this more clearly, watch these three video segments from a 2011 presentation by Brewster Kahle to the Long Now Foundation.
Now answer the following questions (with additional web research if needed):
use the answer form to write down the answers |
Now compare the narrative in the passages you watched from 2011 with the discourse adopted in the opening keynote talk by Brewster Kahle in 2019 at the Charleston Conference (from 3:29 to 12:22).
and answer the following questions:
use the answer form to write down your answers |
Finally, it is time for you to explore the Internet Archive’s web archiving tool, the Wayback Machine, yourself. While doing this assignment, keep in mind that owing to the sheer amount of data saved in its archives (over 25 petabytes or 25 million gigabytes as of 2018), the WM can sometimes take a little longer to load than the web pages you might be used to. Sometimes you need to refresh a page once or twice to get it to load properly.
First, go to the Wayback Machine and search for Google’s home page by entering the URL https://www.google.com/
in the search box. The query should look like this: https://web.archive.org/web/*/https://www.google.com/
Have a look at the note under the red button marked “Calendar”. How many captures of the web page at this URL have been made and over what period?
use the answer form to write down the answers |
Now, below the information that gives you the number of captures of the URL that have been made but above the calendar view of the past 12 months, scroll through the chronological bar graph showing all the years the URL has been saved. Find and click on the very first year that this web page was saved and then search for the month in which the very first snapshot was made. Finally, click on that first snapshot of Google saved by the Wayback Machine.
Think about some of the classic questions that you would ask of any historical source: Who created this? When did they create it? Why did they create it? How did they create it? What was their purpose in creating it? How was it used at the time? How did it look at the time compared to other similar sources?
Who created this? |
When did they create it? |
Why did they create it? |
How did they create it? |
What was their purpose in creating it? |
How was it used at the time? |
How did it look at the time compared to other similar sources? |
use the answer form to write down the answers |
Next, go to the next saved date of the Google home page (it should be in the following month) and click on the snapshot. Explore the content of the page and its hyperlinks to find out the purpose of the company and the reason they choose the name “Google”.
Purpose |
Why the name “Google”? |
Now go back to the current home page of the Wayback Machine https://web.archive.org/ and search for YouTube’s home page using the URL https://www.youtube.com
. Find the first saved snapshot available on the Wayback Machine, and click on the link.
use the answer form to write down the answers |
Take a screenshot of this first snapshot. Then try to click on the tabs in this snapshot (“favorites”, “messages”, “videos”).
use the answer form to write down the answers |
Now, go back to your calendar view of all the times that YouTube has been saved by the Wayback Machine. Click on a snapshot from 10 years after the first snapshot was made and take a screenshot of the snapshot. Complete the table below indicating differences between the features of YouTube in 2005 and 2015. Note: if you don’t find the information below in the snapshots, conduct additional research and note which other source(s) you consulted to find the information.
2005 | 2015 | |
Your screenshot | ||
Design | ||
Navigation links at the top | ||
Information about the company | ||
What can be accessed | ||
Notes about privacy | ||
Notes about copyright |
Would the Wayback Machine be useful to you if you wanted to know whether the character of the uploaded material on YouTube has changed between 2005 and 2010?
| use the answer form to complete the table and write down your answers |