The Web as a historical source; what historians need to know

complete these assignments M

M Assignments (5)

1 out of 5 — The web and its technologies ¶

In the clip above, engineer Lars Wieneke explains how over time web technologies increasingly broadened the range and scale of data that could be shared and shown through the web. To illustrate these changes he elaborates on the two websites about the interview collection of the psychologist David Boder, the topic of another lesson on Ranke2, that were developed in 2000 and 2009. Understanding the changes brought about by software and languages such as XML (Extensible Markup Language) and PHP (Hypertext Preprocessor) is crucial in being able to apply source criticism to a website. However, as historians we should first place the topic into its historical context: how did websites evolve in the first place and what technologies were needed to make them work? These assignments will briefly explore the history of the web and the technological developments that make it work. They will then dive into the differences between the web and the internet, before discussing the physical infrastructure that allows the world to be globally connected.

card

Watch this 35 min documentary created by the Web Foundation about how Tim Berners Lee created the world wide web.

1.a The history of the web and the technology behind it

1.b The difference between the web and the internet

1.c The materiality of the internet

1.d Information systems and power

Reading/viewing suggestions

learning outcomes

Understand the basics of web technology
Understanding how to deal with web archives

2 out of 5 — The archived web for historians ¶

As described in the clip by Lars Wieneke, the David Boder websites from 2000 and 2009 changed over time as new technology became available. The older version from 2000 no longer exists on the “live” web, but a comparison between the version published in 2000 and the new 2009 version is possible thanks to the archived web. In this assignment, you will learn about the basics of the archived web and become familiar with one of the most popular and useful resources to access archived web pages – the Internet Archive’s Wayback Machine. At the same time you will learn about the challenges and limits of web archiving from a historian’s point of view.

See an archived version of the very first website ever created in 1991 by Tim Berners-Lee and archived by CERN in Switzerland:

card

2.a The difference between the live web and the archived web

The live web consists of any web page or website that you can access at this very moment. If you type the URL https://www.google.com, the home page of what is currently the world’s most popular search engine, into your web browser, the page will appear and you can immediately access the search engine. This is different from the archived web, which consists of web pages or websites that you can no longer access but whose information has been saved in the form of past versions of that web page or website. Without the archived web we would run the risk of web content disappearing from the accessible historical record entirely. This is why knowing both how to deal with web archives and how to ensure that web content is preserved are key for the historian’s profession.

For a basic understanding of what web archives are, start by reading three sections from the publication Using Web Archives in Research: An Introduction by Janne Nielsen (Aarhus: NetLab, 2016)

Answer the questions below:

Section 1 “Introduction” (pp. 6-10)
Section 2.1 “Main Types of Web Archiving” (pp. 11-12)
Section 2.5 “Characteristics of the Archived Material” (p. 23)

Questions:

Why were web archives created and how can they be used?
What is the definition of web archiving as given by the International Internet Preservation Consortium (IIPC)?
How often did a typical web page last before it changed or disappeared from the year 2000 to 2010? How does this differ from the pace of changes at present?
In what ways is web archiving different from traditional archiving?
What are some challenges associated with web archiving?
Imagine yourself in a future professional role (not necessarily a historian): in what kind of situation would knowledge about web archiving be useful?

use the answer form to write down the answers

Now that you have some knowledge about what web archives are, read the hypothetical example below and answer the questions that follow.

Hypothetical example: You visit a small website that has information about the recent matches of a local football team on Tuesday 11 February. You look for the score of the most recent match and are able to find the information you are searching for. On Wednesday 12 February, you try to access the website again to show the score to your friend, but the website looks completely different. You eventually find the information but it’s on a different page on the website and it takes you more time to find it than before. When you visit the website a third time on Friday 14 February to check the results of the latest match, you discover that you can’t access the website at all, no matter how many times you refresh the page or which browser you use.

What are some potential reasons why the website changed and then disappeared between the different dates that you accessed it?
How might you go about trying to recover the information you remember seeing but can no longer access?

use the answer form to write down the answers

2.b Familiarising yourself with the Wayback Machine

In this sub-assignment, you will explore the history of the Internet Archive, a non-profit organisation that was established by computer engineer Brewster Kahle in 1996 with the objective of creating a complete digital record of the past. It is mentioned in the clip for this lesson, at minute 3:05, by engineer Lars Wieneke, when he explains how it is possible to retrieve “snapshots” of the David Boder website from 2000.

Taking into consideration that approximately 360-380 new websites were created every minute in 2020, Kahle has set himself a nearly impossible task. The way in which the Internet Archive’s Wayback Machine collects material from the web is by programming robots called web crawlers in such a way that they take “snapshots” of certain web pages at specific points in time. These snapshots are not screenshots; they allow users to navigate the web of the past in an interactive manner, although they only yield a selection of web pages rather than the entire history of a website. The scale of what is nonetheless archived remains quite impressive, especially bearing in mind that the number of pages on the live web is around 6 billion, while the Internet Archive Wayback Machine provides access to more than 480 billion archived web pages (as of November 2020). Additionally, the Internet Archive is no longer the only organisation that archives the web. Institutions at the national level (mostly national libraries) in Ireland, Denmark, Luxembourg, Switzerland, Portugal, France and the United Kingdom, to name but a few, curate web content that is relevant for their country. The latter two countries have even included web archives as a category in their legal deposits, meaning that web pages related to their citizens or state are considered as publications that are vital for records of the countries’ official history.

credits J.Blyberg

To familiarise yourself with the Internet Archive Wayback Machine, read this short web page and answer the following questions:

What is the difference between the Internet Archive and the Wayback Machine?
What is the criterion for web pages to be collected by the Wayback Machine?
Who can access the information available on the Wayback Machine?
What organisations does the IA work with and what is the rationale behind the collaboration?

Next, use this PDF “cheat sheet” cit that gives an explanation of the Wayback Machine’s interface with Google’s home page as an example, to answer the following questions:

Do you search for snapshots (saved versions) of a web page in the Wayback Machine with key terms or with a specific URL?
What are the benefits and disadvantages of the Wayback Machine’s system of finding past saved versions of a web page?
Where can you find the number of times a particular URL has been saved in the Wayback Machine?
What information can you tell about a snapshot just by looking at its URL?
How can you find out which organisation or robot performed the crawl for the snapshot you are looking at?

2.c The original ambitions of the Wayback Machine compared to later thoughts

2.d Explore how the web pages of Google and YouTube have been preserved

Finally, it is time for you to explore the Internet Archive’s web archiving tool, the Wayback Machine, yourself. While doing this assignment, keep in mind that owing to the sheer amount of data saved in its archives (over 25 petabytes or 25 million gigabytes as of 2018), the WM can sometimes take a little longer to load than the web pages you might be used to. Sometimes you need to refresh a page once or twice to get it to load properly.

First, go to the Wayback Machine and search for Google’s home page by entering the URL https://www.google.com/ in the search box. The query should look like this: https://web.archive.org/web/*/https://www.google.com/

google

Have a look at the note under the red button marked “Calendar”. How many captures of the web page at this URL have been made and over what period?

use the answer form to write down the answers

Now, below the information that gives you the number of captures of the URL that have been made but above the calendar view of the past 12 months, scroll through the chronological bar graph showing all the years the URL has been saved. Find and click on the very first year that this web page was saved and then search for the month in which the very first snapshot was made. Finally, click on that first snapshot of Google saved by the Wayback Machine.

Think about some of the classic questions that you would ask of any historical source: Who created this? When did they create it? Why did they create it? How did they create it? What was their purpose in creating it? How was it used at the time? How did it look at the time compared to other similar sources?

Using the PDF “cheat sheet” as a guide, answer the classic historical source criticism questions above and feel free to add any others you can think of.

Who created this?

When did they create it?

Why did they create it?

How did they create it?

What was their purpose in creating it?

How was it used at the time?

How did it look at the time compared to other similar sources?

use the answer form to write down the answers

Next, go to the next saved date of the Google home page (it should be in the following month) and click on the snapshot. Explore the content of the page and its hyperlinks to find out the purpose of the company and the reason they choose the name “Google”.

Purpose

Why the name “Google”?

Now go back to the current home page of the Wayback Machine https://web.archive.org/ and search for YouTube’s home page using the URL https://www.youtube.com. Find the first saved snapshot available on the Wayback Machine, and click on the link.

What are your first impressions of YouTube’s then home page?

use the answer form to write down the answers

Take a screenshot of this first snapshot. Then try to click on the tabs in this snapshot (“favorites”, “messages”, “videos”).

What happens? What does this mean for using the Wayback Machine as a resource for archiving web pages?

use the answer form to write down the answers

Now, go back to your calendar view of all the times that YouTube has been saved by the Wayback Machine. Click on a snapshot from 10 years after the first snapshot was made and take a screenshot of the snapshot. Complete the table below indicating differences between the features of YouTube in 2005 and 2015. Note: if you don’t find the information below in the snapshots, conduct additional research and note which other source(s) you consulted to find the information.

	2005	2015
Your screenshot
Design
Navigation links at the top
Information about the company
What can be accessed
Notes about privacy
Notes about copyright

Would the Wayback Machine be useful to you if you wanted to know whether the character of the uploaded material on YouTube has changed between 2005 and 2010?

use the answer form to complete the table and write down your answers

Reading/viewing suggestions

learning outcomes

Understand the basics of web archiving
Understanding the policies and complexities of webarchiving

3 out of 5 — Web archives and the national dimension: The example of the Luxembourg Web Archive and luxembourg.lu ¶

Fortunately, the Internet Archive is not the only institution that is trying to archive the web. Several other institutions are active on a smaller scale, usually for websites considered important or relevant to specific countries. Several European countries, such as Finland, France, Ireland, Spain and Sweden, have even included web archives in the legal deposit of their country, which means that they have attributed a similar status to websites as that given to archive material such as books and newspapers for centuries. See the examples of the UK, Denmark and France.

Yet the task of archiving the entire web is not an easy one. The explosive growth of online content, especially since the 2000s, has made it impossible for archives and organisations to archive every single website and its various versions over time. Even as the technological expertise at these institutions has increased, a decrease in activities at the national level can be observed, which leads to a stronger dependency on the IA. This is luckily not the case in Luxembourg, where the National Library (Bibliothèque nationale du Luxembourg or BnL) has been tasked with archiving the Luxembourgish web since 2016.

card

3.a Challenges and limits of preservation: The example of the Luxembourg Web Archive

3.b Exploring Luxembourg web content

The example of the LWA in assignment 3.a shows the contradiction between the principle of openness and the limits of institutions in sustaining an open system when they also have to take into account the rights of copyright holders. Because of the latter, we do not have online access to the information within the LWA, as it can only be used within certain library premises. However, we are lucky that some of the content in the LWA is also available online through the Internet Archive’s Wayback Machine (WM). As such, this lesson makes use of the Wayback Machine, rather than the LWA, for the next sub-assignment on archived Luxembourgish web content. The BnL has a contract with IA to crawl the Luxembourgish web, meaning that they offer the same content, but the BnL offers some complementary tools to explore the collections.

Download the teaching aid for working with the interface of the Wayback Machine.

Go to the Internet Archive’s Wayback Machine, copy and paste the URL www.luxembourg.lu and press the “Enter” key. You should arrive on the page https://web.archive.org/web/*/www.luxembourg.lu. Referring to the WM interface user guide, answer the following questions:

Select two elements on this page that provide information about how the Internet Archive has archived the web page that corresponds to this specific URL.
Select one element that provides information about the archived web page itself.
What does the note under the calendar tell us? How many captures of the site www.luxembourg.lu have been made and over what period? Is the distribution of captures evenly spread over time?

use the answer form to write down the answers

Under the URL search bar, click on the “Summary” button for statistics and details about the archived snapshots of the www.luxembourg.lu web page.

How many captures of the site were made in 1996? How many in 2005?
What can be inferred from this? Why do you think the number of captures might be uneven over time?
What does this mean for using the Wayback Machine as a resource for archiving web pages?

use the answer form to write down the answers

Now return to the “Calendar” view. Between the line indicating the number of times that the web page has been saved and the calendar below, hover over the timeline and click on the year 1996 (you might need to scroll to the left). Next, select the snapshot taken on 9 November 1996 at 20:07:38. This should bring you to the archived page: https://web.archive.org/web/19961109200738/http://www.luxembourg.lu/

What does this page show?
Is there any information on the date of creation or the author? What about any temporal information?

use the answer form to write down the answers

Following the same steps above, now go to the archived snapshot from 22 March 2003 taken at 18:49:25: https://web.archive.org/web/20030322184925/http://www.luxembourg.lu/

What does this page show?
Is there any more information about the source (date, author, etc.)? What elements seem to be relevant or missing?
What does the “About this capture” button in the upper right-hand corner tell us?
In your opinion, what is the most noticeable change between the archive of 9 November 1996 and that of 22 March 2003?

use the answer form to write down the answers

Lastly, go back to the “Calendar” view and choose a snapshot taken within the last five years. What are some of the first things you notice about it?

use the answer form to write down the answers

Based on these three archived snapshots from 1996, 2003 and the last five years, write a paragraph summarising how the luxembourg.lu website has changed over a 20-year period.

First start with some basic questions: Who created this website and why? What kind of functions did a website offer at the time? How were these functions catered for before the evolution of the web? Did the function of the website change over time?
Then identity changes in the appearance and content of the website: consider the URLs, colours/fonts, structure, tabs, etc. on the basis of the information that is available on the Wayback Machine. Make sure you also discuss the possible uses of this archived site for historians. (Hint: think about this from different perspectives, such as historians of Luxembourg or historians of the digital world.)

enter your text in the answer form

Reading/viewing suggestions

learning outcomes

Understand preservation policies with regard to websites of various institutions

4 out of 5 — The policy of an international institution: How the European Union preserves its web archives ¶

The act of archiving is not just driven by neutral concerns for preservation. It is very much embedded in ways of prolonging and solidifying one’s identity, status and position. According to Janne Nielsen, who proposes a clear distinction between “macro” and “micro” archiving, it is important to differentiate, for example, between a powerful institution that designs a preservation strategy for prosperity with a broad imaginary future audience in mind (“macro”) and a scholar at the end of a funded project who manages to conserve her data for future use within her academic career (“micro”). In the case of the EU, as the examples below show, preservation is also relevant for reasons of transparency about how decisions are taken or how legal frameworks intended to protect citizens and their cultural heritage evolve over time. The case study presented here – how the European Union deals with the preservation of its web archives – is an example of macro-archiving. The “level” of archiving in this context should be kept in mind throughout the example.

4.a The European Union Web Archive

The European Union) was created within a specific geopolitical context in the aftermath of the Second World War to stimulate trade and collaboration between European countries and diminish the risk of future violent conflict. As this context developed and new countries joined the EU, the EU changed accordingly, and one of these changes was the increase in the number of EU agencies. With the advent of the web in the 1990s, the way in which these institutes could present themselves to their audiences was also affected. Thus, when consulting a web archive for an EU agency, it is important to understand what the function of the website was for that agency at that point in time. Ask yourself: Was creating a website and publishing texts about the mission of the agency something completely new at the time, or was it a continuation of an existing practice – such as informing the outside world about what an organisation does – through the new medium of the website? Reflecting on this question will help you understand the value of a website as a historical source.

There are two ways of consulting the web archives of the EU. You can use the Internet Archive’s Wayback Machine, which has taken random snapshots since the very beginning of the web in 1996 on a non-profit basis, or you can consult the European Union Web Archive), which consists of web pages that are covered more systematically by the Internet Archive’s paid service, Archive-It, which started to offer these services in 2006. The institution responsible for preservation policy is the Publications Office of the European Union. Based in Luxembourg, this inter-institutional organisation has the task of producing and disseminating the EU’s publications, providing free access to official information and data from the EU, and ensuring the long-term preservation of content produced by EU institutions and bodies. For this assignment we will use the Wayback Machine and our corresponding “cheat sheet” to compare the archived content of two EU agencies over time: the European Court of Justice and the European Institute for Gender Equality.

Tip: The best way to compare the two is to take a screenshot of each home page, decrease the size, and then place them next to each other on your screen.

Open the Internet Archive’s Wayback Machine) in two different tabs. In the first tab, search for the snapshots of the European Court of Justice (ECJ) using the URL of its home page. In the second tab, search for the snapshots of the European Institute for Gender Equality (EIGE) using the URL of its home page. Then, using the web archives and any additional web research needed, complete the table below:

	European Court of Justice	European Institute for Gender Equality
Overview
In what year was it founded? (if necessary make use of Wikipedia)
What is the mission of the agency?
Older archived web pages
What are the first archived web pages that you can find with the Wayback Machine? What year and what date? In what language(s)?
What are the titles of the different pages of the website? Does the structure make sense to you?
With the oldest archived web page available, try and find the page that gives the best overview of what the website contains and share the heading and the link of that page (often this is the home page, but not always)
Is there a search function that gives access to one or more databases? What do these databases contain?
Can you upload or download content?
The most recent archived versions of the websites (this year)
Reflect on the changes that you can identify between the first and the last archived website of the respective EU institutions
Do the changes reflect a different course that the agency took during this period? This could be: expanding its mission and/or services merging with other agencies or moving its premises to cut costs adjusting its policies to adapt to new countries joining the EU
Do the changes consist of a more modern design and/or an increase in functionalities? This could be : from static to interactive providing access to databases with search functionalities moving from face-to-face to online services offering streaming services providing a web design that is responsive to tablets and smartphones

use the answer form to write down the answers

4.b The websites of the two EU agencies within the broader picture of web history

Reading/viewing suggestions

Amaro, Silvia. “How does the EU work.” CNBC International. March 28, 2019. Video, 5:29. Youtube Video
European Comission. “Archiving.” Accessed December 7, 2022. Website

learning outcomes

Understand preservation policies with regard to websites of various institutions

5 out of 5 — Micro-archiving: Family and personal archives in the digital age ¶

As creating websites has become increasingly easy for people without a technological background, there has been a steady increase in historical websites created by individuals that make use of open source applications. Wordpress, Google and Weebly are examples of such resources. The people who set up websites like this are often volunteers who want to share their passion for a subject or relatives of an older generation who want to preserve a record of a certain historical practice that they regard as important. The focus is often on local topics: the landscape, the village, the city or the neighbourhood. Yet the most documented experiences are of migration and war. The practice of passing family photo albums from one generation to the next does not exist when it comes to digital family heritage. How should websites or pages (with potential historical relevance) that are not embedded in an institutional setting or are posted on social media platforms be preserved for the future? On the basis of a case study below, you are going to explore the best strategies for micro-archiving.

5.a A website about a grandfather’s life

One of the southern regions in the Netherlands, Limburg, used to be known for its coal mines. These were all closed down at the end of the 1960s,when much cheaper gas resources were found in the north of the Netherlands. The whole culture around coal miners was an important marker for the regional history of Limburg. This is the context in which Roy Simons – aged just 10 – started to document all the objects and stories that his grandfather had preserved about his life working in the mines. By 2012, at the age of 16, Roy had put together the website Mijn Museum – De Beukel (Mine Museum – The Beukel), which looked like this (this is a link to Roy’s website, which was created using a template provided by Webklik).

Roy's website

Roy ended up winning the history online award for historical websites, initiated by the online history magazine Historiek.

online history magazine Historiek

While it was an excellent website, Roy’s website is an example of how these types of sites run the risk of disappearing over time. He succeeded in creating his website by making use of one of the first free website services in the Netherlands, Webklik.

webklik

But this service did not last and all the content was moved to a service called Weebly.

The last snapshot made by the Wayback Machine with webklik is on 10 September 2014.

The first mention with the URL weebly is on 6 January 2015 https://web.archive.org/web/20151206030556/http://mijnbouw.weebly.com/

Roy's website

This is still the URL where the website can be found at present, but the appearance of the site has completely changed. The home page does not have a personal welcome message and the multiple photos of a coal miner with a helmet and a drill are omitted. All references to Roy Simons and his grandfather are taken out. The “De Beukel collection” is now the entity that is central to the presentation, and no mention is made of a person or relative.

We know about these changes because of the previous snapshots taken by the Wayback Machine at the old URL.

While this is a very specific example (not all grandparents are honoured with a website about their lives!), the web is full of family histories about grandfathers and grandmothers. Trace some of these stories online in one of the languages you speak. You will find material by googling this combination of keywords:

Stories from my grandparents / grandfather/ grandmother
Geschichte meine Grosseltern / Grossmutter/ Grossvater
Histoire de mes grands-parents / ma grand-mère / mon grand-père

Now note the URL(s) of at least one of the websites that you have found and write a short sketch about the kinds of online environments in which these stories can be found and what this means for their long-term preservation.

Are these websites “safe” when published online, in the sense that their preservation is being taken care of? Check whether the Wayback Machine has covered the content of the website sufficiently.
Is the information about the grandparent created on the spot, like a blog post, or is it something that already existed in digital form and has been uploaded to the web environment?
If there are two files, one on your device or computer – e.g. the interview you conducted with your grandfather – and one online that you uploaded to a website where memories of a region are collected, do you need to preserve both? Think about the interactive nature of a website and the need to convert big files to be able to publish them on the web. How would you go about doing so?

5.b Archiving your own future historical sources

learning outcomes

Understand how to trace the history of a website
Understanding the historical value of personal digital data

	Common things done on a computer/smartphone	Uses the internet but not the web	Uses both the internet and the web
Searching for a picture on Google
Using Skype for a video call
Checking email with an application
Looking at your Facebook or Instagram feed on a web browser
Sharing a file with someone via a peer-to-peer file network
Downloading a file from a website
Making a call via FaceTime

The Web as a historical source; what historians need to know

Introduction

S The Web as a historical archive; what historians need to know

M Assignments (5)

1 out of 5 — The web and its technologies ¶

1.a The history of the web and the technology behind it

1.b The difference between the web and the internet

1.c The materiality of the internet

1.d Information systems and power

Reading/viewing suggestions

learning outcomes

2 out of 5 — The archived web for historians ¶

2.a The difference between the live web and the archived web

2.b Familiarising yourself with the Wayback Machine

2.c The original ambitions of the Wayback Machine compared to later thoughts

2.d Explore how the web pages of Google and YouTube have been preserved

Reading/viewing suggestions

learning outcomes

3 out of 5 — Web archives and the national dimension: The example of the Luxembourg Web Archive and luxembourg.lu ¶

3.a Challenges and limits of preservation: The example of the Luxembourg Web Archive

3.b Exploring Luxembourg web content

Reading/viewing suggestions

learning outcomes

4 out of 5 — The policy of an international institution: How the European Union preserves its web archives ¶

4.a The European Union Web Archive

4.b The websites of the two EU agencies within the broader picture of web history

Reading/viewing suggestions

Reading/viewing suggestions

learning outcomes

5 out of 5 — Micro-archiving: Family and personal archives in the digital age ¶

5.a A website about a grandfather’s life

5.b Archiving your own future historical sources

learning outcomes

table of contents