Web archives and the national dimension: The example of the Luxembourg Web Archive and luxembourg.lu

Fortunately, the Internet Archive is not the only institution that is trying to archive the web. Several other institutions are active on a smaller scale, usually for websites considered important or relevant to specific countries. Several European countries, such as Finland, France, Ireland, Spain and Sweden, have even included web archives in the legal deposit of their country, which means that they have attributed a similar status to websites as that given to archive material such as books and newspapers for centuries. See the examples of the UK, Denmark and France.

Yet the task of archiving the entire web is not an easy one. The explosive growth of online content, especially since the 2000s, has made it impossible for archives and organisations to archive every single website and its various versions over time. Even as the technological expertise at these institutions has increased, a decrease in activities at the national level can be observed, which leads to a stronger dependency on the IA. This is luckily not the case in Luxembourg, where the National Library (Bibliothèque nationale du Luxembourg or BnL) has been tasked with archiving the Luxembourgish web since 2016.

card

3.a Challenges and limits of preservation: The example of the Luxembourg Web Archive

In this sub-assignment, the Luxembourg Web Archive (LWA) managed by the BnL will be explored to identify the benefits, limitations and challenges of web archiving. Web archivists have the task of defining the parameters and criteria for their digital collection of web pages. They have to decide which types of sites they should archive (i.e. government sites, news agency sites, museum sites, personal blogs, etc.), how many times a year snapshots should be taken, and how this content can be made accessible to their audience.

lux-web

Imagine that you work at the National Library of Luxembourg and your job is to help select and collect all the relevant information on the web that pertains to the Grand Duchy of Luxembourg. Now imagine that a secondary school student wants to understand how the LWA works and has asked you to help explain it to her. Using the information found on the main page of the Luxembourg Web Archive, its FAQ section and the “How it Works” page, answer the following questions that are posed by this imaginary secondary school student:

How it Works

  • Regarding the collection policy of the LWA, who or what are Luxembourgish publishers?
  • How many times a year do you (the BnL) archive the Luxembourgish web?
  • Which types of websites do you archive (e.g. government sites, news agency sites, personal blogs, etc.)?
  • How can someone access the information in the LWA?
  • What is your process for collecting information for the LWA? (Hint: it’s a three-step process)
  • What are the three different types of crawls used by the LWA and what are the differences between them?
  • What are some of the reasons that you might decide not to collect a website?
use the answer form to write down the answers

3.b Exploring Luxembourg web content

The example of the LWA in assignment 3.a shows the contradiction between the principle of openness and the limits of institutions in sustaining an open system when they also have to take into account the rights of copyright holders. Because of the latter, we do not have online access to the information within the LWA, as it can only be used within certain library premises. However, we are lucky that some of the content in the LWA is also available online through the Internet Archive’s Wayback Machine (WM). As such, this lesson makes use of the Wayback Machine, rather than the LWA, for the next sub-assignment on archived Luxembourgish web content. The BnL has a contract with IA to crawl the Luxembourgish web, meaning that they offer the same content, but the BnL offers some complementary tools to explore the collections.

Download the teaching aid for working with the interface of the Wayback Machine.

Go to the Internet Archive’s Wayback Machine, copy and paste the URL www.luxembourg.lu and press the “Enter” key. You should arrive on the page https://web.archive.org/web/*/www.luxembourg.lu. Referring to the WM interface user guide, answer the following questions:

  • Select two elements on this page that provide information about how the Internet Archive has archived the web page that corresponds to this specific URL.
  • Select one element that provides information about the archived web page itself.
  • What does the note under the calendar tell us? How many captures of the site www.luxembourg.lu have been made and over what period? Is the distribution of captures evenly spread over time?
use the answer form to write down the answers

Under the URL search bar, click on the “Summary” button for statistics and details about the archived snapshots of the www.luxembourg.lu web page.

  • How many captures of the site were made in 1996? How many in 2005?
  • What can be inferred from this? Why do you think the number of captures might be uneven over time?
  • What does this mean for using the Wayback Machine as a resource for archiving web pages?
use the answer form to write down the answers

Now return to the “Calendar” view. Between the line indicating the number of times that the web page has been saved and the calendar below, hover over the timeline and click on the year 1996 (you might need to scroll to the left). Next, select the snapshot taken on 9 November 1996 at 20:07:38. This should bring you to the archived page: https://web.archive.org/web/19961109200738/http://www.luxembourg.lu/

  • What does this page show?
  • Is there any information on the date of creation or the author? What about any temporal information?
use the answer form to write down the answers

Following the same steps above, now go to the archived snapshot from 22 March 2003 taken at 18:49:25: https://web.archive.org/web/20030322184925/http://www.luxembourg.lu/

  • What does this page show?
  • Is there any more information about the source (date, author, etc.)? What elements seem to be relevant or missing?
  • What does the “About this capture” button in the upper right-hand corner tell us?
  • In your opinion, what is the most noticeable change between the archive of 9 November 1996 and that of 22 March 2003?
use the answer form to write down the answers

Lastly, go back to the “Calendar” view and choose a snapshot taken within the last five years. What are some of the first things you notice about it?

use the answer form to write down the answers

Based on these three archived snapshots from 1996, 2003 and the last five years, write a paragraph summarising how the luxembourg.lu website has changed over a 20-year period.

  • First start with some basic questions: Who created this website and why? What kind of functions did a website offer at the time? How were these functions catered for before the evolution of the web? Did the function of the website change over time?
  • Then identity changes in the appearance and content of the website: consider the URLs, colours/fonts, structure, tabs, etc. on the basis of the information that is available on the Wayback Machine. Make sure you also discuss the possible uses of this archived site for historians. (Hint: think about this from different perspectives, such as historians of Luxembourg or historians of the digital world.)
enter your text in the answer form

Reading/viewing suggestions

  • Nielsen, Janne. Using Web Archives in Research: An Introduction. Aarhus: NetLab, 2016. E-book
  • Brügger, Niels, and Ralph Schroeder, eds. The Web as History: Using Web Archives to Understand the Past and the Present. London: UCL Press, 2017. E-book
  • Huss, Nick. “How Many Websites Are There Around The World?” Sateefy. Last modified November 27, 2022. Website
  • Potter, Abby. “Why Archive The Web.” IIPC. October 18. 2012. Video, 2:53. Youtube Video
  • Library of Congress. “Web Archiving.” November 30, 2009. Video, 3:10. Youtube Video
  • World Wide Web Size research project measuring the size of the indexed web by Tilburg University (updated on a daily basis) https://www.worldwidewebsize.com/