Plan on the scope of the collecting of online materials and on the related deposit practices, pursuant to the Act on Collecting and Preserving Cultural Materials (1433/2007, 9 §).

Development targets and focus areas

  • constantly developing the National Library's technical competence and storage media as online publishing becomes more diverse
  • redesigning the deposit infrastructure to include all materials covered by legal deposit
  • increasing cooperation with researchers and experts to survey materials to be stored in the web archive
  • promoting research use of archived materials.

1 Introduction

The duties of the National Library of Finland include archiving Finnish publications according to the Act on Collecting and Preserving Cultural Materials (1433/2007). According to the Act, the National Library must present a plan on the scope of the collecting of online materials and on the related deposit practices, to be approved by the Ministry of Education and Culture (9§).

This collection plan for online materials covers the period 2017–2020. The plan takes into account the needs of pertinent research and cultural-historical archiving as well as the equal treatment of online publishers as specified in the Act (9 §). The plan may be reviewed during its period of validity if the Finnish publishing industry or the technical or financial resources available to the National Library change significantly.

In addition to the Act on Collecting and Preserving Cultural Materials, archiving of cultural heritage is regulated by Unesco. In selecting materials to be preserved as well as in developing and implementing collecting and preservation technologies, international cooperation with partners such as the IIPC and other organisations archiving online materials will continue to gain importance. This is particularly true as online publishing further develops and becomes increasingly international.

One of the key issues of the strategic period 2017–2020 is the development of legislation guiding this work, both in terms of archiving material and of the research use of the archived materials. As the amount of digital material increases exponentially, the National Library must focus more on which material to collect and how to develop the related technology. In addition, development of wider research use of the archived materials will become more central.

2 Materials to be collected and collection methods

In addition to websites openly available to the public, the Finnish web archive also collects material protected by a paywall, news articles in particular. Social media content from platforms such as Twitter, Facebook and YouTube will also be collected, especially in themed web harvests.

Other online publications, such as digital books, music, and periodicals, are collected through deposit requests.

3 Collecting websites

Materials openly available to the public through information networks are collected

  1. through the yearly Finnish domain web harvests,
  2. through themed web harvests focusing on a specific topic or event, and
  3. through web harvests of continuously updating contents of newspapers, magazines, and news sites.

The yearly Finnish domain harvests archive an overview of Finnish online publishing. In this harvest, conducted at least once a year, the National Library collects websites on the .fi or .ax domains, as well as Finnish webpages from other domains, using language and country identifying tools. These harvests are supplemented by harvesting social media contents.

Themed web harvests cover a broader and more comprehensive sample of online material focusing on a specific topic or material type, also from social media. Themed web harvests are made on the following topic areas:

  1. national and international events, e.g. elections
  2. other events and phenomena, e.g. festivals, sports events, natural disasters
  3. political and societal situations
  4. other web harvests focusing on a specific topic or material type.

The National Library surveys the online materials to be included in themed web harvests also in cooperation with researchers, other experts, and the general public. Themed web harvests on international topics and phenomena may be conducted in cooperation with, e.g., the IIPC or organizations responsible for online archiving in different countries.

Collecting of continuously updating contents focus on articles published online on a daily, weekly or monthly basis, depending on the publishing/ updating frequency of the website.

Web harvesting and collecting methods

The technology for automatic collecting of online material has changed, due to the increased diversity of material to be collected, from a single established method (the Heritrix web crawler) to a selection of different methods which require constant effort to develop and to make compatible with each other. For example, the Heritrix web crawler may not be able to harvest material from behind paywalls or from social media platforms. To collect such material, the National Library has already adopted and continues to develop open-source applications which are used to log on to the websites and to collect material either directly or through the interfaces provided by the platforms. New types of material require new processing solutions in order to ensure their long-term preservation. The National Library is involved in international cooperation aiming to improve collecting tools.

4 Collecting other digital publications (e.g. e-books, e-magazines, online music)

Online materials which cannot be collected automatically are requested as deposits to the National Library (Government proposal 68/2007). Such publications typically include:

  1. e-books and e-magazines, as well as newspaper archives
  2. online music and games
  3. online publications issued by universities, government, NGOs, and other organisations.

New forms of publications emerging with the continuing development of online publishing are also collected through deposit requests if they cannot be collected automatically.

For publications covered by the legal deposit requirement, the National Library also requests that the related metadata be deposited, primarily in the metadata formats commonly used in the publishing industry (e.g., Dublin Core, ONIX, NewsML, MARC 21). If the publication itself contains high-quality metadata, this may also be used when the material is made available to the public. Deposit copies of publications included in the institutional repositories maintained by the National Library will be collected directly from the repositories. The institutional repositories include publications from government, universities, and universities of applied sciences. Other institutional repositories as well as other materials available through open interfaces, such as academic online journals, are collected in the same manner. These publications will, then, be incorporated into the legal deposit archive, and their metadata can be used in describing the materials in the national bibliography and in other library databases, through conversions. Thus, apart from archiving and long-term preservation of online publications, this also promotes access to open science.

Deposit requests are sent for material covered by the legal deposit requirement when collecting through requests of making harvesting possible have not been successful with the technology available. Deposit processes are made in cooperation between the National Library and online publishers, and they are carried out so that the effort for both parties is as reasonable as possible. Deposit practices and infrastructures are being harmonised and expanded to better cover all materials under the legal deposit requirement.

Large amounts of material are deposited either on a portable storage device, through an SFTP data transfer, or with an online deposit form. All of these methods also enable depositing metadata relating to the material. Individual publications are always deposited through the online deposit form.

Legal deposit of online material and their metadata are primarily made in cooperation with online bookstores, online music stores, or aggregators – they shall regularly deposit online publications of publishers available through their services. The publisher will, however, deposit the material if  it is not available through other channels. Government bodies not using the institutional repository services will deposit their material upon request by the National Library.

All material to be deposited must primarily be delivered in the material type specific file formats that are either recommended or acceptable for transfer, as specified by the Digital preservation service of the National Digital Library [ - URL corrected September 3, 2019]. Material type is determined by the primary content of the material.

In terms of collecting material that is particularly problematic for archiving, such as databases, online teaching materials, and online games, the National Library keeps abreast of international development and participates in national and international cooperation projects on the field, whenever possible.

5 Use and long-term preservation of online materials

The National Library is responsible for collecting and submitting the materials for long-term preservation. Collected materials are available at legal deposit workstations maintained by the National Library, pursuant to the Copyright Act (404/1961 16 b).

Submission of Finnish Web Archive into the National Digital Library's Digital preservation service began in November 2015. Deposited online materials will be next in line to be submitted into the service. Long-term preservation will not release the National Library from its duty to keep and provide user copies of all materials collected.

6 Summary

Collecting Finnish publications as specified in the Act on Collecting and Preserving Cultural Materials is becoming increasingly difficult as new forms of online publishing emerge. New publishing practices often mean that several parallel technologies are in use before procedures become standard. At the same time, the available technologies continue to develop. Cooperation with the publishing industry and with web archiving organisations is an absolute necessity for archiving, along with constant development of staff competencies.

Cooperation with researchers and other experts is becoming increasingly important in collecting online materials, both in terms of surveying the material to be collected and in piloting research use of online materials. Demands for amending the legislation governing the use of archived online materials are becoming more pressing as research methods from the digital humanities gain more ground.