Tietolinja

Tietolinja
News 1/1999


EDITORIAL

ARTICLES


Electronic publications as legal deposit copies

Juha Hakala


Substantial amounts of materials of national cultural value are already being published on-line. The amount of valuable on-line materials on the network is continuously increasing despite the fact that the lifespan of these publications is often very short. For the present, commercial publishers claim responsibility for the long-term storage of their own electronic materials, but in case a publication loses its commercial value or a publishing house closes down, the survival of such publications can no longer be guaranteed.

Nobody seems to claim responsibility for the storage of freely accessible on-line publications. That is why it is important that the long-term preservation of electronic publications be organised on a solid basis as soon as possible.

One of the central aims of the Working Group nominated by the Ministry of Education was to extend the existing Legal Deposit Act to cover all types of electronic publications. The Working Group began its work by familiarising itself with the legal deposit acts applied in other Scandinavian countries and the practices followed by the national libraries in these countries. However, the proposal for a future legal deposit act in Finland was not drafted in line with the other Scandinavian countries due to the fact that the rapid development of technology offers us a possibility to organise things differently.

The scope of the Act

The Working Group's proposal for a future act covers both materials published on physical carriers (off-line resources) and on-line publications, i.e. materials published or made accessible on the network. Considering the scope of the act it is of crucial importance to use the term publication both in the rather limited sense in which it is used in the Copyright Act and, in a more general sense, in reference to materials made accessible to the public via the computer networks.

The extension of the existing Legal Deposit Act to cover on-line publications demands careful consideration. Should the compulsory delivery be extended to apply to every Finnish document on the network, the number of parties with an obligation of delivery would increase from the present three thousand to tens thousands, perhaps even hundreds of thousands. In practice, it would be difficult to inform all the relevant parties of the new act, even if the Internet could be used to facilitate the task, as has been done in Denmark, where the national library maintains a well-designed legal deposit server at: http://www.pligtaflering.dk/. And should it come to pass that all the parties with an obligation of delivery were reached and the majority of them delivered their materials to the library, the library would face a need to increase its staff considerably in order to be able to organise the publications into a database or to guarantee their availability in some other way.

Freely accessible on-line publications

In order to restrict the number of parties with an obligation of delivery and to reduce the need for additional staff in the National Library, it has been decided that freely accessible on-line publications will not be made subject to compulsory delivery in the proposal for a future legal deposit act. However, the National Library should be entitled to collect freely accessible materials from the network with the help of a harvesting robot. Such publications can be made accessible to the public in the same way as other electronic legal deposit copies. In practice, the publications will be indexed into a full text database that can be made available to the public, although access to the actual documents will be restricted. The task of collecting and archiving the publications and maintaining the database will be carried out either by the National Library (the Helsinki University Library) or its authorized representative.

In the legal deposit legislation of other countries, the right to collect freely accessible on-line materials has not been recognized. One reason for this is that the technology needed to harvest and index the publications is very modern and for the most part accessible only by commercial companies. Secondly, the significance of on-line materials has only been realised during the past few years. In Finland the availability of appropriate programs proved to be ideal, as the Nordic Web Index and Nordic Web Archive Projects have developed applications that can shortly be put into practice. Finnish on-line materials in the public domain have already been collected once by the Center for Scientific Computing (CSC) for the EVA Project. They total 1.5 million documents (25 gigabits), which are now stored in the tape archives of the CSC.

The CSC is currently designing a database of these publications together with other Nordic Web Archive participants. Preliminary tests have shown that it takes a user approximately 30 seconds to retrieve a document from a tape archive, which is perfectly adequate in view of the fact that this tape archive constitutes a kind of a "last resort" service. The collections of the publications will be updated a couple of times per year. However, if needed, (in case the information content of a document changes daily) the servers can be visited more often.Thus the future act will not specify how frequently publications are to be collected, as time intervals can be subject to change.

In the light of the EVA Project we can be assured of the practical functioning of the principles stated in the proposal for a future legal deposit act concerning the processing of freely accessible materials on the network. The preparatory projects have, in fact, played a crucial role in the formulation of the act. Without the help they provided, we might have formulated the proposal for a future act following Denmark's example and made all on-line materials subject to compulsory delivery. During the first ten months after the revised Act in Denmark had become effective, the Royal Library of Denmark received 700 documents (400 books and 300 journals), which is only a fraction of the total number of electronic materials published in on-line form in Denmark during 1998.

Off-line resources and on-line materials with restricted access

Under the proposed act, off-line resources are given a status similar to that of printed publications in that the producer is made responsible for the delivery, and the frequency of delivery is the same as that of printed publications. In Sweden the publisher has been made responsible for the delivery, as practical experience concerning the delivery of CD- ROM disks had shown that receiving legal deposit copies from producers was problematic. In Finland receiving CD-ROM disks from producers has not been problematic, although deposit is for the time being voluntary, and thus we did not think that it is necessary for us to follow Sweden's example.

Another similarity between off-line resources and printed publications (books) can be seen from the libraries' perspective. As in the case of printed publications, libraries are responsible for cataloguing electronic resources into the National Bibliography. The National Library of Finland is expected to receive approximately a few hundred off-line resources per year, first in the form of CD-ROM disks and later also in the form of DVD disks and other media. One may expect the number of materials subject to compulsory delivery to increase even though computer programs with little information content, such as games, word processing tools etc., will not be included in the scope of the new legal deposit act. In this the proposed Finnish legislation follows EU recommendations done in the ELDEP study, and practises adopted in many other countries.

On-line materials with restricted access, on the other hand, form a very heterogeneous group, including, among other things, publications in a book form, articles, audio-visual materials and databases. In all these cases, the publisher is responsible for the delivery due to the fact that in many cases there is no separate producer.

The method of delivery and subsequent processing of materials in the library depend on the form of the publication. Materials in book form have to be processed according to the same principles as electronic resources, whereas the processing of electronic articles is based exclusively on the use of automated methods (see the following chapter).

One of the main principles concerning electronic publications in the proposal for a future legal deposit act is that the material must be delivered in a format and saved on a storage device that will allow the National Library to install the material. In principle the format of the electronic resource or the storage device on which it has been saved should not be used as a criterion for restricting the selection of the deposit publications. However, in practice, receiving material that cannot be installed and used by the library is useless. Concerning document formats and storage devices the proposal for a future act includes the following principles:

  • The National Library maintains a list of document formats (e.g. SGML, XML, JPEG) and storage devices (DAT tape, CD-ROM disk, DVD disk) in which documents can be delivered or saved.
  • The party with an obligation of delivery is expected to convert the material into an acceptable format or to copy it to an appropriate storage device, provided that this can be carried out within reason. If the conversion cannot be done or it involves too much trouble, the material need not be delivered. In cases where the material is of particular value and/or extensive, public funding can be available for the development of a suitable conversion program.

As far as the National Library is concerned, it is of crucial importance that the original layout and characteristics of the document be preserved. That is why the document format should not be used as a criterion for restricting the selection of materials. In practice this could be avoided by extending the list of possible formats and storage devices. However, this would make the long-term storage of publications problematic as conversion of the publications may result in losses in information content and/or noticeable changes in the physical appearance of the document. The National Library should find out to what extent it is possible to emulate hardware and software systems so that access to old documents can be guaranteed in future usage environments. Research related to this area has been carried out in the NEDLIB Project, in which the University of Helsinki is also a participant, co-ordinated by the EU (http://www.kb.nl/nedlib/) and the eLib Project called CEDARS (http://www.curl.ac.uk/cedarsinfo.shtml), co-ordinated by the University of Leeds, which aims at developing emulation programs.

In order to be able to deal with the numerous formats in which the legal deposit copies are delivered, the legal deposit libraries must have access to a large selection of programs to process the documents. Access to these programs must be available both to the staff indexing the materials and to the patrons searching the database. Library staff who index the materials and serve the patrons need training to be able to carry out their new assignments successfully.

Delivery methods

The delivery of electronic documents can be organised either so that the party with an obligation of delivery sends the actual material or an announcement of the material to the library. In the latter case the library can then retrieve the material from the publisher's server. In Denmark, the revised legal deposit act, which took effect at the beginning of 1998, is based on the announcement practise. Denmark's decision was motivated by Norway, where the first -mentioned method of delivery had resulted in a situation in which the National Library of Norway received lots of irrelevant material published on the network by individual users.

In Finland, the proposal for a future legal deposit act does not specify the method or methods of delivery. The delivery methods are to be defined on the decree level. The reason for this is that the appropriate method of delivery varies according to the type of publication and also, most likely, within time. In practice different delivery methods may be used side by side. If voluntary delivery of freely accessible on-line materials is to be recognized in the future act so as to include the most relevant freely accessible on-line materials in the National Bibliography ( this is part of our plans), the delivery of such publications may be based on the announcement practise. Appropriate tools such as metadata templates are being developed in the EVA Project to ensure the practical functioning of the announcement practise. Basic work for the development of appropriate tools was already done in the Nordic metadata project.

Delivery of on-line materials with restricted access should not be based on the announcement practise, as the announcements delivered by commercial publishers will not be used as criteria for restricting the selection. Moreover, the announcement practise would cause additional work for both the National Library (retrieval of the material) and the publisher (description of the materials subject to delivery, installation of a network connection between the publisher and the library, updating the material on the server). Descriptive information about the material can be attached to the document but the publisher need not send a separate announcement of the document.

Articles

Electronic articles draw attention to the fact that method of delivery and subsequent processing in the library vary according to the type of the publication. There is no need to legislate on these matters in the future act, however, the Working Group's memo included in their report shows that they have been given consideration. Electronic articles published in newspapers and periodicals may turn out to be the most important group of publications covered by the future act and that is why a considerable amount of time has been devoted to estimating costs and other effects related to them.

Important newspaper and periodicals publishers in Finland have created extensive full text databases. Due to mergers between publishing houses 80% of the articles published in newspapers are covered by three full text databases in SGML format. Images are stored in separate databases with textual descriptions.

The use of article databases is as of this writing restricted to internal use and as such these databases are not deposited. The introduction of the new legal deposit act will not result in changes in this situation unless the publisher decides to make the articles accessible to the public (as has been planned in many publishing houses) by publishing the article database on the network and thus making it subject to compulsory delivery.

Hundreds of thousands of newspaper articles are published in Finland every year. The amount of text in gigabits exceeds the scope of the WWW archive maintained by CSC. From this it can be concluded that the majority of articles is still published in printed form.

The BTJ - Kirjastopalvelu Oy maintains a database for newspaper articles, however, only the most relevant articles are included in it. Descriptions of articles submitted by dozens of libraries are deposited in the ARTO database, which includes approximately 1,100 journals; 60,000-70,000 articles per year. As in the case of newspaper articles, this is also only a small fraction of the total number of articles published in periodicals.

As it is not possible to index every single article in MARC format, the only solution is to index the articles into databases in their entirety. The high technical quality of the systems used by publishers fortunately makes the creation of such databases fairly easy. New publications can be spotted in the publisher's database, transferred to the library's FTP server and, with the help of a so-called Autoloader program, into the library's database. The database maintained by the library can either be a traditional full text database, which requires the use of an SGML parser, or an SGML database. In practise the former is less expensive but equally effective.

All in all, the creation of a database in the National Library, in which most articles published in Finland could be archived as full text, could thus be carried out fairly easily. Since the source data itself is structured, it is possible to create efficient tools to search the database. However, creating a system that functions well requires efficient co-operation between the library and the publishers. Restricting the use of the material to authorised users and enforcing regulations on security issues can both be regarded as prerequisites for the development of such a system.

Use of electronic legal deposit copies

Regulations concerning use of printed legal deposit copies in the first paragraph of the proposal for a future legal deposit act have been formulated in line with the prevailing regulations. Accordingly, legal deposit libraries are required to make the legal deposit copies available to researchers and other people who need them. In practice, as the libraries have not followed a uniform set of principles, in some legal deposit libraries the deposit materials have been more easily available than in others.

Printed copies of particular types of publications - books and periodicals, for instance - are available in all deposit libraries. If there are several deposit libraries, equally many copies are necessary due to the fact that the materials are not accessible via the network.

As far as on-line publications are concerned, there is no need to deliver several copies of them. On-line publications will be stored on archive servers which are accessible from all deposit libraries in the country in question with the help of appropriate tools. In Finland it is foreseen that three archive servers will be needed, i.e.:

1. Server for freely accessible on-line publications

2. Server for newspaper and periodical articles

3. Server for off-line resources such as CD ROMs and access-controlled on-publications

In order to secure the long-term storage of the materials, it is necessary to copy off-line resources to archive servers, as storage media - e.g. diskettes and magnetic tapes - often have a very short lifespan. The number of copies to be delivered has in the proposal been restricted to two, of which one is to be delivered to the National Library (Helsinki University Library) and the other - a backup copy - to the Jyväskylä University Library.

As far as the National Bibliography is concerned, it should be decided to which extent relevant on-line publications, such as homepages of organisations, should be catalogued in the National Bibliography. In Denmark, cataloguing of important homepages have an important role in the InDoReg Project directed by the Dansk BiblioteksCenter. The report on the results of this project (at http://purl.dk/rapport/html.uk/) contains interesting information about the criteria used for indexing electronic materials.

Creating and maintaining an archive server is a costly project. That is why there is no need to install servers in every legal deposit library, especially as, in most cases, access to the materials can be easily organised through the network. The Deutsche Bibliotek has determined that 70% of the most troublesome materials, i.e.off-line resources such as CD ROMs, can be installed on the network without difficulties. All in all, the proportion of troublesome materials is decreasing due to the modernisation of software components of CD-ROM disks.

The commercial value of the materials to be stored in the archive servers will be significant. Under the new act, the National Library will be responsible for organising the use of electronic legal deposit copies in a way which prohibits their illegal use or manipulation. A basic question in this area is whether this will be carried out by archiving the publications as confidential and signed or whether it is enough to make the materials practically inaccessible for unauthorised users. However, inaccessibility cannot be guaranteed, as in order to make authorised use of the publications possible it is necessary to attach the server to the network.

Conclusion

The aim is that the new legal deposit act will take effect on 1 January 2000. Plans for implementing the new regulations are already made and the development or acquisition of necessary tools is in process: the VTLS software allows the indexing of electronic resources and the Helsinki University Library has an archive server for the electronic publications. Many other related tasks cannot be undertaken until resources have become available and the new act has gone into effect. Its enactment will most likely be followed by a period of a few years during which new systems will be developed. Should everything go according to the plan, by the year 2002 the processing of electronic materials will have become part of the daily routine for the legal deposit libraries.

Juha Hakala, Development Director
Helsinki University Library
e-mail: Juha.Hakala@helsinki.fi

Translation by Laura Rontti

Tietolinja 1/1999