Juha Hakala, Laila Heinemann, Nina Hyvönen, Osma Suominen

Metadata services for the Finnish public sector

The public sector produces a lot of information. Some of it, such as public records, has been freely available for a long time. But since the introduction of the Web about 20 years ago the amount of accessible information has increased very rapidly. For instance, universities are publishing their dissertations and reports in open repositories, and the intention is to make research data available as well.

From the users' point of view, the problem is no longer the availability of information. It is out there in the Internet to be found, whether the user wants cartographic data or a university dissertation. But the problem is to find the needle in the haystack. On the other hand, organizations publishing information in digital form may need to preserve at least some of it in the long term, meaning a matter of decades or even centuries. A prerequisite for both of these problems – access and preservation – is metadata. Therefore, relevant information must be described properly. There are many reasons why it is not possible to leave this task to Google. Much of the relevant information is in Deep Web databases and other silos of information – which cannot be harvested. But more importantly, automatic indexing is not capable of authority control, and it cannot determine well even the subject of textual documents, not to mention images or sound.

Moreover, neither Google nor the Internet Archive should be made responsible for preserving digital resources in the long term. While the Internet Archive may preserve at the bit level those documents it can harvest, it will not have the human resources to migrate these documents into more modern document formats, and register differences between the two versions of a resource.

What is the most efficient way to foster access to public sector information once it has been made available in the Web? The Finnish answer to this is the establishment of centralized metadata services for the entire public sector.

The report outlining a centralized metadata service and its components was published in June 2013 by the Ministry of Finance. The service will foster efficient access, use, re-use and preservation of information. This is done primarily by improving semantic interoperability between applications. Technical interoperability will be catered for by connecting the applications into the X-Road-based data exchange channel to be established. But this technical connection is not sufficient if the linked applications do not understand each other's (meta)data.

Components of the metadata service

How to improve the semantic interoperability between public sector applications? No two countries will give exactly the same answer to this question, but it is likely that at least some common denominators will be found. In Finland, the following main components have been identified:

1. Authority database
2. Code list services (including e.g. country codes, language codes, etc.)
3. Ontology services
4. Metadata registry
5. Schema library
6. URN service

Some of these services do not exist yet, and in some cases no host organization has been agreed upon. But some services, including the ones the National Library is responsible for (numbers 1, 3 , 6), are already in place. The current status and future plans of these three services are described below.

The metadata services listed above are supported by a standards portfolio and common principles for resource description (or cataloguing rules, as they are called). The standard portfolio has already been established; there is a working group which maintains the portfolio and extends it to new areas such as cartographic data and accessibility. Usage of a standard specified in the portfolio can be made mandatory; so far this rather drastic step has not been taken.

For librarians, it is all too easy to take the cataloguing rules for granted. But their role is essential; relevant information can be preserved and found from the Web only if it is described properly. A good description takes into account all the various aspects of the resource and it should be done by using common principles so that the metadata can be easily shared with other users.

Co-operative cataloguing done in the National Metadata Repository and Asteri Authority Database brings together several hundred cataloguers from Finnish libraries, e.g. Finnish higher education (universities and universities of applied sciences) and other organizations. They are currently working together with mutually accepted, standardized cataloguing rules and providing timely and cost-effective metadata. This co-operation model will also benefit future developments.

Authority database

The authority files are controlled forms of subject headings and names, which help harmonize the descriptive metadata and thus improve search results. The Asteri Authority Database, which was established in 2013, is based on the authority files of the National Bibliography. These include the general thesauri maintained by the National Library and the names of the public identities (persons and organizations) who have published works in Finland. The production platform is the authority module of the integrated library system used by the National Metadata Repository. All the libraries cataloguing into the shared repository can access the files with their cataloguing client software.

However, there is growing interest in authority files outside the library community, in other memory institutions and the government sector. To enable the use of files in other systems as well, thesauri and name authorities (currently only corporate names, persons will be included later) have been made openly available via the Finto ontology service as Linked Data.

The approximately 40,000 MARC authority records for corporate names have been transformed into RDF using the RDA vocabulary, in particular the RDA elements for describing authorities. The resulting dataset, Finnish Corporate Names, has been published via the Finto service using a CC0 licence that allows reuse of the data. The authority records are now browsable online, and also accessible via the Finto REST API, described below.

Authority data migration is challenging because the MARC records include name variants in many languages, but the language is not indicated in the original records. We used automatic language detection tools to identify the most likely languages for alternate name forms, and based on this analysis added language tags to the RDF data. Unfortunately, not all languages could be reliably detected, and we are currently in the process of adding ISO 639-2 language codes to the MARC authority records.

Ontology services

The Finto service is the national thesaurus and ontology publishing platform that gives access to controlled vocabularies, including thesauri, lightweight ontologies, classifications and authority files. The service currently hosts around 30 controlled vocabularies that can be used in bibliographic as well as other databases in the public sector and beyond.

Finto is implemented as a web application and an underlying RDF database. It provides a user interface, open REST-style API and Linked Data access. The current software implementation, called Skosmos, has been published as open source software.

URN service

Every resource for long-term preservation should have a persistent and actionable identifier. Identifiers are not only unique access keys, they can – and should – also be used as links to resources. It is impossible to maintain URLs of Internet resources in every bibliographic database, but having for instance URNs in bibliographic records and maintaining URN – URL mappings centrally in URN resolvers is more efficient. In the future, URN resolution may also provide additional services, such as finding all manifestations of a resource with a work identifier.

The National Library has maintained a URN resolution service for several years. The number of organizations using it has grown steadily. Originally, the users were libraries, but of late most new users have been outside the library domain – examples include Statistics Finland, the National Land Survey of Finland and the Finnish IT Centre for Science – are from outside the library domain.

The way forward

The National Library has provided centralized services for the library sector for 20 years. How to transform these services so that they meet the requirements of the public sector is a very good question. Of course, some additional funding is needed to transform a service targeted for libraries to a national service. But money is hardly the most serious challenge we are facing.

The needs of libraries and other users may be contradictory. For instance, libraries have used a few thousand Finnish place names for subject description, but the National Land Survey needs more than 800,000 such names. Services must also cater for a much larger group of users, and technical interfaces for non-library applications must be built in order to guarantee smooth information exchange. It is also necessary to change the administrative framework so that all user organizations can participate in the consortia which manage the services.

This may sound challenging, but what is the alternative? A country like Finland cannot afford to maintain multiple metadata services, especially when failure to do so would not only be costly, but would have a negative impact on the quality of metadata services. Maintaining, for instance, multiple centralized authority databases for names of legal and natural persons would only create confusion.

The National Library has already received additional funding for transforming Finto and its ontologies into a truly national service, both from a technical and from a content point of view. It remains to be seen when the authority database and URN service will be supported in the same way. All these services are essential parts of the national information infrastructure, not only in Finland but in every information society.


Juha Hakala, Laila Heinemann, Nina Hyvönen and Osma Suominen are IT specialists in the National Library of Finland.


