To NORDINFO

The Nordic Metadata II project:

Cataloguing, Indexing and Retrieval of Network Documents

Introduction

The 1st Nordic metadata project, which was finished on schedule in May 1998, has been a very successful library IT initiative. The project wanted to create a Nordic metadata production, indexing and retrieval environment. The system was not intended to be experimental, but primarily for production purposes. This aim has been fulfilled - in fact, some of the tools built in the project an taken into everyday usage were not included into the project plan initially. The Nordic metadata I did more than it promised, but there is still work to be done - therefore we propose the Nordic metadata II.

The Nordic metadata I has become well known not only in Scandinavia, but also internationally. The project homepage (http://www.lib.helsinki.fi/meta) is very popular: it is visited on the average more than 1000 times per month, and there have been visitors from at least 60 countries. Users from at least 28 countries have created Dublin Core records by our metadata template.

Even more important is the high esteem the project has among the international Dublin Core community. The most obvious proof of the project's international status is the fact that the project had the honor to arrange the 5th Dublin Core metadata workshop. In practice the Helsinki University Library hosted the conference.

Since 1996 when the 1st Nordic metadata project was launched with the help of 50 % funding from NORDINFO, metadata issues have become even more relevant to the libraries. As a result of the phenomenal growth of the Internet it is now clear that effective information retrieval from the net can not rely on full-text searching. Instead, high-quality metadata is needed. In order to be able to provide useful metadata we need a format and tools and user guides which make the metadata provision as simple as possible, but not simpler.

In 1996 it was not entirely clear which format would be the best choice for Internet resource description. As of this writing this problem has disappeared: there is a common consensus that the Dublin Core metadata element set (http://purl.oclc.org/metadata/dublin_core/) is the best choice. The Nordic metadata project was the 1st international Dublin Core project, but by now there are a number of national and international projects developing Dublin Core -based tools (see the project list at http://purl.oclc.org/metadata/dublin_core/projects.html). Many of these initiatives, especially in the Nordic countries, rely on software built in the 1st Nordic metadata project (see chapter 2 of the project's final report for more information on this).

The main task of the proposed Nordic metadata II is to develop further the tools built in the Nordic metadata I. Although all these tools are fully functional, we can make them better. Lack of time, funding and unclear status of some non-core features of Dublin Core has prevented us from providing the planned services in the 1st project. Another important task of the Nordic metadata II is improvement of the Dublin Core itself. Several project team members hold, as a result of the Nordic metadata I, important positions in the global Dublin Core community and are therefore able to get into the format features that are valuable from the Nordic point of view.


Participants of the Nordic Metadata II Project

The following organizations and people will be involved with the project:

Table 1. Project participants

Bibsys

Ole Husby

Danish Library Center

Susanne Thorborg

Lund University Library, NetLab

Traugott Koch

National and University Library, Iceland

Andrea Johannsdottir

Swedish Institute of Computer Science

Preben Hansen

Helsinki University Library

Juha Hakala

 

Most of the organisations and people involved with the Nordic metadata I are willing to continue the work in Nordic metadata II. This is very important, because of the cumulative experience this group has on metadata issues. There are two changes to the team. Andrea Johannsdottir is the new project partner from the national library of Iceland. She will cooperate closely with the original team member and her colleague, Sigbergur Fridriksson.

Munksgaard will not participate in Nordic metadata II. We try to acquire an another publisher into the project team, and are as of this writing negotiating with the Scandinavian University Press. They will not be a full partner in the project, but an associated partner.

The project manager will be Juha Hakala from the Helsinki University Library, who was also the manager of the Nordic metadata I.

Generic information on participating organizations:

Bibsys (http://www.bibsys.no) is a norwegian library system and union catalogue of a large number of norwegian research libraries. Bibsys personnel has been involved in a number of European development projects, including classic Z39.50 projects Nordic SR Net and ONE, OPAC Network in Europe. Bibsys maintains the Norwegian NWI database.

The Danish Library Center (http://www.dbc.dk/english/default.html) is the Danish national union catalogue and organisation, which maintains this database which is shared by research and public libraries. In addition to the national union catalogue DBC produces and maintains the Danish national bibliography database in cooperation with the Danish national library. DBC has been involved in many domestic and international projects in the area of library automation including the OPAC Network in Europe -project (http://www.dbc.dk/ONE/oneweb/index.html).

Helsinki University Library (http://www.lib.helsinki.fi/hyk/hul/) is the national library of Finland. Its library network services department, which will participate in the NM II, is responsible for planning and co-ordination of automated library services for Finnish academic and research libraries.

Lund University Library NetLab (http://lub.lu.se/netlab/) is well known in the field of library automation not only in Scandinavia, but also elsewhere in Europe for its pioneering work in e.g. Web harvesting and indexing. NetLab has participated in numerous projects like DESIRE. In the Nordic metadata I NetLab had a key role, as they developed both the metadata template and new functionality to the Nordic Web Index needed for metadata harvesting and indexing.

The National and University Library of Iceland (http://bok.hi.is) holds a key position in Icelandic library automation. The library has been involved in several national, Nordic and international library IT initiatives. It maintains the Icelandic NWI database.

Swedish Institute of Computer Science (http://www.sics.se) is a non-profit research foundation funded by the Swedish National Board for Technical and Industrial Development (NUTEK) and by a group of companies.

The


project team as a whole is an ideal combination of skills and knowledge. For instance, Netlab has a lot of experience in the area of Internet information retrieval and metadata harvesting & indexing. Bibsys is very familiar with format conversion issues.

 

Tasks of the Nordic Metadata II project

Nordic metata I has implemented metadata tools that are already used by a large number of users in Scandinavia and elsewhere. However, the Nordic metadata I has not organised long-time maintenance of tools it has developed. This task was not even included in the project plan. On the other hand, we believe that the still on-going Dublin Core development means a project is an ideal way of developing the tools further. Once the Dublin Core is stable, we can nominate a maintenance organisation for the tools developed on a project basis.

 

  1. Enhancement of the existing Dublin Core specification.
  2. As of this writing the nucleus of the Dublin Core - the 15 elements - have been defined. There is also a consensus on how to add private elements into the basic set of 15. But there is still work to be done to define core set of Dublin Core qualifiers, with which one can specify the semantic content of a given element further.

    An another area where the Nordic metadata II will be active is specification of Dublin Core syntaxes for new document formats such as Extensible Markup Language, XML (see http://www.w3.org/XML/). Syntax has a big influence on how the information embedded in the document can be harvested and indexed.

    All project partners will participate in this task.

    For more information about the status of the Dublin Core see chapter 1.3 in the final report of the Nordic metadata I.


  3. Improvement of the Dublin Core to MARC converter
  4. Dublin Core to MARC converters will enable libraries to utilise the cataloguing work done by authors and publishers of web documents. Dublin Core and MARC are quite similar; this means that it is possible to build a converter, although it is not easy - see chapter 2.3 of the NM I final report for more details.

    The Nordic metadata I built the first Dublin Core to MARC converter in the world available for the public. Although this application (see http://www.bibsys.no/meta/d2m/) is fully functional, it should be rebuilt in order to make it table-driven. This will make the maintenance of the converter software a lot easier. Format changes can be defined into an Excel table; there is no need to modify the program code.

    As a part of this task we intend to define a standard conversion table which can be utilised in all projects preparing Dublin Core to MARC conversions. Our plans have already been presented to the global DC community in the 5th Dublin Core metadata workshop in Helsinki.

    As a separate task we want to build a MARC to Dublin Core converter. An experimental NORMARC to DC converter has proved that it is feasible to build an application like this.

    The partner responsible of this task is Bibsys, which developed the existing conversion software.


  5. Dublin Core User support and tools evaluation
  6. Creating a Dublin Core record may be easier than traditional MARC based cataloguing, but it is still complicated for a user with no experience of bibliographic description of documents. We can not expect users to provide high quality metadata without appropriate user guides. On the other hand, if we don't provide the users a possibility to give feedback, we could not evaluate the quality of our tools, and it would be difficult to improve the existing tools.

    DC User guide

    The Nordic metadata produced a full and short version of the Dublin Core user guide (see chapter 2.5 of the final report). These documents are very actively used since they are linked to the metadata templates, and thus form an integrated part of the service. There is an urgent need to keep these documents up to date when the template is modified.

    In addition to the template guide, there are also user guides linked to the URN generator and DC to MARC converter.

    There are at least four reasons for continuing work in this area:

    1. Dublin Core is still evolving. Once the list of core qualifiers has been approved, our metadata template and converter will change to some extent. The changes done to the application will cause some changes to the user guides as well. Inclusion of new DC syntaxes into the template for e.g. XML and Alta Vista must also be reflected in the user guides.
    2. The 1st general Dublin Core user guide will be published in summer '98. We need to evaluate this document thoroughly and modify our own metadata template user guides if necessary.
    3. A Nordic user guide does not give full support to users in e.g. Finland. The user guide, once stable, needs to be translated to all Nordic languages and modified according to the local needs. For instance, since SAB classification is not used in Finland we can remove the SAB section from the local user guide (and template).

    Evaluation of tools

    In Nordic metadata I the users were asked to evaluate the metadata template. Although the number of respondents was small when compared with the number of people who provided metadata with the tool, the feedback received gave us valuable insight on how to improve the service.

    The evaluation must be continued in the Nordic metadata II, but in this project we should evaluate not only the template, but also other services like the national metadata databases, DC to MARC converter and URN generator, in this priority order. Since one of the main motivations of the project is to make network documents more easily accessible, it is important for us to know if the users are satisfied with the metadata databases.

    The partner responsible of this task is SICS, which was responsible of a similar task in Nordic metadata I.


  7. Maintenance and development of metadata tools

Metadata template

In the project plan for the Nordic metadata I we did not anticipate the need for a metadata template. However, it become obvious very early in the project that such a template must be built, in order to enable users to provide rich and syntactically correct metadata.

As of this writing the software written for the template in Netlab is being used in many projects in Nordic countries and elsewhere. It is important to continue maintenance and support of the application for this reason only.

But there are other, more challenging reasons for improving this tool (see chapter 2.1 of the final report for more details). The functionality that must be added to the template includes:

Metadata harvesting and indexing tools

As a part of the Nordic metadata project the Nordic Web Index was enhanced so that it is capable of harvesting meaningful metadata from Web documents. As of this writing there are two national metadata databases available, Swemeta and Danmeta. These databases contain about 250.000 records, which is about 6-7 % of the total web space in Sweden and Denmark.

The service provided by metadata databases can be improved in a various methods.

1) Improve NWI's harvesting and indexing software

The Nordic Web Index harvesting robot and the indexing software were modified in the Nordic metadata I so that it recognizes metadata in the META tag of the HTML document. Dublin Core metadata can be handled both in HTML 2.0 and HTML 4.0 versions.

An important future extension to the project is to adapt NWI to understand other Dublin Core syntaxes; priority should be given to those ones that will be added into the project’s metadata template. Therefore, support for XML and RDF is of vital importance. An another important syntax is the simple one supported by Alta Vista.

If Dublin Core syntax for a well-known image format such as TIFF is developed during the Nordic metadata II, this syntax should be supported both by the template and by the harvesting and indexing software.

2) Adaptation and improvements of the retrieval system

NWI's retrieval system has been adapted to the mix of full text and metadata in a rough way – the full text database and the metadata database are currently separate. This basic policy will not be changed. During the Nordic metadata II we will improve the functionality of the metadata database. We will continue to study different indexing, retrieval and display solutions.

One relevant display option is an exchange file, which represents the data in a format that can be fed into the DC to MARC converter. This display option can be fully utilised only if there are means of controlling the extraction of records from the database with for instance the date when the metadata was last modified (which incidentally is added into the metadata by our metadata template).

3) Adaptation of the user interface and search support

The user interface of the basic NWI was expanded and adapted to the new possibilities of searching metadata alongside the traditional full text retrieval as a part of the Nordic metadata. However, we did not evaluate properly the services provided by the Swemeta and Danmeta. This, and the evaluation of other Nordic national metadata databases, will be done in the Nordic metadata II, and the search interface will be modified according to the feedback provided by the users.

4) Establishment and maintenance of national metadata databases

Our aim is that metadata databases for Finland, Iceland and Norway can be opened during 1998. When all databases are available, we can easily provide a virtual union catalogue of Nordic metadata, since the NWI is based on Z39.50 protocol. The quantities and types of metadata created at Nordic sites will be surveyed and statistics published continuously.

The project will also maintain the metadata databases through its lifetime. We will try to find a maintenance organisation to the databases during the project; one possible solution is coupling of the normal NWI databases (which will probably be taken care of by the national libraries) and metadata databases.


  1. Documentation and project management

Documentation of results was an important part of the Nordic Metadata I. The project spent much more resources for information dissemination than was originally planned. As a surprise to the project team, there was a lot of interest towards the project from abroad (for instance, a visitor from Tasmania will travel to Helsinki University Library in the end of May 1998 in order to discuss the project).

The tradition of annual Nordic metadata workshops established by the Nordic metadata I should be continued, since these workshops are an effective means of disseminating information about metadata in general and the Nordic metadata project in particular.

Internal communication in the project will be based on bi-annual meetings and email discussions; the email alias established for the Nordic metadata I (nordic-metadata@helsinki.fi) can continue in similar task in the Nordic metadata II.

The project management routines will be kept as light as possible, so as not to use resources to non-producing work in vain.

The partner responsible of project management and documentation activities, including production of the final report, is the Helsinki University Library.


Timeplan

The project starts in 1 September 1998 and ends 2 years later in August 31, 2000. As the project has a lot of links to activities in other international projects like NEDLIB and BIBLINK and to several domestic projects, we will be able to save a lot of time and effort by using other's results as a basis of our work.


Costs per task

 

Enhancement of the existing Dublin Core specification (task 1)

  1. Participation in the development of the Dublin Core
    Volume: 1 man month
    Date: September 1998 - August 2000.

Cumulative time for this task: 1 man month


Improvement of the Dublin Core to MARC converter (task 2)

  1. Specification of the standard conversion table
    Volume: 1 man month
    Date: Winter 1998/1999
  2. Update and transfer of existing crosswalks to the new structure
    Volume: 0.5 man months
    Date: Spring 1999
  3. Development of the 2nd generation converter software
    Volume: 1 man month
    Date: Spring/Summer 1999

Cumulative time for this task: 2.5 man months


Dublin Core User support and tools evaluation (task 3)

  1. Maintenance of the user guides
    Volume: 1 man month
    Date: Autumn 1998 – Spring 2000
  2. Evaluation of tools
    Volume: 1.5 man months
    Date: Winter 1998 – Winter 1999

Cumulative time for this task: 2.5 man months


Maintenance and development of metadata tools (task 4)

  1. Modification of the metadata template
    Volume: 2 man months
    Date: Autumn 1998 – Spring 2000
  2. Improve NWI's harvesting and indexing software
    Volume: 1.5 man months
    Date: Winter 1998 – Winter 1999
  3. Adaptation of the user interface, search support
    Volume: 0,5 manmonths
    Date: Spring 1999
  4. Establishment and maintenance of national metadata databases
    Volume: 1 man month
    Date: Autumn 1998 – Summer 2000

Cumulative time for this task: 5 man months


Documentation and project management

  1. Information dissemination
    Volume: 0.5 man months
    Date: Autumn 1998 – Summer 2000
  2. Project management & final report
    Volume: 1 man month
    Date: Autumn 1998 – Summer 2000

Cumulative time for this task: 2 man months


Cumulative time for all tasks: 13 man months


Reference

 

On behalf of the project group,

Juha Hakala
Library Network Specialist
Helsinki University Library