Dublin Core Metadata Element Set and it's applications

Juha Hakala
Helsinki University Library
juha.hakala@helsinki.fi

Version 2.5 25.6.1997

Contents

  1. Metadata in general
  2. Metadata formats
  3. Dublin Core
  4. Nordic Metadata Project


Metadata in general

Metadata is data which describes attributes of a resource. Typically, it supports a number of functions: location, discovery, documentation, evaluation, selection and others. These activities may be carried out by human end-users or their (human or automated) agents.


Metadata formats


Dublin Core Metadata Element Set

Dublin Core homepage, maintained by the OCLC, is available at http://purl.oclc.org/metadata/dublin_core.

DC was developed by an informal group of computer scientists, network specialist, librarians etc. The work is loosely coordinated by the OCLC and NCSA. Formal (sort of) decisions are made in Dublin Core Workshops, four of which have been arranged:

  1. Metadata workshop I. http://purl.oclc.org/oclc/rsch/metadataI

  2. Metadata workshop II. http://purl.oclc.org/oclc/rsch/metadataII

  3. Metadata workshop III (Workshop on Metadata for Networked Images). http://purl.oclc.org/metadata/image

  4. Metadata workshop IV (3.-5.3.1997). http://www.dstc.edu.au/DC4/

    The fifth Workshop will be held in Helsinki, Finland, 6.-8.10.1997. It will concentrate on implementation issues.

    There are already several active projects using DC, many of which are linked to the DC homepage. The Nordic Metadata project (see below) was probably the first one of these which covered all (or most) aspects of metadata usage.

The main idea behind the Dublin Core development was to build a metadata element set so versatile that any Internet document can be described with it, but on the other hand so simple that the authors can provide metadata by themselves.


DC Elements (version 1.0, December 1996):

Dublin Core is divided into 15 elements (see http://purl.org/metadata/dublin_core_elements:

Table 1. The Dublin Core Elements
Subject The topic addressed by the work.
DescriptionA textual description of the content of the resource
Title The name of the object.
Author The person(s) primarily responsible for the intellectual content of the object.
Publisher The agent or agency responsible for making the object available in its current form.
Other AgentThe person(s), such as editors, transcribers, and illustrators who have made other significant intellectual contributions to the work.
Date The date of publication.
Object typeThe genre of the object, such as novel, poem or dictionary.
Form The physical manifestation of the object, such as PostScript file or Windows executable file.
Identifier String or number used to uniquely identify the object.
Relation Relationship to other objects.
Source Objects, either print or electronic, from which this object is derived, if applicable.
Language Language of the intellectual content.
Coverage The spatial location and/or temporal duration characteristics of the object.
Rights Management The content of this element is intended to be a link (a URL or other suitable URI as appropriate) to a copyright notice, a rights-management statement, or perhaps a server that would provide such information in a dynamic way

Usage of most elements is reasonably clear, but some elements need more detailed usage specifications. See for instance the spec for Coverage at http://alexandria.sdc.ucsb.edu/public-documents/metadata/dc_coverage.html

Use of Identifier element is difficult without URN's!


DC Qualifiers

The basic Dublin Core is very simple, but more sophisticated Dublin Core applications require means for specifying the semantic content of existing elements further. Element qualifiers (Schemes and Types) provide a way of doing this.

With Scheme qualifier it is possible to specify for instance the subject heading list your subject terms have been picked from, or the identification system (like ISBN or ISSN) used to identify your document.

In the Author tag, you can tell the author's name, e-mail address, telephone number etc. from one another by using Type qualifier.

Qualifier specification is still not quite complete. A draft proposal written by Jon Knight is available at http://www.roads.lut.ac.uk/Metadata/DC-Qualifiers.html

Extensibility of the Dublin Core has also been enhanced by defining a method for using private elements. The names of these must start with "X-". Web indexing robots can then easily recognise and ignore local extensions (if they have not been taught how to use them).


DC Syntax

In order to actually embed Dublin Core metadata into an electronic document a concrete syntax for every relevant text and image formats is required.

WWW is the strategic application of the Internet at the moment. The Dublin Core community started therefore by specifying a generic way of embedding metadata in HTML (see http://www.oclc.org:5046/~weibel/html-meta.html). Dublin Core is just one metadata element set that can utilise this specification. The convention agreed upon is as follows:

<META NAME = "schema_identifier.element_name.qualifier" CONTENT = "string data">

Example of the Date element:

< Dublin Core has schema identifier DC, so an indexing robot will know that DC.date.current is the Dublin Core Date element, with qualifier "current". Scheme is needed for correct interpretation of of the date information.

The next step will be specification of DC syntax for TIFF or some other tagged image file format.


The Nordic Metadata project

The size of the project is 12 man months, 50.000 USD, duration 1.11.1996-31.5.1998.

The project is funded by NORDINFO (http://www.hut.fi/NORDINFO/).

The software and documentation created in the project is / will be available in public domain

Organizations are people directly involved with the project are:

Table 2. The Project participants
Bibsys, Norway Ole Husby
Helsinki University Library, FinlandJuha Hakala
Lund University Library, NetLab, Sweden Traugott Koch
Munksgaard, Denmark Anders Geertsen
The National and University Library of Iceland Sigbergur Fridriksson
Swedish Institute of Computer Science Preben Hansen

The Danish Library Center (http://www.dbc.dk/) is an associated partner in the project.


Tasks

Generally, the Nordic Metadata project will create basic elements of a metadata production and utilization system, which can be used in subject-specific projects.

The project's aims are:

  1. Evaluation of existing metadata formats

  2. Enhancement of the current Dublin Core specification

  3. Creation of DC -> MARC conversions (and possibly vice versa)

  4. Development of a Dublin Core metadata production system

  5. Development of a metadata-aware search service


Evaluation of the existing metadata formats

We have utilised the evaluation done in the EU Telematics for Research project DESIRE (see http://www.ukoln.ac.uk/metadata/DESIRE/overview/). This document, and our own experiences, led to the decision to use The Dublin Core metadata element set.


Enhancement of the existing Dublin Core specification

It is necessary to specify Nordic classification systems and subject headings as schemes to the SUBJECT tag. This has been done: see http://www.roads.lut.ac.uk/Metadata/DC-Qualifiers.html.

An example of usage of the Finnish Public Library Classification:

<META NAME= "DC.subject" CONTENT=" (SCHEME=YKL) 75">

Thanks to the DC extensibility mechanism it will be easy to incorporate Nordic specialities (if any) into our DC records should the need for this arise.


Creation of DC -> MARC conversions (and vice versa)

DC -> NORMARC converter (alpha version) is available at http://www.bibsys.no/meta/d2m/

The converter will be finalised, and other Nordic MARCs added, during Summer 1997. Specification of DC -> FINMARC conversion is available at http://www.lib.helsinki.fi/meta/dcficross.html

The converted will be built in such a way that adding new MARC formats and modifying existing conversions will be easy.

The converter will be available in two guises: as a stand-alone application and as a plug-in that can be utilized as a module in other applications.

It is more difficult to convert from MARC to DC than vice versa: due to complexity of MARCit is hard to convert from MARC to anything else, including DC. We may try this later in the project, anyway.


Development of a Dublin Core production system

In order to simplify DC metadata creation, the Nordic Metadata supplies:

  1. A Dublin Core template, available at: http://www.ub.lu.se/metadata/DC_creator.html.

  2. A DC user guide, available at: http://www.sics.se/~preben/DC/DC_guide.html. Our main interest is in the area of supporting subject description, since having valid subject headings in the DC record will foster search of documents a lot.

  3. A template user's guide, available at: http://www.sics.se/~preben/DC/DC_temp_help.html

  4. It may also be necessary to specify national guidelines on how to use for instance national classification systems and subject headings lists. For good results, these systems should be available for everyone. Therefore e.g. Helsinki University Library has plans to put the Finnish General Subject Headings List to the Web (which will also help users to search library OPAC's more effectively).


Development of a metadata-aware search service

  1. Modification of NWI's harvesting and indexing software. This part is almost ready by now. The Nordic Web Index will be enhanced so that it can recognize, extract and index the most widely used metadata formats. As of this writing (June 1997) we provide two experimental databases, SWEMETA and DANMETA, which contain 30.000 and 50.000 records (very few of them are in Dublin Core). These bases are available at http://nwi.ub2.lu.se/?lang=en.

    NetLab collects also statistics on metadata usage in Scandinavia with the NWI. The results can be seen at http://www.ub2.lu.se/metadata/Nordic-MDusage.html

  2. Adaptation of the user interface and search support. The NWI has had has a simple, Alta Vista -like user interface, and relatively little user guidance to offer. Significant improvements are necessary, in order to make the NWI a true rival of global Web Indexes.

  3. Evaluation of user feedback.


Documentation

All project documentation will be written in English and in HTML format (like this presentation). All official documents will be made available via the project's homepage at http://www.lib.helsinki.fi/meta/.