The 5th Dublin Core Metadata Workshop

Helsinki, Finland, October 6-8, 1997

PROJECT PRESENTATIONS

Version 1.6
24 October 1997 Juha Hakala


Contents

The projects are listed in First-In-First-Out order. The list will be updated whenever necessary.

1. SSG-Fachinformation (SSG-FI) Geowissenschaften
2. SSG-Fachinformation (SSG-FI) Mathematik
3. The National Library of the Netherlands
4. DSTC
5. ADAM & VADS
6. Swedish Environet
7. The Nordic Metadata Project
8. AHDS Arts & Humanities Data Service
9. German Educational Resources Server / Deutscher Bildungs-Server
10. Florida International University Digital Library
11. SCRAN
12. BIBLINK
13. NewsAgent for Libraries
14. Medical Metadata Project
15. University of Washington Digital Library
16. Math-Net
17. The Victoria and Albert Museum
18. Activities in the Field of Electronic Information Management and Meta-Data in Physics
19. Australian Geodynamics Cooperative Research Centre
20. Interconnect Technologies Corp. projects
21. University of Michigan Digital Library Registry Database
22. INDOREG: INternet DOcument REGistration
23. Metadata Project of State Library of Queensland
24. MedExplore & Dublin Core generation
25. UC Berkeley Digital Library Catalog
26. EdNA
27. Netpublikationer
28. Pandora
29. Scout Report Signpost
30. GEM
31. WWW.NIC.FUNET.FI Metadata interface
32. Dublin Core Metadata Use in Environment Australia


1. SSG-Fachinformation (SSG-FI) Geowissenschaften

B. Information System Summary:

C. Metadata:

D. Tools (provide URLs if freely available):

E. Experiences from the planning and production phases:


2. SSG-Fachinformation (SSG-FI) Mathematik

B. Information System Summary:

C. Metadata:

D. Tools (provide URLs if freely available):

E. Experiences from the planning and production phases:


3. The National Library of the Netherlands

Reported by Titia van der Werf.

The National Library of the Netherlands (Koninklijke Bibliotheek) is in the process of developing a new version of its Web-information service - with a new layout, new functionality features and with DC-metadata elements incorporated in the HTML-pages. The HTML-pages in the test-version carry standard DC. metadata tags like DC.publisher, DC.rights and DC.language. Take a look at: http://www.konbib.nl:8000/ and view the frame source of http://www.konbib.nl:8000/bex-fe.html for example (these are our guidelines for ILL).

When the final version will be up and running this fall we will have trained our information providers (at the different library departments) to supply the metadata elements Title, Author, Date, etc... for all new documents they submit to our Editorial Board.

We will only need to add those elements for the documents that are alreadyin the service (a retrofit :-)).

Finally we plan to install an indexer which will be able to recognise these metadata elements.

Our library is also involved in several projects which are seeking to deploy DC Metadata :

Note that BIBLINK metadata = formal bibliographic descriptions and DESIRE SBIG metadata = subject indexing, classification

which are basically two different processes ... in other words different Internet technologies could be better suited for each process. Formal description elements can be re-used and remain valid for all user-groups, subject information on the contrary is particular to specific target groups and the same resource may receive several different subject codes. Mixing these two types of metadata in one format may prove tricky.

In each case special attention is needed to record WHO (responsible person or institution ...) has assigned a subject code to a particular resource - in order to enhance the trustworthiness of the subject code assigned.


4. DSTC

Report by Renato Iannella

The DSTC is participating in the W3C Resource Description Framework (RDF) Working Group. RDF is the result of a number of metadata communities (including Dublin Core, PICS, Digital Signatures) bringing together their needs to provide a robust and flexible architecture for supporting metadata on the Internet and WWW. RDF will use the new XML as its main carrier syntax.

The DSTC plans to develop the following:

Current Documents

Older Documents:

Resources of interest


5. ADAM & VADS

The Art, Design, Architecture & Media Information Gateway (ADAM) and the Visual Arts Data Service (VADS) are two services that aim to provide the UK Higher Education community with fast, reliable access to high-quality networked resources in the visual arts, and to promote the use of standards of best practice through example and outreach.

A coherent service is being developed through shared objectives, a shared commitment to standards, and a common information system that is being designed from the outset to interoperate with a broad range of information protocols and resource description schemes.

The Dublin Core Metadata Element Set is seen as a strategically significant tool for enhancing the discovery and retrieval of networked resources, and will play a crucial role in the development of the two services; in addition to providing descriptive information within static web pages, DC will form the basis for the common cross-domain resource discovery element set for the AHDS' distributed catalogue.

In common with the other four AHDS Service Providers, the Visual Arts Data Service recently held a domain-specific resource discovery workshop in order to identify the suitability of Dublin Core metadata for the visual arts, museums and cultural heritage community. This process resulted in a survey of domain-specific information standards, a detailed workshop report and a contribution to the synthesized summary of the series as a whole, in addition to raising the awareness of Dublin Core.

ADAM is part of the Electronic Libraries Programme, whereas VADS is an agency of the Arts & Humanities Data Service; both are funded by the Joint Information Systems Committee.


6. Swedish Environet

Reported by Stig Hammarsten

Background

The decision to build the Swedish EnviroNet was taken by the Swedish government in December 1996. The building of the EnviroNet was one of several proposals in an Official Report on "Information Technology in Environmental Management" from the Environmental Council. The constitution of the Project Board and Project Organisation was decided by the Director General of the Environmental Protection Agency in February 1997. At present, this short summary of the project is the only documentation available in English. During the development phase most information will only be published in Swedish. However, from December 1997 the EnviroNet will also provide services in English.

Mission

The Swedish EnviroNet shall become the main www-gateway to electronic data and information on the Swedish environment. The EnviroNet shall provide easy access to information with high quality content.

Members

The EnviroNet will organise all major public agencies, NGO's and private companies in the environmental field. The information is published at the web sites of the members and the EnviroNet server will provide links, metadata and other general services.

Only member web sites will be included in the catalogues and through the search engine accessible at the EnviroNet.

Users

The most important user group for the EnviroNet are professionals working with environmental issues in Sweden. But the EnviroNet will also fulfil the needs of wider user groups, such as: environmental activists, students and teacher, politicians and the press.

Services

Catalogues and metadata

The EnviroNet will provide links to electronic documents (in HTML-format). The document can contain information or point to other sources that contain information or data. The document can also contain addresses to institutions. The contents of the documents will be described using a subset of the Dublin Core (DC) metadata standard. The type element will be used to describe the Resources referred to, i.e thesis, review, database, advertisement etc.

The EnviroNet will provide a classification scheme for the subject and type elements in DC. The classification scheme for subject must have terms that correspond to all the branches in the catalogues.

Users will be able to search information using several different catalogues organised in the following way:

Technical platform

Time table


7. The Nordic Metadata Project

B. Information System Summary:

C. Metadata:

D. Tools (provide URLs if freely available):

E. Experiences from the planning and production phases:


8. AHDS Arts & Humanities Data Service

Reported by Paul Miller, 12 September 1997

Introduction

The Arts & Humanities Data Service (AHDS) is a UK–based initiative funded by the Joint Information Systems Committee (JISC) of the Higher Education Funding Councils which aims to facilitate access to — and reuse of — digital collections of relevance to the arts and humanities.

The AHDS is a federal organisation, consisting of a central Executive and five service providers encompassing archaeology, history, textual studies and the performing and visual arts. Each service provider also holds service–wide responsibility for a type of digital resource, including electronic corpora, geospatial data and time–based media (video, etc.).

Implementation

Far from simply constructing five web–based catalogues to useful resources, AHDS service providers are working to build an integrated solution capable both of knitting the offerings of the service providers together into a seamless whole for the user, and suitable for constructing effective links outside the AHDS to other resources. The Archaeology Data Service, for example, is piloting its collections system by means of a SCRAN–funded collaboration with Scotland's Royal Commission on Ancient & Historical Monuments ( RCAHMS). This will provide access to 19,000 records from the National Monuments Record of Scotland through the ADS catalogue, and may well be extended both to other areas of the National Monuments Record and down to the next level of detail, as held by the West of Scotland Archaeology Service and Shetland Amenity Trust. Other pilots are also underway, both within ADS and the other four service providers.

The Arts & Humanities Data Service's vision is built firmly upon two main foundations, both of which are of potential interest and relevance to the wider community:

Common Core Description

With a need to describe diverse resource types to varying levels of detail, it is unlikely that the AHDS could have found a single recording system capable of handling everything from the Mona Lisa to a complex digital archive from an archaeological excavation. It also appears unlikely that those depositing data with AHDS would have taken kindly to requests that they translate all of their data into a single system/syntax/format prior to deposition.

Instead, AHDS has adopted a pragmatic approach, and is advocating use of the Dublin Core as a means by which such diverse resources may be described in general — and comparable — terms, with greater detail provided where necessary by linked metadata records in more locally approriate formats. It is to be hoped that metadata for deposited resources will be created by the depositor in line with AHDS recommendations and utilising AHDS–provided tools. Documentation will also be provided to users through the AHDS Guides to Good Practice, which will explain the provision of metadata in a context — and using language — relevant to the depositor.

The proposed implementation of Dublin Core has been arrived at following a comprehensive series of discipline–specific workshops undertaken in conjunction with the UK Office for Library & Information Networking (UKOLN). The reports from these workshops are available online from the AHDS, and a single paper publication capturing the essence of these workshops — and related initiatives — is currently being printed for distribution in Helsinki, and wider distribution following the workshop.

Technical Interoperability

In order to tie the catalogues of service providers and collaborating organisations together, a Statement of Requirements has been issued for the provision of a Z39.50–based catalogue system. This will incorporate at least a central gateway and SQL– and SGML–based 'targets', and suppliers will be interviewed during September.

A brief report on this, too, will be available for distribution in Helsinki.


9. German Educational Resources Server / Deutscher Bildungs-Server

Reported by: Diann Rusch-Feja, Christian Richter, Peter Diepold

The German Educational Resources Server http://dbs.schule.de/indexe.html has been in existence since June 1996 and has indexed a total of 2000 Web documents including HTML documents about physical teaching and learning materials from pupils, teachers, pu blishers and state educational authorities. It is also the hub of a larger national network of state and regional educational servers (from 14 state educational servers), the "Schools on the Net" Initiative (including approximately 3300 school websites) an d the Open School Network (ODS), as well as the Central Institute for Images and Film Materials which supports approximately 30 audiovisual media centers in the Federal German States. The German Eduational Resources Server also includes directories of the educational researchers and institutions of higher education with teacher training programs supported by the German Society of Educational Scientists (DGfE). Further institutions involved in various aspects of education at all levels contribute to the Germ an Educational Resources Server (i.e., BAK, BioNet eV., DFN-Verein, GIB, GMD and the German universities, teachers, etc.). In addition, there are links to the websites and to certain products of publishers of educational materials.

The German Educational Resources Server (GER / DBS) has implemented metatags using the Dublin Core with five addtional subfields pertaining to the field of education. These are:

DBS.SUBJECT.Requirements describes the technical requirements for obtaining and using certain teaching or learning materials (for example, WINDOWS 3.0 or above, SVGA Graphics Adapter, Sound Card, etc.). This is because some schools are not equipped with ad equate hardware installations and having this important information prior to ordering or downloading these materials is imperative. We are currently considering this field as a suggested sub-element for the DC, namely, DC.PUBLISHER.TechnicalRequirements. A lternatively, this could be found in the non DC terms and conditions metadata.

DBS.SUBJECT.Conditions described the user restrictions and cost or licensing aspects. These aspects are particularly important for the metadata concerning cost- or fee-based services and products on both a commercial and non-commercial level. We are curren tly considering this field as a suggested sub-element for the DC, namely, DC.PUBLISHER.UseRestrictionsAndCost until otherwise included in terms and conditions metadata.

DBS.SUBJECT.SubjectAreaOfInstruction describes the subject area of instruction to distinguish this field from the simpler DC.SUBJECT which may not refer to teaching and learning materials. Even with the combined DC.SUBJECT with two contents "teaching material" and "biology" it is not clear if this is an article on teaching biology or a part of a teaching unit on some aspect of biology. We suggest this as a sub-element of the DC specifically for educational purposes and not solely for educational servers (i. e., for a mathematics server with academic contents and also didactics of mathematics, as is the case in the German MathNet) as DC.SUBJECT.SubjectAreaOfInstruction.

DBS.SUBJECT.GradeLevel describes the level of intended instruction or use in schools and all levels of educational institutions (including home schooling). This corresponds to the EDNA.USERLEVEL (Australian Educational Network Server ( http://www.edna.edu.au ). We suggest this as a DC sub-element, namely, DC.SUBJECT.UserLevel.

DBS.MEDIUM describes the physical medium of an object which is electronically described using a HTML-file, but may not be in an electronic form or has alternative formats. Examples are: Audiocassettes, CD-ROMs, Videocassettes, printmedium, data disc. We su ggest this be included under the DC.FORMAT as sub-element DC.FORMT.PhysicalMedium.

The organisation of the German Educational Resources Server includes self-registry of teaching, learning or other informational documents by means of a form developed for the GER / DBS http://dbs.schule.de/db/inconeue.html). This places the metadata into a mySQL-database which will be indexed using a dedicated gatherer / broker (currently Harvest, considerations are being made for other robots). Entering the registry form will automatically return a file containing the structural information according to the Dublin Core format. This file can then be included by the autor as part of his HTML, edited as it later may become necessary and be harvested by any search machine. Using these metatags enables a structured retrieval process in which a combination of various content terms from one - three fields (DC.TYPE, DBS.SUBJECT.USERLEVEL and DC.SUBJECT or DBS.SUBJECT.SubjectAreaOfInstruction) can be defined to give targeted results.

The state servers have responsibility for those items stored via their servers, the DBS has a general editorial center to approved entry of all items and maintain integrity. This is especially important because of the legal implications of having school pupil-authored items indexed in these servers.


10. Florida International University Digital Library

Overview

The Florida International University (Miami, Florida, USA) Digital Library (DL) is a university-wide digital library project which will be serving all schools/departments on campus including the libraries. The project is currently in its planning phase but the software installation will begin in October 1997 and the prototype DL should be up in late December. The DL is being implemented under the direction of Jackie Zelman, Director of University Computer Services, and represents a coordinated effort of staff from various parts of the university.

Collections

The focus of the FIU DL will be on images, sound, and video including multimedia presentations and curriculum modules and will support every subject covered by university teaching and research. The prototype DL will be comprised of the James Nelson Goodsell Collection which represents 40 years of Latin American history and politics originally captured on photographs, audiotape, and video (sample images). The second phase will see the addition of library course reserves, the School of Architecture Slide Collection, and images of art objects in the Mitchell J. Wolfson Decorative and Propaganda Arts Collection housed at the Wolfsonian Institute on Miami Beach. The collection size of the prototype DL will be close to 1,000 objects. The rate of growth has not yet been determined but is anticipated to be at least 5- 10,000 objects a year.

Software

The FIU Digital Library is a beta site for version 2 of the IBM Digital Library software which combines an object server (for the digitized resources) with an administrative DB2 database on a separate server for storing the metadata. The DL will be Z39.50 compliant, have a Web interface and will create HTML pages on the fly using a combination of the IBM DL software and CGI scripts which will be written locally. FIU will also be bringing up the IBM Query by Image Content (QBIC) system which will enable the user to search images by color, texture, and shape.

Metadata

The basis for each record is the Dublin Core Metadata Element Set and all 15 DC elements are being used plus the qualifiers Language, Type, and Scheme approved at DC4 and the proposed Creator/Contributor qualifier Role. Metadata are being collected in a single database record for each object and include administrative data, digitization details, linkage information, usage statistics, and description of the content. This can be seen in an !early draft! of the Object Descriptive Record Model. The creation of individual metadata records will be done initially under the supervision of Kass Evans, the DL's Coordinator for Data Modeling, who has experience as a librarian and a cataloger.

Many of the initial concerns about whether the DC 15 elements would be complete enough to be useful have been resolved by the addition of the DC qualifiers. Of note is the added ability to use the Language qualifier under Subjects and Description to add parallel English and Spanish fields and the addition of Role under Creator to enable users to search all artists, etc. Limitations being found with the DC elements stem from the fact that deployment is through an integrated searchable/browsable system that creates HTML pages on the fly rather than as elements in the Header of an HTML document. Some problems/questions: 1. Each object has multiple Resource Identifiers (eg. an image's thumbnail, access, and reference versions; a video's segments) and this is not exactly covered with qualifier Type 2. There are 3 types of Relation in the FIU DL that do not seem to correspond with the Relation Types parent, child, member. They are: a) a separate related title, b) a multimedia link to embed an audio narrative on the same HTML page as an image, c) a contents link such as when an art object is photographed from 2-4 different angles and so 2-4 clickable thumbnail images need to appear on the same HTML page. 3. FIU is using 3 types of Coverage: a) temporal b) geospatial c) subject which allows for tiers of subject browsing which is different from the Subject which is assigned from a thesaurus (ex. Fine Arts - Architecture - Commercial Architecture) 4. Resource Type - When the object is an image of an art object such as a sculpture, shouldn't the Resource Type be Art object or Sculpture rather than the medium eg. Image?

Searchable/Browsable User Interface

Descriptive metadata is being used to develop a user interface designed to access the universe of information through both search templates *AND* through multiple levels of browsability. The search templates will have keyword searching across all fields, boolean operators, and have both search input boxes and drop boxes allowing the user to click to limit a search by time, place, format, and resource type. The user interface will also provide the ability to browse by format, resource type, subject, place and time, or by collection.

See a simulation of the user interface. Follow these links: 1) Basic Search Template 2) Browse by Format... Select Image, then People, then Protestors Marching 3) Browse by Subject... Select Fine Arts, then Architecture, then Commercial Architecture, then Stock Exchange 4) Browse by Place/Time... Select Latin America. !!Please note that this is an early draft of the design concept and the final product will be more complete and the visual product of a team of artists.

Prepared by Katherine "Kass" Evans
Coordinator for Data Modeling
Florida International University Digital Library Project
Sept. 18, 1997


11. SCRAN

Reported by Ian O. Morrison, SCRAN, 17/09/1997

The Scottish Cultural Resources Access Network is a project to build a networked multimedia resource base for the study, teaching and appreciation of history and material culture in Scotland. The founding partners are the National Museums of Scotland, the Royal Commission on the Ancient and Historical Monuments of Scotland and the Scottish Museums Council. The National Lottery, by way of the Millennium Commission, is providing 49% of the funding, to the tune of 7.4 million UKP. The remainder of the cost is being met largely by in-kind contributions, principally the assignation of non-exclusive rights for educational usage of the resources.

By the year 2001 we will be providing easy access to 1.5 million text records of artefacts and historic monuments, and 100,000 related multimedia resources. SCRAN is also commissioning 100 multimedia essays, based on these resources and others, for use in schools and by a wider audience.

The SCRAN approach has been to combine the metadata necessary for resource discovery with aspects of the resources themselves. The intention is that most enquirers will get what they need directly from the SCRAN resource base, in the form of limited information, for example, about a museum object or an archaeological site, sometimes accompanied by a small illustration. In a minority of cases, where more detailed information is required, users will be directed to the original information providers, either by means of a hyperlink, where this is feasible, or by a unique reference number.

The Royal Commission on the Ancient and Historical Monuments of Scotland is also involved in a project, Accessing Scotland's Past, which involves mapping elements of RCAHMS records to both the SCRAN standard and that of the Archaeology Data Service. The Dublin Core has proved useful in providing the common ground on which this mapping can take place.

The SCRAN resource base comprises a variety of different kinds of records:

The provisional data standard has been drawn up to reflect the varied nature of the records. This can be obtained from the SCRAN Web site. It represents a hybrid between a museum collections management approach, drawing on the experience represented in the UK Museum Documentation Standard, SPECTRUM, and the emerging cross-domain standards for metadata, represented by the Dublin Core. The core information, necessary to represent and discover all kinds of resources within SCRAN, is held in a form compatible with the Dublin Core. It is proving more problematic, however, to map to the Dublin Core the standards required for delivery of all the other information that we hold. However, attempting this may be like trying to fit a quart into a pint pot.


12. BIBLINK

A. Project Description:

Project Name:
BIBLINK
Institutional Affiliation:
BIBLINK is funded by the European Commission under the Telematics for Libraries programme. It involves a number of European national libraries (France, UK, Norway, Spain, Catalonia, Netherlands), UKOLN, as well as publishers who will take part in the demonstartor phase.
Contact Person (persons) and email:
Rachel Heery, UKOLN
r.heery@ukoln.ac.uk
URL(s) if publically available:
<URL:http://www.ukoln.ac.uk/metadata/BIBLINK/>

B. Information System Summary:

1. Domain of Content
It is intended to deliver a prototype demonstration system which will enable publishers of electronic resources to input and transmit an agreed minimum level of data describing the resources to national bibliographic services, allowing those services to enrich the data (for example, by the addition of controlled subject terms or identifiers) and retransmit it to the publishers. We have now completed the background study and recommendations for the demonstration service, and are finalising the plans for Phase 2 (October 1997-March 1999) which will involve building and testing the demonstrator.
2. Projected Collection Size (number of records)
Many of the contributing publishers will be inputting several hundreds of records, but we hope to include details of web resources from smaller organisations where there will be a smaller input. Some agencies may be involved in submitting collections of records from third parties.
First year:
-
Third year:
-
3. Type of data, formats
Publishers will submit data in Dublin Core format or in specific SGML DTDs from which DC data will be extracted. The data will be converted to UNIMARC which will then be converted by participating libraries into their own MARC formats (perhaps using the USEMARCON software or other locally available software.)

C. Metadata:

1. What are the intended functions of the metadata used?
__ Management of the collection
__ Enhanced searchability
__ Interoperability
__ Value-added services
X_ Other:
Enhancing the national bibliographies
2. Who is creating the metadata? What training and support are provided?
The publishers will submit the data, collection of data may be through tools such as DC-dot.
3. Which metadata scheme is used?
Dublin Core
4. Encoding strategy:
__ external database
__ auxiliary HTML files (not embedded in resource)
__ metadata embedded in resource
To a large extent this will depend on the publisher, so we may well get a variety of services, documents, databases etc. And we will be including 'offline' electronic publications such as CDs as well.
5. Granularity of the description:
__ individual documents
__ groups of documents
__ collections
Currently unsure. Probably journal level descriptions. Possibly article (document) level as well/instead in some cases.
6. Dublin Core strengths and weaknesses encountered in deployment
Strengths: A ready made minimal set, a de facto standard with international support. Flexible, extensible.
Weaknesses: Needs some tweaking for electronic journal article level description, no provision for version control, confusion as to status of 'enumerated lists' of qualifiers
7. Which DC fields are used/not used?
Details of fields and extensions are outlined at
<URL:http://www.ukoln.ac.uk/metadata/registries/dc/biblink.html>
8. Have additional local fields been added?
Yes
9. Which qualifiers are used/needed?, at what level of detail?
10. Do you provide any subject description support (uncontrolled term lists, thesauri, classification systems)?
I expect we will!

D. Tools (provide URLs if freely available):

1. Conversion tools (eg., MARC -> DC)
USEMARCON, possibly.
2. Metadata creator tools
DC-dot, possibly
<URL:http://www.ukoln.ac.uk/metadata/dcdot/>
3. Automatic metadata extraction tools
-
4. Discovery tools (directory/browsing type)
-
5. Indexing and search tools
-
6. Other tools
-
7. Help system, tutorials etc.
-
8. Is the metadata indexed, browsable, searchable, through a local or external service?
-

E. Experiences from the planning and production phases:

1. Describe problems and accomplishments from both phases
-
2. Do you estimates of the costs and benefits for the creation or use of metadata?
-
3. What measures (or expectations) of service improvement through metadata usage do you have?
-


13. NewsAgent for Libraries

A. Project Description:

Project Name:
NewsAgent for Libraries
Institutional Affiliation:
NewsAgent for Libraries is a project within the UK Higher Education Electronic Libraries Programme (eLib). The project partners are LITC (South Bank University), CERLIM (University of Central Lancashire), Department of Information and Library Studies (UWA), Fretwell-Downing Informatics Ltd and the UK Office for Library and Information Networking (University of Bath). It also involves several publishers who will provide information sources for the project.
Contact Person (persons) and email:
Andy Powell, UKOLN
a.powell@ukoln.ac.uk
URL(s) if publically available:
<URL:http://www.sbu.ac.uk/litc/newsagent/>

B. Information System Summary:

1. Domain of Content
The aim of the NewsAgent project is to create an electronic news and current awareness service for library and information staff with a mixture of content streams, providing up to date descriptions of documents to end users based on user-configurable preferences. Content will include refereed and other papers, reviews andeditorial matter from the most highly respected UK journals in the field, including Program, VINE, Library Technology, Ariadne and the Journal of Librarianship and Information Science. News and briefing materials will be provided by The Library Association, the Institute of Information Scientists, UKOLN, the British Library, and LITC, which already produce printed publications in this field and which are all developing Web sites in addition.
2. Projected Collection Size (number of records)
Unknown at the present time.
First year:
-
Third year:
-
3. Type of data, formats
Publishers will embed Dublin Core into the HTML resources they make available on the Web for collection by the NewsAgent robot. The harvested data will be stored in the, Oracle based, NewsAgent database. In addition an email parser is being developed to generate metadata from the headers of messages sent to various email lists and USENET newsgroups.

C. Metadata:

1. What are the intended functions of the metadata used?
__ Management of the collection
__ Enhanced searchability
__ Interoperability
X_ Value-added services
__ Other:
Current awareness service.
2. Who is creating the metadata? What training and support are provided?
The publishers and other information providers will create the metadata. Guidelines are being produced to detail the format of this metadata. A variety of methods of creating embedded Dublin Core are expected to be used including DC-dot.
3. Which metadata scheme is used?
Dublin Core plus a few extensions
4. Encoding strategy:
X_ external database
__ auxiliary HTML files (not embedded in resource)
X_ metadata embedded in resource
To a large extent this will depend on the publisher, so we may well get a variety of methods of maintaining Dublin Core records used.
5. Granularity of the description:
X_ individual documents
__ groups of documents
__ collections
6. Dublin Core strengths and weaknesses encountered in deployment
Strengths: A ready made minimal set, a de facto standard with international support. Flexible, extensible.
Weaknesses: Confusion as to status of 'enumerated lists' of qualifiers. Limitations of the Date element.
7. Which DC fields are used/not used?
Details of fields and extensions are outlined at
<URL:http://www.ukoln.ac.uk/metadata/NewsAgent/dcusage.html>
8. Have additional local fields been added?
Yes, see above URL.
NewsAgent.topic, NewsAgent.date.validto, NewsAgent.contact, NewsAgent.coverage
9. Which qualifiers are used/needed?, at what level of detail?
-
10. Do you provide any subject description support (uncontrolled term lists, thesauri, classification systems)?
NewsAgent classification scheme as detailed at
<URL:http://www.ukoln.ac.uk/metadata/NewsAgent/classification/>

D. Tools (provide URLs if freely available):

1. Conversion tools (eg., MARC -> DC)
-
2. Metadata creator tools
NewsAgent version of DC-dot,
<URL:http://www.ukoln.ac.uk/metadata/NewsAgent/dc/>
3. Automatic metadata extraction tools
NewsAgent robot
4. Discovery tools (directory/browsing type)
-
5. Indexing and search tools
DALI (Fretwell-Downing)
6. Other tools
-
7. Help system, tutorials etc.
Not yet available
8. Is the metadata indexed, browsable, searchable, through a local or external service?
Local - NewsAgent system (based on Fretwell-Downing's DALI system).

E. Experiences from the planning and production phases:

1. Describe problems and accomplishments from both phases
-
2. Do you estimates of the costs and benefits for the creation or use of metadata?
-
3. What measures (or expectations) of service improvement through metadata usage do you have?
-


14. Medical Metadata Project

Institutional Affiliation: Oregon Health Sciences University, American Medical Informatics Association Internet Working Group, National Cancer Institute, Polytechnic University

Contact Person (persons) and email: mailto:gmalet@worldnet.att.net

URL(s) if publically available: http://medir.ohsu.edu/bicc-informatics/ebm/latest.htm

B. Information System Summary:

1. Domain of Content
Biomedical medicine

Test set is an National Cancer Institute Cancer Genetics Database

2. Projected Collection Size (number of records)
First year: 5000
Third year: 2 million

3. Type of data, formats
Diverse

C. Metadata:

1. What are the intended functions of the metadata used?

2. Who is creating the metadata? What training and support are provided?
Automated tool. Medical Librarian. Template entry.

3. Which metadata scheme is used?
Enhanced Dublin Core. HTML 4.0 compatible.

4. Encoding strategy:

5. Granularity of the description:

6. Dublin Core strengths and weaknesses encountered in deployment
Resource types and other DC standards not fully specified leads to possible instability and unfavorable economics.

7. Which DC fields are used/not used?
Not used:

8. Have additional local fields been added?
Yes: Medical Subject Headings.

9. Which qualifiers are used/needed?, at what level of detail?
SCHEME of course
Keywords:

10. Do you provide any subject description support (uncontrolled term lists, thesauri, classification systems)?
National Library of Medicine

D. Tools (provide URLs if freely available):

In development ( http://medir.ohsu.edu/bicc-informatics/ebm/latest.htm)

1. Conversion tools (eg., MARC -> DC)
In development....parse Medline records

2. Metadata creator tools
In development....JAVA extraction->editor> template entry.
See ( http://medir.ohsu.edu/bicc-informatics/ebm/latest.htm)

3. Automatic metadata extraction tools
Medical World Search, http://www.mwsearch.com/

4. Discovery tools (directory/browsing type)
Medical Matrix www.medmatrix.org

5. Indexing and search tools
Medical World Search, Harvest, http://www.mwsearch.com/

6. Other tools

7. Help system, tutorials etc.

8. Is the metadata indexed, browsable, searchable, through a local = or external service?
Yes.

E. Experiences from the planning and production phases:

Need to set out a specific Dublin Core specification. For example, the resource type syntax and elements are very difficult to figure out. DC documentation should have an official URL with all uptodate standard and implementation recommendations available.

1. Describe problems and accomplishments from both phases

2. Do you estimates of the costs and benefits for the creation or use of metadata?
$100,000 phase one.

3. What measures (or expectations) of service improvement through metadata usage do you have?
Field selection. Medline-line type access to distributed multi-media documents.


15. University of Washington Digital Library

A. Project Description:

The University of Washington Digital Library is contributing to the development and adoption of standard resource descriptors for networked information; our initial efforts in this arena focus on image collections. In collaboration with the Department of Electrical Engineering's Center for Information System Optimization (CISO), the University Libraries are using extended Dublin Core descriptors for image collections. "Content" is CISO's Web-based multimedia database management and archival system *; it is in production on the campus of the University of Washington managing digital image collections today. The software also stores, searches and displays audio and video, and is in test with these formats in various venues.

Technical Services staff from the Libraries lend expertise in information organization to implement the DC templates provided by default in Content's "acquisition" workstation. Using Libraries staff to train authoring paraprofessionals and subject experts offers an approach which leverages scarce cataloging staff while fostering distributed, accurate description of images by academic departmental staff doing standards-based tagging. This is made possible by the open architecture adopted by Content developers. Item and collection-level records are built using DC fields optionally in conjunction with field-level thesauri and validity-checking routines. Since the database can be maintained remotely from multiple acquisition sites, scholars and experts can control and enrich metadata from any networked site running Windows95. A recent grant from Intel Corporation provides the hardware base for the use of this high-performance tool across disciplines. The Content server runs on Windows NT and unix; client and acquisition station are in Visual Basic. The databases can be accessed with any Web browser.

Content provides the ability to do Boolean, full-text searching across databases; the use of the Dublin Core facilitates recall while assuring precision to the extent the database creators demand. Using Content as a common, extensible tool allows us to focus on testing the appropriateness of DC for image searching. Rigorous usability testing of the DC data model for images is now in the design stage.

Project Name:

The University of Washington Digital Library Initiative

Institutional Affiliation:

University of Washington

Contact Person (persons) and email:

Geri R. Bunker, Digital Library Coordinator/Interim Associate Director for Technical Services, University Libraries, bunker@u.washington.edu
Gregory L. Zick, Professor and Chairman, Electrical Engineering, zick@maxwell.washington.edu

URL: Content: high-performance software for multimedia archiving

B. Information System Summary:

1. Domain of Content

Diverse interdisicplinary collections, beginning with images for usability testing of the DC model. As model is proven and extended, development of standard metadata to entire array of Web-based resources.

2. Projected Collection Size (number of records)
First year:35,000 records
Third year:100,000

3. Type of data, formats
Image (jpeg, tiff); video (mpeg1), audio (any format playable with Web plug-ins, e.g., .wav)

C. Metadata:

1. What are the intended functions of the metadata used?

X Management of the collection
X Enhanced searchability
X Interoperability
X Value-added services
X Other: migration, provenance study, enrichment from distributed, remote collaborators .

2. Who is creating the metadata? What training and support are provided?
Database administrators-faculty, student researchers, library specialists, catalogers, lab techs.

3. Which metadata scheme is used?
DC stored in external files, using local html syntax; possible extension to SGML/XML using DTD currently being discussed for DC.

4. Encoding strategy:
__ external database
X auxiliary HTML files (not embedded in resource)
__ metadata embedded in resource

5. Granularity of the description:
X individual documents
X groups of documents
X collections

6. Dublin Core strengths and weaknesses encountered in deployment
Strengths:
Formal usability testing of DC for searching images has not yet begun, but initial configuration of local resource database has shown that scholars and researchers creating image databases are keenly interested in methodologies to describe t heir objects accurately and in standard fashion. Elements critical to discovery which contain the scholar's unique intellectual contribution (description, subject, relation, coverage) are of interest for their ability to be structured and extended. In ad dition, the Rights Management element is expected to prove critical to locally produced digital image creators. Experiments with watermarking, fingerprinting, etc. as links embedded through this element are expected.
Weaknesses:
The use of the Date element to describe the creation date of the surrogate seems to be less than useful. More important seems to be date of the creation, production, publication of the original work depicted in the digital reproduction of t he image. Also, further work is needed on the hierarchical logic employed with fields such as Coverage and Relation.

7. Which DC fields are used/not used?
For homogeneous collections, Resource type may be needed at the collection-level, rather than (or in addition to) the item level. We find it sometimes omitted from the database creators' thinking altogether when first we are approached with a collection.

8. Have additional local fields been added?
Administrative metadata is particularly important to stewards of digital collections who will need to be concerned with migration to new technologies, derivative uses of the "original"; therefore fields including transmission and capture data are often added. In addition, the provenance of the original piece is of critical value and is often required to be kept confidential. (Our implementation allows for hidden fields to be viewable only in "staff mode", allowing us to standardize within the template on such added fields.)

9. Which qualifiers are used/needed?, at what level of detail?
We will support extensive use of qualifiers as needed on a case-by-case basis. Content's implementation allows for the customization of each database by the database administrator (through a Web form) while allowing cross-database searching through mapping of DC fields.

10. Do you provide any subject description support (uncontrolled term lists, thesauri, classification systems)?
Each DC field is allowed its own thesaurus which can be imported from standard sources, or built locally from preferred terms specifically tailored to the database. We will provide a synonym-matching function for both creation and retrieving. We will provide support for classification schema through the Subject element.

D. Tools (URL is freely available):

Demonstration and download of Java version of the search client in Content

Is the metadata indexed, browsable, searchable, through a local or external service?
Our images and metadata are managed through the local implementation of the Content system. A fully documented API is available which would provide the ability to search Content databases externally.

E. Experiences from the planning and production phases:

What measures (or expectations) of service improvement through metadata usage do you have?
The University of Washington understands that standard metadata sets and interoperable acquisition, storage, discovery and display functionality is critical to the future of networked information. We hope that by using and improving a common set of tools, platforms and metadata methodologies, our users will enjoy significant increases in productivity and enhanced facility with network navigation.

* Zick, Greg, Lawrence Yapp and Craig Yamashita., "Content; a practical, scalable, high-performance multimedia database", ACM International Conference paper, July 1997.


16. Math-Net

A. Project Description:

Internet Services for Mathematics Germany http://elib.zib.de/math-net/

Project Name: Math-Net

Institutional Affiliation:
Konrad-Zuse Zentrum fuer Informationstechnik -Berlin-
Technische Universitaet Chemnitz-Zwickau - Fb. Mathematik -
Martin-Luther Universitaet Halle-Wittenberg - Fb. Mathematik -
Universitaet Kaiserslautern - Fb. Mathematik -
Universitaet zu Koeln - Fb. Mathematik -
Technische Universitaet Muenchen - Fb. Mathematik -
Universitaet Osnabrueck - Fb. Mathematik/Informatik -
Universitaet - GHS - Paderborn -Fb. Mathematik -
Universitaet Rostok - Fb. Mathematik -

Contact Person (persons) and email: http://elib.zib.de/math-net/wegweiser.html

URL(s) if publically available: see above

B. Information System Summary:

1. Domain of Content: Mathematics

2. Projected Collection Size (number of records)
First year:
Third year:(2 years project)

3. Type of data, formats: PostScript, HTML, (PDF)

C. Metadata:

1. What are the intended functions of the metadata used?
Management of the collection: SIDE EFFECT
Enhanced searchability: YES
Interoperability: INTENDED
Value-added services: YES
Other: ____________________________________________

2. Who is creating the metadata? AUTHORS
What training and support are provided? METAMAKERS, NO SPECIFIC TRAINING NECESSARY-
Example: http://www.mathematik.uni-osnabrueck.de/projects/META/MetaMake2.2.html

3. Which metadata scheme is used? DC

4. Encoding strategy:
__ external database
__ auxiliary HTML files (not embedded in resource) FOR PostScript (PDF)
__ metadata embedded in resource: FOR HTML

5. Granularity of the description:
individual documents: YES
groups of documents: YES
collections:

6. Dublin Core strengths and weaknesses encountered in deployment
FLEXIBLE, DOES NOT INCLUDE RATING

7. Which DC fields are used/not used?
http://www.dstc.edu.au/DC4/roland/

8. Have additional local fields been added?
Some did for internal use.

9. Which qualifiers are used/needed?, at what level of detail?
see 7.

10. Do you provide any subject description support: (uncontrolled term lists, thesauri, classification systems)?
YES

D. Tools (provide URLs if freely available):

http://elib.zib.de/math-net/werkzeuge.html
-- MORE UNDER DEVELOPMENT

1. Conversion tools (eg., MARC -> DC)

2. Metadata creator tools:
SEE ABOVE

3. Automatic metadata extraction tools:
HARVEST -- To what extend this statement makes sense would require a lengthy explanation.

4. Discovery tools (directory/browsing type) ??

5. Indexing and search tools: HARVEST based

6. Other tools

7. Help system, tutorials etc.

8. Is the metadata indexed, browsable, searchable, through a local or external service?
SURE -- THATS THE USE OF IT -- The project is an open + distributed one

E. Experiences from the planning and production phases:

1. Describe problems and accomplishments from both phases:
MetaData for preprints (papers in general) planning: http://www.mathematik.uni-osnabrueck.de/ak-technik/mail.html
MetaData sources for other material relevant for the subject will become available soon.

http://www.mathematik.uni-osnabrueck.de/projects/workshop97/abstracts/dalitz.ps

2. Do you estimates of the costs and benefits for the creation or use of metadata?
COSTS PER AUTHOR 0, QUALITY IMPROVEMENT FOR RETRIEVAL REMARKABLE.

3. What measures (or expectations) of service improvement through metadata usage do you have?
See yourself: http://www.mathematik.uni-osnabrueck.de/harvest/brokers/MathN/

as opposed to http://www.mathematik.uni-osnabrueck.de/harvest/brokers/niedersachsen/


17. The Victoria and Albert Museum

Reported by Douglas Dodds, Head of Collection Management, National Art Library, Victoria and Albert Museum

The Victoria and Albert Museum http://www.vam.ac.uk (V&A) is one of the partners in the EC-funded Electronic Library Image Service for Europe http://severn.dmu.ac.uk/elise/ (ELISE II) project, which started in October 1996 and runs for three years. The ELISE partners are:

The  ELISE service will operate on a client / server model, making use of Z39.50 and Dublin Core. In the ELISE II prototype, the catalogue data supplied by participating institutions is mapped to DC and displayed alongside thumbnail images. The relationship between analogue original objects and digital surrogate objects is one of a number of areas still to be explored in relation to the use of DC metadata. ELISE also expects to make use of DC at a collection level, in order to assist users in identifying relevant collections to search. In a related area, the V&A's National Art Library is leading the research investigating the use of appropriate thesauri by the participants in ELISE. It is anticipated that this research will result in proposals which link to the use of DC at collection and/or object level.

The National Art Library also contributed draft DC metadata examples for the report of the Visual Arts Data Service workshop held in Edinburgh in March 1997. The NAL's records describe the Library's HTML pages relating to Charles Dickens.


Activities in the Field of Electronic Information Management and Meta-Data in Physics

Reported by:
Olaf Bieker
Thomas Severiens

Activities on this field:
The goal is to build up a distributed electronic information system in physics. The first version of this system is Harvest-based. A link-list of (hopefully) all European Physics Departments EuroPhysDep underlies a Harvest-Search-Machine. The only longterm possibility (from our point of view) to produce complete, correct, and readable search-results is a common basis of Meta-Tags on the Web-Pages of the participant departments. To make it as simple as possible for the authors to produce a complete and correct set of Meta-Tags, the MMM authoring-tool was developed.

DC-Meta-Data is also used in the EPRINT-project to combine the global preprint-database xxx and the distributed preprints on the local webservers (EuroPhysDoc). A common search-interface for both types of preprints is planned to be set up. One of the big questions is how to force the authors to publish their articles with Meta-Tags on the local webserver. One solution is developed with a Web-Upload-Form-Interface (coming soon).

Implementations in DC:
Implementing DC into EuroPhysNet, xxx, and EPRINT indicates that still some changes and additions into DC should be done:


19. Australian Geodynamics Cooperative Research Centre

Reported by Simon Cox,
AGCRC-CSIRO Exploration & Mining
Perth, Western Australia

The Australian Geodynamics Cooperative Research Centre (AGCRC) - a collaboration between two public research organisations and two universities - is using the WWW as a primary delivery system for the results of its research. The results are composed of a variety of material presented as a number of different resource types. In particular

  1. a substantial document archive has been established comprising at least the abstracts of all reports and publications - currently stored as complete HTML documents
  2. a number of numeric datasets are available for visualisation and modelling using interactive web-based tools - currently these are mainly stored in a GIS (GRASS) database.

The primary discovery mechanisms for these resources consist of

  1. an index and search engine, based on AltaVista software customised for (optionally) searching on particular elements within documents and also to use geographic information included in documents
  2. the menus comprising the entry point into the visualisation and modelling application

These use two different metadata systems:

  1. the publications include embedded metadata, using the <META ...> element in the HTML header, - with semantics following Dublin Core extended with the ANZLIC standard for geospatial information, and local (AGCRC) terms for more specialised information
  2. numeric data are "published" to the system through small description files placed in a special directory, including information about the format and location of the dataset, some keywords, and links to associated text-based descriptions, thumbnail and preview images, etc. - these currently use a locally developed extensible attribute=value syntax which we call "GIX".

A crude link has been made between these two systems by allowing maps produced by the visualisation tool to act as an interface onto the document search system, with key-words chosen from names of the map-layers selected. However, we would like to unify the system properly, in order deliver targetted information regarding Australian Geodynamics to the client regardless of what type of resource it is.

Constraints on implementation include:

The intention is to move to a system based on

Note that DC is not the main metadata system in this framework, but resources will be provided in DC when requested. The minimalist/structuralist issue in DC is thus tangential, though it is likely that we will only support the minimalist version. For more detailed uses we have a completly extensible system under our control, which will be specified through an XML DTD, and mapping tables or crosswalks to refer to other systems

This approach introduces a few challenges.


20. Interconnect Technologies Corp. projects

Reported by Mike Raugh

A. Project Description:

Several projects involving development of digital libraries for Federal and Corporate clients

Project Name:

(Example) Aviation Safety Digital Library

Institutional Affiliation:

Interconnect Technologies Corp.

Contact Person (persons) and email:

Mike Raugh, raugh@interconnect.com;
Diane Hillmann, dih1@cornell.edu

URL(s) if publically available:

Not available yet

B. Information System Summary:

1. Domain of Content:

(Example) R&D website containing selected information primarily for general aviation

2. Projected Collection Size (number of records)

First year: 1,000 (subset subject to DC cataloging)
Third year: 10,000 (subset subject to DC cataloging)

3. Type of data, formats:

Primarily text documents, in PDF, HTML or ASCII.

C. Metadata:

1. What are the intended functions of the metadata used?

2. Who is creating the metadata? What training and support are provided?
Most created by Diane Hillmann, under contract. Guidelines being written so some can be provided by others.

3. Which metadata scheme is used?

Minimal Dublin Core + Additional Module (e.g., specialized metadata for aviation safety information)

4. Encoding strategy:

We would like to embed in resource at some point, if possible. We do create sample datasets for clients demonstrating the benefits of embedding.

5. Granularity of the description:

We are debating whether to catalog individual documents of large collections (order of 500,000 structured records) or to merely catalog the collections

6. Dublin Core strengths and weaknesses encountered in deployment

Simplicity both strength and weakness

Some elements still a bit ambiguous, need to be better defined both in general and within this project

7. Which DC fields are used/not used?

Elements used:

  1. title
  2. creator
  3. subject
  4. description
  5. publisher
  6. contributors
  7. date
  8. type
  9. format
  10. identifier (both qualified and unqualified)
  11. source

Elements not used

  1. language
  2. relation
  3. coverage
  4. rights

8. Have additional local fields been added?

No.

9. Which qualifiers are used/needed?, at what level of detail?

Have qualified "identifier" to use with URL and also used unqualified for other identifying numbers on documents.

10. Do you provide any subject description support (uncontrolled term lists, thesauri, classification systems)?

Haven't yet but are considering several options, including term lists.

D. Tools (provide URLs if freely available):

1. Conversion tools (eg., MARC -> DC)

2. Metadata creator tools:

Template for DC

3. Automatic metadata extraction tools

4. Discovery tools (directory/browsing type):

Browsing of catalog records via index hierarchies, with final links to resource

5. Indexing and search tools:

Searching, with optional filtering on metadata

6. Other tools:

Display of metadata

7. Help system, tutorials etc.

8. Is the metadata indexed, browsable, searchable, through a local or external service?

We provide both local services and client-based services.

E. Experiences from the planning and production phases:

1. Describe problems and accomplishments from both phases

Many of our problems have been related to the DATE element, primarily because of its limitations in its present form. Not only did we need more than one kind of date, we were concerned about being too restrictive about format.

The other major problems had not much to do with the Dublin Core,but with the dysfunctional practices of the web sites we were working with. They tended to change URLs, and as one moved to frames, we lost the ability to relate our metadata to a specific item.

2. Do you have estimates of the costs and benefits for the creation or use of metadata?

3. What measures (or expectations) of service improvement through metadata usage do you have?


21. University of Michigan Digital Library Registry Database

Last modified: 9/22/97

David L. Richtmyer
Tech. Services Electronic Resources Librarian (Monographs)
Harlan Hatcher Graduate Library
The University of Michigan
Ann Arbor, MI 48109-1205
dlrichtm@umich.edu


A. Project Description:

Create a searchable, browsable database of World Wide Web resources that have been chosen for their institutional or academic value. Provide metadata elements facile enough for non-specialist users ("registrars") to supply most of the element content, yet robust enough to support sophisticated browse, search and retrieval functionality.

B. Information System Summary:

E. Experiences from the planning and production phases:


22. INDOREG : INternet DOcument REGistration

Danish Library Centre

Reported by Randi Diget Hansen

In recognition of the fact that national information and cultural heritage are not only conveyed in the form of printed matter, the Danish Library Centre (DBC) and the Royal Library decided in 1995 to propose to the National Bibliographic Council and the Danish Ministry of Culture that the national bibliography should also include electronic documents, although only those found in a physical state in the form of diskettes and CD-ROMs. Such documents have been included in the Danish book list and Danish periodical list since 1996.

At the same time, developments in the form of Web publication on the Internet were moving so fast that DBC decided to launch a project to find out whether net publications could and should be subject to bibliographic control in the same way as printed and electronic publications in fixed physical form. The reason for this was that we felt the existing search engines on the net suffered from the general problem of searching in unqualified data and generally replying with excessive amounts of data. We also felt that the information contained in net publications did not basically differ from the information in publications in fixed physical form. If net-borne publications are excluded from bibliographic control, there is a risk that many people will find it difficult to gain access to an increasing amount of the information citizens need, as opposed to the information stored in products in fixed physical form. The ultimate target was that registration should be in DanBib (the joint superstructure system for the complete Danish library system) alongside the national bibliography and the total list of publications at Danish libraries. DanBib should also be used to make direct linking to documents possible.

In addition, a new Legal Deposit Act was pending (and was passed during the project period). It covers all electronic documents on the net which are deemed likely to be available as independent units in a final form. The Act requires that a general standpoint should be taken in terms of national bibliographic registration.

In the autumn of 1996 the Danish National Library Authority (Statens Bibliotekstjeneste, or SBT) decided to provide joint funding of the project. At the same time the principle of national bibliographic registration of electronic documents on the Internet was presented to the National Bibliographic Council, who expressed their agreement. In this connection the project's ultimate target was altered to include not only evaluation of bibliographic control in general, but also proposals as to how the work involved in national bibliography should be performed.

In order to obtain a model for a national bibliographic registration system that includes net publications, the project has focused on the following areas in particular.

Inclusion criteria

Proposed principles for national bibliographic inclusion criteria have been drawn up. They operate with the concepts of static and dynamic publications - with homepages as an independent category under dynamic publications. These principles reflect the criteria that exist for publications in fixed physical form, since there are formal requirements with regard to both size and (to a certain extent) content. For instance, it is proposed that publications of a commercial, internal, highly local or private nature should not be included.

Registration method

The proposed registration method seeks to cover the special needs of net publications in terms of description and format. The problems of describing static and dynamic publications vary. The level of registration has not been finalised yet; for one thing, the new cataloguing rules drawn up in parallel to the project by one of the project participants will have to be tested in practice before the levels in the national bibliography generally are recommended to SBT in the autumn of 1997.

Self-registration by authors/publishers using metadata is regarded as a necessary supplement if very large amounts of information are to be registered.

For this purpose there is produced a modified version of the Dublin Core Template as used in the Nordic Metadata Project to allow for Danish conditions.

Tracing and maintenance

To ensure the constant validity of addresses (URLs), a PURL server has been established which functions as a central exchange. If an international number system is adopted like the ISBN system, it must of course be used. But a solution is required immediately. The further development of automatic tools to check publications is needed. In addition, we feel that the current collaboration between national bibliography and legal deposits should continue. Finally, concentrated efforts are needed to persuade authors/publishers of net publications to regard such publications in the same way as printed publications.

Storage in DanBib

The DanBib base will be given a Web interface in the autumn of 1997, making it possible to link from registration to net publications.

A project report is available at http://www.purl.dk/rapport/html.uk/


23. Metadata Project of State Library of Queensland

Institutional Affiliation:
State Library of Queensland

Contact Person (persons) and email:
Jennie Thornely; J.Thornely@slq.qld.gov.au

B. Information System Summary:

1. Domain of Content :
Gov

2. Projected Collection Size (number of records)
First year: 450
Third year: 650

3. Type of data, formats
html; text; jpeg; gif

C. Metadata:

1. What are the intended functions of the metadata used?
_y_ Management of the collection
_y_ Enhanced searchability
_y_ Interoperability
_y_ Value-added services
__ Other:
provides linkage for opac in the near future

2. Who is creating the metadata? What training and support are provided?
The Project is managed by Manager, Technical Services and Senior Librarian, Internet Services Unit.
The project is intended to be finished by Technical Services.

3. Which metadata scheme is used?
Dublin Core

4. Encoding strategy:
_N_ external database
_N_ auxiliary HTML files (not embedded in resource)
_Y_ metadata embedded in resource

5. Granularity of the description:
_Y_ individual documents
_N_ groups of documents
_N_ collections

6. Dublin Core strengths and weaknesses encountered in deployment
Flexiblity, easy to use and understand;
Still evolving.

7. Which DC fields are used/not used?
We decided to use two levels of complexity, ie, for simple (contents) files we use title/creator/date/description/rights; for the more complicated (contents) files, we apply all the appropriated elements.

8. Have additional local fields been added?
No.

9. Which qualifiers are used/needed?, at what level of detail?
We use type and scheme, eg. LC Name authority; LCSH.

10. Do you provide any subject description support (uncontrolled term lists, thesauri, classification systems)?
No.

D. Tools (provide URLs if freely available):

1. Conversion tools (eg., MARC -> DC)
No

2. Metadata creator tools
No

3. Automatic metadata extraction tools
No

4. Discovery tools (directory/browsing type)
No

5. Indexing and search tools
will develop

6. Other tools
No

7. Help system, tutorials etc.
Working on a local procedure/manual

8. Is the metadata indexed, browsable, searchable, through a local or external service?
Yes

E. Experiences from the planning and production phases:

1. Describe problems and accomplishments from both phases
a. Took awhile for different Divisions to agree for their staff to deploy metadata when they creat the Divisional Web pages.
b. Difficult to discuss problems relating to Dublin Core (locally), as there are not many 'experienced metadata deployers' around.
c. Have to change the syntax a few times when new information was received via meta2.

2. Do you estimates of the costs and benefits for the creation or use of metadata?
cost in $$$ - No
benefits - Yes, as discussed by many people.

3. What measures (or expectations) of service improvement through metadata usage do you have?
We are still in the process of deploying metadata on Web pages.
We would like to see the increase usage of State Library's Website in the near future


MedExplore & Dublin Core generation

Reported by Jacques DUCLOY

CRIN - CNRS & INRIA Lorraine

Preliminary Version

1 - The MedExplore Project

The aim of MedExplore is to give a group of experts confronted with an unforeseen situation the mastery of its terminological resources and a synthetic and deeper knowledge of the state of the art. This will be achieved by the creation of a system of investigation.

Such a system allows an user to navigate through concept graphs and to manipulate conjointly various pieces of information (large international databases, local source documents, raw information from the INTERNET such as news-groups), written in different languages. We have chosen to begin on biomedical fields due to the large amount of what we call "structuring" funds (such as MEDLINE, EMBASE or PASCAL which possess an homogeneous indexing and, with some associated projects like UMLS, a quasi knowledge based representation).

Our project deals with Metadata with two complementary aspects. First one is "how to improve our retrieval on the INTERNET".

The second one sounds like an indirect but very interesting effect. As we could know the main contents which allow to retrieve a set of relevant documents, we are able to use them for producing the good metadata allowing an information to be well retrieved by the search-engines. We are beginning to experiment the generation of Dublin Core metadata from an existing set of local information

2 - The DILIB Workbench

The engineering techniques used for MedExplore rest upon the generalisation of SGML codification allowing the use of SGML toolboxes and linguistic modules libraries. We have developed DILIB, an SGML workbench which contains a set of basic components for building Information Retrieval Systems

2.1 - SGML, homogenisation of information

We use to convert all information in an SGML markup whose structure is very close to original one. For instance a downloaded record issued from MEDLINE such as:

 AN : 96081277
 TI : Orthotopic pulmonary valve replacement with a homograft.
 AU : Saha K,Iyer KS, Sharma R, Bhan A, Airan B, Venugopal P
 CS : Department of Cardiothoracic and Vascular Surgery, All India Institute of Medical Science
 JN : J Heart Valve Dis CP : (ENGLAND)
 PY : Mar 1995
 VO : 4 (2) p187-91
 SN : 0966-8519
 LA : ENGLISH ...

becomes:

<medline>
   <AN>96081277</AN>
   <TI>Orthotopic pulmonary valve replacement with a homograft.</TI>
   <AU><e>Saha K</e><e>Iyer KS</e><e>Sharma R</e><e>Bhan A</e><e>Airan B</e><e>Venugopal P</e></AU>
   <CS>Department of Cardiothoracic and Vascular Surgery, All India Institute of Medical Sciences, New Delhi, India.</CS>
   <JN>J Heart Valve Dis</JN>
   <CP>(ENGLAND)</CP>
   <PY>Mar 1995</PY>


We can do that for traditional MARC formats (ISO 2707), and a CCF (UNESCO) record like:

-- header position 7 = s
001 157028
020 00@BISDS 
022 11@A19880120 
101 00@A0253-021X 
201 00@ALegisLative study - Food and ... Agriculture
210 00@AEtudes législatives - ... Agriculture @Lfre
400 00@ARome@BFood and Agriculture Organization of the United Nations

can be marked as:

<record h7="s">
<f001>157028</f001>
<f020 ind="00"><sB>ISDS</sB></f020>
<f022 ind="20"><sA>198880120</sA></f021>
<f201 ind="00"><sA>Legislative study - Food and ... Agriculture
<f210 ind="00"><sA>Etudes Legislatives ... Agriculture</sA>
               <sL>fre</sL>...
</record>

We may remark that only two filters are required to convert all king of MARC format in that way.

In some case, it may be interesting to proceed little transformation on a set of data. For instance, we need to handle multilingual information coming from UMLS whose records look like:

   C0017379|ENG|P|L0017379|PF|S0022690|Carriers, Genetic|
   C0017379|ENG|P|L0017379|VW|S0044411|Genetic Carriers|
   C0017379|ENG|P|L0017379|VWS|S0022684|Carrier, Genetic|
   C0017379|ENG|P|L0017379|VWS|S0044407|Genetic Carrier|
   C0017379|POR|P|L0436728|PF|S0561010|TRANSPORTADORES GENETICOS|
   C0017379|SPA|P|L0447330|PF|S0571612|PORTADORES GENETICOS|

C0017379 identifies a "unique concept" which gets an English prefered form (Carriers, Genetic), various usual forms and some translations.

This information is available in a table format. In an SGML context, it is more convenient to group all information dealing with a particular concept into one record like:

<mrcon>
   <CUI>C0017379</CUI>
   <TP><PF>Carriers, Genetic</PF>
      <VW>Genetic Carriers</VW>
      <VWS>Carrier, Genetic</VWS>
      <VWS>Genetic Carrier</VWS></TP>
   <VL l="POR">
      <TP><PF>TRANSPORTADORES GENETICOS</PF></TP>
   </VL>
   <VL l="SPA"><TP><PF>PORTADORES GENETICOS</PF>
      </TP></VL>
</mrcon>
Once all our data are coded with an SGML mark-up we can apply the associated engineering facilities.

2.2 - Handling SGML information with DILIB

DILIB provides a set of tools for handling SGML or XML elements. They are available at different programming level.

For instance, if we want to add the key-word "AIDS" as an element tagged with <e> to an SGML element which is pointed by "kw" variable in a C program, we have to write:

   SgmlAddChild (kw, SgmlCreateLeaf("e", AIDS));

In the same way, we can use shell commands to handle some set of records. We have introduced a "path pattern mechanism" to specify a set of elements into a given documents. For instance, if we want to select records which contain "AIDS" as part of key-words and print the corresponding title, we just have to write:

SgmlSelect -g medline/KW/e#AIDS -g medline/TI -p @g2

(where "-g" is used by analogy with grep and @g2 identifies the 2th "g" sub-command)

As HTML and Dublin Core strongly deals with SGML, it becomes very easy to generate metadata in a programming environment. For instance, the following program:

   SgmlNode *meta;
   meta= SgmlCreateEmptyMark("META");
   SgmlSetAtt(meta, "name", "DC.subject");
   SgmlSetAtt(meta, "content", "AIDS");

will produce:

  <META name="DC.subject" content="AIDS">

2.3 Information analysis with MedExplore/DILIB

Now, if we want to know the good vocabulary to use in order to retrieve "good" documents or to produce relevant metadata contents we have to analyse a collection of information.

In that way, DILIB contains also a set of basic components in order to build customised information retrieval systems (in which internal data like inverted files are also coded on an SGML basis). These tools allow a global analysis of large set of information. At the present time, our tools mainly deal with combination of associations of terms.

For people who are not familiar with these techniques, an elementary way of analysing a set of records consist in printing a sorted list of terms with their number of occurences in a given corpus. For instance, if I know nothing about "Wallerian Degeneration", reading the following result:

     [393] Rats
     [268] *Nerve Degeneration
     [247] Wallerian Degeneration
     [241] Microscopy, Electron
     [215] *Wallerian Degeneration
     [184] Time Factors
     [128] Middle Age
     [121] Mice
     [120] Rats, Inbred Strains
     [115] Adult
     [76] Nerve Degeneration 

In this previous sample, knowing nothing about Wallerian degeneration, we can deduce that it deals with nervous system.

An other complementary way consist in using associations, for instance:

     [166] *Wallerian Degeneration - *Nerve Degeneration
     [119] Rats, Inbred Strains - Rats
     [109] Rats - *Nerve Degeneration
     [101] Rats - Microscopy, Electron
     [88] Rats - *Wallerian Degeneration
     [86] Wallerian Degeneration - Rats
     [80] Time Factors - Rats
     [75] Microscopy, Electron - *Nerve Degeneration 

But the better results will be obtained by clustering the associations which get some common terms. Below you will find a sample of cluster in which an expert could see a real convergence of relevant topics.

    List of Key-Words
     [34] Rats, Sprague-Dawley
     [62] *Nerve Regeneration
     [51] Nerve Crush
     [27] Sciatic Nerve_Injuries_IN
     [34] Sciatic Nerve_Physiology--PH
     [17] *Peripheral Nerves_Injuries_IN
     [38] *Sciatic Nerve_Physiology--PH
   Internal Relationships
     [7] Rats, Sprague-Dawley - Nerve Crush
     [11] Sciatic Nerve_Injuries_IN - Nerve Crush
     [10] Sciatic Nerve_Physiology--PH - Nerve Crush
     [10] Nerve Crush - *Nerve Regeneration
     [8] Sciatic Nerve_Injuries_IN - *Peripheral Nerves_Injuries_IN
     [7] Sciatic Nerve_Physiology--PH - *Nerve Regeneration
     [7] Nerve Crush - *Sciatic Nerve_Physiology--PH
     [6] Sciatic Nerve_Injuries_IN - Rats, Sprague-Dawley 

For a given field, the list of clusters gives the main thematic lines, for instance:

    *Wallerian Degeneration - *Nerve Degeneration
    Middle Age - Adult
    Mice, Inbred C57BL - Mice
    Rats, Sprague-Dawley - *Nerve Regeneration
    Immunohistochemistry - Wallerian Degeneration_Physiology--PH
    Nerve Regeneration - Nerve Degeneration
    Sciatic Nerve_Metabolism--ME - *Peripheral Nerves_Metabolism--ME
    Neural Conduction - Action Potentials 

(each cluster is identified by its more weighted association)

Applying these techniques on the authors allow the identification of the main teams (i.d. group of persons who use to publish together).

  Levine RA - Weyman AE
  List of Authors
     [5] Levine RA
     [5] Weyman AE
     [9] Lethor JP
     [4] Siu SC
     [4] Rivera JM
     [4] Handschumacher MD
     [3] Picard MH
     [2] Juilliere Y
  Internal Relationships
     [5] Levine RA - Weyman AE
     [5] Lethor JP - Weyman AE
     [5] Lethor JP - Levine RA
     [4] Siu SC - Weyman AE
   ...
     [3] Lethor JP - Picard MH
     [3] Handschumacher MD - Picard MH
     [2] Juilliere Y - Lethor JP

The traditional area of this set of techniques deals with technical or scientific surveys. In the framework of MedExplore project we use them to search through the INTERNET. Some search engines like AltaVista, with the "live topics facilities", give now the same kind of facilities.

3 - Using MedExplore to search information from the INTERNET

Thus we aims to provide to different types of user (from the traditional "end-user" to the "expert in analizing data") a server in which he could access to different sources of data in a coherent an homogeneous way.

In other words, we have to solve the different level of inter-operability between heterogeneous data. SGML/XML brings us a good answer for the "cofidication - structuration level". Now we have to deal with more semantic levels. As we work in specialised area, we have chosen to simplify this problem by defining a kernel vocabulary which contain a limited number of terms (between 100 and 300). Such a lexicon can be generated by automatic tools (for instance cluterization) and, in a second step, improved by a specialist.

For the extraction of documents from the INTERNET, we are testing an approach which gives us some first and "no so bad" results. The principle consist in associating to each term of the kernel vocabulary an histogram of the more frequent word which are founded in the abstracts of corresponding MEDLINE records.

For instance, for the key-word "Newborn, Infant", in a local base dealing with "cardiology", the associated histogram looks like:

   [59] patient
   [42] pulmonary
   [37] tetralogy
   [33] fallot
   [29] infant
   [27] heart
   [25] artery
   [24] defect
   [19] month
   [19] outflow
   [19] ventricular 

Now, we just have to send a query on Altavista, using this set of words.

If we use the same kind of techniques for an author, we obtain interesting results too. Here is the vocabulary associated to Pr. JP Lethor from its bibliography on MEDLINE:

   [35] ventricular
   [31] volume
   [27] left
   [26] dimensional
   [20] coronary
   [19] three
   [16] method
   [15] patient
   [13] defect
   [13] image
   [12] excised
   [11] doppler
   [10] artery
   [10] echocardiography
   [10] tau
   [9]  pressure

We are also implementing multilingual queries by using multilingual bibliographic data bases such as PASCAL.

4 - Generating Dublin Core Metadata

From the computer point of view, the result of MedExplore is a set of resources, such as tables, which can be used in a generation process.

For instance, we are working on the generation of a server allowing browsing through a collection of Medical Images. Each image is described in French with a little set of information such as:

<doc>
  <id>lethor/007_001
  <auteur><e>Lethor JP
  <specialite>Cardiologie infantile
  <tech><e>Radiographie
  <acquis><e>
  <organe><e>Coeur
     Poumon
  <patho><e>Tétralogie de Fallot
  <motif>rupture de patch infundibulaire
  <age>12 ans
  <cr><e>rupture de patch infundibulaire après correction chirurgicale de tétralogie de Fallot
  <resultat>tétralogie de Fallot
We are using the MedExplore's techniques and Dilib's tools to improve the indexing and to generate a server.

For instance, for the previous image we can generate an HTML page which can be accessed here, and whose Metadata are the following:
<meta name="DC.title" lang="fr"
      content="Image : tétralogie de Fallot">
<meta name="DC.creator" content="Lethor JP">
<meta name="DC.subject" lang="fr"
      content="Coeur, COEUR (RADIOGRAPHIE ), Poumon, POUMON (RADIOGRAPHIE ), Radiographie, Tétralogie de Fallot">
<meta name="DC.subject"
      content=" Heart, Heart_Radiography, Lung, Lung_Radiography, Radiography, Tetralogy of Fallot">
<meta name="DC.subject" scheme="MESH"
      content=" Heart, Lung, Radiography, Tetralogy of Fallot">

5 - Conclusion: improving MetaData extraction and generation

At the present time, we have not yet implemented a real utilisation of Metadata. The previous sample was just done for feasibility reasons. For the next future we are investigating two ways:

First we want to improve our retrieval performances. In other word we have to define some more relevant profile for each element of our kernel vocabulary. We plan to:

Once these Metadata are identified, we have shown that we can use them to produce a good set of relevant Metadata to be used by an organisation working on a particular medical topic to see its documents well retrieved on the INTERNET.

An interesting feature of the Dublin Core deals with the generation of several set of contents with they various scheme. We will also work on the use and generation of some other specific elements, for instance, in a set of medical images, the implementation of the coverage element related to the ages of the patients.


25. UC Berkeley Digital Library Catalog

A. Project Description:

Project Name:
Digital Library Catalog
Institutional Affiliation:
The Library, University of California, Berkeley
Contact Person (persons) and email:
Roy Tennant
rtennant@library.berkeley.edu
URL(s) if publically available:
<URL:http://sunsite.berkeley.edu/Catalog/>

B. Information System Summary:

1. Domain of Content
Presently, all digital objects of intellectual significance on the Berkeley Digital Library SunSITE. The site has a more complete collection description.
2. Projected Collection Size (number of records)
First year:
Presently around 20,000 records, could grow beyond 30,000 by year's end.
Third year:
No idea.
3. Type of data, formats
Items being described include books, essays, speeches, and other textual material in HTML, technical reports (in various formats), photographs, engravings and other visual materials, and video and sound clips.

C. Metadata:

1. What are the intended functions of the metadata used?
__ Management of the collection
X_ Enhanced searchability
X_ Interoperability
__ Value-added services
__ Other:
2. Who is creating the metadata? What training and support are provided?
Project is being prototyped by a library assistant and a professional librarian, with no special training or support. This will change when/if the project goes beyond a prototype.
3. Which metadata scheme is used?
Dublin Core and two additional elements for local use.
4. Encoding strategy:
__ external database
X_ auxiliary HTML files (not embedded in resource)
__ metadata embedded in resource
Presently working with external HTML files, but will likely migrate to an external database after the prototype phase.
5. Granularity of the description:
X_ individual documents
__ groups of documents
__ collections
Where "documents" includes photographs, movies, sound clips, etc.
6. Dublin Core strengths and weaknesses encountered in deployment
Strengths: A ready made minimal set, a de facto standard with international support. Flexible, extensible.
Weaknesses: Lack of definition regarding what should be in each element and how it should be encoded.
7. Which DC fields are used/not used?
We provide for the user of all Dublin Core elements, and most records use most of them. The fields most unused are probably COVERAGE and CONTRIBUTOR.
8. Have additional local fields been added?
Yes, "UCB.Notes" and "UCB.Version".
9. Which qualifiers are used/needed?, at what level of detail?
In DC.subject, we use "(SCHEME=LCSH)" for Library of Congress subject headings.
10. Do you provide any subject description support (uncontrolled term lists, thesauri, classification systems)?
Yes, we have provisions for using Library of Congress Subject Headings (LCSH) as well as uncontrolled subject keywords.

D. Tools (provide URLs if freely available):

1. Conversion tools (eg., MARC -> DC)
2. Metadata creator tools
Locally-produced Web form and Perl scripts, see
<URL:http://sunsite.berkeley.edu/Catalog/behind.html/> for screen-shots; the system itself is not publically available.
3. Automatic metadata extraction tools
Locally produced Perl programs, that are tailored to specific input streams.
4. Discovery tools (directory/browsing type)
-
5. Indexing and search tools
We are using SWISH-Enhanced, which can limit searches to META tags (thus searches can be limited to specific DC fields)
<URL:http://sunsite.berkeley.edu/SWISH-E/>
6. Other tools
-
7. Help system, tutorials etc.
None yet.
8. Is the metadata indexed, browsable, searchable, through a local or external service?
Yes, at
<URL:http://sunsite.berkeley.edu/Catalog/>

E. Experiences from the planning and production phases:

1. Describe problems and accomplishments from both phases
-
2. Do you estimates of the costs and benefits for the creation or use of metadata?
We estimate the cost of manual record creation at $2-$2.40 US per record. We do not have an estimate for the cost of creating records by software translation, but it is clearly pennies apiece.
3. What measures (or expectations) of service improvement through metadata usage do you have?
We do not yet have any measures of service improvement, but we expect our users to have a much easier time of locating digital objects on our server, as well as serendipitous discovery of related objects from separate collections. We also anticipate service improvements from our ability to share our object records with others through cooperative projects.


26. EdNA

Description for Dublin Core 5 Workshop

EdNA logo Dublin Core logo

This document contains a description of the EdNA Directory Service and its use of metadata prepared for participants at the DC5 workshop in Helsinki, October 1997. It is structured based on the Dublin Core Project Summary Questionnaire v 1.0 970911 prepared for that workshop.


Contents

A. Project Description

  1. Project Name
  2. Institutional Affiliation
  3. Contacts and email
  4. URL(s)

B. Information System Summary

  1. Domain of Content
  2. Projected Collection Size
  3. Type of data, formats

C. Metadata

  1. What are the intended functions of the metadata used?
  2. Who is creating the metadata? What training and support are provided?
  3. Which metadata scheme is used?
  4. Encoding strategy
  5. Granularity of the description
  6. Dublin Core strengths and weaknesses encountered in deployment
  7. Which DC fields are used/not used?
  8. Have additional local fields been added?
  9. Which qualifiers are used/needed?, at what level of detail?
  10. Do you provide any subject description support?

D. Tools

  1. Conversion tools
  2. Metadata creator tools
  3. Automatic metadata extraction tools
  4. Discovery tools (directory/browsing type)
  5. Indexing and search tools
  6. Other tools
  7. Help system, tutorials etc.
  8. Is the metadata indexed, browsable, searchable, through a local or external service?

E. Experiences from the planning and production phases

  1. Problems and accomplishments from both phases
  2. Estimates of the costs and benefits for the creation or use of metadata?
  3. What measures (or expectations) of service improvement through metadata usage do you have?

A. Project Description

1. Project Name

EdNA (Education Network Australia)

2. Institutional Affiliation

EdNA is a collaborative project between all Australian States and Territories and all sectors of education and training; schools, vocational education and training (which include TAFE - technical and further education), adult community education, and higher education (universities). More details are available by selecting the "About EdNA" option from the home page of the EdNA web site. The "EdNA Directory Service" refers to the services provided at the web site http://www.edna.edu.au. "EdNA" is a broader process of collaboration of which the EdNA Directory Service is one product.

The EdNA Directory Service is managed by the Open Learning Technology Corporation (OLTC: http://www.oltc.edu.au).

3. Contacts and email

Email addressed to webdesk@edna.edu.au will be distributed to a range of people associated with the management of EdNA.

4. URL(s)

The EdNA Directory Service is at: http://www.edna.edu.au

The EdNA Metadata Standard is at:
http://www.edna.edu.au/edna/owa/info.getpage?sp=auto&pagecode=5210

B. Information System Summary

1. Domain of Content

Web based resources relevant to any sector of education and training in Australia, subject to various quality assurance procedures. Listed resources can be divided into two main types (although these are not treated differently in any fundamental way by the EdNA Directory Service).

A) Internal resources. Internet resources created by organisations involved in education and training in Australia. For example, a school home page, course material available online, research reports, course directories.

B) External resources. Internet resources created outside the education system but identified as being a relevant resource by some organisation within the EdNA structure. For example the " Dictionary of Gamilaraay/Kamilaroi" (an aboriginal language) or "The NASA Homepage".

In the initial identification of resources, the schools sector have concentrated on external resources and the vocational education and training and higher education sectors have concentrated on internal resources.

2. Projected Collection Size

There are currently over 3,700 individually identified URLs in the EdNA Database. However the EdNA search facility provides access to a much wider collection of resources because as well as indexing the actual pages in the Database, some URLs are tagged as sites or link pages and links on these pages are followed and indexed. As the EdNA Directory Service has not been formally launched yet, it is virtually impossible to predict the range of resources which might be identified over the next few years once EdNA is widely used throughout the Australian education and training systems.

3. Type of data, formats

The EdNA software allow for any URL type, but the overwhelming majority of resources identified are web pages, with a sprinkling of ftp: and gopher: resources.

C. Metadata

1. What are the intended functions of the metadata used?

The intended functions of metadata in EdNA are to:

2. Who is creating the metadata? What training and support are provided?

While a first draft of the EdNA Metadata Standard has been agreed, there has not been any substantial implementation of metadata to date as the tools to assist with its creation and entry are still being developed.

3. Which metadata scheme is used?

The EdNA Metadata Standard is based on Dublin Core with the addition of:

4. Encoding strategy

The ultimate location of metadata about items in the EdNA Directory will be in the EdNA database. There are five major ways in which item information gets into the EdNA database:

For 'internal' resources (ie those created by education organisations in Australia), the intention is that item owners will use the EdNA metadata wizard currently under development to embed metadata in their HTML.

Where items are not owned within the education system, EdNA will still recognise basic Dublin Core metadata. The entry of items directly into the database via the bulk uploads or the administration system creates metadata which is not stored in the original documents.

5. Granularity of the description:

Resources in EdNA are indexed at the individual URL level.

6. Dublin Core strengths and weaknesses encountered in deployment

The use of Dublin Core has saved EdNA from having to create a metadata system from scratch and provides a range of additional benefits:

7. Which DC fields are used/not used?

The following DC fields are not actively supported in the EdNA Metadata Standard, however if users create them they will be stored in the EdNA Database and will be searchable:

DC.relation

DC.contributor

DC.source

8. Have additional local fields been added?

The following additional fields are defined:

EDNA.entered: Date item was entered (used for management purposes)

EDNA.approver: Email of person or organisation approving the item for inclusion in EdNA.

EDNA.reassessment: Number of months until resource should be reassessed.

EDNA.userlevel: Typical level of user for which the content would be most appropriate.

EDNA.categories: Numbers representing categories in the EdNA Directory in which the resource is either suggested for inclusion, or in the case of approved organisations, is automatically included.

EDNA.indexlevel: How many levels should the EdNA search engine follow links from this page?

EDNA.indexsites: When the EdNA search engine follows links from this page (as controlled by the INDEXLEVEL field), how many web servers are to be accessed?

EDNA.review: A third party review of the resource.

NB: All these fields are metadata in the sense that they are information about resources. However some of these fields are generated by the management of resources within EdNA. Depending on the nature of the field and future policy decisions:

9. Which qualifiers are used/needed? At what level of detail?

It is intended to support the language qualifier but no work has been done on this to date. See other sections and the EdNA Metadata Standard for details of schemes supported.

10. Do you provide any subject description support?

Not at the moment but negotiations are underway with the owners of thesaurus used in education and training in Australia to make documentation of these available online to support the cataloguing of resources in EdNA.

D. Tools

1. Conversion tools

None are currently implemented but details of web resources are currently being imported into EdNA from a range of other databases ("bulk upload"). At the moment this includes just title and description plus EdNA categories. Future versions may include other metadata.

2. Metadata creator tools

A "metadata Wizard" is currently under development to assist users to generate EdNA metadata for inclusion in web pages they own. It will be announced on the meta2 list when it is available.

3. Automatic metadata extraction tools

Details not available.

4. Discovery tools (directory/browsing type)

None.

5. Indexing and search tools

Not developed yet.

6. Other tools

None.

7. Help system, tutorials etc.

None yet except for instructions in the EdNA Metadata Standard.

8. Is the metadata indexed, browsable, searchable, through a local or external service?

The ways in which Dublin Core and EdNA metadata will be used in searching is still being defined. The field EDNA.categories will be used to allocate items to the EdNA Directory allowing browsing of items.

E. Experiences from the planning and production phases

1. Problems and accomplishments from both phases

2. Estimates of the costs and benefits for the creation or use of metadata

No cost benefit analysis has been conducted.

3. Expectation of service improvement through metadata usage

As EdNA is a decentralised system where resources are identified and managed by a wide range of organisations, the use of metadata created and maintained in a decentralised ways is seen as the only practical way of cataloguing resources.


Disclaimer

This brief description of EdNA and it use of metadata was developed by Jack Gilding specifically for the benefit of attendees at the DC5 workshop in Helsinki. While other stakeholders in the EdNA project have been requested to comment, due to time constraints in this process it should not be regarded as an 'official' document.

URL: http://www.otfe.vic.gov.au/edna/dc5edna.htm

Last updated: 2 October 1997.
Contact: Jack Gilding j.gilding@c031.aone.net.au


27. Netpublikationer

Danish Government Publications on WWW

Reported by: Ulrik Andersen, ua@si.dk, Danish State Information Service
Date: October 3, 1997


In Denmark we have implemented most of the Dublin Core metadata elements in the Government WWW-publications published from 1997.

Background

In order to make public information material accessible on WWW and to make the distribution of public information material more efficient, all new publications issued by Danish ministries, government offices and agencies shall be published on WWW, in parallel with the printed editions, from 1997.

A common format including metadata

To insure a common format for the WWW-publications, The Danish Ministry of Research and Information Technology and The Danish State Information Service have worked out a standard describing how the publications shall be encoded. Technically, the standard is based on the official HTML 3.2 specifications.

The standard, called Netpublikationer, contains a series of requirements and recommendations. The standard is available at http://www.fsk.dk/fsk/publ/online-pub/.

One of the requirements is that the publications shall contain metadata of a common scheme. This scheme is based on the Dublin Core elements, but is extended in order to insure the additional metainformation required in the database in The Danish State Information Service. The database contains information about all Danish government publications - in print and on WWW.

Dublin Core elements

The DC elements we have implemented are:

1. Title
2. Creator
3. Subject
4. Description
5. Publisher
6. Contributors
7. Date
9. Format
10. Identifier
11. Source
12. Language
13. Relation

The metadata scheme in the current version (vers. 1.0) of Netpublikationer does not fully follow the latest DC-recommendations. We will in future versions of Netpublikationer change the recommendations and requirements regarding metadata in accordance with the development of Dublin Core.

Netpublikationer is not available in English. But chapter 4.2 gives an impression of the metadata scheme and syntax used.

Tools

In order to make it as easy - and cheap - as possible for the civil servants and the web-companies to produce WWW-publications, we are developing an application (freeware), which can convert "raw" and incorrect HTML-documents into documents, that are formatted correct. This conversion includes the creating of a correct set of metadata. The software is expected to be released in November 1997.

Additional information

Netpublikationer is a part of the overall government policy on the use of information technology in modern society. These matters can be studied in a number of publications (in English) at The Danish Ministry of Researchand Information Technology.


The PANDORA Project

Reported by Bemal Rajapatirana

For more information, see: http://www.nla.gov.au/policy/pandje97.html.

The National Library of Australia (NLA) is developing an electronic archive called PANDORA to provide long term access to significant Australian online publications. A dedicated online archive will be established in the first instance. Publications contained in it will be updated on an ongoing basis and converted to new formats as technology demands.

The PANDORA project has, as its main objective, to:

capture, archive and provide long term access to significant Australian online publications selected for national preservation.

As part of this the PANDORA project will implement a system of describing archived documents based on the Dublin Core attributes, to make online searching for information more efficient.

There are four prongs in the PANDORA project's approach to the diverse subject of metadata that will offer long-term, integrated searching across all of the Library's search systems.

  1. Tools for encouraging publishers to generate Dublin Core metadata have been tested and it has been found that metadata forms (similar to the Nordic template) are preferable to the automatic generation of Dublin Core metadata by tools such as 'DC.dot'

    The PANDORA project plans to encourage publishers to provide Dublin Core metadata, either with or within their publications.

  2. The PANDORA project is currently in the process of considering screen designs and workflows for the archive management facility. As part of this process, online help text is being drafted, which will have two flow-on effects:
    • each data element will be defined according to ISO standard 11179
    • clear definitions of the metadata will be created to expidite its use in a national archive model context

  3. The ISO 11179 standard covers "Information technology - Specification and standardization of data elements' and contains guidelines for the description of data elements in a Logical Data Model. In part 6, the standard provides a framework for the registration and management of data elements when they are used by a broad group of managers. For example, in a national archive model context, if all State libraries were to use the same metadata as the PANDORA archive, then changes to any one data element could be managed through the infrastructure provided by Part 6.

    For more information on this approach see http://www.ariadne.ac.uk/issue11/metadata/, which gives examples of metadata registries that might be affected by the use of ISO11179.

  4. PANDORA-Dublin Core crosswalk. In the Logical Data Model for the PANDORA archive database, a cross-walk between the archive and the Dublin Core elements has been provided. When provision is made to utilise pure Dublin Core elements for searching, PANDORA will be poised to take advantage of this development, either by extracting the Elements and placing them in a metadata repository, or by utilising the Dublin Core elements embedded in the publications by publishers.

For further information contact:

Debbie Campbell
PANDORA project officer
National Library of Australia
PARKES ACT 2600

phone: 06 262 1622
fax: 06 257 1703
dcampbel@nla.gov.au


29. Scout Report Signpost

A. Project Description:

Project Name: Scout Report Signpost

Institutional Affiliation: Internet Scout Project/UW-Madison Computer Sciences Department

Contact Person (persons) and email:
Amy Tracy Wells (awells@cs.wisc.edu)
Aimee Glassel (aglassel@cs.wisc.edu)

URL(s) if publically available: http://www.signpost.org/signpost/index.html

B. Information System Summary:

  1. Domain of Content:

  2. Projected Collection Size (number of records):
    • First year: 2700 records
    • Third year: (still in first year)

  3. Type of data, formats:
    • Animation/Video
    • Audio
    • Bibliography
    • Chart/Table
    • Conference/Solicitation
    • Database
    • Dictionary/Encyclopedia
    • Directory
    • Document
    • Educational Materials
    • FAQ
    • Graphics
    • Journal/Newspaper
    • Library Catalog
    • Mailing List/Newsgroup
    • Meta-site
    • Software

C. Metadata:

  1. What are the intended functions of the metadata used?

      x__ Management of the collection
      x__ Enhanced searchability
      x__ Interoperability
      x__ Value-added services
      x__ Other:____________________________________________

  2. Who is creating the metadata? What training and support are provided?
    Professional catalogers who hold Master's degrees in Library and Information Science. The Library of Congress' Subject Cataloging Manual and Classification Plus are the primary tools along with AACR2.

  3. Which metadata scheme is used?
    A modified version of the Dublin Core.

  4. Encoding strategy:

      x__ external database
      __ auxiliary HTML files (not embedded in resource)
      __ metadata embedded in resource

  5. Granularity of the description:

      x__ individual documents
      x__ groups of documents
      x__ collections

  6. Dublin Core strengths and weaknesses encountered in deployment.
    The literature surrounding the DC has appeared to indicate different uses for the "language" field and there appears to have also been a shift from "Object Type" to "Resource Type" which was not well documented. In general, while there has been much written on the DC, perhaps changes to the Core could be explicitly documented. The use of "Creator" and "Contributor" seemed less concrete (although conceptually they are well-founded) than perhaps the average end-user might intuitively grasp. "Resource Type", of course, requires development and concensus. "Date" is perhaps the most difficult field, not on a syntactic level, rather because it is dynamic and also difficult to define.

  7. Which DC fields are used/not used?
    Used: Title (Site Title and Alternate Title), Creator (Author), Contributor, Publisher, Resource Type, Subject, (Library of Congress Subject Headings and Library of Congress Classifications), Description (Summary), Language, Date (not viewable to users), and Identifier.
    Not used: Format, Source, Relation, Coverage and Rights Management.

  8. Have additional local fields been added?
    Yes, we have additional local fields which include Alternate Title, Date URL Last Verified, LC Classification and Resource Location which we use to identify a source's domain. We repeat certain fields such as Subject (Library of Congress Classification as many as five times and LC Classification as many as two times for a given record.)

  9. Which qualifiers are used/needed?, at what level of detail?

  10. Do you provide any subject description support (uncontrolled term lists, thesauri, classification systems)?
    Up to five Library of Congress Subject Headings are applied and up to two Library of Congress Classes are assigned.

D. Tools (provide URLs if freely available):

  1. Conversion tools (eg., MARC -> DC)
    N/A.

  2. Metadata creator tools
    No.

  3. Automatic metadata extraction tools
    No.

  4. Discovery tools (directory/browsing type)
    Scout Reports (http://scout.cs.wisc.edu/scout/report/index.html)

  5. Indexing and search tools

  6. Other tools
    See C2 above.

  7. Help system, tutorials etc.
    Signpost Help!

  8. Is the metadata indexed, browsable, searchable, through a local or external service?
    Indexed, browseable and searchable at Signpost ( http://www.signpost.org/signpost/)

E. Experiences from the planning and production phases:
  1. Describe problems and accomplishments from both phases
    For problems see C6 above.
    Accomplishments include the creation of a non-MARC system to effectively catalog the contents of the Scout Report and subsequent subject-specific Scout Reports all of which are Internet-based resources.

  2. Do you estimates of the costs and benefits for the creation or use of metadata?

  3. What measures (or expectations) of service improvement through metadata usage do you have?
    We expect (and hope!) that we can aid the U.S. higher education community in locating effective internet-based publications, tools and etc. to enhance their research.

    For more information, please contact us:
    Amy Tracy Wells (awells@cs.wisc.edu
    Aimee Glassel (aglassel@cs.wisc.edu)


GEM

Reported by: Nancy A. Morgan, GEM Coordinator
September 18, 1997

A. Project Description: The Gateway to Educational Materials (GEM) is an initiative of the U.S. Department of Education and National Library of Education (NLE). NLE's goal is to improve the organization and accessibility of the substantial, but uncataloged, collections of educational materials which are already available on various federal, state, university, non-profit, and commercial Internet sites. In general, these valuable resources are difficult for most teachers to find in an efficient, effective manner. The goal of the Gateway to Educational Materials (GEM) project is to solve this resource discovery problem. For more information, see Proposal: DC-5 GEM Project Presentation

Project Name: Gateway to Educational Materials (GEM)

Institutional Affiliation: U.S. Department of Education, ERIC Clearinghouse on Information & Technology

Contact Person (persons) and email: Nancy A. Morgan, GEM Coordinator
nmorgan@ericir.syr.edu

URL(s) if publically available: http://geminfo.org/

B. Information System Summary:

1. Domain of Content: educational materials

2. Projected Collection Size (number of records)
First year: 10,000
Third year: 50,000

3. Type of data, formats: HTML files
Resource Types

C. Metadata:

1. What are the intended functions of the metadata used?

_X_ Management of the collection
_X_ Enhanced searchability
_X_ Interoperability
_X_ Value-added services
__ Other: ____________________________________________

2. Who is creating the metadata? A consortium of collection holders of educational materials, located on federal, state, university, non-profit, and commercial Internet sites.

What training and support are provided? Cataloging instructions, online help, listservs and technical support. The first group training session was help on September 18, 1997 in Washington, D.C.

3. Which metadata scheme is used?

4. Encoding strategy:

__ external database
_X_ auxiliary HTML files (not embedded in resource)
_X_ metadata embedded in resource

5. Granularity of the description:

_X_ individual documents
__ groups of documents
__ collections

6. Dublin Core strengths and weaknesses encountered in deployment. One "weakness" was the specificity of domain. GEM added 8 elements to the Dublin Core set and developed controlled vocabulary to improve description of education resources. One strength was that we were able to map to a large database of Marc records at Eisenhower National Clearinghouse.

7. Which DC fields are used/not used? All the DC fields were used.

8. Have additional local fields been added? Yes, 8 fields were added to better describe educational resources.

9. Which qualifiers are used/needed?, at what level of detail?

10. Do you provide any subject description support (uncontrolled term lists, thesauri, classification systems)? Subject Controlled Vocabulary
See also Cataloging Instructions: Subject for use of the NICEM and ERIC Thesauri.

D. Tools (provide URLs if freely available):

1. Conversion tools (eg., MARC -> DC)

2. Metadata creator toolsGEMCat: a user-friendly, stand-alone program for cataloging Internet resources using the GEM element set and controlled vocabularies. Current version is for Windows 95 and Windows NT platforms. Under development is platform-independent web-based version of GEMCat.

3. Automatic metadata extraction tools

4. Discovery tools (directory/browsing type) Prototype search engines: PLWeb (Full-Text/Fielded Search Engine) and a Relational Database Management System Interface

5. Indexing and search tools Prototype search engines: PLWeb (Full-Text/Fielded Search Engine) and a Relational Database Management System Interface

6. Other tools: Metadata Harvesting Script: extracts GEM metadata from html-tagged documents located on UNIX servers. The harvest module is implemented in Perl. Currently under development is a cross-platform version.

7. Help system, tutorials etc. Cataloging instructions, online help, listservs and technical support. The first group training session was help on September 18, 1997 in Washington, D.C.

8. Is the metadata indexed, browsable, searchable, through a local or external service? Yes. See prototypes at: Prototype search engines: PLWeb (Full-Text/Fielded Search Engine) and a Relational Database Management System Interface. We envision competing interfaces to local and union catalogs as consortium members implement GEM.

E. Experiences from the planning and production phases:

1. Describe problems and accomplishments from both phases

2. Do you estimates of the costs and benefits for the creation or use of metadata?

3. What measures (or expectations) of service improvement through metadata usage do you have?


WWW.NIC.FUNET.FI Metadata interface

A. Project Description:

Project Name: WWW.NIC.FUNET.FI Metadata interface

Institutional Affiliation: Center for Scientific Computing/Finnish University and Research Network

Contact Person (persons) and email: Harri K. Salminen
hks@nic.funet.fi

URL(s) if publically available: http://www.nic.funet.fi/


B. Information System Summary:

1. Domain of Content

All kinds of freely distributable objects in the NIC.FUNET.FI archive.

2. Projected Collection Size (number of records)

First year:
50 000

Third year:
Over million

3. Type of data, formats

Major part of the collection consists currently of archived computer programs and text documents but the amount of all kinds of multimedia objects is growing. Formats include tar.gz, .zip, .exe, .sea, .hqx, .ps,.txt, .html, .gif, .jpeg, .mpeg, rtp etc. Most of them are listed in the ftp://ftp.funet.fi/README.FILETYPES


C. Metadata:

1. What are the intended functions of the metadata used?

2. Who is creating the metadata? What training and support are provided?

Mostly volunteer archive admins and software authors in various fields as well as the CSC staff. Tools for semiautomatic creation and conversion of metadata are being developed so that for example author provided Linux Software Maps can be utilized.

3. Which metadata scheme is used?

Dublin Core with possible local extensions

4. Encoding strategy:

The latter method will be used mostly for HTML pages maintained outside the metadata directory database.

5. Granularity of the description:

The goal is to describe individual objects but since we have over million files having metadata for groups and collections (directories and directory trees) is an important as well.

6. Dublin Core strengths and weaknesses encountered in deployment

DC is flexible and allows for multiple instances and many qualifiers but doesn't easily support hierarchical definitions and inheritance so information needs to be duplicated or described outside the DC format. E.g. We have currently implemented directory listings in another local format.

The format doesn't have currently support for special fields needed for file and multimedia objects like contents of an container (.zip, .tar, .rtp stream etc.), file size, requirements like os or program needed (mime type isn't enough), to use the file and amount of memory and disk space used after unpacked. For real time files there's need for the length in time, exact format descriptions (e.g. sdp) etc. possibly a url to external format dependent description might be enough.

These belong to the currently unresolved issues.

7. Which DC fields are used/not used?

We allow the use of all predefined DC fields

8. have additional local fields been added?

Not yet in DC. Our internal metadata format supports in additition a html version of the qualifiers that allows for images etc. and also the date and source of the last update for each field since there can be multiple sources for the same field.

9. Which qualifiers are used/needed?, at what level of detail?

In DC.subject, we plan to use scheme=foldoc (or x-foldoc) and we see the need for most of the other qualifiers as well as some new ones (see question 6). without qualifiers the dc format would be nearly useless to us.

10. Do you provide any subject description support (uncontrolled term lists, thesauri, classification systems)?

We have just started to provide the free online computing dictionary (foldoc) as a free controlled vocabulary suited to describe computer programs. we also have a taxonomical tree used currently to describe butterflies, plants etc. that is extended as needed. it has it's own www interface that is not yet dc compatible however.

If we can get from somewhere suitable free classification systems that suit our needs we'd like to offer that in our user interface. most seem to be copyrighted and without uptodate and granular enough classification for our needs.

Currently the only feasible way to classify computer files seems to be inventing your own classification scheme or just use such ad hoc hieararchies as is used in the directory trees.


D. Tools (provide URLs if freely available):

1. Conversion tools (eg., marc -> dc)

lsm -> dc being developed

2. Metadata creator tools

Web form is to be done. Some experiments with the DC creator have been done. Possibly a simple linemode tool is needed as well.

3. Automatic metadata extraction tools

Local perl programs

4. Discovery tools (directory/browsing type)

nicdir module for the Apache server that produces directory listings and also metadata descriptions for the local files. It uses an internal metadb shadow tree with internal metadata formats that contain the DC information as well as data needed for efficiently creating the HTML output with metadata on the fly in various formats. It requires the use of our local ls command and ftp server as well. The old non meta ftpd and ls are available at ftp://ftp.funet.fi/pub/unix/local/ and we foresee publishing the metadata versions in the future as well. It has been implemented for Digital Unix 4.0.

5. Indexing and search tools

We currently use local locate and archie databases but for indexing the metadata we plan to use the Nordic Web Index server available in house (http://nwi.funet.fi).

6. Other tools

A tool to create various listings and a public locate database from the output of our special ls command. A need for producing timesorted lists of URLs recommended for indexing by robots is foreseen. Current robots will do lots of unnecessary work while trying to browse the multiple views to the same information or they will just give up...

7. Help system, tutorials etc.

Some documentation on the design, formats and files used.

Rest is mostly documented as C and perl source code. Not yet published, but will be with the source code.

8. Is the metadata indexed, browsable, searchable, through a local or external service?

Soon, at www.nic.funet.fi (current test version in port 8888).


E. Experiences from the planning and production phases:

1. Describe problems and accomplishments from both phases

There was multiple versions of the DC format online and it was sometimes difficult to find out if documents referred to the current or some other proposed format. Also the current DC seems to concentrate on describing traditional literary resources and HTML but not other kinds of media like computer programs and multimedia objects. Even the BibTeX derived type list didn't have relevant types for many objects but I hope it's just temporary. Most of the DC fields seem to be usable in our context although it needs some guessing which one to use. E.g. how to best describe the mirrors and mirroring frequency of files or multiple versions of the same object. We also haven't yet seen any freely available tools like DC parsers, checkers and converters which might speed the implementation.

2. Do you have estimates of the costs and benefits for the creation or use of metadata?

We don't have cost estimates since most of it will be created part time by volunteers, authors and programs.

Main benefit should be easier searching and browsing of the over million files stored in the archive. Older methods allowed searches only on filenames which aren't in many cases descriptive enough to find the relevant files.

3. what measures (or expectations) of service improvement through metadata usage do you have?

We expect it to make finding the relevant files from our archive easier for the non-experienced WWW user who doesn't know the the right directory or file to look at making it less necessary to go overseas looking for the same files. System should also help in looking at different locations of the same files since we have large number of mirrors in addition to our local directory hierarchy we have multiple possible directory trees for the same category of files, bit like having multiple sublibraries inside a library with different classification system in each.

A common metadata format should make it also easier to offer tools and motivation for maintaining the metadata. Also people might not need to download files to just figure out if they really want it or not making the usage more efficient and users happier.


Dublin Core Metadata Use in Environment Australia

A. Project Description:

Institutional Affiliation: Environment Australia

Contact Person (persons) and email: Arthur D. Chapman

The project description is available at: http://www.environment.gov.au/www-standards/EA_DC.html
arthur@erin.gov.au