1. SSG-Fachinformation (SSG-FI) Geowissenschaften
2. SSG-Fachinformation (SSG-FI) Mathematik
3. The National Library of the Netherlands
4. DSTC
5. ADAM & VADS
6. Swedish Environet
7. The Nordic Metadata Project
8. AHDS Arts & Humanities Data Service
9. German Educational Resources Server / Deutscher Bildungs-Server
10. Florida International University Digital Library
11. SCRAN
12. BIBLINK
13. NewsAgent for Libraries
14. Medical Metadata Project
15. University of Washington Digital Library
16. Math-Net
17. The Victoria and Albert Museum
18. Activities in the Field of Electronic Information Management and Meta-Data in Physics
19. Australian Geodynamics Cooperative Research Centre
20. Interconnect Technologies Corp. projects
21. University of Michigan Digital Library Registry Database
22. INDOREG: INternet DOcument REGistration
23. Metadata Project of State Library of Queensland
24. MedExplore & Dublin Core generation
25. UC Berkeley Digital Library Catalog
26. EdNA
27. Netpublikationer
28. Pandora
29. Scout Report Signpost
30. GEM
31. WWW.NIC.FUNET.FI Metadata interface
32. Dublin Core Metadata Use in Environment Australia
1. SSG-Fachinformation (SSG-FI) Geowissenschaften
Project Name:
SSG-Fachinformation (SSG-FI) Geowissenschaften = Special Subject Collection-Subject
Area Information for Earth Sciences (Geology, Mineralogy, Petrology, Soil
Science, Geography, Geophysics, Thematic Maps)
Institutional Affiliation:
Niedersächsische Staats- und Universitätsbiblithek (SUB) Göttingen
= Lower Saxony State and University Library Goettingen
Contact Person (persons) and email:
Dr. Heike Neuroth, neuroth@mail.sub.uni-goettingen.de
URL(s) if publicly available:
http://www.sub.uni-goettingen.de/ssgfi/ (scheduled to be available in October,
english version of information texts subsequently)
1. Domain of Content
Listing and evaluation of information related to Earth Sciences:
1.1 Internet servers, e.g. servers of Earth Sciences societies, electronic
journals, department and preprint servers
1.2 CD-ROMs related to Earth Sciences
1.3 Reference books
2. Projected Collection Size (number of records)
First year: 600
Third year: 5000
3. Type of data, formats
HTML documents with meta tags
1. What are the intended functions of the metadata used?
__ Management of the collection
X Enhanced searchability
X Interoperability
X Value-added services
2. Who is creating the metadata? What training and support are provided?
Metadata are collected by assistant graduate student of geology under supervision
of an earth scientists; the students are given a thorough introduction
into the subject matter.
3. Which metadata scheme is used?
Metadata consist of elements of Dublin Core Data using type and scheme
extensions, plus a set of metadata specifically defined for this project
to describe characteristics of servers as opposed to electronic
documents, related e.g. to the size and statistics of the server.
4. Encoding strategy:
__ external database
__ auxiliary HTML files (not embedded in resource)
X metadata embedded in resource
5. Granularity of the description:
__ individual documents
X groups of documents
X collections
6. Dublin Core strengths and weaknesses encountered in deployment
7. Which DC fields are used/not used?
All fields are used. dc.Rights is used to describe the availability of
documents.
8. Have additional local fields been added?
Yes: "ssg.Timestamp, ssg.Formal key, ssg.Evaluation;type=relevance
of contents, ssg.Evaluation;type=clarity, ssg.Evaluation;type=indexing,
ssg.Evaluation;type=relevant links, ssg.Evaluation;type=level, ssg.Size,
ssg.Stat_size, ssg.Backlinks, ssg.Country_code, ssg.Notes, ssg.Linklisten,
9. Which qualifiers are used/needed?, at what level of detail?
As seen from the above, we use qualifiers as a second level of categorization
in our internal definition, similarly with type and scheme qualifiers for
the Dublin Core set. E.g. we use the standard Goettingen Online Classification
(GOK), Dewey Decimal Classification (DDC) and Basis Classification des
GBV Germany (BK) as a scheme under dc.Subject.
10. Do you provide any subject description support
(uncontrolled term lists, thesauri, classification systems)?
We try to use standard classification systems (e.g. GOK, DDC, BK, ISO 639
for languages and a coding of server types based upon PICA+).
1. Conversion tools (e.g., MARC -> DC)
2. Metadata creator tools
3. Automatic metadata extraction tools
4. Discovery tools (directory/browsing type)
5. Indexing and search tools
6. Other tools
7. Help system, tutorials etc.
8. Is the metadata indexed, browsable, searchable, through a local or external
service?
All of this is not really applicable, since we use a database system (Allegro
15 ) to collect and organize data and an Allegro based server
(Avanti ) to create and deliver HTML-documents (in the final stage).
For now, static HTML documents with metatags are created from the database
and indexed and searched locally using Wais.
1. Describe problems and accomplishments from both phases
The basic problem in the planning stage was the steady evolution of the
metadata discussion, providing a somewhat shaky ground for fixing the details
of the definition of the database - again, we are not really up to date
with everything now, but think we can produce any syntax for the metatags
using the database. Having fixed the metadata, collecting data showed some
shortcomings of the first definition, so minor modifications and slight
shifts in the meaning of (essentially self defined) tags occurred.
2. Do you estimates of the costs and benefits for the creation or use of
metadata?
Since indexing is the goal of our project, standardisation of metadata
is crucial for the usefulness of the project Since data are newly collected,
there is no extra cost for us using any particular categorizing scheme,
provided its operability.
3. What measures (or expectations) of service improvement through metadata
usage do you have?
Basically we hope for the general usage of metadata by internet search
engines, so the data we produce become more easily available for people
outside the project (and our web site). Internally we could provide essentially
identical services without metadata tags (the information in the tags is
mirrored in the document itself), but we found the restricted set of Dublin
Core categories useful for our porposes.
2. SSG-Fachinformation (SSG-FI) Mathematik
Project Name:
SSG-Fachinformation (SSG-FI) Mathematik = Special Subject Collection-Subject
Area Information for Mathematics
Institutional Affiliation:
Niedersächsische Staats- und Universitätsbiblithek (SUB) Göttingen
= Lower Saxony State and University Library Goettingen
Contact Person (persons) and email:
Dr. Thomas Fischer, fischer@mail.sub.uni-goettingen.de
URL(s) if publicly available:
http://www.sub.uni-goettingen.de/ssgfi/ (scheduled to be available in October,
english version of information texts subsequently)
1. Domain of Content
Listing and evaluation of information related to Mathematics:
1.1 Internet servers, e.g. servers of Mathematics societies, electronic
journals, department and preprint servers
1.2 CD-ROMs related to Mathematics
1.3 Reference books
2. Projected Collection Size (number of records)
First year: 800
Third year: 1500
3. Type of data, formats
HTML documents with meta tags
1. What are the intended functions of the metadata used?
__ Management of the collection
X Enhanced searchability
X Interoperability
X Value-added services
2. Who is creating the metadata? What training and support are provided?
Metadata are collected by assistant graduate mathematics students under
supervision of a mathematician; the students are given a thorough introduction
into the subject matter.
3. Which metadata scheme is used?
Metadata consist of elements of Dublin Core Data using type and scheme
extensions, plus a set of metadata specifically defined for this project
to describe characteristics of servers as opposed to electronic
documents, related e.g. to the size and statistics of the server.
4. Encoding strategy:
__ external database
__ auxiliary HTML files (not embedded in resource)
X metadata embedded in resource
5. Granularity of the description:
__ individual documents
X groups of documents
X collections
6. Dublin Core strengths and weaknesses encountered in deployment
7. Which DC fields are used/not used?
All fields are used. dc.Rights is used to describe the availability of
documents.
8. Have additional local fields been added?
Yes: "ssg.Timestamp, ssg.Formal key, ssg.Evaluation;type=relevance
of contents, ssg.Evaluation;type=clarity, ssg.Evaluation;type=indexing,
ssg.Evaluation;type=relevant links, ssg.Evaluation;type=level, ssg.Size,
ssg.Stat_size, ssg.Backlinks, ssg.Country_code, ssg.Notes, ssg.Linklisten,
9. Which qualifiers are used/needed?, at what level of detail?
As seen from the above, we use qualifiers as a second level of categorization
in our internal definition, similarly with type and scheme qualifiers for
the Dublin Core set. E.g. we use the standard Mathematics Subject Classification
1991 as a scheme under dc.Subject.
10. Do you provide any subject description support
(uncontrolled term lists, thesauri, classification systems)?
We try to use standard classification systems (e.g. MSC 91 for Mathematics,
ISO 639 for languages and a coding of server types based upon PICA+.
1. Conversion tools (e.g., MARC -> DC)
2. Metadata creator tools
3. Automatic metadata extraction tools
4. Discovery tools (directory/browsing type)
5. Indexing and search tools
6. Other tools
7. Help system, tutorials etc.
8. Is the metadata indexed, browsable, searchable, through a local or external
service?
All of this is not really applicable, since we use a database system (Allegro
15 ) to collect and organize data and an Allegro based server
(Avanti ) to create and deliver HTML-documents (in the final stage).
For now, static HTML documents with metatags are created from the database
and indexed and searched locally using Wais.
1. Describe problems and accomplishments from both phases
The basic problem in the planning stage was the steady evolution of the
metadata discussion, providing a somewhat shaky ground for fixing the details
of the definition of the database - again, we are not really up to date
with everything now, but think we can produce any syntax for the metatags
using the database. Having fixed the metadata, collecting data showed some
shortcomings of the first definition, so minor modifications and slight
shifts in the meaning of (essentially self defined) tags occurred.
2. Do you estimates of the costs and benefits for the creation or use of
metadata?
Since indexing is the goal of our project, standardisation of metadata
is crucial for the usefulness of the project Since data are newly collected,
there is no extra cost for us using any particular categorizing scheme,
provided its operability.
3. What measures (or expectations) of service improvement through metadata
usage do you have?
Basically we hope for the general usage of metadata by internet search
engines, so the data we produce become more easily available for people
outside the project (and our web site). Internally we could provide essentially
identical services without metadata tags (the information in the tags is
mirrored in the document itself), but we found the restricted set of Dublin
Core categories useful for our purposes.
3. The National Library of the Netherlands
Reported by Titia van der Werf.
The National Library of the Netherlands (Koninklijke Bibliotheek) is in the process of developing a new version of its Web-information service - with a new layout, new functionality features and with DC-metadata elements incorporated in the HTML-pages. The HTML-pages in the test-version carry standard DC. metadata tags like DC.publisher, DC.rights and DC.language. Take a look at: http://www.konbib.nl:8000/ and view the frame source of http://www.konbib.nl:8000/bex-fe.html for example (these are our guidelines for ILL).
When the final version will be up and running this fall we will have trained our information providers (at the different library departments) to supply the metadata elements Title, Author, Date, etc... for all new documents they submit to our Editorial Board.
We will only need to add those elements for the documents that are alreadyin the service (a retrofit :-)).
Finally we plan to install an indexer which will be able to recognise these metadata elements.
Our library is also involved in several projects which are seeking to deploy DC Metadata :
Publishers (in particular grey Internet publishers are interested) will be involved during the demonstration stage of the project (November 1997-March 1999).
The BIBLINK project in a way tries to initiate the process of educating new electronic publishers (on the Internet) to generate trustworthy metadata and incorporate it in their publications by means of a standard "electronic title page". A DC-generator could be a handy tool to promote this.
It is not certain yet in what way the DC metadata set could be most effectively deployed in this context. We are planning to look into PICS as a mechanism to search for subject indexed material.
URL: http://www.nic.surfnet.nl/surfnet/projects/desire/desire.html
Note that BIBLINK metadata = formal bibliographic descriptions and DESIRE SBIG metadata = subject indexing, classification
which are basically two different processes ... in other words different Internet technologies could be better suited for each process. Formal description elements can be re-used and remain valid for all user-groups, subject information on the contrary is particular to specific target groups and the same resource may receive several different subject codes. Mixing these two types of metadata in one format may prove tricky.
In each case special attention is needed to record WHO (responsible person or institution ...) has assigned a subject code to a particular resource - in order to enhance the trustworthiness of the subject code assigned.
4. DSTC
Report by Renato Iannella
The DSTC is participating in the W3C Resource Description Framework (RDF) Working Group. RDF is the result of a number of metadata communities (including Dublin Core, PICS, Digital Signatures) bringing together their needs to provide a robust and flexible architecture for supporting metadata on the Internet and WWW. RDF will use the new XML as its main carrier syntax.
The DSTC plans to develop the following:
5. ADAM & VADS
The Art, Design, Architecture & Media
Information Gateway (ADAM) and the Visual
Arts Data Service (VADS) are two services that aim to provide the
UK Higher Education community with fast, reliable access to high-quality
networked resources in the visual arts, and to promote the use of standards
of best practice through example and outreach.
A coherent service is being developed through shared objectives, a shared commitment to standards, and a common information system that is being designed from the outset to interoperate with a broad range of information protocols and resource description schemes.
The Dublin Core Metadata Element Set is seen as a strategically significant tool for enhancing the discovery and retrieval of networked resources, and will play a crucial role in the development of the two services; in addition to providing descriptive information within static web pages, DC will form the basis for the common cross-domain resource discovery element set for the AHDS' distributed catalogue.
In common with the other four AHDS Service Providers, the Visual Arts Data Service recently held a domain-specific resource discovery workshop in order to identify the suitability of Dublin Core metadata for the visual arts, museums and cultural heritage community. This process resulted in a survey of domain-specific information standards, a detailed workshop report and a contribution to the synthesized summary of the series as a whole, in addition to raising the awareness of Dublin Core.
ADAM is part of the Electronic Libraries Programme, whereas VADS is an agency of the Arts & Humanities Data Service; both are funded by the Joint Information Systems Committee.
6. Swedish Environet
Reported by Stig Hammarsten
Background
The decision to build the Swedish EnviroNet was taken by the Swedish government in December 1996. The building of the EnviroNet was one of several proposals in an Official Report on "Information Technology in Environmental Management" from the Environmental Council. The constitution of the Project Board and Project Organisation was decided by the Director General of the Environmental Protection Agency in February 1997. At present, this short summary of the project is the only documentation available in English. During the development phase most information will only be published in Swedish. However, from December 1997 the EnviroNet will also provide services in English.
Mission
The Swedish EnviroNet shall become the main www-gateway to electronic data and information on the Swedish environment. The EnviroNet shall provide easy access to information with high quality content.
Members
The EnviroNet will organise all major public agencies, NGO's and private companies in the environmental field. The information is published at the web sites of the members and the EnviroNet server will provide links, metadata and other general services.
Only member web sites will be included in the catalogues and through the search engine accessible at the EnviroNet.
Users
The most important user group for the EnviroNet are professionals working with environmental issues in Sweden. But the EnviroNet will also fulfil the needs of wider user groups, such as: environmental activists, students and teacher, politicians and the press.
Services
Catalogues and metadata
The EnviroNet will provide links to electronic documents (in HTML-format). The document can contain information or point to other sources that contain information or data. The document can also contain addresses to institutions. The contents of the documents will be described using a subset of the Dublin Core (DC) metadata standard. The type element will be used to describe the Resources referred to, i.e thesis, review, database, advertisement etc.
The EnviroNet will provide a classification scheme for the subject and type elements in DC. The classification scheme for subject must have terms that correspond to all the branches in the catalogues.
Users will be able to search information using several different catalogues organised in the following way:
Technical platform
Time table
7. The Nordic Metadata Project
Contact Person (persons) and email:
Juha Hakala, juha.hakala@helsinki.fi
URL(s) if publicly available:
http://www.lib.helsinki.fi/meta
1. Domain of Content
The aim of the project is creation of general purpose tools that support creation, harvesting, indexing and searching
of Dublin Core -based metadata. These tools can be used in any subject area.
2. Projected Collection Size (number of records)
Current status: 80.000 records in Danish and Swedish metadata databases. Of these records,
only a handful are Dublin Core.
Status in Spring 1998: more than 100.000 records, after creation of Finnish and Norwegian
Metadata bases
3. Type of data, formats
HTML documents with meta tags
1. What are the intended functions of the metadata used?
__ Management of the collection
X Enhanced searchability
X Interoperability
__ Value-added services
2. Who is creating the metadata? What training and support are provided?
Within Nordic Metadata Dublin Core metadata is provided mainly by individuals - librarians, university
researchers, network specialists - who want to try metadata creation. But already our tools are
used by other, subject spesific projects like Swedish Environet or general projects like INDOREG.
A set of user guides have been written in order to make metadata creation easier. These guides have been written for Scandinavian audience; early experiences by metadata providers have made it clear that professionals (librarians) need more detailed guidelines, which take into account national differences between Nordic countries.
3. Which metadata scheme is used?
Metadata consist of all elements of Dublin Core Data using type and scheme
qualifiers. Some of them were added by us in order to make Dublin Core
fit better Nordic needs (for instance, a few popular Nordic classification systems
were added to the qualifier list).
4. Encoding strategy:
__ external database
__ auxiliary HTML files (not embedded in resource)
X metadata embedded in resource
5. Granularity of the description:
X individual documents
__ groups of documents
__ collections
6. Dublin Core strengths and weaknesses encountered in deployment
Qualifiers need to be agreed on
Type list is not satisfactory
The current practice of using URL's as document identifiers is not satisfactory
More guidance is required on the national or a subject community level
7. Which DC fields are used/not used?
All elements are used.
8. Have additional local fields been added?
No.
9. Which qualifiers are used/needed?, at what level of detail?
We support extensive use of qualifiers in our full metadata template.
On the other hand, a short template alternative offered cuts qualifier
usage to the minimum.
10. Do you provide any subject description support
(uncontrolled term lists, thesauri, classification systems)?
The help text for Subject contains links to all subject description systems
available in the Web we are aware of. Javascript is used to automatically offer
the appropriate remote system in a separate window on the users screen
to copy and paste suitable subject vocabulary and classification notations.
1. Conversion tools (e.g., MARC -> DC)
Dublin Core -> MARC converter, available at:
http://www.bibsys.no/meta/d2m/
2. Metadata creator tools
Metadata template, available at:
http://www.ub.lu.se/metadata/DC_creator.html
3. Automatic metadata extraction tools
The harvester module of the Nordic Web Index (see
http://nwi.ub2.lu.se/?lang=en) is used to extract Dublin Core and several other kinds of
metadata from documents.
4. Discovery tools (directory/browsing type)
5. Indexing and search tools
The index and search modules of the Nordic Web Index (see
http://nwi.ub2.lu.se/?lang=en) are used to make searchable Dublin Core and several other kinds of
metadata.
6. Other tools
7. Help system, tutorials etc.
The project has produced short (see
http://www.sics.se/~preben/DC/DC_guide_short.html) and advanced (see
http://www.sics.se/~preben/DC/DC_guide.html)
Dublin Core guides, and a guide on how to use the Metadata template (see
http://www.sics.se/~preben/DC/DC_temp_help.html.
All these have been linked to the metadata template.
8. Is the metadata indexed, browsable, searchable, through a local or external
service?
Nordic Web Index is currently an external service to most organisations, but it can
also be installed locally to serve as an internal database. As of this writing, experimental
Swedish and Danish metadata databases are available for searching at:
http://nwi.ub2.lu.se/?lang=en
1. Describe problems and accomplishments from both phases
Rapid development of Dublin Core has caused some problems to early
implementors, both in planning and production phase. Our project organisation is
very small and flexible, so it has not been too hard fo us to adapt.
We now have available a full set of tools needed to create and utilise
Dublin Core -based metadata. These have already been adopted with other,
content-oriented projects, and more such projects will follow later. A critical
issue is to make Dublin Core data production not a "hobby" of the initiated ones,
but a part of every day routine. This step has already been taken in Denmark, thanks to
the INDOREG project and the Netpublikationer standard.
2. Do you have estimates of the costs and benefits for the creation or use of
metadata?
We expect to get this kind of information via user feedback (see
http://www.sics.se/~preben/DC/usaq.html), which is being
collected. Usefulness of metadata database is largely dependent on the quality of
harvested data, and it is essential that garbled or useless metadata is eliminated
during indexing (and authors are properly told how to do resource description correctly).
3. What measures (or expectations) of service improvement through metadata
usage do you have?
Utilization of high-quality metadata by Nordic Web Index and other WWW indexes
will enhance quality of Internet searching via making it more precise. We also hope that
metadata created by authors and publishers can be utilized in MARC cataloguing.
8. AHDS Arts & Humanities Data Service
Reported by Paul Miller, 12 September 1997
The AHDS is a federal organisation, consisting of a central Executive and five service providers encompassing archaeology, history, textual studies and the performing and visual arts. Each service provider also holds servicewide responsibility for a type of digital resource, including electronic corpora, geospatial data and timebased media (video, etc.).
The Arts & Humanities Data Service's vision is built firmly upon two main foundations, both of which are of potential interest and relevance to the wider community:
Instead, AHDS has adopted a pragmatic approach, and is advocating use of the Dublin Core as a means by which such diverse resources may be described in general and comparable terms, with greater detail provided where necessary by linked metadata records in more locally approriate formats. It is to be hoped that metadata for deposited resources will be created by the depositor in line with AHDS recommendations and utilising AHDSprovided tools. Documentation will also be provided to users through the AHDS Guides to Good Practice, which will explain the provision of metadata in a context and using language relevant to the depositor.
The proposed implementation of Dublin Core has been arrived at following a comprehensive series of disciplinespecific workshops undertaken in conjunction with the UK Office for Library & Information Networking (UKOLN). The reports from these workshops are available online from the AHDS, and a single paper publication capturing the essence of these workshops and related initiatives is currently being printed for distribution in Helsinki, and wider distribution following the workshop.
A brief report on this, too, will be available for distribution in Helsinki.
9. German Educational Resources Server / Deutscher Bildungs-Server
Reported by: Diann Rusch-Feja, Christian Richter, Peter Diepold
The German Educational Resources Server http://dbs.schule.de/indexe.html has been in existence since June 1996 and has indexed a total of 2000 Web documents including HTML documents about physical teaching and learning materials from pupils, teachers, pu blishers and state educational authorities. It is also the hub of a larger national network of state and regional educational servers (from 14 state educational servers), the "Schools on the Net" Initiative (including approximately 3300 school websites) an d the Open School Network (ODS), as well as the Central Institute for Images and Film Materials which supports approximately 30 audiovisual media centers in the Federal German States. The German Eduational Resources Server also includes directories of the educational researchers and institutions of higher education with teacher training programs supported by the German Society of Educational Scientists (DGfE). Further institutions involved in various aspects of education at all levels contribute to the Germ an Educational Resources Server (i.e., BAK, BioNet eV., DFN-Verein, GIB, GMD and the German universities, teachers, etc.). In addition, there are links to the websites and to certain products of publishers of educational materials.
The German Educational Resources Server (GER / DBS) has implemented metatags using the Dublin Core with five addtional subfields pertaining to the field of education. These are:
DBS.SUBJECT.Requirements describes the technical requirements for obtaining and using certain teaching or learning materials (for example, WINDOWS 3.0 or above, SVGA Graphics Adapter, Sound Card, etc.). This is because some schools are not equipped with ad equate hardware installations and having this important information prior to ordering or downloading these materials is imperative. We are currently considering this field as a suggested sub-element for the DC, namely, DC.PUBLISHER.TechnicalRequirements. A lternatively, this could be found in the non DC terms and conditions metadata.
DBS.SUBJECT.Conditions described the user restrictions and cost or licensing aspects. These aspects are particularly important for the metadata concerning cost- or fee-based services and products on both a commercial and non-commercial level. We are curren tly considering this field as a suggested sub-element for the DC, namely, DC.PUBLISHER.UseRestrictionsAndCost until otherwise included in terms and conditions metadata.
DBS.SUBJECT.SubjectAreaOfInstruction describes the subject area of instruction to distinguish this field from the simpler DC.SUBJECT which may not refer to teaching and learning materials. Even with the combined DC.SUBJECT with two contents "teaching material" and "biology" it is not clear if this is an article on teaching biology or a part of a teaching unit on some aspect of biology. We suggest this as a sub-element of the DC specifically for educational purposes and not solely for educational servers (i. e., for a mathematics server with academic contents and also didactics of mathematics, as is the case in the German MathNet) as DC.SUBJECT.SubjectAreaOfInstruction.
DBS.SUBJECT.GradeLevel describes the level of intended instruction or use in schools and all levels of educational institutions (including home schooling). This corresponds to the EDNA.USERLEVEL (Australian Educational Network Server ( http://www.edna.edu.au ). We suggest this as a DC sub-element, namely, DC.SUBJECT.UserLevel.
DBS.MEDIUM describes the physical medium of an object which is electronically described using a HTML-file, but may not be in an electronic form or has alternative formats. Examples are: Audiocassettes, CD-ROMs, Videocassettes, printmedium, data disc. We su ggest this be included under the DC.FORMAT as sub-element DC.FORMT.PhysicalMedium.
The organisation of the German Educational Resources Server includes self-registry of teaching, learning or other informational documents by means of a form developed for the GER / DBS http://dbs.schule.de/db/inconeue.html). This places the metadata into a mySQL-database which will be indexed using a dedicated gatherer / broker (currently Harvest, considerations are being made for other robots). Entering the registry form will automatically return a file containing the structural information according to the Dublin Core format. This file can then be included by the autor as part of his HTML, edited as it later may become necessary and be harvested by any search machine. Using these metatags enables a structured retrieval process in which a combination of various content terms from one - three fields (DC.TYPE, DBS.SUBJECT.USERLEVEL and DC.SUBJECT or DBS.SUBJECT.SubjectAreaOfInstruction) can be defined to give targeted results.
The state servers have responsibility for those items stored via their servers, the DBS has a general editorial center to approved entry of all items and maintain integrity. This is especially important because of the legal implications of having school pupil-authored items indexed in these servers.
Many of the initial concerns about whether the DC 15 elements would be complete enough to be useful have been resolved by the addition of the DC qualifiers. Of note is the added ability to use the Language qualifier under Subjects and Description to add parallel English and Spanish fields and the addition of Role under Creator to enable users to search all artists, etc. Limitations being found with the DC elements stem from the fact that deployment is through an integrated searchable/browsable system that creates HTML pages on the fly rather than as elements in the Header of an HTML document. Some problems/questions: 1. Each object has multiple Resource Identifiers (eg. an image's thumbnail, access, and reference versions; a video's segments) and this is not exactly covered with qualifier Type 2. There are 3 types of Relation in the FIU DL that do not seem to correspond with the Relation Types parent, child, member. They are: a) a separate related title, b) a multimedia link to embed an audio narrative on the same HTML page as an image, c) a contents link such as when an art object is photographed from 2-4 different angles and so 2-4 clickable thumbnail images need to appear on the same HTML page. 3. FIU is using 3 types of Coverage: a) temporal b) geospatial c) subject which allows for tiers of subject browsing which is different from the Subject which is assigned from a thesaurus (ex. Fine Arts - Architecture - Commercial Architecture) 4. Resource Type - When the object is an image of an art object such as a sculpture, shouldn't the Resource Type be Art object or Sculpture rather than the medium eg. Image?
See a simulation of the
user interface. Follow these links: 1)
Basic Search Template 2) Browse by Format... Select Image, then People, then Protestors
Marching 3) Browse by Subject... Select Fine Arts, then Architecture, then Commercial
Architecture, then Stock Exchange 4) Browse by Place/Time... Select Latin America. !!Please
note that this is an early draft of the design concept and the final product will be more complete
and the visual product of a team of artists.
Prepared by Katherine "Kass" Evans
Coordinator for Data Modeling
Florida International University Digital Library Project
Sept. 18, 1997
11. SCRAN
Reported by Ian O. Morrison, SCRAN, 17/09/1997
The Scottish Cultural Resources Access Network is a project to build a networked multimedia resource base for the study, teaching and appreciation of history and material culture in Scotland. The founding partners are the National Museums of Scotland, the Royal Commission on the Ancient and Historical Monuments of Scotland and the Scottish Museums Council. The National Lottery, by way of the Millennium Commission, is providing 49% of the funding, to the tune of 7.4 million UKP. The remainder of the cost is being met largely by in-kind contributions, principally the assignation of non-exclusive rights for educational usage of the resources.
By the year 2001 we will be providing easy access to 1.5 million text records of artefacts and historic monuments, and 100,000 related multimedia resources. SCRAN is also commissioning 100 multimedia essays, based on these resources and others, for use in schools and by a wider audience.
The SCRAN approach has been to combine the metadata necessary for resource discovery with aspects of the resources themselves. The intention is that most enquirers will get what they need directly from the SCRAN resource base, in the form of limited information, for example, about a museum object or an archaeological site, sometimes accompanied by a small illustration. In a minority of cases, where more detailed information is required, users will be directed to the original information providers, either by means of a hyperlink, where this is feasible, or by a unique reference number.
The Royal Commission on the Ancient and Historical Monuments of Scotland is also involved in a project, Accessing Scotland's Past, which involves mapping elements of RCAHMS records to both the SCRAN standard and that of the Archaeology Data Service. The Dublin Core has proved useful in providing the common ground on which this mapping can take place.
The SCRAN resource base comprises a variety of different kinds of records:
The provisional data standard has been drawn up to reflect the varied nature of the records. This can be obtained from the SCRAN Web site. It represents a hybrid between a museum collections management approach, drawing on the experience represented in the UK Museum Documentation Standard, SPECTRUM, and the emerging cross-domain standards for metadata, represented by the Dublin Core. The core information, necessary to represent and discover all kinds of resources within SCRAN, is held in a form compatible with the Dublin Core. It is proving more problematic, however, to map to the Dublin Core the standards required for delivery of all the other information that we hold. However, attempting this may be like trying to fit a quart into a pint pot.
14. Medical Metadata Project
Institutional Affiliation: Oregon Health Sciences University,
American Medical Informatics Association Internet Working Group,
National Cancer Institute, Polytechnic University
Contact Person (persons) and email: mailto:gmalet@worldnet.att.net
URL(s) if publically available: http://medir.ohsu.edu/bicc-informatics/ebm/latest.htm
Test set is an National Cancer Institute Cancer Genetics Database
2. Projected Collection Size (number of records)
First year: 5000
Third year: 2 million
3. Type of data, formats
Diverse
2. Who is creating the metadata? What training and support
are provided?
Automated tool. Medical Librarian. Template entry.
3. Which metadata scheme is used?
Enhanced Dublin Core. HTML 4.0 compatible.
4. Encoding strategy:
5. Granularity of the description:
6. Dublin Core strengths and weaknesses encountered in deployment
Resource types and other DC standards not fully specified leads to
possible instability and unfavorable economics.
7. Which DC fields are used/not used?
Not used:
8. Have additional local fields been added?
Yes: Medical Subject Headings.
9. Which qualifiers are used/needed?, at what level of detail?
SCHEME of course
Keywords:
10. Do you provide any subject description support
(uncontrolled term lists, thesauri, classification systems)?
National Library of Medicine
1. Conversion tools (eg., MARC -> DC)
In development....parse Medline records
2. Metadata creator tools
In development....JAVA extraction->editor> template entry.
See (
http://medir.ohsu.edu/bicc-informatics/ebm/latest.htm)
3. Automatic metadata extraction tools
Medical World Search, http://www.mwsearch.com/
4. Discovery tools (directory/browsing type)
Medical Matrix www.medmatrix.org
5. Indexing and search tools
Medical World Search, Harvest, http://www.mwsearch.com/
6. Other tools
7. Help system, tutorials etc.
8. Is the metadata indexed, browsable, searchable, through a local =
or external service?
Yes.
1. Describe problems and accomplishments from both phases
2. Do you estimates of the costs and benefits for the creation or
use of metadata?
$100,000 phase one.
3. What measures (or expectations) of service improvement through
metadata usage do you have?
Field selection. Medline-line type access to distributed
multi-media documents.
15. University of Washington Digital Library
The University of Washington Digital Library is contributing to the development and adoption of standard resource descriptors for networked information; our initial efforts in this arena focus on image collections. In collaboration with the Department of Electrical Engineering's Center for Information System Optimization (CISO), the University Libraries are using extended Dublin Core descriptors for image collections. "Content" is CISO's Web-based multimedia database management and archival system *; it is in production on the campus of the University of Washington managing digital image collections today. The software also stores, searches and displays audio and video, and is in test with these formats in various venues.
Technical Services staff from the Libraries lend expertise in information organization to implement the DC templates provided by default in Content's "acquisition" workstation. Using Libraries staff to train authoring paraprofessionals and subject experts offers an approach which leverages scarce cataloging staff while fostering distributed, accurate description of images by academic departmental staff doing standards-based tagging. This is made possible by the open architecture adopted by Content developers. Item and collection-level records are built using DC fields optionally in conjunction with field-level thesauri and validity-checking routines. Since the database can be maintained remotely from multiple acquisition sites, scholars and experts can control and enrich metadata from any networked site running Windows95. A recent grant from Intel Corporation provides the hardware base for the use of this high-performance tool across disciplines. The Content server runs on Windows NT and unix; client and acquisition station are in Visual Basic. The databases can be accessed with any Web browser.
Content provides the ability to do Boolean, full-text searching across databases; the use of the Dublin Core facilitates recall while assuring precision to the extent the database creators demand. Using Content as a common, extensible tool allows us to focus on testing the appropriateness of DC for image searching. Rigorous usability testing of the DC data model for images is now in the design stage.
Diverse interdisicplinary collections, beginning with images for usability testing of the DC model. As model is proven and extended, development of standard metadata to entire array of Web-based resources.
X Management of the collection
X Enhanced searchability
X Interoperability
X Value-added services
X Other: migration, provenance study, enrichment from distributed,
remote collaborators .
Demonstration and download of Java version of the search client in Content
Project Name: Math-Net
Institutional Affiliation:
Konrad-Zuse Zentrum fuer Informationstechnik -Berlin-
Technische Universitaet Chemnitz-Zwickau - Fb. Mathematik -
Martin-Luther Universitaet Halle-Wittenberg - Fb. Mathematik -
Universitaet Kaiserslautern - Fb. Mathematik -
Universitaet zu Koeln - Fb. Mathematik -
Technische Universitaet Muenchen - Fb. Mathematik -
Universitaet Osnabrueck - Fb. Mathematik/Informatik -
Universitaet - GHS - Paderborn -Fb. Mathematik -
Universitaet Rostok - Fb. Mathematik -
Contact Person (persons) and email: http://elib.zib.de/math-net/wegweiser.html
URL(s) if publically available: see above
2. Projected Collection Size (number of records)
First year:
Third year:(2 years project)
3. Type of data, formats: PostScript, HTML, (PDF)
2. Who is creating the metadata? AUTHORS
What training and support are provided? METAMAKERS, NO SPECIFIC TRAINING NECESSARY-
Example:
http://www.mathematik.uni-osnabrueck.de/projects/META/MetaMake2.2.html
3. Which metadata scheme is used? DC
4. Encoding strategy:
__ external database
__ auxiliary HTML files (not embedded in resource) FOR PostScript (PDF)
__ metadata embedded in resource: FOR HTML
5. Granularity of the description:
individual documents: YES
groups of documents: YES
collections:
6. Dublin Core strengths and weaknesses encountered in deployment
FLEXIBLE, DOES NOT INCLUDE RATING
7. Which DC fields are used/not used?
http://www.dstc.edu.au/DC4/roland/
8. Have additional local fields been added?
Some did for internal use.
9. Which qualifiers are used/needed?, at what level of detail?
see 7.
10. Do you provide any subject description support:
(uncontrolled term lists, thesauri, classification systems)?
YES
1. Conversion tools (eg., MARC -> DC)
2. Metadata creator tools:
SEE ABOVE
3. Automatic metadata extraction tools:
HARVEST -- To what extend this
statement makes sense would require a lengthy explanation.
4. Discovery tools (directory/browsing type) ??
5. Indexing and search tools: HARVEST based
6. Other tools
7. Help system, tutorials etc.
8. Is the metadata indexed, browsable, searchable, through a local or external service?
SURE -- THATS THE USE OF IT -- The project is an open + distributed one
http://www.mathematik.uni-osnabrueck.de/projects/workshop97/abstracts/dalitz.ps
2. Do you estimates of the costs and benefits for the creation or use of metadata?
COSTS PER AUTHOR 0, QUALITY IMPROVEMENT FOR RETRIEVAL REMARKABLE.
3. What measures (or expectations) of service improvement through
metadata usage do you have?
See yourself:
http://www.mathematik.uni-osnabrueck.de/harvest/brokers/MathN/
as opposed to http://www.mathematik.uni-osnabrueck.de/harvest/brokers/niedersachsen/
17. The Victoria and Albert Museum
Reported by Douglas Dodds, Head of Collection Management, National Art Library,
Victoria and Albert Museum
The Victoria and Albert Museum http://www.vam.ac.uk (V&A) is one of the partners in the EC-funded Electronic Library Image Service for Europe http://severn.dmu.ac.uk/elise/ (ELISE II) project, which started in October 1996 and runs for three years. The ELISE partners are:
The National Art Library also contributed draft DC metadata examples for
the report of the Visual Arts Data Service
Activities on this field:
DC-Meta-Data is also used in the EPRINT-project to combine the global
preprint-database xxx and the distributed
preprints on the local webservers (EuroPhysDoc). A common
search-interface for both types of preprints is planned to be set up. One of
the big questions is how to force the authors to publish their articles with
Meta-Tags on the local webserver. One solution is developed with a Web-Upload-Form-Interface (coming soon).
Implementations in DC:
The Australian Geodynamics Cooperative Research Centre (AGCRC)
- a collaboration between two public research organisations and two universities -
is using the WWW as a primary delivery system for the results of its research.
The results are composed of a variety of material presented as a number of different resource types.
In particular
Activities in the Field of Electronic Information Management and Meta-Data in Physics
Reported by:
The goal is to build up a distributed electronic information system in physics.
The first version of this system is Harvest-based. A link-list of (hopefully)
all European Physics Departments EuroPhysDep underlies a
Harvest-Search-Machine. The only longterm possibility (from our point of view)
to produce complete, correct, and readable search-results is a common basis of
Meta-Tags on the Web-Pages of the participant departments. To make it as simple
as possible for the authors to produce a complete and correct set of Meta-Tags,
the MMM authoring-tool was developed.
Implementing DC into EuroPhysNet, xxx, and EPRINT indicates that still some changes and additions into DC should be done:
for dissertations it is essentialy important to know who produced the
Meta-Tags. To allow this, Meta-Information about the Meta-Information has to be
archived. One idea to implement this:
<!-- MetaCreatorDate="Mrs. Author, 03061997"-->
<META NAME="DC.creator" CONTENT="(TYPE=name)Mr. Author">
<META NAME="DC.subject" CONTENT="Keyword.Author1, Keyword.Author2, ...">
...
<!-- /MetaCreatorDate="Mrs. Author, 03061997"-->
<!-- MetaCreatorDate="library, 04091997"-->
<META NAME="DC.subject" CONTENT="Keyword.Library1, Keyword.Lirary2, ...">
<!-- /MetaCreatorDate="library, 04091997"-->
the Type Unpublished is too general. Most of the authors of (still)
unpublished documents want to stress the state of the publication-process.
There should be additional types as: "tobepulished", "accepted" with an
additional required field e.g. journal.
in this field a mixture of the document-format and the archiving-format is very
common. If you want to describe a z-compressed tar-archive that contains a
mixture of document-formats, what to do? One solution could be, to
separate the archiving-format (tar, gz-compressed, ... uncompressed) and the
data-format (text/html, application/pdf, ... multi).
how should DC be used in other than HTML-Files? There should be a safe binding
between the Meta-Information and the described file. To optimize this, DC-Tags
should be implemented directly into the file. But, how to do this in
GIF-Images, PNG-Images, PDF-Texts, ...?
19. Australian Geodynamics Cooperative Research Centre
Reported by Simon Cox,
Perth, Western Australia
The primary discovery mechanisms for these resources consist of
These use two different metadata systems:
A crude link has been made between these two systems by allowing maps produced by the visualisation tool to act as an interface onto the document search system, with key-words chosen from names of the map-layers selected. However, we would like to unify the system properly, in order deliver targetted information regarding Australian Geodynamics to the client regardless of what type of resource it is.
Constraints on implementation include:
The intention is to move to a system based on
This approach introduces a few challenges.
20. Interconnect Technologies Corp. projects
Reported by Mike Raugh
Project Name:
(Example) Aviation Safety Digital Library
Institutional Affiliation:
Interconnect Technologies Corp.
Contact Person (persons) and email:
Mike Raugh, raugh@interconnect.com;
Diane Hillmann, dih1@cornell.edu
URL(s) if publically available:
Not available yet
(Example) R&D website containing selected information primarily for general aviation
2. Projected Collection Size (number of records)
First year: 1,000 (subset subject to DC cataloging)
Third year: 10,000 (subset subject to DC cataloging)
3. Type of data, formats:
Primarily text documents, in PDF, HTML or ASCII.
2. Who is creating the metadata? What training and support are provided?
Most created by Diane Hillmann, under contract. Guidelines being written so some can
be provided by others.
3. Which metadata scheme is used?
Minimal Dublin Core + Additional Module (e.g., specialized metadata for aviation safety information)
4. Encoding strategy:
We would like to embed in resource at some point, if possible. We do create sample datasets for clients demonstrating the benefits of embedding.
5. Granularity of the description:
We are debating whether to catalog individual documents of large collections (order of 500,000 structured records) or to merely catalog the collections
6. Dublin Core strengths and weaknesses encountered in deployment
Simplicity both strength and weakness
Some elements still a bit ambiguous, need to be better defined both in general and within this project
7. Which DC fields are used/not used?
Elements used:
Elements not used
8. Have additional local fields been added?
No.
9. Which qualifiers are used/needed?, at what level of detail?
Have qualified "identifier" to use with URL and also used unqualified for other identifying numbers on documents.
10. Do you provide any subject description support (uncontrolled term lists, thesauri, classification systems)?
Haven't yet but are considering several options, including term lists.
2. Metadata creator tools:
Template for DC
3. Automatic metadata extraction tools
4. Discovery tools (directory/browsing type):
Browsing of catalog records via index hierarchies, with final links to resource
5. Indexing and search tools:
Searching, with optional filtering on metadata
6. Other tools:
Display of metadata
7. Help system, tutorials etc.
8. Is the metadata indexed, browsable, searchable, through a local or external service?
We provide both local services and client-based services.
1. Describe problems and accomplishments from both phases
Many of our problems have been related to the DATE element, primarily because of its limitations in its present form. Not only did we need more than one kind of date, we were concerned about being too restrictive about format.
The other major problems had not much to do with the Dublin Core,but with the dysfunctional practices of the web sites we were working with. They tended to change URLs, and as one moved to frames, we lost the ability to relate our metadata to a specific item.
2. Do you have estimates of the costs and benefits for the creation or use of metadata?
3. What measures (or expectations) of service improvement through metadata usage do you have?
21. University of Michigan Digital Library Registry Database
Last modified: 9/22/97
David L. Richtmyer
Another problem encountered was the great (roughly 20%) number of links that become broken, either by the time that the registrar's record reaches our catalogers, or after an item has been cataloged and placed in the Registry Database. This problem was solved partially by a script that tests all links when the database is indexed (roughly once a week), and then reports back to our cataloging staff whether the link is dead or has moved. If a link is found broken the script automatically removes the record from the active database and places it in repository file for manual resolution by the cataloging staff. Still, the maintenance of this problem is largely manual, and therefore costly.
Because the planning and implementation of this project included people from diverse backgrounds in the information field (i.e., catalogers, public services librarians, programmers), pitfalls that would normally ensue from a project narrowly directed were avoided. For example, when 'View Details' is chosen for a record, the order and placement of the metadata fields was determined not by a preordained set of rules but by what was found to be what the typical user most often looked for in a record. Thus, the Providers field was placed not a the top of the record as is found in regular AACR2/MARC records, but rather further down in the body of information. Why? Because it was found that, unlike printed books, most users did not remember a site by its creator, but rather by its title.
The advent of the Registry Database solved all of these retrieval problems.
22. INDOREG : INternet DOcument REGistration
Danish Library Centre
Reported by Randi Diget Hansen
In recognition of the fact that national information and cultural heritage are not only conveyed in the form of printed matter, the Danish Library Centre (DBC) and the Royal Library decided in 1995 to propose to the National Bibliographic Council and the Danish Ministry of Culture that the national bibliography should also include electronic documents, although only those found in a physical state in the form of diskettes and CD-ROMs. Such documents have been included in the Danish book list and Danish periodical list since 1996.
At the same time, developments in the form of Web publication on the Internet were moving so fast that DBC decided to launch a project to find out whether net publications could and should be subject to bibliographic control in the same way as printed and electronic publications in fixed physical form. The reason for this was that we felt the existing search engines on the net suffered from the general problem of searching in unqualified data and generally replying with excessive amounts of data. We also felt that the information contained in net publications did not basically differ from the information in publications in fixed physical form. If net-borne publications are excluded from bibliographic control, there is a risk that many people will find it difficult to gain access to an increasing amount of the information citizens need, as opposed to the information stored in products in fixed physical form. The ultimate target was that registration should be in DanBib (the joint superstructure system for the complete Danish library system) alongside the national bibliography and the total list of publications at Danish libraries. DanBib should also be used to make direct linking to documents possible.
In addition, a new Legal Deposit Act was pending (and was passed
during the project period). It covers all electronic documents
on the net which are deemed likely to be available as independent
units in a final form. The Act requires that a general standpoint
should be taken in terms of national bibliographic registration.
In the autumn of 1996 the Danish National Library Authority (Statens
Bibliotekstjeneste, or SBT) decided to provide joint funding of
the project. At the same time the principle of national bibliographic
registration of electronic documents on the Internet was presented
to the National Bibliographic Council, who expressed their agreement.
In this connection the project's ultimate target was altered to
include not only evaluation of bibliographic control in general,
but also proposals as to how the work involved in national
bibliography should be performed.
In order to obtain a model for a national bibliographic registration
system that includes net publications, the project has focused
on the following areas in particular.
Inclusion criteria
Proposed principles for national bibliographic inclusion
criteria have been drawn up. They operate with the concepts of
static and dynamic publications - with homepages as an independent
category under dynamic publications. These principles reflect
the criteria that exist for publications in fixed physical form,
since there are formal requirements with regard to both size and
(to a certain extent) content. For instance, it is proposed that
publications of a commercial, internal, highly local or private
nature should not be included.
Registration method
The proposed registration method seeks to cover the special needs of net publications in terms of description and format. The problems of describing static and dynamic publications vary. The level of registration has not been finalised yet; for one thing, the new cataloguing rules drawn up in parallel to the project by one of the project participants will have to be tested in practice before the levels in the national bibliography generally are recommended to SBT in the autumn of 1997.
Self-registration by authors/publishers using metadata is regarded as a necessary supplement if very large amounts of information are to be registered.
For this purpose there is produced a modified version of the Dublin Core Template as used in the Nordic Metadata Project to allow for Danish conditions.
Tracing and maintenance
To ensure the constant validity of addresses (URLs), a PURL server
has been established which functions as a central exchange. If
an international number system is adopted like the ISBN system,
it must of course be used. But a solution is required immediately.
The further development of automatic tools to check publications
is needed. In addition, we feel that the current collaboration
between national bibliography and legal deposits should continue.
Finally, concentrated efforts are needed to persuade authors/publishers
of net publications to regard such publications in the same way
as printed publications.
Storage in DanBib
The DanBib base will be given a Web interface in the autumn of 1997, making it possible to link from registration to net publications.
A project report is available at http://www.purl.dk/rapport/html.uk/
23. Metadata Project of State Library of Queensland
Institutional Affiliation:
State Library of Queensland
Contact Person (persons) and email:
Jennie Thornely; J.Thornely@slq.qld.gov.au
1. Domain of Content :
Gov
2. Projected Collection Size (number of records)
First year: 450
Third year: 650
3. Type of data, formats
html; text; jpeg; gif
1. What are the intended functions of the metadata used?
_y_ Management of the collection
_y_ Enhanced searchability
_y_ Interoperability
_y_ Value-added services
__ Other:
provides linkage for opac in the near future
2. Who is creating the metadata? What training and support
are provided?
The Project is managed by Manager, Technical Services and Senior Librarian, Internet Services Unit.
The project is intended to be finished by Technical Services.
3. Which metadata scheme is used?
Dublin Core
4. Encoding strategy:
_N_ external database
_N_ auxiliary HTML files (not embedded in resource)
_Y_ metadata embedded in resource
5. Granularity of the description:
_Y_ individual documents
_N_ groups of documents
_N_ collections
6. Dublin Core strengths and weaknesses encountered in deployment
Flexiblity, easy to use and understand;
Still evolving.
7. Which DC fields are used/not used?
We decided to use two levels of complexity, ie, for simple (contents) files we use title/creator/date/description/rights;
for the more complicated (contents) files, we apply all the appropriated elements.
8. Have additional local fields been added?
No.
9. Which qualifiers are used/needed?, at what level of detail?
We use type and scheme, eg.
LC Name authority;
LCSH.
10. Do you provide any subject description support
(uncontrolled term lists, thesauri, classification systems)?
No.
1. Conversion tools (eg., MARC -> DC)
No
2. Metadata creator tools
No
3. Automatic metadata extraction tools
No
4. Discovery tools (directory/browsing type)
No
5. Indexing and search tools
will develop
6. Other tools
No
7. Help system, tutorials etc.
Working on a local procedure/manual
8. Is the metadata indexed, browsable, searchable, through a local
or external service?
Yes
1. Describe problems and accomplishments from both phases
a. Took awhile for different Divisions to agree for their staff to deploy metadata when they creat the Divisional Web pages.
b. Difficult to discuss problems relating to Dublin Core (locally), as there are not many 'experienced metadata deployers' around.
c. Have to change the syntax a few times when new information was received via meta2.
2. Do you estimates of the costs and benefits for the creation or use
of metadata?
cost in $$$ - No
benefits - Yes, as discussed by many people.
3. What measures (or expectations) of service improvement through
metadata usage do you have?
We are still in the process of deploying metadata on Web pages.
We would like to see the increase usage of State Library's Website in the near future
MedExplore & Dublin Core generation
Preliminary Version
The aim of MedExplore is to give a group of experts confronted with an unforeseen situation the mastery of its terminological resources and a synthetic and deeper knowledge of the state of the art. This will be achieved by the creation of a system of investigation.
Such a system allows an user to navigate through concept graphs and to manipulate conjointly various pieces of information (large international databases, local source documents, raw information from the INTERNET such as news-groups), written in different languages. We have chosen to begin on biomedical fields due to the large amount of what we call "structuring" funds (such as MEDLINE, EMBASE or PASCAL which possess an homogeneous indexing and, with some associated projects like UMLS, a quasi knowledge based representation).

Our project deals with Metadata with two complementary aspects. First one is "how to improve our retrieval on the INTERNET".
The second one sounds like an indirect but very interesting effect. As we could know the main contents which allow to retrieve a set of relevant documents, we are able to use them for producing the good metadata allowing an information to be well retrieved by the search-engines. We are beginning to experiment the generation of Dublin Core metadata from an existing set of local information
The engineering techniques used for MedExplore rest upon the generalisation of SGML codification allowing the use of SGML toolboxes and linguistic modules libraries. We have developed DILIB, an SGML workbench which contains a set of basic components for building Information Retrieval Systems
We use to convert all information in an SGML markup whose structure is very close to original one. For instance a downloaded record issued from MEDLINE such as:
AN : 96081277 TI : Orthotopic pulmonary valve replacement with a homograft. AU : Saha K,Iyer KS, Sharma R, Bhan A, Airan B, Venugopal P CS : Department of Cardiothoracic and Vascular Surgery, All India Institute of Medical Science JN : J Heart Valve Dis CP : (ENGLAND) PY : Mar 1995 VO : 4 (2) p187-91 SN : 0966-8519 LA : ENGLISH ...
becomes:
<medline> <AN>96081277</AN> <TI>Orthotopic pulmonary valve replacement with a homograft.</TI> <AU><e>Saha K</e><e>Iyer KS</e><e>Sharma R</e><e>Bhan A</e><e>Airan B</e><e>Venugopal P</e></AU> <CS>Department of Cardiothoracic and Vascular Surgery, All India Institute of Medical Sciences, New Delhi, India.</CS> <JN>J Heart Valve Dis</JN> <CP>(ENGLAND)</CP> <PY>Mar 1995</PY>
We can do that for traditional MARC formats (ISO 2707), and a CCF (UNESCO)
record like:
-- header position 7 = s 001 157028 020 00@BISDS 022 11@A19880120 101 00@A0253-021X 201 00@ALegisLative study - Food and ... Agriculture 210 00@AEtudes législatives - ... Agriculture @Lfre 400 00@ARome@BFood and Agriculture Organization of the United Nations
can be marked as:
<record h7="s"> <f001>157028</f001> <f020 ind="00"><sB>ISDS</sB></f020> <f022 ind="20"><sA>198880120</sA></f021> <f201 ind="00"><sA>Legislative study - Food and ... Agriculture <f210 ind="00"><sA>Etudes Legislatives ... Agriculture</sA>
<sL>fre</sL>... </record>
We may remark that only two filters are required to convert all king of MARC format in that way.

In some case, it may be interesting to proceed little transformation on a set of data. For instance, we need to handle multilingual information coming from UMLS whose records look like:
C0017379|ENG|P|L0017379|PF|S0022690|Carriers, Genetic| C0017379|ENG|P|L0017379|VW|S0044411|Genetic Carriers| C0017379|ENG|P|L0017379|VWS|S0022684|Carrier, Genetic| C0017379|ENG|P|L0017379|VWS|S0044407|Genetic Carrier| C0017379|POR|P|L0436728|PF|S0561010|TRANSPORTADORES GENETICOS| C0017379|SPA|P|L0447330|PF|S0571612|PORTADORES GENETICOS|
C0017379 identifies a "unique concept" which gets an English prefered form (Carriers, Genetic), various usual forms and some translations.
This information is available in a table format. In an SGML context, it is more convenient to group all information dealing with a particular concept into one record like:
<mrcon> <CUI>C0017379</CUI> <TP><PF>Carriers, Genetic</PF> <VW>Genetic Carriers</VW> <VWS>Carrier, Genetic</VWS> <VWS>Genetic Carrier</VWS></TP> <VL l="POR"> <TP><PF>TRANSPORTADORES GENETICOS</PF></TP> </VL> <VL l="SPA"><TP><PF>PORTADORES GENETICOS</PF> </TP></VL> </mrcon>Once all our data are coded with an SGML mark-up we can apply the associated engineering facilities.
DILIB provides a set of tools for handling SGML or XML elements. They are available at different programming level.
For instance, if we want to add the key-word "AIDS" as an element tagged with <e> to an SGML element which is pointed by "kw" variable in a C program, we have to write:
SgmlAddChild (kw, SgmlCreateLeaf("e", AIDS));
In the same way, we can use shell commands to handle some set of records. We have introduced a "path pattern mechanism" to specify a set of elements into a given documents. For instance, if we want to select records which contain "AIDS" as part of key-words and print the corresponding title, we just have to write:
SgmlSelect -g medline/KW/e#AIDS -g medline/TI -p @g2
(where "-g" is used by analogy with grep and @g2 identifies the 2th "g" sub-command)
As HTML and Dublin Core strongly deals with SGML, it becomes very easy to generate metadata in a programming environment. For instance, the following program:
SgmlNode *meta;
meta= SgmlCreateEmptyMark("META");
SgmlSetAtt(meta, "name", "DC.subject");
SgmlSetAtt(meta, "content", "AIDS");
will produce:
<META name="DC.subject" content="AIDS">
In that way, DILIB contains also a set of basic components in order to build customised information retrieval systems (in which internal data like inverted files are also coded on an SGML basis). These tools allow a global analysis of large set of information. At the present time, our tools mainly deal with combination of associations of terms.
For people who are not familiar with these techniques, an elementary way of analysing a set of records consist in printing a sorted list of terms with their number of occurences in a given corpus. For instance, if I know nothing about "Wallerian Degeneration", reading the following result:
[393] Rats
[268] *Nerve Degeneration
[247] Wallerian Degeneration
[241] Microscopy, Electron
[215] *Wallerian Degeneration
[184] Time Factors
[128] Middle Age
[121] Mice
[120] Rats, Inbred Strains
[115] Adult
[76] Nerve Degeneration
In this previous sample, knowing nothing about Wallerian degeneration, we can deduce that it deals with nervous system.
An other complementary way consist in using associations, for instance: [166] *Wallerian Degeneration - *Nerve Degeneration
[119] Rats, Inbred Strains - Rats
[109] Rats - *Nerve Degeneration
[101] Rats - Microscopy, Electron
[88] Rats - *Wallerian Degeneration
[86] Wallerian Degeneration - Rats
[80] Time Factors - Rats
[75] Microscopy, Electron - *Nerve Degeneration
But the better results will be obtained by clustering the associations which get some common terms. Below you will find a sample of cluster in which an expert could see a real convergence of relevant topics.
List of Key-Words
[34] Rats, Sprague-Dawley
[62] *Nerve Regeneration
[51] Nerve Crush
[27] Sciatic Nerve_Injuries_IN
[34] Sciatic Nerve_Physiology--PH
[17] *Peripheral Nerves_Injuries_IN
[38] *Sciatic Nerve_Physiology--PH
Internal Relationships
[7] Rats, Sprague-Dawley - Nerve Crush
[11] Sciatic Nerve_Injuries_IN - Nerve Crush
[10] Sciatic Nerve_Physiology--PH - Nerve Crush
[10] Nerve Crush - *Nerve Regeneration
[8] Sciatic Nerve_Injuries_IN - *Peripheral Nerves_Injuries_IN
[7] Sciatic Nerve_Physiology--PH - *Nerve Regeneration
[7] Nerve Crush - *Sciatic Nerve_Physiology--PH
[6] Sciatic Nerve_Injuries_IN - Rats, Sprague-Dawley
For a given field, the list of clusters gives the main thematic lines, for instance:
*Wallerian Degeneration - *Nerve Degeneration
Middle Age - Adult
Mice, Inbred C57BL - Mice
Rats, Sprague-Dawley - *Nerve Regeneration
Immunohistochemistry - Wallerian Degeneration_Physiology--PH
Nerve Regeneration - Nerve Degeneration
Sciatic Nerve_Metabolism--ME - *Peripheral Nerves_Metabolism--ME
Neural Conduction - Action Potentials
(each cluster is identified by its more weighted association)
Applying these techniques on the authors allow the identification of the main teams (i.d. group of persons who use to publish together).
Levine RA - Weyman AE
List of Authors
[5] Levine RA
[5] Weyman AE
[9] Lethor JP
[4] Siu SC
[4] Rivera JM
[4] Handschumacher MD
[3] Picard MH
[2] Juilliere Y
Internal Relationships
[5] Levine RA - Weyman AE
[5] Lethor JP - Weyman AE
[5] Lethor JP - Levine RA
[4] Siu SC - Weyman AE
...
[3] Lethor JP - Picard MH
[3] Handschumacher MD - Picard MH
[2] Juilliere Y - Lethor JP
The traditional area of this set of techniques deals with technical or scientific surveys. In the framework of MedExplore project we use them to search through the INTERNET. Some search engines like AltaVista, with the "live topics facilities", give now the same kind of facilities.

In other words, we have to solve the different level of inter-operability between heterogeneous data. SGML/XML brings us a good answer for the "cofidication - structuration level". Now we have to deal with more semantic levels. As we work in specialised area, we have chosen to simplify this problem by defining a kernel vocabulary which contain a limited number of terms (between 100 and 300). Such a lexicon can be generated by automatic tools (for instance cluterization) and, in a second step, improved by a specialist.

For the extraction of documents from the INTERNET, we are testing an approach which gives us some first and "no so bad" results. The principle consist in associating to each term of the kernel vocabulary an histogram of the more frequent word which are founded in the abstracts of corresponding MEDLINE records.
For instance, for the key-word "Newborn, Infant", in a local base dealing with "cardiology", the associated histogram looks like:
[59] patient [42] pulmonary [37] tetralogy [33] fallot [29] infant [27] heart [25] artery [24] defect [19] month [19] outflow [19] ventricular
Now, we just have to send a query on Altavista, using this set of words.
If we use the same kind of techniques for an author, we obtain interesting results too. Here is the vocabulary associated to Pr. JP Lethor from its bibliography on MEDLINE:
[35] ventricular [31] volume [27] left [26] dimensional [20] coronary [19] three [16] method [15] patient [13] defect [13] image [12] excised [11] doppler [10] artery [10] echocardiography [10] tau [9] pressure
We are also implementing multilingual queries by using multilingual bibliographic data bases such as PASCAL.
For instance, we are working on the generation of a server allowing browsing through a collection of Medical Images. Each image is described in French with a little set of information such as:
<doc>
<id>lethor/007_001
<auteur><e>Lethor JP
<specialite>Cardiologie infantile
<tech><e>Radiographie
<acquis><e>
<organe><e>Coeur
Poumon
<patho><e>Tétralogie de Fallot
<motif>rupture de patch infundibulaire
<age>12 ans
<cr><e>rupture de patch infundibulaire après correction chirurgicale de tétralogie de Fallot
<resultat>tétralogie de Fallot
We are using the MedExplore's techniques and Dilib's tools to improve
the indexing and to generate a server.

<meta name="DC.title" lang="fr"
content="Image : tétralogie de Fallot">
<meta name="DC.creator" content="Lethor JP">
<meta name="DC.subject" lang="fr"
content="Coeur, COEUR (RADIOGRAPHIE ), Poumon, POUMON (RADIOGRAPHIE ), Radiographie, Tétralogie de Fallot">
<meta name="DC.subject"
content=" Heart, Heart_Radiography, Lung, Lung_Radiography, Radiography, Tetralogy of Fallot">
<meta name="DC.subject" scheme="MESH"
content=" Heart, Lung, Radiography, Tetralogy of Fallot">
At the present time, we have not yet implemented a real utilisation of Metadata. The previous sample was just done for feasibility reasons. For the next future we are investigating two ways:
First we want to improve our retrieval performances. In other word we have to define some more relevant profile for each element of our kernel vocabulary. We plan to:Once these Metadata are identified, we have shown that we can use them to produce a good set of relevant Metadata to be used by an organisation working on a particular medical topic to see its documents well retrieved on the INTERNET.
An interesting feature of the Dublin Core deals with the generation of several set of contents with they various scheme. We will also work on the use and generation of some other specific elements, for instance, in a set of medical images, the implementation of the coverage element related to the ages of the patients.
25. UC Berkeley Digital Library Catalog
This document contains a description of the EdNA Directory Service and its use of metadata prepared for participants at the DC5 workshop in Helsinki, October 1997. It is structured based on the Dublin Core Project Summary Questionnaire v 1.0 970911 prepared for that workshop.
Contents
C. Metadata
D. Tools
E. Experiences from the planning and production phases
A. Project Description
1. Project Name
EdNA (Education Network Australia)
2. Institutional Affiliation
EdNA is a collaborative project between all Australian States and Territories and all sectors of education and training; schools, vocational education and training (which include TAFE - technical and further education), adult community education, and higher education (universities). More details are available by selecting the "About EdNA" option from the home page of the EdNA web site. The "EdNA Directory Service" refers to the services provided at the web site http://www.edna.edu.au. "EdNA" is a broader process of collaboration of which the EdNA Directory Service is one product.
The EdNA Directory Service is managed by the Open Learning Technology Corporation (OLTC: http://www.oltc.edu.au).
3. Contacts and email
Email addressed to webdesk@edna.edu.au will be distributed to a range of people associated with the management of EdNA.
The EdNA Directory Service is at: http://www.edna.edu.au
The
EdNA Metadata Standard is at:
http://www.edna.edu.au/edna/owa/info.getpage?sp=auto&pagecode=5210
B. Information System Summary
1. Domain of Content
Web based resources relevant to any sector of education and training in Australia, subject to various quality assurance procedures. Listed resources can be divided into two main types (although these are not treated differently in any fundamental way by the EdNA Directory Service).
A) Internal resources. Internet resources created by organisations involved in education and training in Australia. For example, a school home page, course material available online, research reports, course directories.
B) External resources. Internet resources created outside the education system but identified as being a relevant resource by some organisation within the EdNA structure. For example the " Dictionary of Gamilaraay/Kamilaroi" (an aboriginal language) or "The NASA Homepage".
In the initial identification of resources, the schools sector have concentrated on external resources and the vocational education and training and higher education sectors have concentrated on internal resources.
2. Projected Collection Size
There are currently over 3,700 individually identified URLs in the EdNA Database. However the EdNA search facility provides access to a much wider collection of resources because as well as indexing the actual pages in the Database, some URLs are tagged as sites or link pages and links on these pages are followed and indexed. As the EdNA Directory Service has not been formally launched yet, it is virtually impossible to predict the range of resources which might be identified over the next few years once EdNA is widely used throughout the Australian education and training systems.
3. Type of data, formats
The EdNA software allow for any URL type, but the overwhelming majority of resources identified are web pages, with a sprinkling of ftp: and gopher: resources.
C. Metadata
1. What are the intended functions of the metadata used?
The intended functions of metadata in EdNA are to:
Assist with user searches by allowing a more detailed specification of the type of material being sought (rather than the current search which is basically a free text search).
Weight search results so that words in metadata fields are given priority over words in the text of a document. (This should increase the chance of finding relevant resources even if the user does not explicitly search on a metadata field.)
Automatically allocate items to categories in the EdNA directory structure.
Assist with management of EdNA Directory content.
Provide general cataloguing information which can enable documents in the EdNA database to be searched by other catalogues (for example to allow education documents entered in EdNA to be searched by libraries).
2. Who is creating the metadata? What training and support are provided?
While a first draft of the EdNA Metadata Standard has been agreed, there has not been any substantial implementation of metadata to date as the tools to assist with its creation and entry are still being developed.
3. Which metadata scheme is used?
The EdNA Metadata Standard is based on Dublin Core with the addition of:
Some EdNA specific fields.
EdNA specific schemes for some DC and EdNA specific fields.
Documentation of existing cataloguing schemes and thesauruses used in Australian education and training for use as schemes within the DC.subject field.
The ultimate location of metadata about items in the EdNA Directory will be in the EdNA database. There are five major ways in which item information gets into the EdNA database:
Items are suggested by users and reviewed and approved by an organisation which is part of the EdNA administration system.
Some organisations are approved to directly create items in the EdNA database using secure web based forms.
Items are identified for searching by following links from selected sites.
Existing listings of Internet resources identified by other education organisation are transferred to the EdNA database by 'bulk upload' using a defined format for delimited text files.
Once agreed quality assurance procedures are in place, approved sites will be visited by the EdNA robot and items with EdNA metadata will be automatically indexed and added to (or updated in) the EdNA database.
For 'internal' resources (ie those created by education organisations in Australia), the intention is that item owners will use the EdNA metadata wizard currently under development to embed metadata in their HTML.
Where items are not owned within the education system, EdNA will still recognise basic Dublin Core metadata. The entry of items directly into the database via the bulk uploads or the administration system creates metadata which is not stored in the original documents.
5. Granularity of the description:
Resources in EdNA are indexed at the individual URL level.
6. Dublin Core strengths and weaknesses encountered in deployment
The use of Dublin Core has saved EdNA from having to create a metadata system from scratch and provides a range of additional benefits:
Resources created in the Australian education system with embedded metadata will be recognised and more easily found by other search engines which recognise Dublin Core fields.
It is anticipated that software tools developed to support Dublin Core will assist with the creation and management of metadata records.
7. Which DC fields are used/not used?
The following DC fields are not actively supported in the EdNA Metadata Standard, however if users create them they will be stored in the EdNA Database and will be searchable:
DC.relation
DC.contributor
DC.source
8. Have additional local fields been added?
The following additional fields are defined:
EDNA.entered: Date item was entered (used for management purposes)
EDNA.approver: Email of person or organisation approving the item for inclusion in EdNA.
EDNA.reassessment: Number of months until resource should be reassessed.
EDNA.userlevel: Typical level of user for which the content would be most appropriate.
EDNA.categories: Numbers representing categories in the EdNA Directory in which the resource is either suggested for inclusion, or in the case of approved organisations, is automatically included.
EDNA.indexlevel: How many levels should the EdNA search engine follow links from this page?
EDNA.indexsites: When the EdNA search engine follows links from this page (as controlled by the INDEXLEVEL field), how many web servers are to be accessed?
EDNA.review: A third party review of the resource.
NB: All these fields are metadata in the sense that they are information about resources. However some of these fields are generated by the management of resources within EdNA. Depending on the nature of the field and future policy decisions:
the contents of the fields may or may not be visible to users of EdNA.
the content of the field may sometimes be stored in Web documents or may only be stored in the EdNA server.
9. Which qualifiers are used/needed? At what level of detail?
It is intended to support the language qualifier but no work has been done on this to date. See other sections and the EdNA Metadata Standard for details of schemes supported.
10. Do you provide any subject description support?
Not at the moment but negotiations are underway with the owners of thesaurus used in education and training in Australia to make documentation of these available online to support the cataloguing of resources in EdNA.
D. Tools
1. Conversion tools
None are currently implemented but details of web resources are currently being imported into EdNA from a range of other databases ("bulk upload"). At the moment this includes just title and description plus EdNA categories. Future versions may include other metadata.
2. Metadata creator tools
A "metadata Wizard" is currently under development to assist users to generate EdNA metadata for inclusion in web pages they own. It will be announced on the meta2 list when it is available.
3. Automatic metadata extraction tools
Details not available.
4. Discovery tools (directory/browsing type)
None.
5. Indexing and search tools
Not developed yet.
6. Other tools
None.
7. Help system, tutorials etc.
None yet except for instructions in the EdNA Metadata Standard.
8. Is the metadata indexed, browsable, searchable, through a local or external service?
The ways in which Dublin Core and EdNA metadata will be used in searching is still being defined. The field EDNA.categories will be used to allocate items to the EdNA Directory allowing browsing of items.
E. Experiences from the planning and production phases
1. Problems and accomplishments from both phases
2. Estimates of the costs and benefits for the creation or use of metadata
No cost benefit analysis has been conducted.
3. Expectation of service improvement through metadata usage
As EdNA is a decentralised system where resources are identified and managed by a wide range of organisations, the use of metadata created and maintained in a decentralised ways is seen as the only practical way of cataloguing resources.
Disclaimer
This brief description of EdNA and it use of metadata was developed by Jack Gilding specifically for the benefit of attendees at the DC5 workshop in Helsinki. While other stakeholders in the EdNA project have been requested to comment, due to time constraints in this process it should not be regarded as an 'official' document.
URL: http://www.otfe.vic.gov.au/edna/dc5edna.htm
Last updated: 2 October 1997.
Contact: Jack Gilding
j.gilding@c031.aone.net.au
Reported by: Ulrik Andersen, ua@si.dk, Danish
State Information Service
In Denmark we have implemented most of the Dublin Core metadata elements in the Government WWW-publications published from 1997.
In order to make public information material accessible on WWW and to make the distribution of public information material more efficient, all new publications issued by Danish ministries, government offices and agencies shall be published on WWW, in parallel with the printed editions, from 1997.
To insure a common format for the WWW-publications, The Danish Ministry of Research and Information Technology and The Danish State Information Service have worked out a standard describing how the publications shall be encoded. Technically, the standard is based on the official HTML 3.2 specifications.
The standard, called Netpublikationer, contains a series of requirements and recommendations. The standard is available at http://www.fsk.dk/fsk/publ/online-pub/.
One of the requirements is that the publications shall contain metadata of a common scheme. This scheme is based on the Dublin Core elements, but is extended in order to insure the additional metainformation required in the database in The Danish State Information Service. The database contains information about all Danish government publications - in print and on WWW.
The DC elements we have implemented are:
1. Title
2. Creator
3. Subject
4. Description
5. Publisher
6. Contributors
7. Date
9. Format
10. Identifier
11. Source
12. Language
13. Relation
Netpublikationer is not available in English. But chapter 4.2 gives an impression of the metadata scheme and syntax used.
In order to make it as easy - and cheap - as possible for the civil servants and the web-companies to produce WWW-publications, we are developing an application (freeware), which can convert "raw" and incorrect HTML-documents into documents, that are formatted correct. This conversion includes the creating of a correct set of metadata. The software is expected to be released in November 1997.
Netpublikationer is a part of the overall government policy on the use of information technology in modern society. These matters can be studied in a number of publications (in English) at The Danish Ministry of Researchand Information Technology.
The PANDORA Project
Reported by Bemal Rajapatirana
For more information, see: http://www.nla.gov.au/policy/pandje97.html.
The National Library of Australia (NLA) is developing an electronic archive called PANDORA to provide long term access to significant Australian online publications. A dedicated online archive will be established in the first instance. Publications contained in it will be updated on an ongoing basis and converted to new formats as technology demands.
The PANDORA project has, as its main objective, to:
capture, archive and provide long term access to significant Australian online publications selected for national preservation.
As part of this the PANDORA project will implement a system of describing archived documents based on the Dublin Core attributes, to make online searching for information more efficient.
There are four prongs in the PANDORA project's approach to the diverse subject of metadata that will offer long-term, integrated searching across all of the Library's search systems.
The PANDORA project plans to encourage publishers to provide Dublin Core metadata, either with or within their publications.
For more information on this approach see http://www.ariadne.ac.uk/issue11/metadata/, which gives examples of metadata registries that might be affected by the use of ISO11179.
For further information contact:
Debbie Campbell
PANDORA project officer
National Library of Australia
PARKES ACT 2600
phone: 06 262 1622
fax: 06 257 1703
dcampbel@nla.gov.au
29. Scout Report Signpost
A. Project Description:
Project Name: Scout Report Signpost Institutional Affiliation: Internet Scout Project/UW-Madison Computer Sciences DepartmentB. Information System Summary:Contact Person (persons) and email:
Amy Tracy Wells (awells@cs.wisc.edu)
Aimee Glassel (aglassel@cs.wisc.edu)URL(s) if publically available: http://www.signpost.org/signpost/index.html
For more information, please contact us:
Amy Tracy Wells (awells@cs.wisc.edu
Aimee Glassel (aglassel@cs.wisc.edu)
GEM
Reported by: Nancy A. Morgan, GEM Coordinator
September 18, 1997
A. Project Description: The Gateway to Educational Materials (GEM) is an initiative of the U.S. Department of Education and National Library of Education (NLE). NLE's goal is to improve the organization and accessibility of the substantial, but uncataloged, collections of educational materials which are already available on various federal, state, university, non-profit, and commercial Internet sites. In general, these valuable resources are difficult for most teachers to find in an efficient, effective manner. The goal of the Gateway to Educational Materials (GEM) project is to solve this resource discovery problem. For more information, see Proposal: DC-5 GEM Project Presentation
Project Name: Gateway to Educational Materials (GEM)
Institutional Affiliation: U.S. Department of Education, ERIC Clearinghouse on Information & Technology
Contact Person (persons) and email: Nancy A. Morgan, GEM Coordinator
nmorgan@ericir.syr.edu
URL(s) if publically available: http://geminfo.org/
B. Information System Summary:
1. Domain of Content: educational materials
2. Projected Collection Size (number of records)
First year: 10,000
Third year: 50,000
3. Type of data, formats: HTML files
Resource Types
C. Metadata:
1. What are the intended functions of the metadata used?
_X_ Management of the collection
_X_ Enhanced searchability
_X_ Interoperability
_X_ Value-added services
__ Other: ____________________________________________
2. Who is creating the metadata? A consortium of collection holders of educational materials, located on federal, state, university, non-profit, and commercial Internet sites.
What training and support are provided? Cataloging instructions, online help, listservs and technical support. The first group training session was help on September 18, 1997 in Washington, D.C.
3. Which metadata scheme is used?
4. Encoding strategy:
__ external database
_X_ auxiliary HTML files (not embedded in resource)
_X_ metadata embedded in resource
5. Granularity of the description:
_X_ individual documents
__ groups of documents
__ collections
6. Dublin Core strengths and weaknesses encountered in deployment. One "weakness" was the specificity of domain. GEM added 8 elements to the Dublin Core set and developed controlled vocabulary to improve description of education resources. One strength was that we were able to map to a large database of Marc records at Eisenhower National Clearinghouse.
7. Which DC fields are used/not used? All the DC fields were used.
8. Have additional local fields been added? Yes, 8 fields were added to better describe educational resources.
9. Which qualifiers are used/needed?, at what level of detail?
10. Do you provide any subject description support
(uncontrolled term lists, thesauri, classification systems)?
Subject Controlled Vocabulary
See also Cataloging Instructions: Subject
for use of the NICEM and ERIC Thesauri.
D. Tools (provide URLs if freely available):
1. Conversion tools (eg., MARC -> DC)
2. Metadata creator toolsGEMCat: a user-friendly, stand-alone program for cataloging Internet resources using the GEM element set and controlled vocabularies. Current version is for Windows 95 and Windows NT platforms. Under development is platform-independent web-based version of GEMCat.
3. Automatic metadata extraction tools
4. Discovery tools (directory/browsing type) Prototype search engines: PLWeb (Full-Text/Fielded Search Engine) and a Relational Database Management System Interface
5. Indexing and search tools Prototype search engines: PLWeb (Full-Text/Fielded Search Engine) and a Relational Database Management System Interface
6. Other tools: Metadata Harvesting Script: extracts GEM metadata from html-tagged documents located on UNIX servers. The harvest module is implemented in Perl. Currently under development is a cross-platform version.
7. Help system, tutorials etc. Cataloging instructions, online help, listservs and technical support. The first group training session was help on September 18, 1997 in Washington, D.C.
8. Is the metadata indexed, browsable, searchable, through a local or external service? Yes. See prototypes at: Prototype search engines: PLWeb (Full-Text/Fielded Search Engine) and a Relational Database Management System Interface. We envision competing interfaces to local and union catalogs as consortium members implement GEM.
E. Experiences from the planning and production phases:
1. Describe problems and accomplishments from both phases
2. Do you estimates of the costs and benefits for the creation or use of metadata?
3. What measures (or expectations) of service improvement through metadata usage do you have?
WWW.NIC.FUNET.FI Metadata interface
Institutional Affiliation: Center for Scientific Computing/Finnish University and Research Network
Contact Person (persons) and email: Harri K. Salminen
hks@nic.funet.fi
URL(s) if publically available: http://www.nic.funet.fi/
All kinds of freely distributable objects in the NIC.FUNET.FI archive.
2. Projected Collection Size (number of records)
First year:
50 000
Third year:
Over million
3. Type of data, formats
Major part of the collection consists currently of archived computer programs and text documents but the amount of all kinds of multimedia objects is growing. Formats include tar.gz, .zip, .exe, .sea, .hqx, .ps,.txt, .html, .gif, .jpeg, .mpeg, rtp etc. Most of them are listed in the ftp://ftp.funet.fi/README.FILETYPES
2. Who is creating the metadata? What training and support are provided?
Mostly volunteer archive admins and software authors in various fields as well as the CSC staff. Tools for semiautomatic creation and conversion of metadata are being developed so that for example author provided Linux Software Maps can be utilized.
3. Which metadata scheme is used?
Dublin Core with possible local extensions
4. Encoding strategy:
The latter method will be used mostly for HTML pages maintained outside the metadata directory database.
5. Granularity of the description:
The goal is to describe individual objects but since we have over million files having metadata for groups and collections (directories and directory trees) is an important as well.
6. Dublin Core strengths and weaknesses encountered in deployment
DC is flexible and allows for multiple instances and many qualifiers but doesn't easily support hierarchical definitions and inheritance so information needs to be duplicated or described outside the DC format. E.g. We have currently implemented directory listings in another local format.
The format doesn't have currently support for special fields needed for file and multimedia objects like contents of an container (.zip, .tar, .rtp stream etc.), file size, requirements like os or program needed (mime type isn't enough), to use the file and amount of memory and disk space used after unpacked. For real time files there's need for the length in time, exact format descriptions (e.g. sdp) etc. possibly a url to external format dependent description might be enough.
These belong to the currently unresolved issues.
7. Which DC fields are used/not used?
We allow the use of all predefined DC fields
8. have additional local fields been added?
Not yet in DC. Our internal metadata format supports in additition a html version of the qualifiers that allows for images etc. and also the date and source of the last update for each field since there can be multiple sources for the same field.
9. Which qualifiers are used/needed?, at what level of detail?
In DC.subject, we plan to use scheme=foldoc (or x-foldoc) and we see the need for most of the other qualifiers as well as some new ones (see question 6). without qualifiers the dc format would be nearly useless to us.
10. Do you provide any subject description support (uncontrolled term lists, thesauri, classification systems)?
We have just started to provide the free online computing dictionary (foldoc) as a free controlled vocabulary suited to describe computer programs. we also have a taxonomical tree used currently to describe butterflies, plants etc. that is extended as needed. it has it's own www interface that is not yet dc compatible however.
If we can get from somewhere suitable free classification systems that suit our needs we'd like to offer that in our user interface. most seem to be copyrighted and without uptodate and granular enough classification for our needs.
Currently the only feasible way to classify computer files seems to be inventing your own classification scheme or just use such ad hoc hieararchies as is used in the directory trees.
lsm -> dc being developed
2. Metadata creator tools
Web form is to be done. Some experiments with the DC creator have been done. Possibly a simple linemode tool is needed as well.
3. Automatic metadata extraction tools
Local perl programs
4. Discovery tools (directory/browsing type)
nicdir module for the Apache server that produces directory listings and also metadata descriptions for the local files. It uses an internal metadb shadow tree with internal metadata formats that contain the DC information as well as data needed for efficiently creating the HTML output with metadata on the fly in various formats. It requires the use of our local ls command and ftp server as well. The old non meta ftpd and ls are available at ftp://ftp.funet.fi/pub/unix/local/ and we foresee publishing the metadata versions in the future as well. It has been implemented for Digital Unix 4.0.
5. Indexing and search tools
We currently use local locate and archie databases but for indexing the metadata we plan to use the Nordic Web Index server available in house (http://nwi.funet.fi).
6. Other tools
A tool to create various listings and a public locate database from the output of our special ls command. A need for producing timesorted lists of URLs recommended for indexing by robots is foreseen. Current robots will do lots of unnecessary work while trying to browse the multiple views to the same information or they will just give up...
7. Help system, tutorials etc.
Some documentation on the design, formats and files used.
Rest is mostly documented as C and perl source code. Not yet published, but will be with the source code.
8. Is the metadata indexed, browsable, searchable, through a local or external service?
Soon, at www.nic.funet.fi (current test version in port 8888).
There was multiple versions of the DC format online and it was sometimes difficult to find out if documents referred to the current or some other proposed format. Also the current DC seems to concentrate on describing traditional literary resources and HTML but not other kinds of media like computer programs and multimedia objects. Even the BibTeX derived type list didn't have relevant types for many objects but I hope it's just temporary. Most of the DC fields seem to be usable in our context although it needs some guessing which one to use. E.g. how to best describe the mirrors and mirroring frequency of files or multiple versions of the same object. We also haven't yet seen any freely available tools like DC parsers, checkers and converters which might speed the implementation.
2. Do you have estimates of the costs and benefits for the creation or use of metadata?
We don't have cost estimates since most of it will be created part time by volunteers, authors and programs.
Main benefit should be easier searching and browsing of the over million files stored in the archive. Older methods allowed searches only on filenames which aren't in many cases descriptive enough to find the relevant files.
3. what measures (or expectations) of service improvement through metadata usage do you have?
We expect it to make finding the relevant files from our archive easier for the non-experienced WWW user who doesn't know the the right directory or file to look at making it less necessary to go overseas looking for the same files. System should also help in looking at different locations of the same files since we have large number of mirrors in addition to our local directory hierarchy we have multiple possible directory trees for the same category of files, bit like having multiple sublibraries inside a library with different classification system in each.
A common metadata format should make it also easier to offer tools and motivation for maintaining the metadata. Also people might not need to download files to just figure out if they really want it or not making the usage more efficient and users happier.
Dublin Core Metadata Use in Environment Australia
Institutional Affiliation: Environment Australia
Contact Person (persons) and email: Arthur D. Chapman
The project description is available at:
http://www.environment.gov.au/www-standards/EA_DC.html
arthur@erin.gov.au