
Juha Hakala, Katri Seppälä, Eero Hyvönen
Finnish thesauri and ontologies on the web – results from the national FinnONTO initiative

Introduction
Research libraries spend a considerable amount of professional staff resources on content analysis. The preferred tools for this work are classifications and controlled vocabularies.
Like most of its peers, Finnish research libraries initially preferred to use classifications. In some countries, libraries opted to develop and use national classification systems; in Sweden for instance, the SAB classification was used in most libraries (see http://sv.wikipedia.org/wiki/SAB:s_klassifikationssystem). In Finland, research libraries relied on an abridged Finnish version of UDC. As UDC was seen as the system most suited for research libraries, public libraries developed a Dewey-based classification system which is still in use in every public library except the Helsinki city public library, which uses its own system.
Although professionals were able to use classifications both for the description and retrieval of information resources, librarians involved with customer services realized early on that ordinary library users were not able to utilize classifications. Therefore, even though classifications do have benefits, such as language independence in a multilingual country, in the 1980s the decision was taken to develop the Finnish General Thesaurus (Yleinen suomalainen asiasanasto, YSA; see http://onki.fi/en/browser/overview/ysa). The National Library was responsible for this work, which was completed in 1987. Since then the Library has maintained the system in close co-operation with the libraries that use the system.
Maintenance of the Finnish thesauri
In 2010 the Finnish General Thesaurus contained approximately 30,000 terms, of which 5500 are geographical names. The thesaurus has grown by an average of 1000 terms a year (200 geographical names), and there have also been many changes – notes and term specifications have been added, and occasionally the terms themselves are changed. Since Finland is a bilingual country, there is also a Swedish translation of the thesaurus. Allärs (Allmän thesaurus) is maintained by the Åbo Akademi library; the work is funded by the National Library and is based on a contract between the two organizations.
The YSA is a general thesaurus, so it is not ideally suited for detailed analysis of scientific content. Research libraries have therefore developed topical thesauri for this purpose. They are based on the YSA, and a large proportion of their terms are derived from the general thesaurus. The terms specific specialized thesauri are those needed in a particular area of study (say, forestry or agriculture). There are some specialized thesauri which are no longer maintained, including social sciences and information science. Their terms have been integrated into the YSA, and the intention is to maintain the same level of detail in the future thesauri.
The National Library also coordinates the development of the specialized thesauri. Such cooperation is important, since the people maintaining specialized thesauri often propose new terms to the YSA as well. The development of controlled vocabularies requires familiarity with the scientific literature, and no single person can cover all areas of knowledge. Thus the National Library sees broad co-operation as a vital component of vocabulary development and maintenance.
The expertise of subject specialists in Finland's libraries has enabled the National Library to build a general thesaurus which defines a large number of terms and the relationships between them (such as narrower, broader and related). This helps people – library staff and patrons alike – to describe and search for information. However, as a thesaurus, the YSA does not specify terms and their relationships with sufficient detail to make the vocabulary computer understandable in addition to computer readable. For this reason, the YSA was not suitable as the basis for the Finnish Semantic Web and had to be restructured into the Finnish General Ontology YSO.
Building and updating of the Finnish General Upper Ontology
The principles and general framework of the Finnish General Upper Ontology were created by the Semantic Computing Research Group in the FinnONTO project. When the work began in 2004, the whole research group participated in the modeling of the ontology. Models like DOLCE, by Nicola Guarino, and WordNet were used as sources of inspiration for this work. Since the research group chose to build an ontology based of the terms already in use in the Finnish General Thesaurus, the content of the thesaurus also greatly influenced the work. In order to maintain interoperability with the YSA (and the materials indexed using it), all terms included in the thesaurus had to be placed somewhere in the hierarchy of the ontology, or otherwise mapped to it.
The practical goal of this work was to construct a simple, light-weight ontology based on a subclass hierarchy. A typical YSO hierarchy branch looks like this:
Each term / concept is identified by a URI; for instance the URI for the concept "cars" is http://www.yso.fi/onto/yso/p1223. Indexing in the YSO is based on URIs and not on human readable labels, which makes the ontology language neutral, i.e. multilingual. Based on the subclass hierarchy and other semantic relations, ontologies enable reasoning and the automatic semantic enriching of data. These kind of knowledge structures have been found useful in various semantic web applications.
Transforming a thesaurus into an ontology involves many tasks. Firstly, new intermediate concepts and semantic relations between the concepts were introduced in order to change the fragmented thesaurus into a systematic, fully connected hierarchy. Secondly, the broader/narrower term relations used in thesauri had to be refined in order to make the distinction between subclass-of and part-of relationships. Thirdly, the conceptual ambiguities of the terms had to be clarified. Many terms in the YSA have a broad meaning, making it impossible to place the concept in one branch of the ontology. In such cases, the term was typically split into several concepts that were placed in different parts of the ontology. For example, the YSA concept "child" can, for instance, refer to a family relationship or to an age group, and these meanings can now be found in different parts of the YSO ontology, with a mapping to the original ambiguous meaning used in the YSA. After any change, the relationships between the split or modified concept and other concepts had to be re-checked.
A major method used in the ontology work is concept analysis, which is described, for example, in the standard ISO 704 Terminology – Principles and methods. In concept analysis, the characteristics of the concept are analyzed in order to find out the essential and delimiting characteristics of the concept. These in turn help to identify the nearest generic superordinate concept and the concepts to which there is a relevant associative relation. In terminology work, for which the method has been created, the end result of concept analysis is usually a written definition complemented by notes, but in light-weight ontologies only the most important concept relationships are documented in the ontology.
It is of course possible to study a concept from several perspectives. This makes ontology work challenging, especially when building a general ontology (covering several subject fields) which can be used for several different purposes. In such a case, the person responsible for building the ontology has to find out what is the universal, widely accepted meaning of the concept and be ready for compromises which in many cases are imperfect, but can still help to create a solution that can be accepted by several parties.
When creating the principles of the Finnish General Upper Ontology, ontology expertise was crucial. In the long run, when the ontology is updated, it is important to ensure that expertise in concept analysis and ontology is available and that the work of the experts is co-ordinated. This can be done by using a network or working group consisting of experts from relevant subject fields. When experts are consulted and decisions are made together, major errors can be prevented and credibility gained among the users. This working model also reduces the cost of ontology work, as all parties can concentrate on that part of the work they know best.
From the point of view of the expertise required, there is not much difference between thesaurus and ontology maintenance. Both succeed or fail in dependence on the skills of the experts who develop the systems. However, unlike thesauri, ontologies require a solid IT foundation from which these systems can be embedded in the Semantic Web. This requires skills that few libraries currently have. In Finland, FinnONTO has removed this bottleneck.
The aims of FinnONTO
The first phase of the FinnONTO initiative began in 2003. The current phase, phase 3, will be completed in early 2012. All the phases of the initiative have been financed by TEKES, the Finnish Funding Agency for Technology and Innovation (http://www.tekes.fi/en/), and by dozens of Finnish public-sector organizations and companies, while the Semantic Computing Research Group, joint venture between Aalto University and the University of Helsinki, (http://www.seco.tkk.fi/) has been mainly responsible for the research and development work. The National Library has been one of the key partners, due to its YSA responsibility, but many other organizations have also been involved in the initiative, which is well known not only in Finland but also internationally. The project has published a large number of research papers, ranging from short articles to academic dissertations (see http://www.seco.tkk.fi/publications/).
The vision of the FinnONTO-project is to create a national semantic web infrastructure in Finland based on ontologies. This vision includes the following goals regarding the publishing of ontologies:
1. To enable the development of intelligent web applications. The shift from thesauri to ontologies is a key enabler for this. The crucial benefit of ontologies is that they specify concepts more accurately for computers. This is crucial, for instance, in semantic searches and when recommending applications. Thesauri do specify the relationships between semantic terms, too, but they require more human interpretation. Thus, ontologies are needed for the establishment of the machine processable Semantic Web, also known as the Web of Data.
2. To make data on the web interoperable. Using shared reference ontologies at the content descriptions level enables interoperability between the contents of different organizations and the public.
3. To make the national ontologies freely available on the Web. A free, centralized Web service, maintained by public funding, will foster interoperability through the usage of the ontologies and save work and money at a national level.
4. Open data, standards, and licensing. Ontologies (and the thesauri they are based on) should be published as open data, based on well known standards and should use business-friendly licensing (such as an MIT License). Moreover, applications with which controlled vocabularies can be maintained and accessed should be made available as open source, since this will foster the usage of these systems.
5. To foster and enrich both public and private sector services via the utilization of common infrastructure.
FinnONTO achievements
No project is important just because it has ambitious aims. FinnONTO, however, has managed to achieve its goals, and it therefore deserves its status as the flagship Semantic Web initiative in Finland.
The project has developed the Finnish General Upper Ontology, the YSO, its Swedish translation, ALLSO, and a draft of an English version of the YSO. Moreover, this core set (ca 25,000 concepts) has been extended in close co-operation with partnering organizations to include 15 other specialized thesaurus-based ontologies. These domain specific ontologies are now aligned with the YSO and through it linked to each other. The result is what is known as the Finnish Holistic Collaborative Ontology, KOKO, depicted in figure 1. KOKO contains ca. 90,000 concepts at the moment, about three times more than the YSO.
Figure 1. KOKO ontology consists of the upper ontology, YSO, and 15 special ontologies, such as AFO (forestry and acriculture), MAO (museums), TAO (applied art), and VALO (photography).
In addition to general concept ontologies based on thesauri, the project has developed also other kind of ontologies/vocabularies for places (both historical and contemporary), authorities (people and organizations), historical events, and biological names (taxonomies). These vocabularies have been publishised on the web in the National Ontology Service (ONKI) for use by human beings and computer systems. The popularity of the ONKI service has increased steadily, with the number of registered domains increasing by 50 per cent from January 2010 to January 2011. The users of the service represent both the public and private sector. The ONKI service is maintained at the Aalto University in a living lab environment, and at present there are some 14,000 individual human users every month, in addition to about 400 registered machine users (through web services).
A project such as FinnONTO can do no more than enable and foster the shift towards using ontologies, and many organizations still prefer thesauri over ontologies. There are many possible explanations for this, including tradition, the lack of support for ontologies in current information systems, lack of understanding of the benefits of using ontologies, or quite simply a belief that in spite of all the Semantic Web –related hype, thesauri are still superior to ontologies. As the deep concept hierarchies in ontologies are important and useful for computers but less so for human beings, some critics have failed to understand their usefulness.
Thus in Finland, as in other countries, the shift from thesauri to ontologies is still a work in progress. In some areas of knowledge the process has advanced relatively quickly and painlessly; the old specialized thesauri have already been replaced by ontologies which are, without exception, based on the earlier thesauri. However, the National Library will continue developing the YSA for the time being, since the thesaurus is very widely used for content analysis, and since the YSO is dependent on the YSA: new YSO terms are and will be derived from YSA.
Co-maintenance of the YSA and the YSO is not much of a burden at a practical level. Since the two vocabularies are almost identical at a term / concept level, there is not much additional work to be done. Furthermore, the switch to YSO usage should be relatively painless, once organizations are ready for it. The key issue the users need to consider is the support for the Semantic Web that the ontologies will give – if publishing open linked data is a priority for an organization, then ontology usage should be seriously considered.
Availability of national ontologies in the Internet. As of May 2011, the ONKI ontology library service (see http://onki.fi/) will make 62 vocabularies freely available to the users in Finland and abroad. In addition, the service includes various knowledge structures that cannot be shown or published openly due to the licensing conditions of the original publishers (e.g. the ULAN, TGN, and AAT vocabularies of the Getty Foundation). The ONKI supports the easy publication of light-weight ontologies (either in RDFS or in OWL format) and vocabularies in the W3C standard SKOS format. All the ontologies and thesauri can be browsed using software and a graphical user interface that has been built during the project. Feedback from users has been a major driving force for development. There are also APIs via which other programs can use the ONKI as web service.
To foster and enrich services via the utilization of a common infrastructure. FinnONTO has been instrumental in the development of various information services based on ontologies and open linked data. The project has not only developed ontologies and ontology services but has also demonstrated their usage with practical applications on the Semantic Web. Examples of this include the semantic portal MuseumFinland (http://www.seco.tkk.fi/applications/museumfinland/) and HealthFinland (http://www.seco.tkk.fi/applications/tervesuomi/) – both of which won the prestigious international Semantic Web Challenge Award from the research community – and the massive cultural heritage portal and content service "CultureSampo – Finnish Culture on the Semantic Web 2.0" (http://www.seco.tkk.fi/applications/kulttuurisampo/) which is based on the whole range of KOKO-ontologies interlinked with various international vocabularies. A common denominator between many of these initiatives is the integration of heterogeneous (meta)data arriving from multiple sources. For the time being, we do not know how these services will be maintained after the project has been completed. Nevertheless, they have brought a significant part of Finnish culture onto the Semantic Web, and it would be a mistake to let them wither away.
Future
FinnONTO 2.0 will continue until Spring 2012. However, the project team, the National Library and the Ministry of Education are already planning how things will continue. The common aim is to guarantee that both the national ontologies and their technical basis (the ONKI service and its server environment) will be accessible in the future. Since the vocabularies themselves are available for free, their costs must be covered centrally.
Traditionally, the National Library of Finland has worked with other libraries to maintain its thesauri. He extension of the service to ontologies, and the resulting Semantic Web connection, mean that the number of potential partners has grown exponentially. These partners include not only the entire public sector – as requested by the information architecture – but also, for instance, Semantic Web researchers and companies utilizing Semantic Web technologies. The co-ordination of the development work and finding all the interested parties has become a challenging task, but it is most likely one that the library will relish. The Semantic Web has given new life to controlled vocabularies and created new possibilities, the like of which the National Library could not even dream of in the past. However, developing YSO and creating a solid basis for the Finnish Semantic Web would have been more difficult, or even impossible, without the libraries' decision, more than 25 years ago to create the Finnish General Thesaurus.
Juha Hakala, The National Library of Finland
Katri Seppälä, The Finnish Terminology Centre TSK
Eero Hyvönen, Aalto University and University of Helsinki
More information:
FinnONTO-project home page: http://www.seco.tkk.fi/projects/finnonto/
|

|

HIGHLIGHTS
FinnONTO-project home page
|