of Finland Bulletin 2014
The National Library of Finland Bulletin 2014


Päivi Piispa

Making the most of digital materials: Interview with Professor Timo Honkela

A new professorship focuses on developing the methods and applications of machine learning and text mining.

Timo Honkela, who took up office as the University of Helsinki's professor of research into digital information at the beginning of this year, is clearly excited about his new position, which allows him to combine information technology with its users. Honkela studies how digital materials can better serve as resources for researchers and how they can more widely benefit society as a whole.

The new professorship is unique in Finland. It strengthens the language technology expertise of the University of Helsinki's Department of Modern Languages and supports the National Library of Finland's Centre for Preservation and Digitisation, which has digitised millions of pages of old materials to date.

The topic is relevant because more and more of our materials and conversations have become digital. The digital dimension has become embedded in our society on many levels.

"Digital resources will provide a wide range of opportunities to ease people's daily lives and the operations of organisations," Honkela says.

Professor Honkela also believes that the increasingly comprehensive computer analysis of digital materials will help usher in a new era of excellence in research in the humanities and social sciences, disciplines that can especially benefit from a technology-aided analysis of complex phenomena.

"A large number of socially relevant issues are associated with the humanities and social sciences. Humankind has already learned how to fly to the Moon and make paper, but we still don't know how best to organise our societies," Honkela says.

Language learning is a big challenge for a computer

Honkela speaks of technologies, such as text or data mining and machine learning, that enable computers to analyse digital information more comprehensively and in greater detail than people could. It is quite simply impossible to manually process millions of digital documents in a short period of time. Luckily, we can increasingly delegate this task to computers.

Machine learning will lead to the addition of more smart components to information systems. These components can learn things and organise material based on its features. For example, machine-aided translation is based more and more on machine learning. Google Translate, which studies indicate is the world's best machine translation system for many language pairs, is based on billions of documents, which have contributed to the system's ability to translate by finding patterns in an enormous pool of data.

The difference from what went before is astounding: all commands used to be pre-programmed on a computer, which meant that the analysis was based on prior human interpretation. In statistical machine learning, however, machines can be taught to establish relationships between things and find meaning in the data they analyse.

"A machine can determine that a dog and a cat are closer to each other in meaning than to a house or a library. The machine develops a semantic independence and is able to establish relationships between things on its own as long as textual material is available," Honkela explains.

The challenge in text mining and machine learning is the complexity of language as a system.

"Language learning is a big challenge for a computer. The majority of the objects that engineers have traditionally modelled are child's play compared to language and language learning," Honkela laughs.

But we should embrace this challenge, for technologies can provide considerable benefits in our use of materials, for example, by enabling searches across languages. A Finn with limited Swedish skills can search for information even if he or she cannot formulate the keywords in Swedish.

"Old Finnish texts from 200 or so years ago could also be translated into contemporary Finnish. We could also compare the development of people's conceptions over the years by having a machine process millions of newspapers published in different countries across the decades and centuries," Honkela proposes.

An emotional interpreter and other applications

Much of the conversation on data mining often focuses on negative issues, such as the stories of surveillance by the US National Security Agency (NSA). But Honkela is an optimist and believes that letting machines interpret our digital conversations would be more useful than harmful.

He envisions a kind of interpreter that could prevent misunderstandings, particularly in expert communication. Experts of all fields often have difficulty writing or talking about their field in a way that is understandable to non-specialists. With the help of text mining, an expert's writings could be analysed and made understandable to different target groups. The machine would then generate an alert if the text contained passages that were difficult to understand for the intended reader.

These methods could also serve to improve the operations of organisations. Honkela offers a typical example: the top level of an organisation draws up a strategy, but the way it is formulated disconnects it from the everyday lives and language of the people at the grassroots level. A computer could draw the management's attention to this issue.

"A computer can interpret not only the content of communication, but also the emotional dimension. Studies suggest that people's decisions are based primarily on emotions and only secondarily on explicit conclusions. That's why companies have increasingly begun to analyse customer feedback from an emotional perspective," Honkela explains.

An interpreter of emotions could also prove useful in everyday communications. Many of us are guilty of sending inadvertently inappropriate emails. A computer could alert us to the content of our message and suggest that we revise the email before sending it.

Analysing emotions from text can also help with social analysis.

"Reader discussions on newspaper websites and discussion forums could be studied to identify the issues of primary concern to the public."

Ambassador of humane information technology

Timo Honkela has clearly found his niche. His entire career has focused on applying information technology to meet people's needs and to analyse language. His enthusiasm for this field was born during his studies. The University of Oulu was ahead of its time in teaching students about humane computing in the 1980s, when it combined computer science with ethics, epistemology, economics and psychology. "The studies were not just about machines, but about trying to understand individuals and communities as a system," Honkela says. After graduating, Honkela worked in a project on language machines hosted by the Finnish Innovation Fund Sitra and understood as early as the 1980s that an information system that understands language cannot be based solely on programmed commands. Such a system requires machine learning, for the number of rules needed to understand language is vast, and the interpretation of language is a subtle and complex process. After the Sitra project, Honkela worked at the VTT Technical Research Centre of Finland, where he investigated how neural networks can help process language. Then, after transferring to the Helsinki University of Technology and the research group of Teuvo Kohonen, who at the time was Academy Professor but now holds the honorary title of Academician, Honkela defended his doctoral dissertation on the topic. Honkela's diverse career also includes a professorship at the Media Laboratory of the University of Art and Design Helsinki, the post of CEO at an IT company based on his dissertation, and a fixed-term professorship in computer science as well as a research director position at Aalto University. A common thread running through his career has been the combination of language, socio-cognitive systems and information technology.

Päivi Piispa is a communications professional.

Who? Timo Honkela, Professor of Research into Digital Information, University of Helsinki
(since 1 January 2014)
- The professorship is based at the Department of Modern Languages, Faculty of Arts, University of Helsinki, and the Mikkeli-based Centre for Preservation and Digitisation of the National Library of Finland.
- The professor works in close cooperation with the University of Helsinki's Department of Computer Science, the National Library of Finland, the Mikkeli University Consortium and the Mikkeli University of Applied Sciences.

Professor Timo Honkela
Professor Timo Honkela


Print this article (PDF) Print entire issue (PDF)