using VR to explore corpora

Using Virtual Reality, large corpuses can be analysed in new, smart ways

Researchers in computer science and linguistics presented an interdisciplinary VR project for exploring text and metadata in Nordic tweets at the international linguistic conference ICAME.

In the beginning of June, the 39th ICAME conference took place in Tampere in Finland. At ICAME, Aris Alissandrakis and Nico Reski, researchers in computer science, presented their ongoing research efforts together with linguists Mikko Laitinen, Jukka Tyrkkö and Magnus Levin (from the department of Languages). Their multidisciplinary research aims to explore multilingual corpora using an interactive Virtual Reality, VR, environment.

In recent years, data visualization has become a major area in Digital Humanities research, and the same holds true in linguistics. The rapidly increasing size of corpora, the emergence of dynamic real-time streams (Twitter etc) , and the availability of complex and enriched metadata have made it increasingly important to facilitate new and innovative approaches to presenting and exploring primary data.

At the same time, Virtual Reality (VR), a technology that immerses the user in fully digital environments, is one of the currently hot topics in the IT and technology fields. While most known as an entertainment tool for consumers, it may also be used to investigate scientific scenarios.

At ICAME, the Linnaeus University researchers presented a VR prototype that aimed to allow an easy overview of textual as well as metadata features in the Nordic Tweet Stream (NTS) corpus. Aris and Nico conducted two demo sessions where 26 conference participants wore a VR headset and explored the NTS data using a VR controller.

"The demo generated many positive impressions, and led to interesting conversations throughout the conference about the potential of such tools for research and education purposes", says Aris Alissandrakis, senior lecturer of computer science.

ICAME is short for International Computer Archive of Modern and Medieval English and is an international group of linguists and data scientists working to digitise English texts based on existing texts. This year's theme was "Corpus Linguistics and Changing Society".

More information