This project was concluded in 2017.
Andreas Kerren (Lnu), Carita Paradis (Lund University) and Magnus Sahlgren (Gavagai AB)
Other researchers/Doctoral students
Maria Skeppstedt, Vasiliki Simaki, Kostiantyn Kucher
Swedish Science Council (Vetenskapsrådet)
Computer Science (Department of Computer Science, Faculty of Technology)
More about the project
The full name of this interdisciplinary project is "Advances in the description and explanation of stance in discourse using visual and computational text analytics", abbreviated StaViCTA.
The project will identify how people express stance and positions – that is, attitudes, feelings, perspectives, certainty, doubt and trust – in digitized web based media, news and web sites, micro-blogs like Twitter, social media like Facebook, and electronic forums.
Stancetaking is an important factor in the communicative dynamics. It conveys not only factual information, but also plays a crucial role in social interaction. There is a need for both a deeper theoretical understanding of the stance as a phenomenon and practical methods for analyzing the expression of stance in real language data.
The analytical methods that are available are simplistic and essentially developed for small and static data. Our goal is to develop innovative computational analysis and visualization methods for investigations of stance in very large and dynamic text data. The methods developed in this project bring together theory, data analysis and information visualization in a unique way, and therefore provide a novel and deeper understanding of the expression of stance in texts.
The project was carried out within the Information and Software Visualization research group.
Popular scientific summary
The StaViCTA project focused on the analysis of stance in English written online social media, such as blog or Twitter posts. Taking stance is the expression of attitudes, judgments, doubts, trust, or certainty about a specific topic, and its analysis is crucial for application fields like crisis management, financial analytics, or business intelligence. Our research interests were to identify the language resources for such expressions in the context of social media and how they act together over time. In addition, we wanted to research the technologies that are needed to achieve these goals. Consequently, the members of StaViCTA came from different subjects—linguistics, computational linguistics, and information visualization—in order to exploit synergies and to enable human beings to make sense of large dynamic text data and allow for exploration, control and final evaluation of the analysis processes and results.
To increase the chances of finding enough stance expressions in social media, we concentrated on political blogs, for example on the Brexit. The ten stance categories chosen are AGREEMENT/DISAGREEMENT, CERTAINTY, CONTRARIETY, HYPOTHETICALITY, NECESSITY, PREDICTION, TACT/RUDENESS, SOURCE OF KNOWLEDGE, UNCERTAINTY, and VOLITION. From this, we compiled a gold standard stance corpus on which we then carried out further analyses, for instance, which of these stance categories co-occur together and which not. This corpus has been made available publicly. We then attested that our notional approach was successful in identifying stance-taking in discourse.
On the computational side, we developed so-called machine learning classifiers that are specialized on political texts and able to identify the above-mentioned categories in social media texts. We applied a machine learning technique called active learning for the automatic selection of useful training samples and for subsequent interactive querying of a person to manually provide the right classification. Here, we showed the usefulness of a number of methods, which optimize for resource-efficiency when collecting training data, implemented them, and made them freely available.
Interactive visualization helps to bring all these concepts together and provides the users with a tool to effectively access the textual online data, to apply and interpret the classifiers and their results, but also to make the process of building the training data for the classifiers more efficient and analyzable. Thus, we developed a number of novel, web-based visualization approaches for investigating lexical features for stance phenomena in social media and for supporting text data annotation and classifier training by using active learning stance classification. Finally, we implemented visualization tools for specific application areas, such as digital humanities, that built on our project results.