Towards Multiple Embeddings for Multivariate Network Analysis
Data- och informationsvetenskap
Fakulteten för teknik
Torsdagen den 19 maj 2022 kl. 10.00
Weber, Hus K, Växjö.
Daniel Archambault, Swansea University, UK
Docent Arianit Kurti, Linnéuniversitetet
Professor Andreas Kerren, Linnéuniversitetet
Lektor Ilir Jusufi, Linnéuniversitetet
Professor Welf Löwe, Linnéuniversitetet
The study of multivariate networks (MVNs, i.e., large data sets where data points have relations to other data points and both these relations and the points themselves can have attributed data) is an important task in many different fields, such as social networks for the humanities, citation networks for bibliometrics and biochemical networks for life sciences. Furthermore, when dealing with visualization and analysis of MVNs, many open challenges still exist regarding both computational aspects (i.e., the challenge of computing different metrics of a large-scale MVN) and visual aspects (i.e. the challenge of displaying all the information of a large-scale MVN in a way that is comprehensible to the user). In the search for efficient and scalable visual analytics methods, especially for exploratory data analysis, this thesis explores a novel approach of aspect-driven MVN embedding and the use of ensembles of embeddings for multi-level similarity calculations. Starting from the observation that there already exist several different embedding techniques for datatypes that are common for real-world MVNs, the main question that we will try to answer is: “Could the use of multiple embeddings provide for new and better solutions for visual analytics on multivariate networks?” This main question then inspires the formulation of four more specific research goals regarding: (1) methods for combining embeddings, (2) the development of a general methodology framework, (3) new visualization methods, and (4) proof-of-concept applications for real-world scenarios.
The focus of our work lies on similarity-based analysis within the domains of bibliometrics and scientometrics, and our first major step is to develop a methodology for combining several different embeddings (for the same underlying data) to augment the quality of similarity calculations. This step includes an adaptation of some of the key ideas from ensemble methods to the field of embeddings, and also an interactive optimization process for finding the best performing ensembles. Upon this foundation, we develop an aspect-driven approach which seeks to divide an underlying MVN into separately embeddable aspects, which in turn allows for the resulting embedding vectors to be used in flexible analysis scenarios with high level of interaction. We then proceed to show how the concept of similarity-based analysis can be used to obtain valuable insights to, and a better understanding of, a large set of scientific publications. For this, we introduce the abstract concept of similarity patterns which we use to express how a specific set of similarity criteria are distributed over a data set. Furthermore, we present proof-of-concept applications which are designed to allow the user to exploit these similarity patterns at different levels of detail. We also show that our proposed methodology is generalizable beyond the scope of MVNs, and therefore could be applied to other fields as well.