1 Dec 2017 9:30 AM4:15 PM

Conference

Big Data 2017

Welcome to our 3rd annual Big Data Conference at Linnaeus University.

The conference this year's host for the conference is Linnaeus University Centre for Data Intensive Sciences and Applications (DISA). We have invited researchers from our own Centre to present some of the work that has taken place within DISA this past year and Keynote and invited talks from other universities and partners.

When? Friday December 1st 2017 9.30-16.15

Where? Linnaeus University, Building: K, Room: Wicksell, Växjö

About the conference

On Friday December 1st the conference opens with Coffee and registration, poster mingle in the open area of building K. During the day longer talks with an academic focus will be mixed with poster mingles during meals. We invite everyone that has an interest in Big Data and data intensive applications. The presentations during the conference will be held in English.

The conference is free of charge, but if you register and don't show up you will be charged a fee that covers food etc.

Programme

9.30 – 10.00 Coffee and registration, poster mingle in the open area of building K

10.00 -10.10 Welcome and practical information, Mikko Laitinen, Professor of English Linguistics, Linnaeus University

10.10 – 10.30 Opening speech - Welf Löwe, Professor of Computer Science, Linnaeus University

10.30 – 11.30 "When a Few Data Points are Not Enough: On the Emergence of Analytics in Culture Research" Kristoffer Laigaard Nielbo, University of Southern Denmark, Denmark.

11.30 – 12.30 Showcasing DISA - results, Chair: Andreas Kerren, Professor of Computer Science, Linnaeus University

"Graph Layouts by t-SNE" Johannes F. Kruiger, Paulo E. Rauber, Rafael M. Martins, Andreas Kerren, Stephen Kobourov, and Alexandru C. Telea
"Self-organization in the commons: An empirically-tested model" Amineh Ghorbani, Giangiacomo Bravo, Ulrich Frey, and Ins Theesfeld
"Revisiting weak ties: Using present-day social media data in variations studies" Mikko Laitinen, Jonas Lundberg, Magnus Levin & Alexander Lakaw (all DISA researchers)
"The State of the Art in Sentiment Visualization" Kostiantyn Kucher, Carita Paradis, and Andreas Kerren

12.30 – 14.00 Lunch and poster mingle.

14.00 – 15.00 "BigData@BTH - Establishing a Data Science Research Environment" Håkan Grahn, Professor of Computer Engineering at Blekinge Institute of Technology

15.00-16.00 "Exploring complex data sets with generative models" Fabrice Rossi, Professor of applied mathematics at University Paris 1 Panthéon Sorbonne

16.00 – 16.10 Closing remark – future plans for DISA – Welf Löwe, Professor of Computer Science, Linnaeus University.

16.10 -> Grab and go coffee and chance to mingle

Conference speakers

Kristoffer Laigaard Nielbo, datakuben @ Department of History, University of Southern Denmark, Denmark.

"When a Few Data Points are Not Enough: On the Emergence of Analytics in Culture Research"

Digitization and digital media have generated a rapid proliferation of data that is unprecedented in the history of man. This data deluge is transforming knowledge discovery and understanding in every domain of human inquiry. Large-scale computing and data-intensive methods have therefore gained acceptence in most research domains. With a preference for myopic and qualitative approaches to cultural heritage, culture researchers do in many ways represent the anti-thesis of this development. We insist on manually scrutinizing small sets of cultural expressions and mentally synthesizing our results. A \Culture Analytics" is however emerging as domain experts in history, language, and literature are starting to utilize computation and digital data to test well-established theories and nd new cultural patterns. While the need for high performancing computing is still quite limited, culture analytics is changing the scale and scope of multiple disciplines in the social sciences and humanities. This talk will outline the emerging research eld of culture analytics with examples from historical, literary and ethnographic research. For culture analytics to prosper, it is necessary to establish lasting collaborations between culture researchers and the computational sciences grounded in mutual interest and understanding. We argue that culture analytics can contribute with valuable domain expertise and a human perspective in data-driven research.

Biography KLN is associate professor of humanities computing with specialization in tools for analysis and interpretation of cultural data. He has participated in a range of collaborative and interdisciplinary research projects involving researchers from the humanities, social sciences, health science, and natural sciences. His research covers two broad areas: automated text analysis and modeling of cultural behavior. Both areas explore the cultural information space in new and innovative ways by combining cultural data and humanities theories with statistics, computer algorithms, and visualization.

DISA@LNU-Nielbo.pdf

Andreas Kerren, Professor of Computer science, Department of computer science, Linnaeus University

"Initial research findings of the LNUC DISA"

Description: In this session, academically outstanding, previously published work of the various DISA research areas will be presented. For this, four selected talks will pinpoint recent scientific contributions in the fields of data-intensive sciences and applications.

Håkan Grahn, Professor of Computer engineering at Blekinge Institute of Technology Data

"Establishing a Data Science Research Environment"

Håkan Grahn will be generated at an ever-increasing rate for the foreseeable future. Added value and cost savings can be obtained by analyzing big data streams. The analysis of large data sets requires scalable and high-performance computer systems. In order to stay competitive and to reduce consumption of energy and other resources, the next generation systems for scalable big data analytics need to be more resource-efficient. The research project BigData@BTH - Scalable resource-efficient systems for big data analytics, combines expertise in machine learning, data mining, and computer engineering to advance the knowledge in the domain. The goal is to establish a long-term sustainable research environment, and the value of it will be demonstrated and evaluated mainly in two application areas (decision support systems and image processing).

For more information about Big Data.

The research interests by Prof. Grahn are in the intersection of parallel computing and data science. Traditionally has accuracy been the main objective in machine learning, but we address also aspects such as execution time and energy consumption. Most of our research is done in close collaboration with industrial partners. Currently, he is heading a 6-year research effort on big data analytics, BigData@BTH.

Fabrice Rossi, Professor of applied mathematics at University Paris 1 Panthéon Sorbonne

"Exploring complex data sets with generative models"

Making sense of medium to large data sets remains a very difficult challenge, especially when both the number of objects and the number of instances are large. The classical way of exploring such data sets remains a combination of clustering methods and low dimensional visual representations. Clustering methods are used to group similar objects while low dimensional visual representations enable the analyst to make sense of the relationships between clusters. However, truly high dimensional data sets cannot be represented faithfully in low dimension, a fact that strongly limits the practical usefulness of this standard analysis methodology on modern data sets.

A potential solution is offered by the co-clustering framework in which both objects and variables are summarized. The main advantage of clustering variables rather than trying to build a low dimensional representation is that the former scales easily to complex data with high intrinsic dimension. However, most co-clustering methods cannot handle large data sets or mixed data sets (with numerical and categorical variables).

I will present in this talk a general principle based on grid modeling which can be used in particular to circumvent the limitations of co-clustering and thus to explore medium to large scale data sets. I will first present the general idea of generative modeling, then introduce our non parametric generative model. I will give examples of the way the general idea can be adapted to different settings. The last part of the talk will be focused on the co-clustering case.

Fabrice Rossi is a member of the SAMM laboratory. He leads a research team on statistical learning, statistics and networks, with nine permanent researchers and seven PhD students. He specializes on exploratory data analysis with a special interest in graph data, change detection and visual data exploration. More generally, his research covers numerous important themes of machine learning including large scale data processing, feature selection, learning theory and clustering.

Fabrice Rossi works frequently with researchers from other fields, especially from the humanities, including archaeology, history and sociology. In 2017, he was guest editor of a special issue on humanities and statistics of the main French statistics journal. He has (co)-authored more than 150 articles in journals and conference proceedings.

DISA@LNU-Fabrice Rossi.pdf

Rafael Martins, Post doc of Computer science, Linnaeus University

"Graph Layouts by t-SNE"

We propose a new graph layout method based on a modification of the t-distributed Stochastic Neighbor Embedding (t-SNE) dimensionality reduction technique. Although t-SNE is one of the best techniques for visualizing high-dimensional data as 2D scatterplots, t-SNE has not been used in the context of classical graph layout. We propose a new graph layout method, tsNET, based on representing a graph with a distance matrix, which together with a modified t-SNE cost function results in desirable layouts. We evaluate our method by a formal comparison with state-of-the-art methods, both visually and via established quality metrics on a comprehensive benchmark, containing real-world and synthetic graphs. As evidenced by the quality metrics and visual inspection, tsNET produces excellent layouts.

Giangiacomo Bravo, Professor of Sociology, Linnaeus University

"Self-organization in the commons: An empirically-tested model"

A appropriate bottom-up rule system can support the sustainability of common-pool resources such as forests and fisheries. The process that leads to the developments of such institutional settings requires the considerations of multiple social, physical, and institutional factors over long time horizons. In this paper, we present the SONICOM model as a general exploratory model of CPR systems. The model can be configured to represent different CPR systems in order to explore what kind of institutional settings result in stable systems, i.e. situations where the resource and the appropriators are in a state of well-being. We use a large-N-dataset of CPR management institutions to validate the model. The results show numerous correlations between various parameters of the system such as rule compliance, social influence and resource growth rate which help explaining the process of institutional emergence as well as unveiling the conditions under which systems are stable.

Mikko Laitinen, Professor of English linguistics, Linnaeus University

"Revisiting weak ties: Using present-day social media data in variations studies"

This article makes use of big and rich present-day data to revisit the social network model in sociolinguistics. This model predicts that mobile individuals with ties outside a home community and subsequent loose-knit networks tend to promote the diffusion of linguistic innovations. The model has been applied to a range of small ethnographic networks. We use a database of nearly 200,000 informants who send micro-blog messages in Twitter. We operationalize networks using two ratio variables; one of them is a truly weak tie and the other one a slightly stronger one. The results show that there is a straightforward increase of innovative behavior in the truly weak tie network, but the data indicate that innovations also spread under conditions of stronger networks, given that the network size is large enough. On the methodological level, our approach opens up new horizons in using big and often freely available data in sociolinguistics, both past and present.

Konstiantyn Kucher, PhD of Computer science, Linnaeus University

"The State of the Art in Sentiment Visualization"

Visualization of sentiments and opinions extracted from or annotated in texts has become a prominent topic of research over the last decade. From basic pie and bar charts used to illustrate customer reviews to extensive visual analytics systems involving novel representations, sentiment visualization techniques have evolved to deal with complex multidimensional data sets, including temporal, relational and geospatial aspects. This contribution presents a survey of sentiment visualization techniques based on a detailed categorization. We describe the background of sentiment analysis, introduce a categorization for sentiment visualization techniques that includes 7 groups with 35 categories in total, and discuss 132 techniques from peer-reviewed publications together with an interactive web-based survey browser. Finally, we discuss insights and opportunities for further research in sentiment visualization. We expect this survey to be useful for visualization researchers whose interests include sentiment or other aspects of text data as well as researchers and practitioners from other disciplines in search of efficient visualization techniques applicable to their tasks and data.