Big Data 2022
Welcome to the 8th annual Big Data Conference at Linnaeus University, in Kalmar, Sweden.
Tora Hammar
On December 1-2, a group of 60 people gathered in Kalmar with an interest in Big Data and data-driven research and development. The conference was organized by Linnéuniversity's Linnaeus University Centre for Data Intensive Sciences and Applications (DISA).
Participants in the eighth Big Data conference at Linnéuniversity learned about different aspects of large data sets and how they can be used to improve decision-making in various fields. The conference included lectures by experts from academia and industry, discussions, poster presentations, and networking opportunities.
–There was a great atmosphere with a high level of chatter and many new contacts were made that we hope will lead to new exciting interdisciplinary collaborations says Diana Unander, coordinator for DISA.
Participants have learned about research and ideas on how to use big data in areas such as prediction of maintenance planning for connected products, forest mapping, social media, humanities and digital art, psycholinguistic research, and financial analysis.
–The great thing about the conference is that researchers from different fields meet and can inspire each other to new solutions to research problems. It is fascinating how many common denominators there can be between research in, for example, eHealth, forestry and social science says Tora Hammar, researcher and coordinator of the group within DISA working on eHealth.
The conference days ended with panel discussions, including on how large data sets and artificial intelligence can improve decision-making and the opportunities and challenges of Big Data.
If you missed the conference but want to see the presentations, they will be uploaded to the conference website soon.
To learn more about the research being done in the area of specialist research, visit DISA's website.
------------------------------------------------------------------------------------------
The host for this year's Big Data will of course be Linnaeus University Centre for Data Intensive Sciences and Applications (DISA).
We invite everyone who has an interest in Big Data and data intensive applications to take part in this event!
During the conference you will meet invited speakers from other universities and industry, but of course also learn about results and ongoing research within DISA. Longer talks with an academic or an industry focus will be mixed with poster mingles.
Just as previous years, all the presentations will be held in English.
Information from previous conferences, videos etc can be found here:
https://lnu.se/en/meet-linnaeus-university/conferences/big-data-conference/
Programme
December 1st
9.00–9.30 Registration/coffee
9.30–9.45 Welcome/introduction
9.45–10.30 Predicting optimal maintenance planning using data from connected products - Jonas Tillström, Business Area Manager, Sigma Technology Insight Solutions
10.30–10.50 Coffee break and poster mingle
10.50–11.25 The next generation of forest maps - adapting a Nordic success story across the globe - Anton Holmström, Product Manager at Katam Technologies AB
11.25–12.00 Social-media data for research: opportunities and challenges - Elizaveta Kopacheva, PhD student Linnaeus University
12.00–13.00 Lunch and poster mingle
13.00–13.30 Digital Methods Platform for Arts and Humanities: Presenting Three Open Educational Resources for the Digital Arts and Humanities - Koraljka Golub, Fredrik Hanell, Jukka Tyrkkö, Daniel Ihrmark, and Ludvig Papmehl-Dufay
13.30–14.00 Big data in psycholinguistic research: A presentation of studies using electrophysiological measures of first and second language processing - Annika Anderson
14.00–14.30 Patterns in the Gibberish. Using Google Speech-to-text in distant reading of Salafi on-line audio sermons in Swedish - Jonas Svensson
14.30–15.00 Coffee break and poster mingle
15.00–15.45 Big data and digital twins: bring a digital copy of yourself with you throughout your health journey - Gunnar Cedersund, docent Linköping University
15.45–16.45 Panel discussion: How can big data and AI improve decision making in different areas - possibilities and challenges
16.45–17.00 Wrap up, closing remarks
18.00 Guided tour Kalmar castle
19.00 Dinner at Söderport, Kalmar*
Poster session during day 1: Presentation of all DISA groups
For more information on the speakers and their presentations, please look under "Speakers".
December 2nd
8.30–9.00 Registration / coffee
9.00–9.15 Welcome /introduction to the day
9.15–10.05 Model reduction strategies for physics-based Digital Twins and the importance of error control - Christian Engström, Professor in Mathematics, Linnaeus University
10.05–11.00 Data driven orthogonal basis selection for functional data analysis - Krzysztof Podgorski, Professor in Mathematical statistics, Lund University
11.00–11.30 Coffee + sandwich and poster mingle
11.30–12.15 Estimating precision matrices in high-dimensional settings: Principles and applications - Taras Bodnar, Professor in Mathematical statistics, Stockholm University
12.15–13.00 Analysis of High-dimensional data: from nuclear physics to finance - Thomas Holgersson, Professor, Linnaeus University
13.00–14.00 Lunch and poster mingle
14.00–15.00 Panel discussion: Big Data: the Good, the Bad and the Ugly
15.00–15.30 Wrap up and closing remarks
Poster session during day 2: Ongoing or recently finished research projects related to Big Data (information about submission will be announced shortly, PhD-students and junior researchers from DISA will be prioritized)
For more information on the speakers and their presentations, please look under "Speakers".
* For DISA researchers not living in Kalmar it will be possible to get a hotel room in Kalmar between December 1st and 2nd to attend the conference.
Speakers
Jonas Tillström: Predicting optimal maintenance planning using data from connected products
Abstract
Today, uptime of products like vehicles or production machines is of high importance for any company performing transportations or manufacturing products. The longer a vehicle or machine has to be taken out of ordinary service for maintenance, the higher will the cost or loss of income for the company be. Optimal planning of maintenance tasks has therefore become a high attention in order to reduce the total time needed for maintenance but also the number of times it has to be taken out of service. An accurate service planning is also of great importance to prevent unexpected breakdowns.
As a result of that, many manufacturing companies have started to also provide different types of real usage-based maintenance planning as a service for their customers. Instead of only use traditional factors like calendar time, mileage, engine hours etc. for the planning, it should also include prediction of the health of components in the product in order to enable repair or replacement of them in combination with regular maintenance tasks before a breakdown of the product but also not too early.
This presentation will show how we use data collected regularly from a large number of sensors in the products in combination with statistical models and machine learning to predict the optimal points in time for the maintenance sessions.
Bio
Jonas is business area manager for analytics, AI and machine learning at Sigma Technology Insight Solutions in Gothenburg. He has worked in the analytics and business intelligence area since 1994 and been part of a large number of projects for customers in automotive, manufacturing, retail, healthcare, and many other areas.
Anton Holmström: The next generation of forest maps - adapting a Nordic success story across the globe
Abstract
The three-year research project, in cooperation between universities from Sweden, Finland and Turkey, aim to deliver open forest maps globally. This will be achieved by applying AI algorithms using the combination of data from satellite remote sensing and high-resolution terrestrial data crowdsourced with Katam’s smartphone application.
The background is the successful delivery of open forest data from national lidar-scanning of the Nordic countries. The data is today more than ten years old and are updated with new flights of data collection. The open high-resolution data has changed how organisations, not just forestry, are working. This success with both economic and societal values has a huge potential if it could be exported to global scale, but it’s hold back due to high CapEx in hardware and limited access of field plots measured on ground. This is where Katam comes in with a solution of efficiently gather high quality and resolution field data using inexpensive hardware as smartphones and camera drones.
Bio
Anton Holmström (M.Sc Forestry) is Product Manager at Katam Technologies AB. Earlier he worked as business and operations developer at Swedish Forestry Agency, with his focus on implementing the use of drones and AI in the Swedish forestry sector.
Elizaveta Kopacheva: Social-media data for research: opportunities and challenges
Abstract
What types of data can we collect via APIs of famous social media platforms (e.g., Twitter, VKontakte)? What research questions can we answer using these data? What limitations will we face? --- All of those questions are addressed in this talk. I will discuss mentioned issues on an example of 3 empirical cases of analysing social media data for exploratory, explanatory and predictive purposes. In particular, I will focus on (1) text data, for exploring the dynamics of polarization over the question of migration; (2) popularity metrics (such as the number of likes, views, followers) for explanatory modelling of petition-signing; and (3) social network data for predicting protest participation. In that way, I will present opportunities, which emerged for social-science research with the growth of social-media popularity. I will, also, talk about the challenges that a researcher may face when working with social-media data: inclusive of those related to data collection, pre-processing, and analysis as well as ethical considerations. I will finish my talk by emphasising the limitations of both: social-media data and the tools for analysing these data. The limitations will be illustrated by the example of ending up with exploratory analytics --- when aiming for explanatory modelling.
Bio
Elizaveta Kopacheva is a PhD candidate at the Department of Political Science, Linnaeus University. Her research interests lie in the areas of political participation in non-institutionalised activities (such as protesting, petition-signing, political activism), social-media use for political mobilisation, text mining and social network analysis. She participated in several research projects in the fields of protest participation, opinion formation and polarisation, resource dependence and conflict, and electoral behaviour. As a part of the Linnaeus University Centre (Lnuc) for Data Intensive Sciences and Applications (DISA), she is mostly interested in applying techniques of unsupervised and supervised machine learning to social-media data for inference.
Anton Holmström: The next generation of forest maps - adapting a Nordic success story across the globe
Abstract
The three-year research project, in cooperation between universities from Sweden, Finland and Turkey, aim to deliver open forest maps globally. This will be achieved by applying AI algorithms using the combination of data from satellite remote sensing and high-resolution terrestrial data crowdsourced with Katam’s smartphone application.
Bio
Product Manager at Katam Technologies AB.
Gunnar Cedersund: Big data and digital twins: bring a digital copy of yourself with you throughout your health journey
Abstract
For the last 20 years, Cedersund has developed mechanistic mathematical models for all of the main organs in the human body: heart, liver, fat, brain, etc. Lately, these models have combined into an interconnected model for the body as a whole. This interconnected model can be made specifically for each individual, and is then called a digital twin. This digital twin technology employs a hybrid approach, which combines the mechanistic simulation models with machine learning and bioinformatics models. This allows a patient, doctor, or other end-user to look inside the body of a patient, as it is now, ranging from the whole-body to the intracellular level. This also allows for simulations of different future scenarios, ranging from ms to years, and can simulate e.g. the risk of a stroke, depending on choice of diets, exercise, and certain medications. The models are thus of an M4-nature: multi-level, multi-timescale, mechanistic, and multi-organ. This digital technology is generalizable to future new usages of data and application areas, and thus goes beyond traditional narrow AI, to general, explainable AI. The digital twins are made available for end-users via a backend, which can be connected to a series of different eHealth apps. In this way, the same digital twin will be able to follow a patient across their health journey: from normal day activities like learning, concerts, and exercise, to preventive care, treatment planning, and rehabilitation.
Bio
Gunnar Cedersund (https://liu.se/en/employee/gunce57) heads a cross-disciplinary research group at the Department of Biomedical Engineering (IMT) at Linköping University. The heart of this group (15+ people) does hybrid mathematical modelling, combining machine learning with mechanistic small- and large-scale models. These models are developed using both pre-clinical and experimental data of various types, which are collected both by others within the same group, and by numerous collaborators. These models are made available for preventive and patient-centric care, as well as for drug development and medical pedagogics, using innovative eHealth technologies.
Taras Bodnar: Estimating precision matrices in high-dimensional settings: Principles and applications
Abstract
The estimation of the covariance matrix, as well as its inverse (the precision matrix), plays an important role in many disciplines from finance and genetics to wireless communications and engineering. In fact, having a suitable estimator for the precision matrix we are able to construct a good estimator for different types of optimal portfolios. Similarly, in the array processing, the beamformer or the so-called minimum variance distortionless response spatial filter is defined in terms of the precision matrix. In practice, however, the true precision matrix is unknown and a feasible estimator, constructed from data, must be used.
An optimal shrinkage estimator for the precision matrix will be introduced in high dimensions. The recent results from random matrix theory are used to find the asymptotic deterministic equivalents of the optimal shrinkage intensities and to estimate them consistently. The resulting distribution-free estimator has almost surely the minimum Frobenius loss. Additionally, the asymptotic properties of the Frobenius norms of the inverse and of the pseudo-inverse sample covariance matrices are obtained and they are used to construct a bona fide optimal linear shrinkage estimator for the precision matrix. Finally, an optimal shrinkage estimator for the weights of optimal portfolios will be presented.
Bio
Taras Bodnar received the MS degree in mathematics from the Ivan Franko National University of Lviv, Lviv, Ukraine in 2001, the PhD degree in economics and statistics (summa cum laude) from the European University Viadrina, Frankfurt (Oder), Germany in 2004 and the PhD degree in mathematics and physics from the Taras Shevchenko National University of Kyiv, Kyiv, Ukraine in 2009. He is Professor of mathematical statistics with the Department of Mathematics of Stockholm University, Stockholm, Sweden. He is a Member of the Editorial Boards of Journal of Multivariate Analysis and Theory of Probability and Mathematical Statistics. His research interests include high-dimensional statistics, random matrix theory, Bayesian statistics, statistical methods in finance among others. The obtained results were published in major statistical and financial journals, like The Annals of Statistics, IEEE Transactions on Signal Processing, Journal of Business and Economic Statistics, Journal of Multivariate Analysis, Scandinavian Journal of Statistics, The European Journal of Operational Research, Journal of the Empirical Finance, Computational Statistics and Data Analysis, Applied Mathematics and Computation.
Krzysztof Podgorski: Data driven orthogonal basis selection for functional data analysis
Abstract
`Big data' are understood as high-dimensional dimensional or massive or both. In fact, one should probably not refer to this term without high-dimensionality (complexity) built into them. The extreme case of high-dimensionality are infinitely dimensional data such as infinite sequences and functions. Is there any gain in analyzing the high-dimensional data by modeling them as functions?
Functional data analysis is typically performed in two steps: first, functionally representing discrete observations, and then applying functional methods, such as the functional principal component analysis, to the so-represented data. While the initial choice of a functional representation may have a significant impact on the second phase of the analysis, this issue has not gained much attention in the past. Typically, a rather ad hoc choice of some standard basis such as Fourier, wavelets, splines, etc. is used for the data transforming purpose. To address this important problem, we present its mathematical formulation, demonstrate its importance, and propose a data-driven method of functionally representing observations. The method chooses an initial functional basis by an efficient placement of the knots. A simple machine learning style algorithm is utilized for the knot selection and recently introduced orthogonal spline bases - splinets - are eventually taken to represent the data. The benefits are illustrated by examples of analyses of sparse functional data.
Bio
Professor in Statistics, Department of Statistics, Lund University School of Economics and Management, Ph.D. diplomas: Wroclaw University of Science and Technology (Mathematics), Michigan State University (Statistics), interests in Stochastic Processes, Modern Distribution Theory, Random Matrices, Functional Data Analysis, Gaussian and Non-Gaussian Stochastic Modeling, Extreme Value Analysis, Financial Mathematics.
Christian Engström: Model reduction strategies for physics-based Digital Twins and the importance of error control
Abstract
Partial differential equations are key tools in the modeling of processes in science and engineering. Many models in physics, finance, and biology include fractional derivatives to model long-range structural stochasticity and terms that represent a time delay. Equations that include time delay and fractional derivatives are therefore essential for developing digital replicas (Digital Twins) of our physical world. The development of state-of-the-art digital twins requires numerical simulations of dynamical systems where the in-data has uncertainties. It is therefore essential to develop new very fast methods for parameter-dependent problems that also include error control.
In this talk, we outline the connection between stochastic processes and fractional derivatives. Furthermore, we discuss different model- and data-driven model reduction strategies and the importance of error control.
Bio
Christian Engström is Professor of Mathematics at Linnaeus University. He develops mathematical and numerical analysis that results in robust and very fast numerical methods, which can be used to build digital twins of our world. Fast numerical methods are also a cornerstone in optimization with PDE constraints and uncertainty quantification.
Poster sessions
Two poster sessions will be organized as part of the Big Data Conference 2022. Each session will be ongoing during each of the conference days during the coffee breaks. There will be two poster sessions.
- Poster session during day 1: Presentation of all DISA groups – Each group leader is responsible to submit a poster for this session to present the group, ongoing research and/or new ideas for research and collaboration.
- Poster session during day 2: Ongoing or recently finished research projects related to Big Data. PhD-students and junior researchers from DISA will be prioritized.
To submit to the second poster session participants should send to Diana Unander <diana.unander@lnu.se> a 500-word abstract briefly presenting the research by November 20th. Each participant can submit at most one abstract for the session. Since we have limited space – you will get an accept or decline no later than November 22nd.
A poster presenting it in more details should be sent to the same address by November 25th, 2022 for both sessions in order to be printed and ready on time.
Program committee
The conference “Big Data 2022" is a sustainability-assured meeting in accordance with Linnaeus University’s guidelines for sustainable events. These guidelines are linked to the 17 global goals in Agenda 2030 and comprise the three dimensions of sustainable development: the economic, the social, and the environmental. Learn more about Linnaeus University’s sustainable events here.