Big Data Conference 2024
Welcome to the 10th annual Big Data Conference at Linnaeus University, in Växjö, Sweden.
The host for Big Data 2024 will be the Linnaeus University Centre for Data Intensive Sciences and Applications (DISA). We invite everyone who has an interest in artificial intelligence, big data, and data intensive applications in the sciences, the humanities, in engineering and computing to take part in this event!
Our aim is that the Big Data Conference will bring you both new inspiration from the speakers, and updates on results and ongoing research within DISA as well as other universities and the industry.
AI Literacy Workshop
We can also recommend another event which has close connections to the Big Data Conference: AI Literacy Workshop - on September 25 at 1-6 pm.
The workshop is fully booked.
Digital Skog (Digital Forest)
We can also recommend another event which has close connections to the Big Data Conference and our DISA Forestry group: Digital Skog: https://vaxjolinnaeussciencepark.se/event/kommande-evenemang/2024-04-10-digital-skog-2024---for-dig-inom-skogsbranschen at Växjö Linnaeus Science Park, Växjö on September 25 – 26th. The event is mainly in Swedish.
The event is fully booked.
Tech Heads
We can also recommend another event which has connections to our DISA eHealth group: https://techheads.se/ in Kalmar on September 24th.
From previous Big Data Conferences
- Various information, videos etc can be found via the following link: https://lnu.se/en/meet-linnaeus-university/conferences/big-data-conference/
Programme
September 26th (day 1)
9.00 - 9.30 Coffee and registration
9.30 - 9.40 Welcome and practical information
9.40 - 10.20 Keynote 1: Leveraging Time-Series Data Analysis for Classification, Fault Detection, and Sensor Fusion in Smart Industries – Vinicius Prado da Fonseca
10.20 - 10.40 Coffee break
10.40 - 12.00 Session 1: Smart Industry
- Efficient data collection for sustainable Smart Industry – Hatem Algabroun
- An intelligent diagnostic framework based on digital twins and partial transfer learning: methodology and industrial application – Mehdi Saman Azari
- Normalizing Flow Mechanics for Anomaly Detection in Industrial Parts through Visual Inspection – Sebastian Hönel
- Condition monitoring of driveline components – Joel Cramsky
12.00 - 13.00 Lunch
13.00- 14.30 Session 2: Forestry
- Supporting agricultural subsidy administration utilizing machine learning – Niels Gundermann
- Unpaired Image-to-Image Translation to Improve Log End Identification – Dag Björnberg
- Early detection of the spruce bark beetle infestations, from theory to practical application – Christo van Zyl
14.30 - 14.50 Coffee break
14.50 - 15.30 Keynote 2: Drone based multi-annual methods for bark beetle induced forest disturbance monitoring - Roope Näsi
15.30 - 16.50 Session 3: AI in software development and maintenance
- Harnessing LLMs for Deciphering Commit Intent From Affected Source Code – Sebastian Hönel
- AI-based critical embedded systems: ensuring quality and safety – Maria Ulan
- Mapping source code to software architecture by leveraging Large Language Models – Nils Johansson
- Code generated by Large Language Models – Welf Löwe
16.50 - 17.00 Closing of day 1
19.00 - Conference dinner at Izakaya Moshi (maximum 120 persons).
September 27th (day 2)
8.30 - 9.00 Coffee and registration
9.00 - 9.10 Welcome and practical information
9.10 - 10.40 Session 4: Visual Analytics
- InfraVis - The National Research Infrastructure for Data Visualization - Andreas Kerren
- Navigating Patient Care with Data Visualization – Claudio D.G. Linhares
- Interactive Visualization for Urban Data Analysis – Nivan Ferreira
- Explainable AI through Visual Analytics – Rafael M. Martins
10.40 - 11.00 Coffee and fruit
11.00 - 12.30 Session 5: Health data and AI
- Benefits and challenges of using AI to support decisions in health care – short overview of research at Lnu – Tora Hammar
- Using SweDeClin-BERT model for identifying adverse drug events in clinical notes – Alisa Lincke
- Real-world healthcare data and artificial intelligence – experiences from a large Norwegian hospital – Arian Ranjbar
12.30 - 13.20 Lunch
13.20 - 14.50 Session 6: Health data, Cybersecurity and Synthetic data
- Introduction to possibilities and challenges with health data in Sweden – Tora Hammar
- Telia crowd insights - how people travel through Telia's cellular networks: How do you balance anonymity, data quality and business value in customer-facing data products - Jonas Ahnstedt
- Privacy-preserving data analytics – Approaches and Challenges for End User – Simone Fischer-Hübner
14.50 -15.10 Coffee
15.10 - 15.50 Keynote 3: Information-driven healthcare, how Halland took the leap – Farzaneh Etminani
15.50 – 16.00 Closing of day 2
Keynotes/Speakers
Keynotes
Farzaneh Etminani: Information-driven healthcare, how Halland took the leap (Keynote 3)
Abstract: Information-driven care means to analyze and draw conclusions from collected health data in order to achieve a holistic fact-based picture of healthcare, from an individual to system perspective. By harnessing the power of information, healthcare organizations can improve decision making, clinical outcomes, enhance patient experiences, and achieve greater efficiency and effectiveness in care delivery. Region Halland aims to provide the right care to the right person at the right time. Achieving this requires proactive, accurate, and accessible healthcare. Challenges posed by an aging population, increased prevalence of chronic diseases, and inadequate healthcare personnel necessitate improving healthcare quality without additional resources. Part of the solution involves utilizing the vast amount of health-related data collected daily in new ways – for change and value for the patient. We tackle the development of healthcare in collaboration with the surrounding community; with academia, private, and public sectors. This talk will showcase a few examples of how Artificial Intelligence and Machine learning within the context of information-driven care can be utilized to address these challenges effectively.
Bio: Farzaneh Etminani is holding two positions: AI Strateg at FoU in Region Halland and Associate Professor, Docent, in machine learning working at the Center for Applied Intelligent Systems Research in Health (CAISR Health), Halmstad University, Sweden. She is Deputy Profile Manager for CAISR Health, a Swedish funded research profile, and deputy focus area leader for Health Innovation.
She manages the Real-World Evidence (RWE) research projects together with Health Data Center (HDC), that includes several research projects together with Region Halland (a regional Swedish healthcare system), analytics companies, and big Pharma.
She has worked on various topics and application areas within Machine Learning (ML), Artificial Intelligence (AI), and Deep Learning (DL) in the last decade, focused mostly within healthcare domain. Her main research interest is focused on solving real-world problems, which is focused on a healthier society, with the help of AI and ML, if possible and applicable.
Roope Näsi: Drone based multi-annual methods for bark beetle induced forest disturbance monitoring (Keynote 2)
Abstract: Boreal forests in central and northern Europe are threatened by biotic and abiotic stresses at an increasing rate as a consequence of climate change. The risk of outbreaks by the European spruce bark beetle (Ips typographus L.) has increased in Norway spruce stands. Drones offer a versatile solution for monitoring forest ecosystems. This study aimed to develop and assess an individual tree-based methodology using multi-temporal, drone images to track changes caused by the European spruce bark beetle. The approach encompassed four key steps: 1) individual tree detection using point clouds, 2) tree species classification, 3) health classification of spruce trees as healthy, declined, or dead, and 4) change detection, identifying fallen/removed trees and alterations in tree health status. The results demonstrated successful control of the outbreak in the managed stands, evidenced by moderate tree mortality.
Bio: Roope Näsi is a Senior Researcher at Finnish Geospatial Research Institute (FGI) in National Land Survey of Finland. He received a BSc and MSc in Geomatics from Aalto University. In 2021, he received a PhD (title: Drone-based spectral and 3D remote sensing applications for forestry and agriculture) from Aalto University based on research conducted in FGI. At the moment, his research is focusing mainly to various of application in agriculture and forestry, where remote sensing, especially drones, can be utilized.
Vinicius Prado da Fonseca: Leveraging Time-Series Data Analysis for Classification, Fault Detection, and Sensor Fusion in Smart Industries (Keynote 1)
Abstract: Join us as we explore the strategic utilization of time-series data analysis in smart manufacturing. This keynote will delve into the applications of time-series analysis for classification, fault detection, and sensor fusion, showcasing its transformative impact on manufacturing efficiency and reliability. Through real-world examples from tactile sensing and fiber industries, discover how leveraging time-series data analysis is used to enhance classification accuracy, detect events, and fuse sensor data. Gain insights into the innovative approaches driving smart manufacturing forward and learn how to apply these principles in your own industrial processes.
Bio: Vinicius Prado da Fonseca obtained his Ph.D. in Electrical and Computer Engineering from the School of Electrical Engineering and Computer Science, University of Ottawa, Canada, and his M.Sc. degree from the Military Institute of Engineering, Rio de Janeiro, Brazil. He is currently an assistant professor in the Department of Computer Science at the Memorial University of Newfoundland, Canada. His research interests include robotic manipulation, intelligent prostheses, visuotactile robotic feedback, fuzzy-logic controllers, artificial intelligence, and applied machine learning.
Session Speakers
Jonas Ahnstedt
Abstract: Telia Crowd Insights uses anonymized and aggregated mobile network data to provide insights about population movement and behavior. The data is collected from mobile devices connected to Telia's network, and then processed to ensure privacy before being analyzed. In this talk, we delve into the challenges faced when ensuring end-user anonymity while providing high-quality, useful data when moving from static pre-processed data to API:s that allows customers to query the data for custom time and geospatial granularities. How to maintain comparability of data over time with seasonality and a pandemic causing drastic shifts in people behavior, and number of data points, leading to data loss due to anonymization. This is one of the few talks where a significant part of the audience is part and already contributing to the data, as Telia subscribers.
Bio: Jonas Ahnstedt has bachelors and masters degrees in electrical engineering from BTH and Jönköping University, and is currently a consultant within Cloud Architecture and Data Engineering where he establishes entirely new, or secures existing, data platforms and organizations for some of the largest companies in sweden in the banking, retail, telecommunications and manufacturing sectors.
Hatem Algabroun
Abstract: With the trend of Smart Industry, wireless sensors are gaining growing interest due to their ability to be installed in locations inaccessible to wired sensors. Although great success has already been achieved, energy limitation remains a major obstacle for further advances. Furthermore, when a massive number of sensors are installed to collect huge amounts of data, dealing with such data could be intensive in terms of processing capabilities, communication, memory size, and power consumption. This would diminish the efficiency and performance of these systems, and eventually, their sustainability. As such, it is important to optimize sampling to a sufficient rate to catch important information without excessive energy consumption. Adaptive sampling techniques offer a solution. This talk will explore this technique on different sensors and demonstrates its application in data-driven maintenance.
Bio: Hatem Algabroun is a Senior Lecturer at Linnaeus University in the Department of Mechanical Engineering, Faculty of Technology, in Växjö, Sweden. He holds a PhD in Terotechnology (Maintenance Engineering) from the Department of Mechanical Engineering at Linnaeus University. His research interests include predictive maintenance, maintenance and industrial digitalization, adaptive sampling, software architecture for maintenance systems, and maintenance investment-benefit analysis.
Dag Björnberg
Abstract: Tracking timber throughout the supply chain is an important mechanism to prevent illegal logging. Consequently, visual re-identification systems can be employed for tracing the logs through the production chain. However, domain capabilities of such systems are often limited, making them inaccurate for new types of source data captured under new environmental circumstances. In this talk, we demonstrate how unpaired image-to-image translation can be used to enhance the generalization capacity of a log end identification model in the absence or combined with a smaller amount of labeled training data.
Bio: Dag Björnberg holds a masters degree in mathematics from LNU and is currently an industrial PhD student at Softwerk AB. He has conducted research in computer vision applications within forestry such as timber tracing, quality assessments of logs and monitoring of plantations from drone images. Since 2024, he is also involved in the ForestMap project at LNU, which aims to produce forest maps across the globe.
Joel Cramsky
Abstract: Volvo Construction Equipment presents its ongoing research on condition monitoring of driveline components, specifically focusing on a new gearbox and tires. For both the gearbox and tires new sensors are tested to capture vibration spectra in the order domain. For tires modeling to predict the temperature distribution within the tire is important. The primary aim is to implement predictive maintenance strategies to reduce downtime and extend the lifespan of these critical components. By continuously monitoring their condition, early signs of wear and potential failures can be detected.
Bio: Phd candidate at Linneaus university. Has worked in product development for Swedish companies for 15 years. Holds a master in engineering physics from Lunds technical institute of technology.
Nivan Ferreira
Abstract: Cities are complex environments that house the majority of the world's population and the urban population is going to continue growing in the following decades. For this reason, an enormous problem faced by governments and urban planners is how to plan for this new surge of people while solving the already challenging scenarios of the present. The explosion in the volume of data about urban environments has opened up opportunities to better inform both policy and administration, thereby helping governments to overcome constant challenges of improving/increasing the quality of public services and promoting sustainable development. By taking advantage of modern computer graphics, analytical techniques, and the power of the human visual system, interactive visualization systems are powerful tools that help to make sense of large data collections. In this talk, I will describe recent efforts in designing visualization systems and techniques that allow analysts to explore and analyze large collections of urban data interactively. These techniques include visual, algorithmic, and pattern mining aspects and have been applied to help domain experts in the fields of urban planning, transportation engineering, and architecture.
Bio: Nivan Ferreira is an Assistant Professor at Universidade Federal de Pernambuco (UFPE) in Brazil. He received a BSc in Computer Science and MSc in Mathematics from UFPE and PhD in Computer Science from New York University. Nivan was also a Post-Doc at the Department of Computer Science at the University of Arizona and visiting professor at Université Paris-Saclay. Nivan’s research focuses on many aspects of interactive data visualization, in particular systems and techniques for analyzing spatiotemporal datasets. Affiliation: Universidade Federal de Pernambuco (Brazil). Homepage: https://www.cin.ufpe.br/~nivan/
Simone Fischer-Hübner
Abstract: Big Data Analytics offers many opportunities, but may also pose privacy risks if personal data are analysed. For protecting privacy and achieving privacy by design in compliance with the GDPR, different types of privacy-enhancing technologies (PETs) for privacy-preserving data analytics have been developed.
This talk provide an overview of such PETs and discusses technical opportunities, their limitations and challenges of explaining their privacy functionalities to end users for enhancing trust.
Bio: Simone Fischer-Hübner has been a Full Professor at Karlstad University since June 2000, where is the head of the Privacy& Security (PriSec) research group. Moreover, since April 2022 she is a part-time Guest Professor at Chalmers University of Technology.
She received a PhD (1992) and Habilitation (1999) Degrees in Computer Science from Hamburg University, where she was an Assistant professor from 1992 – 2000. Moreover, she was a Guest Professor at Copenhagen Business School in 1994/1995 and Stockholm University / Royal Institute of Technology (KTH) in 1998/1999.
She has been conducting research in privacy, cyber security and privacy-enhancing technologies for more than 35 years. She is member of Cybersecurity Council of the Swedish Civil Contingency Agency (MSB), board member of the Swedish Data Protection Forum (Forum för Dataskydd), member of the management board of Cybercampus Sweden and member of the board for the Privacy Enhancing Technology Symposia (PETS) and for the NordSec conferences. Moreover, she is the Swedish representative of IFIP (International Federation for Information Processing) TC 11 (Technical Committee on Information Security and Privacy Protection) and is the IFIP TC 11 vice chair.
She has been partner of several EU Horizon cybersecurity and privacy projects including CyberSec4Europe and the EU Marie Curie project Privacy&Us, for which she was the scientific coordinator. Moreover, she coordinates the Swedish IT Security Network (SWITS) and the newly started Swedish industrial graduate school on cybersecurity (SIGS-CyberSec).
She was awarded with an Honorary Doctorate by Chalmers University of Technology in 2021 and received the IFIP WG11.11 William Winsborough Award (2016), two Google research Awards (2011, 2012) and the IFIP Silver Core Award (2001).
Niels Gundermann
Abstract: The administration of agricultural subsidies becomes more difficult due to new and more complex regulations given by different authority levels, i.e. the European union or local authorities. To cope with the enormous amount of work related to multiple administration processes, there is a need for computer systems capable of assisting authorities with specific tasks. There are many tasks that are solved utilizing deterministic algorithms. However, there are still a lot of tasks that require a large amount of human labour, e.g. in cadastral work based on aerial photographs or when interpreting various documents. The tasks remaining to the officers are often clearly defined and organized in a way that the officers can focus on just a few, but still complex, things when dealing with these tasks. This is a good foundation for applying and evaluating machine learning approaches specialized for these different tasks. Specifically for cadastral work, we evaluated two approaches that contribute to the overall goal of ensuring the correct delineation of agricultural parcels, which is relevant for the calculation of subsidies in a later administrative step. First, we evaluated an approach to identify human made objects on a parcel that need to be excluded from the eligible area of the parcel. Second, we evaluated an approach to verify the plausibility of an agricultural parcels geometry by examining the parcels area according to homogeneous coverage with a single crop group. According to the authorities' experiences when dealing with these tasks in the past, the approach evaluated for the identification of human made objects resulted in a workload reduction of 50%, while the approach evaluated for the verification of a parcels geometry is likely to increase the efficiency from 47% to 100%.
Bio:
2011-2013 B.Sc. in Computer Science and Business Management at Nordakademie, Germany
2013-2023 Software Engineer at data experts GmbH
2014-2019 M.Sc. in Practical Computer Science at Fernuniversität Hagen, Germany
Since 2023 Industry PhD-Student at Linneaus University in collaboration with data experts GmbH
Tora Hammar
Abstract: Tora will introduce session 5 with giving an overview of research at Linnaeus University regarding AI to support decisions in health care. Focus will be on the ongoing research about using AI to better predict side effects from medications (adverse drug events, ADEs). It is common that patients use many different medications simultaneously, which can lead to an increased risk of ADEs. In the projects health care data from over a decade is used to see how well the current rule based decision support systems (so called “expert systems”) can predict ADEs, and how machine learning can be used to improve those predictions. Tora will also describe a new related research project where large language models are used to identify side effects and ADEs in clinical text.
In session 6 Tora will give a short introduction to the possibilities and challenges with health data in Sweden. During the first half of 2024 Tora has been working with a study about how the Regions in Sweden are working with their health data, to describe the challenges and needs related to secondary use of health data.
Bio: Tora is an associate professor and senior lecturer in health informatics at the eHealth Institute, Linnaeus University. She is a pharmacist with a PhD in biomedical sciences and has been working for almost 15 years with research about informatics and decision support related to medication management. Tora is the research leader for the eHealth group of the Linnaeus University Centre for Data Intensive Sceiences and Applications (LnuC DISA, i.e. the host of the Big Data conference). She is also a part of The European Digital Innovation Hub called HDS (Health Data Sweden) among other things.
Sebastian Hönel
Abstract:
In session 1 Sebastian Hönel will talk about Normalizing Flows (NF) are a family of deep density estimation models. As such, they allow for exact approximation of and sampling from high-dimensional joint densities. NFs have previously been used to estimate the joint distribution of non-anomalous industrial parts. In visual inspection, the key idea is to gather a comprehensive understanding exclusively from imagery of what constitutes a “good part”, in order to mark out-of-distribution samples or outliers as anomalies. While this has been shown to work well under certain conditions, it currently still suffers from a limited understanding of when a sample should actually be considered an anomaly. In practice, a non-reducable residual estimation error remains and leads to an approximation of the true distribution. Worse, generalization makes low-likelihood samples indistinguishable from true out-of-distribution samples, actual anomalies, and samples accepted as part of the desired generalization itself. Another problem is the natural variance found in image datasets (non-constant feature embeddings) that frequently leads to over-/underestimation of the true likelihood. In this talk, we will demonstrate some of the core mechanics of NFs, such as bijectivity (invertibility) between the data- and chosen base-distribution, along with various choices of base distributions and their impact on the model. Furthermore, we will review dynamic anomaly thresholds, conditional and complementary sampling, as well as some current ideas for reducing natural variance in image datasets.
Session 3: Understanding the purpose behind software commits is crucial for enhancing software quality and the development process. We explore the classification of commits using both commit messages and the committed source code, leveraging large language models (LLMs) in the process. Most current LLMs can not only understand natural language, but also a variety of programming languages. The intent behind a commit can be directly derived from the modifications it makes to an existing code base. LLMs have been previously applied to, e.g., extract the commit purpose from the message or to compose appropriate commit messages from the affected source code. Other approaches extract fine-grained code changes, which are used in pre-trained neural language models. In this paper, however, we explore the aptness of readily available (off-the-shelf) state-of-the-art LLMs, both closed- and open source. Current LLMs have large input capacities and can hence be leveraged to process large quantities of source code. By engineering prompts that combine the commit message and the affected source code, we evaluate the performance of various models on available and previously labeled datasets. We evaluate the predictive power of such models using common metrics by classifying commits into one of the three most common maintenance activities (adaptive, corrective, perfective). Software maintenance activities play an important role in, e.g., software evolution, software quality analyses, (human) resource allocation, or the detection of managerial anti-patterns. By improving the accuracy of automatic commit classification, we enhance confidence in software quality assessments and process evaluations reliant on classification information. Our study contributes to advancing the understanding and utilization of large language models in commit classification, facilitating more informed decision-making and quality improvement in software development processes.
Bio: Sebastian is a postdoctoral fellow in the Computer Science and Media Technology Department at Linnaeus University. He is a former PhD student and part of the Linnaeus University Centre (LNUC) for Data Intensive Sciences and Applications (DISA). Sebastian has a background in software technology. As such, he is part of the research group for Data Intensive Software Technologies and Applications (DISTA). In this role, Sebastian directs attention toward software- and information analysis, which is concerned with, e.g., the qualitative and especially quantitative assessment of software engineering artifacts and related application lifecycle management data. More recently, Sebastian joined the research project for In-Line Visual Inspection Using Unsupervised Learning. In this project, the concern is the automatic assessment of the quality of mass-produced industrial parts. As part of this project, Sebastian scrutinizes methods for Deep Density Estimation, with a special focus on Normalizing Flows. These are a family of generative models that explicitly model likelihood and, therefore, are a potential candidate for quality assessment and anomaly detection in unsupervised and zero/few-shot settings."
Nils Johansson
Abstract: Architecture refactoring is a big challenge and requires thorough analysis and labor-intensive, error-prone activities to restructure functionalities from a legacy architecture to a new intended one. Indeed, source code should be adapted to match the new structure. In this context, automatically mapping source code to the intended architecture would significantly reduce manual work and prevent technical debt. To this end, we aim to map methods to architectural modules solely defined by textual descriptions, i.e., formulated as a machine learning text classification problem. Methods are mapped into modules using different approaches. Early results show that vectorizing text and code using large language models outperforms other techniques. The different applied machine learning classifiers perform comparably well, where the best attain accuracy of around 40% and F1-score of around 30%.
Andreas Kerren
Abstract: InfraVis is a national research infrastructure funded by the Swedish Research Council (Vetenskapsrådet) in support of scientific advancement through the application of state-of-the-art data analysis and visualization techniques. Our mission is to provide Swedish researchers with visualization services through a distributed and adaptable team of experts, thus elevating their global scientific impact. This talk will provide an overview of the infrastructure, its current state in terms of supported projects and activities, as well as its importance for researchers at Linnaeus University and how to come into contact.
Bio: Andreas Kerren is a full professor with the Department of Science and Technology, Linköping University, Norrköping, 60233, Sweden, and the Department of Computer Science and Media Technology, Linnaeus University, Växjö, 35195, Sweden. He holds the Chair of Information Visualization at LiU and is head of the ISOVIS group at LNU. He is also an ELLIIT professor supported by the Excellence Center at Linköping–Lund in Information Technology and key researcher of the Linnaeus University Centre for Data Intensive Sciences and Applications. His research interests include several areas of information visualization and visual analytics, especially visual network analytics, text visualization, and the use of visual analytics for explainable AI. He received his Ph.D. degree in computer science from Saarland University, Saarbrücken, Germany, in 2002. Contact him at andreas.kerren@lnu.se.
Claudio D. G. Linhares
Abstract: Physicians are constantly challenged with time constraints and the need for effective decision-making tools in healthcare settings. Despite the digitization of patient records, the process of analyzing electronic health records (EHRs) remains complicated, particularly in resource-constrained environments. In this presentation, we introduce ClinicalPath, a dynamic visualization tool designed to evaluate EHRs and enhance clinical decision-making. Through collaborative efforts with medical experts, ClinicalPath provides a longitudinal view of a patient's clinical journey, facilitating efficient diagnoses and treatment planning. Our validation studies demonstrate the efficacy of ClinicalPath in improving physicians' decision-making processes, allowing for more confident and time-effective patient care.
Bio: Claudio Linhares is a Senior Lecturer at Linnaeus University, at the Department of Computer Science and Media Technology Faculty of Technology, in Växjö, Sweden.
His research interests include information visualization, network visualization, visual scalability, data science, and human-computer interaction, with applications especially in healthcare.
Alisa Lincke
Abstract: Over the years, researchers have dedicated extensive efforts to develop effective methods and algorithms to detect and recognise adverse drug events (ADEs) within the information stored in electronic health records (EHRs), particularly unstructured data such as clinical notes. The research in the field evolved alongside advancements in natural language processing (NLP) and machine learning techniques. The integration of transformer-based models and deep learning architectures has demonstrated state-of-the-art performance in ADE detection tasks such as medical named-entity recognition (NER) and relation extraction (RE). This study shows preliminary results of using the SweDeClin-BERT model to identify ADEs in clinical notes compared to a traditional machine learning approach.
Short bio: Alisa Lincke is Senior Lecturer at Linnaeus University, at the Department of Computer Science and Media Technology Faculty of Technology, in Växjö, Sweden. She works with data-driven applications, decision support systems (DSSs), and machine learning, focused mainly on the healthcare sector.
Welf Löwe
Abstract: Large language models (LLMs), such as ChatGPT and Copilot, are transforming software development by automating code generation and, arguably, enable rapid prototyping, support education, and boost productivity. Therefore, correctness and quality of the generated code should be on par with manually written code. To assess the current state of LLMs in generating correct code of high quality, we conducted controlled experiments with ChatGPT and Copilot: we let the LLMs generate simple algorithms in Java and Python along with the corresponding unit tests and assessed the correctness and the quality (coverage) of the generated (test) codes. We observed significant differences between the LLMs, between the languages, between algorithm and test codes, and over time.
Bio: Welf Löwe has been a professor in computer science at Linnaeus University since 2002, after PhD studies in Karlsruhe (Germany) and a PostDoc fellowship in Berkeley (California, USA). With a background in parallel programming, compiler construction, and program analysis, his research centers around AI, machine learning, and computer vision as tools to build software. He is the leader of the university's research excellence center DISA (Data Intensive Sciences and Applications) and the industrial research school DIA (Data Intensive Applications). Welf Löwe runs and participates in many interdisciplinary and industrial collaboration projects and leads several courses for professionals in AI and machine learning. In addition to his role as a researcher and teacher, Welf Löwe is an entrepreneur and has co-founded and contributes to different companies.
Rafael M. Martins
Abstract: Artificial intelligence (AI) has revolutionized decision-making processes and enhanced efficiency in various sectors and areas of knowledge and application. However, the adoption of AI models in critical domains such as healthcare, finance, and criminal justice can be hindered by their inherent opacity. The difficulty to comprehend the rationale behind AI-driven decisions can pose significant challenges, including lack of trust, legal implications, and ethical concerns. In response, the concept of Explainable AI (XAI) has emerged as a pivotal area of research, aiming to elucidate the underlying mechanisms of AI models and enhance their interpretability. In this talk I will discuss the synergy between Explainable AI and Visual Analytics, exploring how visualizations can serve as powerful tools for elucidating AI decision-making processes. I will discuss some of my own work and other interesting state-of-the-art developments in the area, with some examples of how these techniques can be applied in practice to foster trust, transparency, and accountability, hopefully bridging the gap between AI sophistication and human understanding.
Bio: Rafael M. Martins is a Senior Lecturer in the Department of Computer Science and Media Technology at Linnaeus University. His research is mainly focused on the combination of interactive visualization and modern unsupervised learning techniques for the investigation, exploration, and interpretation of patterns in large-scale and high-dimensional datasets that emerge from complex real-life systems.
Arian Ranjbar
Abstract: Scandinavia produces a substantial amount of healthcare data. In Norway, the government has explicitly requested the utilization of this data to improve the healthcare sector, both nationally and internationally by exporting data driven solutions – a phenomenon often summarized as “healthcare data becoming the new oil”. While the quantity of the data is large, there is less discussion about its quality. Additionally, strict regulations in the healthcare space, despite their good intentions, might also hinder innovation.
This talk will highlight the challenges of working with healthcare data and how to overcome them, including issues related to data accessibility and reliability, regulations and ethics. Particularly our solution at Akershus University Hospital, where we have developed an infrastructure enabling fast development cycles (including clinical trials of algorithms) on large-scale, real-world, multi-modal healthcare data.
Bio: Arian Ranjbar, PhD, holds the position as Head of Research at Medical Technology and E-Health, Akershus University Hospital, where he also leads the research group “Artificial Intelligence and Medical Informatics (AIM)”. He did his PhD at Chalmers University of Technology and University of California, Berkeley, specializing in machine learning implementations for safety critical systems. His previous research involves statistical modelling, unsupervised learning, confidence estimation in machine learning models and AI monitoring systems. Today, he works on data driven healthcare solutions, from infrastructure and data pipelines tailored for healthcare data, to machine learning development.
Mehdi Saman Azari
Abstract: Within Industry 4.0, efficient fault diagnosis plays a pivotal role in predictive maintenance of industrial machinery. However, the challenge lies in the significant domain shift between the source (training) and target (testing) domains, which hampers the application of machine learning in engineering practice. Several approaches based on transfer learning have been proposed to cope with the lack of training data in the target domain and the related domain adaptation challenges. Those approaches leverage the knowledge from similar source domains, including related real-world applications or lab machines. Unfortunately, access to sufficient faulty data from such source domains is often restricted due to insufficient history of faults in real machines, as well as difficulties to get labeled datasets from lab machines, which is time-consuming and sometimes unfeasible. To tackle those issues, this study proposes a novel diagnostic framework integrating digital twins and transfer learning to mitigate the limitations posed by insufficient training datasets and domain discrepancies. By leveraging digital twins, training datasets are generated as the source domain, while introducing a model update strategy based on parameter sensitivity analysis to enhance adaptability. In addition, the partial transfer diagnostic model, incorporating a double-layer attention mechanism, enables to cope with data distribution discrepancies between digital twins and real machines, as well as inconsistencies in label spaces across domains. The diagnostic framework is validated on an industrial rotating machine case study, where faulty behaviors originated by defects on the inner race, outer race, and ball of the bearing are considered. Real data from two publicly available datasets are leveraged. The results of the experimental analysis have been compared with state-of-the-art methodologies: the proposed approach is able to improve the diagnostic accuracy by over 11\% in the specific case study. Therefore, the approach can effectively increase equipment reliability, optimize maintenance, and enhance operational efficiency.
Bio: Mehdi Saman Azari received his B.Sc. degree in Electrical Engineering from the University of Tabriz, Tabriz, Iran, in 2013, and his M.Sc. degree in Control Engineering and Industrial Automation from Tarbiat Modares University, Tehran, Iran, in 2016. He is currently pursuing a Ph.D. in the Department of Computer Science at Linnaeus University, Sweden. His main research interests include domain adaptation, domain generalization, and digital twins, particularly their applicability to the predictive maintenance of cyber-physical systems (CPS).
Maria Ulan
Abstract: Critical Embedded Systems (CES) are systems in which even minor faults can have major consequences. In recent years, the amount of software integrated into CES has grown significantly. The integration of AI, particularly Deep Learning (DL) techniques, has grown exponentially in CES. However, the data-dependent and stochastic nature of DL algorithms conflicts with the traditional functional safety standards that rely on deterministic and verifiable outcomes. In this talk, we will discuss how to design, implement, qualify, and certify DL-based CES software products, bridging the gap between cutting-edge AI advancements and strict functional safety requirements.
Christo van Zyl
Abstract: Early detection of spruce bark beetle (Ips typographus) infestations are crucial for preventing their spread and protecting forests. Our research has led to the development of five computer vision models that identify specific signs of the beetle's lifecycle, enabling immediate detection upon infestation.
In addition, we have created an electronic nose that detects Biological Volatile Organic Compounds (BVOCs), providing an additional layer of confirmation. Our latest innovation is a biosensor that we are developing, further enhancing detection capabilities.
This self-funded project aims to make a measurable positive impact for the 2024 spring/summer season.
Bio: After high school, Christo van Zyl spent five years attempting to join the South African Air Force as a pilot. During this period, he earned a Diploma in Travel & Tourism and later completed a degree in Commerce with a major in Information Systems from the University of Cape Town. Following an ocean crossing in a racing yacht, he was finally accepted into the Air Force.
Christo quickly rose through the ranks, serving as both a helicopter pilot and a fixed-wing pilot instructor. He commanded 19 Squadron, which trained and operated helicopters in various African conflicts, and ultimately became the second-in-command of an Air Force base. He received four medals, (one from the United Nations) and was the pilot for three presidents in South Africa.
In August 2022, Christo moved with his family to Sweden, where he rebranded and entered the field of innovation. Through his involvement with Science Park Värnamo and Jönköping, he now leads 'Chaos Out Of Order,' an initiative dedicated to early detection of the spruce bark beetle to mitigate outbreaks in the Northern Hemisphere.
Registration
Programme Committee
- Johan Fransson, Professor
- Jukka Tyrkkö, Professor
- Tora Hammar, Associate Professor
- Welf Löwe, Professor
A Sustainable Event
Big Data 2024 is - of course! - a sustainability-assured meeting in accordance with Linnaeus University’s guidelines for sustainable events. These guidelines are linked to the 17 global goals in Agenda 2030 and comprise the three dimensions of sustainable development: the economic, the social, and the environmental.
Learn more about Linnaeus University´s sustainable events here.