Big Data 2018
Welcome to our 4th annual Big Data Conference at Linnaeus University.
This year's host for the conference is Linnaeus University Centre for Data Intensive Sciences and Applications (DISA). During day 1 researchers from our own Centre to present some of the work that has taken place within DISA this past year and Keynote and invited talks from other universities and partners. Day 2 will be more focused on workshops and seminars
When? Thursday November 29th 2018 at 9.30 – Friday November 30th 2018 16.00
Where? Room Myrdal (K Building), Linnaeus University, Växjö
About the conference
On Thursday November 29th the conference opens with Coffee and registration, poster mingle in the open area. During the day longer talks with an academic focus will be mixed with poster mingles during meals. We invite everyone that has an interest in Big Data and data intensive applications. The presentations during the conference will be held in English. In the evening there will be a social activity.
On Friday November 30th there will be a focus on seminars and workshops.
Thursday November 29th 2018
09.30 – 10.00: Coffee and registration, poster mingle
10.00 -10.30: Welcome and practical information + Opening speech - Welf Löwe, Professor of Computer Science, Linnaeus University
10.30 – 11.15: ”Swedish national data service – possibilities for research and examples of projects” - Max Petzold, professor and director of Swedish national data service
11.15 – 12.00: The Z Garbage Collector – Erik Österlund, Oracle
12.00 – 13.30 Lunch and poster mingle
13.30 – 14.15 Presenting DISA results:
- 13.30-13.35: Tora Hammar
- 13.35-13.45: Automatic Classification Using DDC on the Swedish Union Catalogue – Koraljka Golub, Associate Professor Linnaeus University
- 13.45-13.55: Efficient Dynamic Time Warping for Big Data Streams – Rafael Messias Martin, Assistant Professor Linnaeus University
- 13.55-14.05: The effect of publishing peer review reports on referee behavior in five scholarly journals – Giangiacomo Bravo, Professor Linnaeus University
- 14.05-14.15: Applying Self-Adaptation to Automate the Management of Online Documentation of Telecom Systems – Morgan Ericsson, Associate Professor Linnaeus University
14.15 – 15.00 Digital transformations in the humanities - Mikko Tolonen, Professor, Helsinki University
15.00 – 15.30 Coffee
15.30 – 16.15 Efficient customer support service using AI - Anton Borg, Assistant Professor, Blekinge Institute of Technology
16.15 – 16.30 Closing remark for the day – Welf Löwe, Professor of Computer Science, Linnaeus University
Practical information - Tora Hammar
18.00 - Dinner /social activity
Friday November 30th 2018
09.00 – 12.00 seminar/workshop: parallel sessions with coffee somewhere between 10-11:
1: In Room K1073: Virtual Reality and Digital Humanities - Mikko Laitinen, Aris Alissandrakis, Nico Reski
2: In Room Myrdal:
09.00-10.00: Workshop/lecture on funding possibilities in EU's Horizon 2020 programme - Annett Wolf and Anthony Scully, EU Advisors from Grants and Innovation Office
10.30-12.00: Workshop: eHealth here and there - past, now and future – reflections from 10 years at the eHealth Institute and revisiting Silicon Valley in 1985/2018” - Göran Petersson, Senior Professor Health Informatics, Physician, eHealth Institute, Dept of Medicine and Optometry, Linnaeus University
3: In Room K1074
10.30-12.00: DISA computing as a service - Morgan Ericsson, Associate Professor Linnaeus University
12.00 - 13.00 Lunch
13.00 – 14.30 Discussion within DISA - what is going on and what do we need during next year? – Welf Löwe, Professor of Computer Science, Linnaeus Universit
14.30 – 14.45 Closing remark – future plans for DISA – Welf Löwe, Professor of Computer Science, Linnaeus University
Practical information - Tora Hammar
14.45 – 15.00 Coffee (possible for to go)
"Efficient customer support service using AI"
For the last year we have been working together with an industrial partner to investigate how to aid their customer support division. The customer support division handles e-mail based support errands for multiple customers all over the nation, covering around 40 different topics from invoice related support errands to technical support. Being able to automatically identify the content of each support e-mail allows specialization of the support personnel, further enabling a more efficient customer partner are presented, some of which have been implemented.
Anton Borg, Assistant Professor in Computer Science at Blekinge Institute of Technology.
Giangiacomo Bravo, Francisco Grimaldo, Emilia López-Iñesta, Bahar Mehmani and Flaminio Squazzoni:
"The effect of publishing peer review reports on referee behaviour in five scholarly journals"
This paper examines the effect of publishing peer review reports on referee behavior in five scholarly journals involved in a pilot study at Elsevier. By considering 9,220 submissions and 18,525 reviews from 2010 to 2017, we measured changes both before and during the pilot and found that publishing reports did not significantly compromise the referees' willingness to review, their recommendations, or the turn-around time. We found that younger and non-academic scholars were more willing to accept to review and provided more positive and objective recommendations. Male referees tended to write more constructive reports during the pilot. However, only 8.1% of referees agreed to reveal their identity in the published report. This suggests that open peer review does not compromise the process on condition that referees can protect themselves by anonymity.
Applying Self-Adaptation to Automate the Management of Online Documentation of Telecom Systems
Engineering software-intensive systems, such as production systems, is complex as these systems are subject to various types of changes that are often difficult to anticipate before deployment. Tackling this complexity requires joint expertise from different backgrounds. In this paper we focus on the problem of maintaining online technical documentation of telecom systems. In the context of continuous deployment and ever-changing user needs, high quality of the documentation of such products is in a key concern of users. To tackle this problem, different experts worked together equipping the online documentation system with a feedback loop. This feedback loop tracks changes in the system and its context and automatically adapts the documentation accordingly. The results demonstrate that this self-adaptation approach offers a viable solution to tackle the maintainability problem of online documentation of telecom systems.
Automatic Classification Using DDC on the Swedish Union Catalogue
With more and more digital collections of various information resources becoming available, also increasing is the challenge of assigning subject index terms and classes from quality knowledge organization systems. While the ultimate purpose is to understand the value of automatically produced Dewey Decimal Classification (DDC) classes for Swedish digital collections, the paper aims to evaluate the performance of two machine learning algorithms for Swedish catalogue records from the Swedish union catalogue (LIBRIS). The algorithms are tested on the top three hierarchical levels of the DDC. Based on a data set of 143,838 records, evaluation shows that Support Vector Machine with linear kernel outperforms Multinomial Naïve Bayes algorithm. Also, using keywords or combining titles and keywords gives better results than using only titles as input. The class imbalance where many DDC classes only have few records greatly affects classification performance: 81.37% accuracy on the training set is achieved when at least 1,000 records per class are available, and 66.13% when few records on which to train are available. Proposed future research involves an exploration of the intellectual effort put into creating the DDC to further improve the algorithm performance as commonly applied in string matching, and to test the best approach on new digital collections that do not have DDC assigned.
Rafael M. Martins and Andreas Kerren:
"Efficient Dynamic Time Warping for Big Data Streams"
Many common data analysis and machine learning algorithms for time series, such as classification, clustering, or dimensionality reduction, require a distance measurement between pairs of time series in order to determine their similarity. A variety of measures can be found in the literature, each with their own strengths and weaknesses, but the Dynamic Time Warping (DTW) distance measure has occupied an important place since its early applications for the analysis and recognition of spoken word. The main disadvantage of the DTW algorithm is, however, its quadratic time and space complexity, which limits its practical use to relatively small time series. This issue is even more problematic when dealing with streaming time series that are continuously updated, since the analysis must be re-executed regularly and with strict running time constraints. In this paper, we describe enhancements to the DTW algorithm that allow it to be used efficiently in a streaming scenario by supporting an “append" operation for new time steps with a linear complexity when an exact, error-free DTW is needed, and even better performance when either a Sakoe-Chiba band is used, or when a "sliding window" is the desired range for the data. Our experiments with one synthetic and four natural data sets have shown that it outperforms other DTW implementations and the potential errors are, in general, much lower than another state-of-the-art approximated DTW technique.
"Swedish National Data Service – possibilities for research and examples of project"
The purpose of Swedish National Data Service (SND) is to provide a coordinated and secure structure for describing, depositing, sharing, and finding research data. SND is governed by a consortium consisting of seven universities: University of Gothenburg, Karolinska Institutet, Lund University, Stockholm University, Swedish University of Agricultural Sciences, Umeå University, and Uppsala University. From 2018 to 2022 SND will undergo major changes to develop, grow, and expand its activities. Key activities include the development a secure system for restricted data (sensitive information) and handling BIG data for other major infrastructures. The organization supports nearly 30 universities in developing and running local data access units (DAUs) to support the researchers in their preparations of the research data and meta data.
Max Petzold is a professor in biostatistics and director of Swedish National Data Service.
"eHealth here, there - past, now and future – reflections from 10 years at the eHealth Institute and revisiting Silicon Valley in 1985/2018"
Göran Petersson is a senior professor in health informatics at the eHealth Institute, Linnaeus University. Göran will share his experiences and reflections from more than 10 years as director of the eHealth Institute at Linnaeus University and from his visits to Silicon Valley between 1985 and 2018. What has happened and what is going on within the field of eHealth and Big data?
"Digital transformations in the humanities"
Big data is impacting all academic fields, including the humanities. Humanities research culture is however quite different from STEM. Thus, the direct application of different methods and applications from other fields of science to humanities might not be straightforward for multiple different reasons. This talk will weigh some central aspects of the ongoing transformation caused by changes in data sources and analyses methods in the humanities. It will focus particularly on the Helsinki Computational History Group (led by Tolonen) and the way it combines big data, computational methods and (traditional) intellectual history. The talk will also discuss research infrastructures and the support needed to guarantee that the ongoing transformative period will be a successful one for the humanities.
Mikko Tolonen, Professor, Helsinki University, Finland
"The Z Garbage Collector"
The Z Garbage Collector (ZGC) is a new JDK garbage collector designed for low latency and high scalability. For an industry-standard latency-sensitive benchmark it achieves throughput over 36% higher than the existing JDK collectors while never exceeding a maximum pause time of two milliseconds. In this session we’ll explain how ZGC achieves this performance, and show how to use it.
Annett Wolf and Anthony Scully:
"Workshop/lecture from Grants and Innovation Office regarding EU funding"
The Grants and Innovation Office (GIO) is the university's support for you who need to find funding for your research and to develop your ideas and results. We will help you to take the next step, either for research or for the market. More information on our homepage. Annett Wolf and Anthony Scully will provide a short overview over the funding possibilities in Horiozon 2020, the European Union’s Programme for Research and Innovation, and the support the GIO can offer.