Full title of the doctoral project
Reuse of health data, combing the best of two worlds - Automatic generation of features with high predictive power based on domain knowledge
Welf Löwe, Tora Hammar, Pär Wanby
Linnéuniversitetet, Region Kalmar län
Region Kalmar län
2022 Oct 1 –
Computer and Information Science (Department of Computer Science and Media Technology, Faculty of Technology)
Research groups/Center for advanced research
E-health – Improved Data to and from Patients
The Linnaeus University Centre (Lnuc) for Data Intensive Sciences and Applications (DISA)
More about the project
In short terms the PhD-project describes, evaluates, and automates the process of medical registry research involving close cooperation with domain experts, data analysts, machine learning driven feature engineering and analysis of generated metadata.
The knowledge discovery process is an iterative process starting with a pattern or vague scientific questions including many diverse describing factors and relationships resulting in unexpected information or new knowledge. The medical researcher is often a domain user or expert on the overall medical or physiological theory. The researcher, has for instance, knowledge about the relations between medical drugs, physiological effects, medical treatments etc. The researcher wants to get information that can support and prove a medical research hypothesis.
Performing knowledge discovery, a.k.a. datamining, on big health data from electronic health record (EHR) can be time consuming and is limited by unstructured data and low registration quality. A key factor for successful knowledge discovery is using domain experts in close cooperation with data scientist when performing data mining.
Feature engineering is generating new or modified variables, also called features, with aim to have high prediction power. By studying and evaluating medical registry research projects a foundation for a theoretical model were designed and named Knowledge Driven Feature Engineering (KDFE). Further on an automated KDFE (aKDFE) were developed. By using aKDFE less time and domain knowledge resource need to be involved without losing explainability and performance on developed prediction models.
Other related results are:
- In depth description of the medical research process from initial research question to published paper when performing registry studies on real world medical data
- Answers to the question: Why data scientist’s involvement leads to better predictions
- Ethical aspects of post-result constructed hypothesis when performing machine-learning.
The project is part of the research in the research group E-health – Improved Data to and from Patients and in Linnaeus University Centre for Data Intensive Sciences and Applications (DISA) and eHealth Institute.