Medicine in a jar of glass

Seed project: Using Natural Language Models for Extracting Drug-Related Problems (NLMED)

The overall goal of the research in this seed project within the Linnaeus University Center for Data Intensive Sciences and Applications (DISA) is to to explore the state-of-the art of natural language models in order to find information in clinical text relevant to adverse drug effects (ADE)s.

Project information

Although suspected or confirmed ADEs can be documented explicitly in structural data (i.e. certain diagnosis codes) of electronic health records (EHR)s, or reported to the medical products agency this is often not done. Instead many suspected or confirmed ADEs are instead only described in textual form in free text of EHR. Therefore, it is difficult to monitor ADEs from structured data and there is a clear need to study the clinical texts to find and extract the indications of possible ADEs.

Using Natural Language Models for Extracting Drug-Related Problems from Clinical Text of Electronical Health Records - NLMED
Alisa Lincke, Elizaveta Kopacheva, Tora Hammar, Olof Björneld, Morgan Eriksson
Project period
July 2023-March 2024
Linnaeus University Centre for Data Intensive Sciences and Applications (DISA)
Core research areas
E-health, health informatics, pharmacovigilance, computer science, natural language processing
Research group
Linnaeus University Centre for Data Intensive Sciences and Applications (DISA)

More about the project

Pharmacovigilance is an essential field that focuses on preventing and detecting adverse drug effects (ADE) to ensure safe medication for society. ADEs account for a significant number of hospital admissions globally, and in Sweden, they are the seventh leading cause of death. Detecting ADEs from clinical texts is challenging due to various factors such as patient characteristics, medications, and the properties of drugs. Clinical texts, including electronic health records (EHR), are vital sources for documenting suspected ADEs. However, clinicians often fail to provide explicit diagnosis codes for ADEs, making it difficult to monitor them from structured data. Therefore, there is a need to study clinical texts to extract indications of possible ADEs.

Previous research has identified three subtasks in ADE detection: named entity recognition (NER) to extract relevant information, relation extraction (RE) to establish relationships between entities, and integrated NER-RE to determine the presence of ADEs. Early approaches to ADE detection were rule-based, but their performance fell short of desired standards. Recent studies have combined rule-based methods with machine learning techniques, leveraging the power of supervised machine learning for more interpretable results.

However, the accuracy of ADE detection models remains unsatisfactory, with imbalances between precision and recall metrics. Previous studies have focused on specific geographical areas and examined effects of only one type of medication, limiting their generalizability. The recent advancements in natural language processing (NLP), such as fine-tuning models like BERT, have shown notable improvements in ADE detection within clinical texts.

The objectives of this proposal include:

  • conducting a literature review of state-of-the-art NLP models for ADE detection,
  • applying for ethical approval to access clinical texts,
  • creating a labeled dataset of 2000 clinical texts,
  • annotating the texts for training/validation/testing,
  • selecting and applying two NLP models to the data,
  • conducting a pilot study on NLP feature engineering,
  • writing a manuscript for a conference, and
  • writing application for external funding

What is a seed project?

A seed project is a minor project funded by a knowledge environment or a research group at the university. The aim is to launch and promote excellent research. Depending on the financier, a seed project may be to idenfify new or deepen existing collaborations, preferably cross-disciplinary ones, to explore possible research issues in a feasibility study, to collect empirical material, or to write an application for external funding.

DISA's seed projects

Learn more about the seed project concept and DISA's other seed projects.