Data-driven Software and Information Quality

Within the research area Data-driven Software and Information Quality, the objective of Linnaeus University Centre for Data Intensive Sciences and Applications (DISA) is to investigate the approaches to measure quality in software and information and possibly improve them. If you provide us with interesting data and questions, we will do our best to provide answers.

Quality is a central concept in business, engineering, and manufacturing. Most people want to control, manage, and assure it, but to do so, there must be an understanding of how to measure it. In software and information, this is not always clear. Many approaches have limited empirical evidence to support them. The researchers within Data-driven Software and Information Quality will investigate the approaches, provide evidence, and possibly improve them.

The researchers focus on the following:

  • Improve the understanding of how to express and measure quality of digital artifacts, e.g. models, source code, documents.
  • Provide an experimentation platform to test hypotheses.
  • A data-driven method to formulate a quality requirement for a system, to assess its quality over time, and to suggest improvements to both the requirement and the system. Contribution of the experimentation platform will leverage the state of the art in various fields, e.g. program analysis, language processing, security of processing and storage of information.

Gather, enrich, classify, store, analyse

The researchers gather data from software repositories, bug databases, web sites, mailing lists, chat logs, etc. The data is enriched, classified, and stored in a database for future analysis. The enrichment process relies on program analysis, natural language processing etc. Queries against this database will be domain specific and test hypotheses regarding quality assessment. Software quality assessment relies on models and measurements that approximate qualities such as maintainability. Historic data from software repositories can be used to formulate and validate such models and measurements. By adding other types of sources, e.g., bug reports and feature request, will improve our understanding of why some qualities change. We expect to analyze millions of projects, developers, and web pages to capture different domains, languages, and maturity levels.

We want to make software and information quality tangible. We want to make it a tool that can be used to better value a piece of software or documentation and guide future directions. This allows us to answer question such as how much future effort will we save if we make this change now? From a societal perspective, this should help increase the quality of both software and technical information, since it becomes something we can better reason about.

To better understand software

An important step towards our goal is to better understand software (and information) and how it is developed. A better understanding could have big impacts, for example better and more efficient tools, a greater deal of automation in the development process etc.

We value external collaboration and some of the most interesting research questions are born from discussions with practitioners with real-world problems. Our offer is basically that if you provide us with interesting data and questions, we will do our best to provide answers. If you already have answers, we are happy to check if we agree with them.

Data-driven Software and Information Quality is an application area within the Linnaeus University Centre of Excellence (LNUC) for Data Intensive Sciences and Applications.