Title: Foundation of multi-criteria quality scoring
Subject: Computer and information science
Faculty: Faculty of technology
Date: Friday 7 June 2019 at 10.15 am
Place: Room B2018, building B, Växjö
External reviewer: Dr Harald Störrle, principal IT consultant, QAware GmbH, München, Germany
Chairperson: Dr Diego Perez, department of computer science and media technology, Linnaeus University
Supervisor: Dr Anna Wingkvist, department of computer science and media technology, Linnaeus University
Examiner: Professor Danny Weyns, department of computer science and media technology, Linnaeus University
Software quality becomes more critical as our dependence on software increases. We need better quality assessment than ever. Comparison and ranking of software artifacts, detection of bad or good quality are important tasks for quality assessment.
Software quality models are widely used to support quality assessment. In general, they have a hierarchical structure and defines quality in terms of sub-qualities and metrics in a tree-like structure. Different metrics evaluate different quality criteria, and several metrics often needs to be assessed and aggregated to obtain a total quality score. The quality models standards of today do not enable numerical metrics aggregation. They leave aggregation to decision makers, and different methods of aggregation lead to different assessment results and interpretations. Hence, there is a need to define metrics aggregation formally based on well-known theories.
We propose to consider the probabilistic nature of quality as a solution. We consider metrics as random variables and define quality scores based on joint probabilities. The aggregation, and the quality model in extension, express quality as the probability of detecting something with equal or worse quality, based on all software projects observed; good and bad quality is expressed in terms of lower and higher probabilities. We analyze metrics dependencies using Bayesian networks and define quality models as directed acyclic graphs. Nodes correspond to metrics, and edges indicate dependencies. We propose an implementation using multi-threading to improve the efficiency of joint probabilities computations.
We validate our approach theoretically and in an empirical study on software quality assessment of approximately 100,000 real-world software artifacts with approximately 4,000,000 measurements in total. The results show that our approach gives likely results and scales in performance to large projects.
We also applied our approach to a multi-criteria decision-making task to propose a ranking method to aid evaluation processes. We use a real-world funding allocation problem for a call that attracted approximately 600 applications to evaluate our approach. We compared our approach with the traditional weighted sum aggregation model and found that ranks are similar between the two methods, but our approach provides a more sound basis for a fair assessment.
Further, we implemented an exploratory multivariate data visualization tool, which visualizes the similarities between software artifacts based on joint distributions. We illustrate the usability of our tool with two case studies of real-world examples: a set of technical documents and an open source project written in Java.
Our overall results show that our approach for multi-criteria quality scoring is well-defined, has a clear interpretation, and is applicable under realistic conditions, generalizable, and transferable to other domains.