Xray of human body with unreadble text overlaid

About the Prediction Modelling Group


Recent years have seen a shift from traditional evidence-based medicine towards precision medicine, an emerging approach that focuses on identifying treatments or approaches that are effective for an individual, based on genetic, environmental, sociodemographic and lifestyle factors. Nowadays, this includes data from imaging, patient records, omics, wearables and smartphones.

Big Data

As a result, ‘big data’-driven prediction modelling approaches are increasingly needed to improve classification of disease, to predict the likelihood of development of a disease, to make a prognosis about the likely course of a clinical condition, and to predict treatment outcomes or improve treatment selection (personalised medicine). Precision medicine has, therefore, become increasingly synonymous with the analytical approach of prediction modelling.

The use of big data is usually discussed in the context of improving medical care, but its use for preventing disease has become increasingly important in recent years.

Precision medicine

Precision medicine has made progress in many medical areas with well-defined diseases such as cancer. Psychiatry, in contrast, has not yet benefited from new technologies that are now integral to other areas of medicine. Only a few clinical prediction models are available, and even these should be regarded as preliminary and are not usually implemented in national health systems such as the NHS.

Why is this the case? The analysis of such data is often more challenging in psychiatry than in other areas of medical research because of the complexity of the conditions, the lack of candidate genes, measurement error in both potential predictor and outcome variables, and challenges in data collection. An incomplete list of potential problems can be found below.

Prediction models in psychiatry 

The development of prediction models in psychiatry is often more challenging than in other medical research areas because:

i. Psychiatrists study traits which are not easily measurable, and are often measured indirectly, for example via questionnaires (measurement error).

ii. Although mental disorders typically show a strong heritability, genetic variants for most traits account for far less than 1% of the phenotypic variability.

iii. Susceptibility genes are often common variants rather than mutations, which means targeting specific genetic changes (as in cancer) is not possible.

iv. The definition of mental disorders is often very broad and often includes distinct but unknown subcategories. Different mental health problems have similar phenotypes and many mental (and physical) health problems occur together (comorbidity).

v. Patient recruitment is difficult and there is a high drop-out rate in many studies. Patients often do not adhere to treatments.

vi. Restrictive inclusion criteria, self-selection bias or case-healthy control studies can result in samples that are not representative. Studies using such non-random samples are often used in psychiatric research. Models with good internal validity typically show poor external validity and do not generalise to the clinical population.

vii. Treatment interventions often have several interacting components (complex interventions) and it is often difficult to measure the ‘active ingredients’ of an intervention.

viii. Key data, such as electronic health records of mental health patients, are typically in narrative text or other unstructured data types. These require non-conventional data analysis technologies such as Natural Language Processing to take full advantage of them.

ix. Models are often not properly internally validated and calibrated. Often too much emphasis is given to the comparison of different machine learning algorithms than to the development of a clinically useful tool.