We use cookies on our website to make it clear, useful and reliable.In order to achieve this and to provide certain personalised features we store a small amount of data about you. To learn more visit Cookies: How we use the information on our website. By navigating from the front page to other sections of our website, you are consenting to information being stored.
To be notified of upcoming presentations and be added to the prediction modelling distribution list, please email Dr Raquel Iniesta on raquel.iniesta@kcl.ac.uk. Note that you do not have to be affiliated with King’s College London or NIHR Maudsley Biomedical Research Centre - our talks are open to everyone with an interest in Prediction Modelling.
Dr Dominic Oliver | Are our predictions fair? Assessing and addressing algorithmic bias in a transdiagnostic risk calculator for psychosis | 28 May 2025
Speaker: Dr Dominic Oliver is a postdoctoral researcher at Department of Psychiatry, University of Oxford and a visiting research associate at Department of Psychosis Studies, King's College London.
Abstract: Precision medicine aims to use data and clinical prediction models to inform patients' care. The data we use has inherent biases, which can therefore be perpetuated in precision medicine approaches. For ethical provision of precision medicine, we need to better understand the biases in our model outputs and, if possible, address them. We have developed, validated and implemented a clinical prediction model for identifying people at risk for psychosis in secondary health care but we do not know the biases associated with its predictions. In this talk, I will use this model to outline the limitations of current recommendations for assessing these biases, how to improve them and how we can address the biases we identify.
Speaker: David Wissel is a fourth-year PhD student shared between ETH Zurich and the University of Zurich
Abstract: Chromatin interactions provide insights into which DNA regulatory elements connect with specific genes, informing the activation or repression of gene expression. Understanding these interactions is crucial for assessing the role of non-coding mutations or changes in chromatin organization due to cell differentiation or disease. Hi-C and single-cell Hi-C experiments can reveal chromatin interactions, but these methods are costly and labor-intensive. Here, I will introduce our computational approach, UniversalEPI, an attention-based deep ensemble model that predicts regulatory interactions in unseen cell types with a receptive field of 2 million nucleotides, relying solely on DNA sequence data and chromatin accessibility profiles. Demonstrating significantly better performance than state-of-the-art methods, UniversalEPI, with a much lighter architecture, effectively predicts chromatin interactions across malignant and non-malignant cancer cell lines (Speaman’s Rho of > 0.9 on unseen cell types). This model represents a significant advancement in in-silico 3D chromatin modeling, essential for exploring genetic variant impacts on diseases and monitoring chromatin architecture changes in organism development.
Speaker: Valentina Boeva is a Tenure Track Assistant Professor at the Department of Computer Science, ETH Zurich
Abstract: Patient and public involvement and engagement (PPIE) is well-embedded in applied health care research in the UK, with PPIE a funding requirement. Unlike applied health research, there is little research on how to conduct meaningful PPIE in statistical methodology research. Statistical methodology research involves the development, evaluation or comparison of statistical methods for the design or analysis of research studies. The technical nature of this research and often confusing terminology, can make PPIE challenging. The PPI-SMART group at the University of Leicester has developed a number of resources to aid those undertaking PPIE for statistical methodology research based on the needs of statisticians identified in a nationwide survey. In this talk, I will introduce our work to date and discuss up coming projects.
Abstract: I will familiarise participants with the background, methods and approaches to service user involvement within mental health research. We will discuss what is meant by “PPI” (patient and public involvement) and I will provide examples of service user involvement in current research studies. I will discuss traditional machine learning paradigms and how involvement activities with service users/carers can support the modelling pipeline.
Speaker: Dr Sagar Jilka, Assistant Professor at Warwick Medical School, University of Warwick
Imaging is one of the main pillars of clinical protocols for cancer care that provides essential non-invasive biomarkers for detection, diagnosis and response assessment. The development of Artificial Intelligence (AI) tools have proven potential to transform the analysis of radiological images, by significantly reducing processing time, by increasing the reproducibility of measurements and by improving the sensitivity of tumour detection compared to the standard visual interpretation, leading to cancer early detection. In this talk, I will discuss the studies that we have carried out in our group to develop Deep Learning-based and Machine Learning-based tools, and to incorporate them into the clinical research setting.
Tareen presents her work focusing on clinical trust and enhancing the reliability of cardiac DL models by looking at uncertainty, and results from research and experiments she has performed, that incorporates uncertainty aware training in cardiac DL models.
Artificial intelligence (AI) has become a research topic of significant importance in recent years, with one of its crucial applications in the clinical realm is the processing of medical images. However, the translation of these methods into clinical environments remains a challenge, as their integration in the point-of-care is not straighforward. Additionally, challenges arise in adapting these solutions to the required timeframes and infrastructures prevalent in hospitals, which often differ from those in research centers. This talk presents the ongoing projects addressing these challenges within our hospital. Insights into practical implementations and outcomes will be provided, showcasing the integration of AI into clinical workflows. These projects not only serve as examples of successful integration but also highlight the collaborative efforts between engineers and clinical practitioners.
Personalised medicine approaches are eagerly awaited to facilitate individualisation of medical care for patients with inflammatory bowel disease (IBD). Multiple approaches have already been explored in attempts to stratify patients into different prognostic trajectories. In this study we aimed to use unsupervised machine learning algorithms to cluster patients based on their routinely collected electronic health records in an unsupervised approach.
As the number of dimensions increases, big datasets from precision medicine research studies can exhibit complex shapes and unexpected behaviours. The statistical analysis of such data necessitates sophisticated analytical methods capable of capitalizing on the high dimension of these datasets. This talk will present novel methods of applying TDA to devise a unique approach for assessing the quality of data imputation for missing values. The method establishes a pipeline that combine TDA with permutation testing to identify differences in topological data structures among datasets. This provides valuable information for tailoring the selection of missing data imputation strategies.
In this presentation we delve into the field of algorithmic fairness in Multi-Agent Systems (MAS), focusing on the fairness of agents' decision-making processes. We first provide a definition of fairness and we present the reasons why it is relevant for AI-based decisions. Various fairness metrics, e.g., demographic parity, conditional statistical parity and fairness through awareness are discussed. We show how to apply these metrics in multi-agent systems, providing an explanation of the key adaptations. We complete the presentation with an application of the metrics to the Harvest Tree Game, an original configuration of multi-agent systems that are already well-known in the literature.
The integration of AI-based decision tools into routine clinical care is opening the door to a completely new paradigm where doctors and machines can collaborate to decide a right diagnose or treatment for a patient, based on individual patient's biomedical information. A number of important ethical challenges rise from the development of the AI tools to their implementation. In this talk Raquel will introduce Fair modelling, a qualitative framework that aims to serve as an interrogation for an ethical integration of AI decision systems in healthcare. During her talk, the role that clinicians, developers and patients have in ensuring an ethical development and deployment of AI models will be discussed. Several ethical challenges will be identified and connected with the four ethical principles of the medical profession —Respect for autonomy, Beneficence, Non-Maleficence and Justice.
Individual participant data meta-analyses (IPDMA) have in recent years been applied to a range of mental health conditions to understand individual differences in treatment response and aid the personalisation of interventions. This presentation covers the results of a large-scale effort to collect and synthesise available data from randomised controlled trials studying the efficacy of psychological interventions versus control to prevent depressive relapse for people in remission from depression (see also: itfra.org). It will further describe how individual participant data could be used to potentially improve risk stratification using decision tree analyses. It will reflect on the practical and methodological considerations of using IPDMA to aid the personalisation of interventions to individual participant characteristics. It will also cover plans for conducting IPDMA for preventing the onset and relapse of common mental health conditions.
Whole disease models (WDMs) are large-scale, system-level models which can evaluate multiple decision questions across an entire care pathway. Whilst WDMs can offer several advantages as a platform for undertaking economic analyses, the development of a WDM requires a significant initial investment of time and resources and presents additional challenges for model verification and validation.
During this talk, Lily discusses:
The motivations for her to develop a WDM for schizophrenia services in the UK
Methods for developing the schizophrenia WDM
Reflections on pros and cons of the whole disease modelling approach
Electronic Health Records hold detailed longitudinal information about each patient's health status and general clinical history, a large portion of which is stored within the unstructured text. Existing approaches focus mostly on structured data and a subset of single-domain outcomes. We will explore how temporal modelling of patients from free text and structured data, using deep generative transformers can be used to forecast a wide range of future disorders, substances, procedures or findings.
The mechanism of human neural responses to different stimuli has always been of interest to neuroscientists. In clinical situations, tools to distinguish different diseases or states are required. However, classic classification methods have obvious shortcomings: traditional clinical categorical methods may not be competent for behaviour prediction or brain state classification and traditional machine learning models are improvable in classification accuracy. With the increasing use of convolutional neural networks (CNN) in neuroimaging computer-assisted classification, an ensemble classifier of CNNs might be able to mine hidden patterns from MEG signals.
Joint models have been proposed to compute individual dynamic predictions from repeated measures to one or two markers. However, they hardly extend to the case where the complete patient history includes much more repeated markers. Our objective was thus to propose a solution for the dynamic prediction of a health event that may exploit repeated measures of a possibly large number of endogenous markers. We extended the random survival forest methodology to incorporate multivariate longitudinal endogenous markers. At each split of the nodes of the random forest trees, mixed models for the longitudinal markers are fitted and the predicted random effects are used among the others time-fixed predictors to split the subjects. The individual-specific event prediction is derived as the average over all trees of the leaf-specific cumulative incidence function computed using the Aalen-Johansen estimator. We demonstrate in a simulation study the performances of our methodology, both in a small and a large dimensional context. The method is applied to predict the individual risk of dementia in the elderly (accounting for the competing death) according to the trajectories of cognitive functions, brain imaging markers, and general clinical evaluation. Our method is implemented in the R package DynForest.
The use of machine learning to improve prognostic and diagnostic accuracy has been increasing at the expense of classic statistical models. In this talk Dr Lauric Ferrat presents results comparing the prediction performance of several well-known machine learning approaches to logistic regression. He then argues that focus should not be made on performance optimisation but clinical utility and ease of model access.
In conventional prediction models, predictors are typically measured at a single fixed time point such as at baseline or the most recent follow-up. Dynamic prediction has emerged as a more appealing prediction technique that takes account of longitudinal history of biomarkers for making predictions. In this talk Dr Mizanur Khondoker presents results from a simulation study comparing the prediction performance of two well-known approaches for dynamic prediction, namely joint modelling and landmarking approaches.
Unsupervised learning techniques have been applied to psychosis groups in the hope of finding meaningful but undiscovered groupings of patients. A methodological option for unsupervised learning is network-based clustering, which relies on the topology of the data represented as a network. This study used cognitive and symptom data from a cohort of healthy controls and those with a Clinical High Risk of Psychosis to test the validity of graph clustering and to explore the use of a multilayer clustering method for multimodal unsupervised learning. Graph clustering was able to produce results highly similar to k-means clustering and to separate groups into those with significantly different functioning scores. Multilayer clustering was used to tune the similarity of clustering solutions between modalities.
Artificial Intelligence (AI) systems and applications are gaining greater prominence in everyday life. With this growth comes the need to discuss and debate the implications of this development. With this is mind, this talk aims to introduce some of the key concepts relating to what AI really means, different means to achieving it, and outline key challenges and ethical considerations.
In this talk, Joie discusses some of the considerations when deciding how much data is ‘enough’ when looking to i) develop a new clinical prediction model (CPM) and ii) validate an existing CPM. When designing a study to develop a new CPM, researchers must ensure a large enough sample size to develop a model that predicts as accurately as possible. Conversely, when designing a study to validate an existing CPM, we must ensure a sample size large enough to estimate model performance accurately and precisely in an external sample.
In this talk, Ewan introduces a pipeline that exploits recent developments in topological data analysis to identify homogeneous clusters in high-dimensional data. The approach is based on Mapper, an algorithm that reduces a point cloud into a one-dimensional graph. Written in Python and freely available online, the pipeline offers several advantages over existing clustering techniques. These include the ability to integrate prior knowledge into the clustering process and selection of optimal clusters; the use of the bootstrap to restrict the search to robust topological features; the use of machine learning to inspect clusters; and the ability to incorporate mixed data types.
In this talk, Diana speaks about survival analysis, which deals with the longitudinal data and estimates both the distribution of time-to-event in a population over the observation time and how the time-to-event depends on the risk factors.
In this talk, Andrew speaks about dCVnet, a software tool for prediction modelling. It produces tuned elastic-net regression models with cross-validated prediction performance measures. This approach can be useful in smaller samples or with many predictors. The tool is fast, easy to use and, in contrast to more general prediction modelling software, requires minimal statistical programming experience. dCVnet was developed recently with support from the Biomedical Research Centre for Mental Health at the South London and Maudsley NHS Trust and King’s College London and is freely available athttps://github.com/AndrewLawrence/dCVnet.
Inthis talk, Mihai introduces a new technique based on Generative Adversarial Networks (GANs), that is able to achieve high performance in the one-class classification problem. He talks about the introduction of an algorithm for one-class classification based on binary classification of the target class against synthetic samples. Mihai's work was recently nominated for the best PhD student paper award of the International Conference of Engineering Applications of Neural Networks, EANN 2021.
In this talk, Lucy Bull provides an overview of her methodological work that focuses on how we can make better use of routinely-collected medical data to enhance the reliability and applicability of clinical prediction models (CPMs). More specifically, Lucy highlights the motivations behind incorporating longitudinal data into clinical prediction models, provides a detailed overview of available methodology and discusses the challenges faced when applying such methodology to real-world data, using a case-study in chronic disease.
Isobel talks structural equation modelling (SEM) and Regularised SEM (regSEM) as a method incorporating penalised likelihood into the SEM framework. In this seminar, regSEM is applied to a model of outcome prediction including a large psychometric scale in first a simulation study, and then a real-world longitudinal data set, allowing for a comparison of standard maximum likelihood estimation and regSEM, and demonstrating the ability of regSEM to perform sparse model selection and hence potentially optimise a scale for outcome prediction.
In this talk, Dr Andreas Groll investigates the effect structure in the Cox frailty model, which is the most widely used model that accounts for heterogeneity in survival data.
Since in survival models one has to account for possible variation of the effect strength over time the selection of the relevant features has to distinguish between several cases, covariates can have time-varying effects, can have time-constant effects or be irrelevant. Regularization approaches are discussed that are able to distinguish between these types of effects to obtain a sparse representation that includes the relevant effects in a proper form. This idea is applied to a real world data set, illustrating that the complexity of the influence structure can be strongly reduced by using such a regularization approach.
Topological Data Analysis (TDA) is a recently emerged field offering promising tools to extract descriptors of the shape and structure of complex data.
In this talk, Raquel provides an overview of TDA methods that complement current analytical approaches based on machine learning for precision medicine studies. She also introduces two popular techniques from TDA: the Persistent Diagram and Mapper graph, and discusses how these techniques are effective, based upon the literature available where TDA has been applied in the context of precision medicine.Lastly, she very briefly presents her and her team's ongoing work on how to integrate TDA with machine learning models to identify homogeneous subgroups of patients and predict clinical outcomes.
Sam covers the development and validation a risk prediction model of symptom non-remission in first-episode psychosis. His development cohort consisted of 1027 patients with first episode psychosis recruited between 2005 to 2010 from 14 early intervention services across the National Health Service in England.
The prediction model showed good discrimination (C-statistic of 0.72 (0.66, 0.78) and adequate calibration with intercept alpha of 0.14 (-0.11, 0.39) and slope beta of 1.15 (0.76, 1.53). Our model improved the net benefit by 13%, equivalent to 13 more detected non-remitted first episode psychosis individuals per 100. Hence, using our model would be worthwhile if we accept using it on eight individuals to predict one additional non-remitted individual, or using our model on eight individuals will avoid unnecessary additional interventions in one individual.
Dr Florian will start by introducing the penalized regression models, and their pros and cons, particularly in the context of genetic prediction, then explain how these models can still be used, even for very large datasets. He will present some results of using penalized regression to predict 240 different phenotypes based on 1M genetic variants for each of 500K individuals.
Dr Florian Privé is a postdoc in Aarhus, Denmark. He is interested in using statistical learning to advance precision medicine. He is specifically developing tools to analyze very large datasets and methods to build predictive models based on large genetic data. I’m also fond of Data Science and an R(cpp) enthusiast.
Schizophrenia is a heterogenous disease comprising manifold clinical phenotypes which may underlie distinct biological underpinnings. Frontal lobes are a key area of brain dysfunction in schizophrenia. The frontal assessment battery (FAB) is a battery screening for a dysexecutive syndrome in neurodegenerative diseases.
Filippo Corponi presents his work investigating the relationship between frontal lobe impairment and symptom profiles defined along the Positive and Negative Syndrome Scale (PANSS) principal components in patients with acute schizophrenia.
Dr Olesya Ajnakina discusses her large population-based cohort study which addresses the need to develop a robust prediction model for estimating an individual risk for all-cause mortality. This allows relevant assessments and interventions to be targeted appropriately.
Having employed modern statistical learning algorithms and addressed the weaknesses of previous models, the new mortality model achieved good discrimination and calibration to quantify absolute 10-year risk of all-cause mortality in older adults, as shown by its performance in a separate validation cohort. The model can be useful for clinical, policy, and epidemiological applications.
The aim of this presentation is an introduction to statistical learning and prediction modelling.
Daniel explains the key differences between inferential statistical modelling and prediction modelling and then introduces the concept of prediction modelling and statistical learning. Finally,Daniel assesses the usefulness of statistical learning algorithms for applications in medical research as an alternative to classical statistical inference methods by reanalysing an event-related brain potential (ERP) dataset from infants at high or low risk of developing autism. Daniel also explains the concept of cross-validation for model selection and validation and provide a brief introduction to regularized regressions.