Xray of human body with unreadble text overlaid

Analytical tools for machine learning


Pipelines and tools to analyse big data sets using machine learning methods

Big datasets in healthcare have very complex structure and particular characteristics. We develop open tools and pipelines based on modern machine learning and prediction modelling methods to facilitate their analysis.

A pipeline based on topological machine learning to identify homogeneous patients and relevant features

Dr Raquel Iniesta and Dr Ewan Carr developed a novel pipeline built on recent advances in topological data analysis (TDA) to identify homogeneous clusters of patients with respect to a characteristic of interest. The pipeline focuses on Mapper, a clustering algorithm to identify topological features in complex data that has shown big potential in uncovering homogeneous subgroups sharing common characteristics. TDA is a growing field providing tools to infer, analyse, and exploit the shape of data. TDA has seen increasing adoption in recent years. It holds particular promise as a set of tools to further precision medicine where we often want to identify groups of patients with similar treatment or prognostic outcome.  The analytical tool combines and extends existing software implementations of the Mapper algorithm to provide several unique strengths, as the integration of prior knowledge to inform the clustering process, the restriction of clusters search to significant topological features, the use of multivariable machine learning XGBoost to describe clusters composition, and the ability to incorporate mixed data types. Details about the methodological aspects and implementation, and an application for clustering patients with major depression in terms of their chances to remit are published in this paper (2021).

Two videos introducing TDA and explaining the tool are on our BRC Prediction Modelling Presentation page and on YouTube at Introduction to TDA and Mapper pipeline presentation.


The pipeline can be downloaded at:  https://github.com/kcl-bhi/mapper-pipeline

“dCVnet”: a user-friendly tool to develop regularized regression prediction models

Dr Andrew Lawrence developed a software tool “dCVnet” (R wrapper for the glmnet package) to implement regularized logistic regression with double (nested) cross-validation for internal validation and made this easy-to-use tool available for use by the scientific and clinical community as an R package

In contrast to traditional statistical methods, regularized regression allows the analyses of a large number of predictors relative to sample size. Regularization provides a means to reduce overfitting by constraining the magnitude of the regression coefficients through the introduction of a penalty. DCVnet provides a documented and standardized implementation of this particular machine learning pipeline, making it accessible to researchers lacking the programming experience required for more general machine learning software environments. Details about the methodology and an application to predict of recurrence of depression are published in Lawrence, A. Stahl, D. et al (2022).

A video explaining the tool is on our BRC Prediction Modelling Presentation page and on YouTube

The toolbox can be downloaded at: github.com/AndrewLawrence/dCVnet

  

About precision medicine and prediction modelling

About precision medicine and prediction modelling

Precision medicine is an emerging approach that focuses on identifying treatments or approaches.
Initiatives and aims

Initiatives and aims

We are finding prediction modelling researchers to establish an online database listing members’ areas of interest and expertise.
People

People

Meet the current members of the group and discover their areas of interest.
Ethics

Ethics

Predictive models that use data from individuals are an important source of information in medical settings.
Join our prediction modelling group

Join our prediction modelling group

Joining our group helps create our database of researchers and establish collaborations within our institution and beyond.
Prediction Modelling Workshop

Prediction Modelling Workshop

Videos from the inaugural workshop of our Prediction Modelling Group including presentations from world-renowned experts.
Prediction Modelling Presentations

Prediction Modelling Presentations

Our Prediction Modelling group will be hosting regular presentations to engage and introduce new members to our community.
Implementation research

Implementation research

Our Prediction Modelling Group provides innovative approaches to tackle this translational gap and advance implementation science.
Training in methodological skills for Prediction Modelling

Training

Find out more about our training in methodological skills for Prediction Modelling.
Analytical tools for machine learning

Analytical tools for machine learning

We develop open tools and pipelines based on modern machine learning and prediction modelling methods to facilitate their analysis.