Published: 22 June 2020

Introducing the CRIS Natural Language Processing (NLP) Service

This blog was written by Dr Anna Kolliakou, Clinical Informatics Interface and Network Lead, and Professor Rob Stewart, Clinical and Population Informatics Lead at the NIHR Maudsley BRC.

Since our launch back in 2008, the Clinical Record Interactive Search (CRIS) has enabled and supported researchers and clinicians to study the health records of the South London and Maudsley NHS Trust through a governance framework that ensures patient anonymity and places service users at its core. This has revolutionised mental health research by using exceptionally large volumes of records with unprecedented levels of detail.

Solving unstructured data

The greatest potential of the health record however also poses its biggest challenge. When we began CRIS research nearly 12 years ago, it was immediately clear that the information we could use was limited to that recorded in ‘structured’ fields – such as dates, numbers and dropdown lists (e.g. age and gender).

A lot of the most important informationwas contained in ‘unstructured’ free text – for example, routine case notes written by healthcare professionals or correspondence with other professionals (e.g. GPs) involved in the delivery and continuation of care. From an early stage we knew that ‘unlocking’ information from clinical records text could hugely enhance the scope and quality of CRIS research.

Automatically extracting information from health records text is inherently difficult – different expressions are used to describe the same thing, spelling and typing errors are inevitable, abbreviations are very common, and the eccentricities of the English language are very much a feature.

Early on, researchers wishing to utilise text information had to read and code it manually, a time-consuming process that limited studies to only a small percentage of the hundreds of thousands of case notes contained within CRIS. We had therefore identified and enabled access to a uniquely rich source of data but were still far from achieving an efficient method to use it for better research.

Over the last decade and collaborating particularly with our Computer Science colleagues at the University of Sheffield, we have worked steadily to apply natural language processing techniques which allow us to access and use information from health records text.

Natural language processing

Natural language processing is a broad field encompassing the interaction between computers and human languages and our particular implementation of it has involved training programs to recognise specific details from the text: for example, a medication being taken, or a symptom being reported. This approach began to transform the way we categorised clinical notes for systematic access and research.

Driven by the demands of emerging research and clinical requirements and with full access to the 30 million documents available in the CRIS database, we are proud to share our library of over 70 of these natural language applications.

These automatically process the health record text and extract information of importance for research, including that related to symptoms such as insomnia or hallucinations, contextual factors that may affect illness progression such as illicit drug use, interventions such as medications and several psychotherapies, and physical health outcomes such as blood pressure.

We believe that this is a world-first in terms of the depth of information now available for mental health research.

Working in practice

An example of using these applications is when Dr Marcella Fok and colleagues wanted to investigate levels of acute general hospital use in people who had received mental healthcare for personality disorders. A barrier was that personality disorders are not often recorded in diagnostic fields in mental healthcare records. Therefore, a supplementary natural language processing application was used to ascertain cases.

The application was able to identify 47% more cases than if only the diagnostic field had been utilised. Coupled with a data linkage with hospital admissions information, this enabled the research team to establish that people with personality disorder had a higher risk of hospitalisation for a range of conditions, including circulatory, respiratory, digestive, musculoskeletal and infectious disorders.

Future work

The advanced text-mining capabilities we have established, and will continue to develop, have made possible the leading research that is undertaken using our CRIS platform. We are also hoping that this can be increasingly implemented by other mental health trusts to gain information from the text of their own clinical records, allowing better response to research and clinical priorities.

We are privileged to be able to offer these resources to our fellow academics, researchers and clinicians and look forward to making individual and collective strides in better understanding the nature of mental health disorders and their successful treatment and care.

Introducing the CRIS Natural Language Processing (NLP) Service

Solving unstructured data

Natural language processing

Working in practice

Future work

Subscribe to our newsletter