Research blog: Using social media to recognise mental health conditions

Dr Rina Dutta is a Clinical Senior Lecturer and Consultant Psychiatrist at the Institute of Psychiatry, Psychology & Neuroscience's Department of Psychological Medicine at King’s College London and South London and Maudsley NHS Foundation Trust. She is also a Clinician Scientist Fellow of the Health Foundation in partnership with the Academy of Medical Sciences. She leads a team of researchers who are using the latest technology to successfully identify and classify social media posts related to mental health according to different disorders. Here, she tells us about their latest publication in the journal Scientific Reports.

Mental health and substance abuse disorders are the leading cause of ill-health and disability worldwide, affecting 450 million people globally. In fact, 1 in 4 people will be affected by mental illness at some point in their lives.

Many people regularly write about their experiences on social media platforms like Twitter and Reddit. Unlike medical notes, which record clinicians’ professional impressions, these forums capture people's personal experience first hand. The language people use and the information they post offer valuable insights into mental health, which could ultimately help us to understand how to deliver new types of interventions for mental health using social media.

Our research, published in the journal Scientific Reports, investigates whether recent developments in Natural Language Processing (a technique that extracts meaning from written language) and Deep Learning (algorithms that enable computers to perform complex pattern recognition, after being trained with hundreds of examples) can accurately identify mental health-related content on social media. We also looked at whether this content could be accurately classified according to 11 different disorders including depression, bipolar disorder, schizophrenia, and drug use disorders.

To test this, we sifted through 10 years of content that was publically posted on Reddit—one of the world’s largest discussion websites, with more than 500 million monthly visitors.  We trained computer algorithms to recognise words and language patterns associated with a disorder. For example, word cues linked to anxiety disorders like ‘panic’, ‘attack’, ‘feel’, or ‘heart’. The formula was based on a review of more than 900,000 subreddits (discussions of specific topics) and does not reveal the names or any other identifying information of individuals who had chosen to publicly post about their disorders.

Our approach correctly identified posts about mental health conditions with 91% accuracy and accurately selected the disorder theme in 71% of cases. We also discovered that the 11 mental health themes that we studied are not mutually exclusive; instead they inter-relate and demonstrate interesting patterns. For example, depression—the most prominent condition we studied, mentioned in 42% of all mental health-related posts—links to almost all other themes.

It’s now possible to efficiently extract the meaning of online content and access the treasure trove of publicly available data contained in narratives. We’re just beginning to scratch the surface of what’s possible. This technology has potential to be a valuable public health tool, which could provide new and previously hidden information about the prevalence of certain mental health conditions.

It also has the potential to detect early signs of mental illness (eg, changes in word complexity can indicate cognitive decline) which could lead to interventions to delay, mitigate, or even prevent the onset of serious mental illness. Researchers will also benefit from more accurate, anonymous data that can be used to understand how social media is used by people who suffer from mental health issues.

More work needs to be done about how to use this type of data whilst preserving individuals’ privacy as well as their prerogative to speak openly about their feelings. The safest way to prevent this technology from being misused is to ensure that providers hosting social media are open and clear about how their content is being used, but as we’ve seen from news in recent days about monitoring extreme political or ideological content on social media, this isn’t always clear-cut.

However, it’s also clear that a frank and open discussion needs to take place about how clinicians and health providers react to and assess information online. Our findings suggest that there is rich potential for online narratives about health to inform research priorities and improve real-world understanding of mental health conditions—but we will maximise the benefits by involving the people who use these forums from the outset.

Paper reference: Characterisation of mental health conditions in social media using Informed Deep Learning, George Gkotsis, Anika Oellrich, Sumithra Velupillai, Maria Liakata, Tim J. P. Hubbard, Richard J. B. Dobson & Rina Dutta, Scientific Reports 7, Article number: 45141 (2017)


Tags: Informatics - Publications - Clinical and population informatics -

By Admin at 22 Mar 2017, 11:11 AM


Back to Blog List