What makes depression apps safe, effective and trustworthy?

Transparenz: Redaktionell erstellt und geprüft.
Veröffentlicht am

The study provides a clearer path to identifying depression apps that are safe, effective and recommended. Study: Validation and selection of criteria for evaluating apps for the treatment of depression: a Delphi study. Image credit: myboys.me/Shutterstock.com In a recent study published in BMJ, researchers unveiled a novel consensus-based framework that lays the foundation for...

What makes depression apps safe, effective and trustworthy?

The study provides a clearer path to identifying depression apps that are safe, effective and recommended.

Study: Validation and selection of criteria for evaluating apps for the treatment of depression: a Delphi study. Photo credit: myboys.me/Shutterstock.com

In a recent study published inBMJ open, researchers presented a novel consensus-based framework that lays the foundation for a future assessment tool to help experts and consumers evaluate the plethora of mobile health (mHealth) apps targeting depression and mental health support.

The framework used a modified Delphi methodology involving healthcare professionals, technology experts, and patients to propose and review 51 potential evaluation metrics, which were then distilled into 28 essential criteria intended for use in a structured app evaluation tool.

Study results showed that privacy and clinical effectiveness are a strong priority among participants over other commonly highlighted app features such as engagement and self-tracking. In particular, these findings will form the basis for “EvalDepApps,” a future tool designed to help users and clinicians identify safe, evidence-based digital interventions.

Why choosing a trustworthy depression app remains difficult

A depressive disorder, or commonly just called depression, is a spectrum of mental illnesses that are characterized by persistent feelings of sadness and often lead to observable changes in behavior and daily (routine) functioning. The disease is increasing at an alarming rate worldwide at an unprecedented rate, reportedly affecting approximately 5% of the adult population worldwide.

While the World Health Organization (WHO) emphasizes the potential of emerging digital technologies to bridge gaps in care, the reality of the app store is often unclear. A recent analysis found that of 30 depression apps that were subjected to extensive scrutiny, only 26.7% were supported by scientific evidence.

Additionally, mobile health (mHealth) apps targeting depression and mental health support are rarely subject to standardization or regulatory pressure. Currently, users are forced to rely on subjective or unverifiable reviews that provide little insight into an app's medical validity, its handling of sensitive personal data, or its compliance with clinical best practices.

A depression-focused approach to evaluating mHealth would enable professionals and patients alike to make informed decisions when choosing their next digital anti-depression tool.

Using Delphi methods to define depression-specific app standards

The aim of the present study was to address this urgent need by proposing a specific, consensus-based criteria set tailored exclusively to depression treatment apps. The study utilized a modified Delphi study methodology, a structured communication technique designed to reach consensus among a panel of experts and help consolidate the opinions, perceptions, and priorities of included stakeholders across two iterative rounds of evaluation.

The stakeholders referred to as study participants included healthcare professionals (psychiatrists, psychologists, nurses), healthcare technology experts, and individuals diagnosed with depression. These participants (n = 43) were carefully selected to ensure a holistic view of the topic from both clinical and lived experience perspectives.

The study methodology began with a literature review conducted by the author to identify possible criteria for standardizing mHealth apps and evaluating their performance. Of the 60 potential criteria identified, an internal review shortlisted 51 criteria for participant assessment, with nine criteria found to be redundant.

The participant or panel evaluation process was conducted in two rounds of voting:

  1. Runde 1: Die Teilnehmer bewerteten die Relevanz jedes Kriteriums auf einer 6-stufigen Likert-Skala (0 bis 6).
  2. Runde 2: Kriterien, die in der ersten Runde keinen entscheidenden Konsens erzielten, wurden zusammen mit früheren Abstimmungsergebnissen und zusammenfassendem Feedback zur Neubewertung an das Gremium zurückgesendet.

Only those criteria that met internal thresholds with a high level of consensus were included in the study recommendations. Specifically, a criterion was only considered high consensus if 80% or more of respondents rated it 5 or 6, which corresponds to “very important.”

What stakeholders really want from depression apps

The two rounds of panel review resulted in the identification of 28 criteria (up from 51) that met the preferences and needs of both experts and patients. Panel participation was particularly strong: 59% of invited experts responded in Round 1 and 53.4% ​​in Round 2.

The study results showed participants overwhelmingly prioritized the following areas:

  1. Sicherheit und Datenschutz (25 % der berücksichtigten Kriterien): Alle vorgeschlagenen Kriterien zu Sicherheit und Datenschutz erzielten bereits in der ersten Runde maximalen Konsens. So erreichte beispielsweise ein Kriterium bezüglich der Weitergabe von Daten an Dritte eine 100-prozentige Übereinstimmung.
  2. Klinische Wirksamkeit (25 %): Die Anforderung, dass die Empfehlungen einer App evidenzbasiert sein müssen, erreichte eine Zustimmung von 95,7 %, was ein starkes Interesse an einem nachweisbaren therapeutischen Wert widerspiegelt

Unexpectedly, and in contrast to findings from broader mHealth evaluation studies, criteria related to health indicators, such as: Other criteria, such as tracking sleep, diet or sedentary habits, were largely considered less important and made up 7.1% of the final list of criteria. The authors note that this deprioritization does not mean that these traits lack value, but rather that it reflects limited evidence linking them to improved depression outcomes when used in isolation.

Finally, usability and functionality remained important, accounting for 17.9% of the final list. Participants emphasized that apps need to be interpretable, responsive, and clearly communicate their goals to support sustainable and meaningful use.

What these criteria mean for future mental health apps

The present study highlights that mHealth apps, particularly those aimed at alleviating depression, must prioritize safety and scientific validity over less clinically based features such as sleep or diet tracking. It introduces 28 validated criteria for evaluating and reviewing mHealth apps, providing stakeholders with an informed path for future evaluations of digital tools for depression treatment, rather than immediately recommending specific apps.

These criteria will be crucial in optimizing the EvalDepApps assessment tool, which aims to enable healthcare professionals and users to identify high-quality apps, ensuring that digital mental health support is not only accessible, but also safe and scientifically sound. However, the authors note that further validation, contextual adaptation, and real-world testing will be required before widespread implementation, particularly in diverse health systems and cultural settings.

Download your PDF copy now!


Sources:

Journal reference:
  • Robles, N., et al. (2025). Validation and selection of criteria for evaluating apps for managing depression: a Delphi study.BMJ Open, 15(11), e101302.DOI: 10.1136/bmjopen-2025-101302.  https://bmjopen.bmj.com/content/15/11/e101302