Multimodal Approach for Detecting Depression

An innovative approach to detect depression using physiological and behavioral data from three modalities: Text, Audio, and Video.

An innovative approach to detect depression using physiological and behavioral data from three modalities: Text, Audio, and Video.

An innovative approach to detect depression using physiological and behavioral data from three modalities: Text, Audio, and Video.

About the project

Multimodal Depression Detection is a deep learning model that mimics clinical assessments by integrating text, audio, and video inputs to evaluate a subject’s mental state. By assigning dynamic weights to each modality, the model delivers a binary classification—indicating the presence or absence of depression symptoms—offering a more holistic and automated approach to mental health analysis.

Multimodal Depression Detection is a deep learning model that mimics clinical assessments by integrating text, audio, and video inputs to evaluate a subject’s mental state. By assigning dynamic weights to each modality, the model delivers a binary classification—indicating the presence or absence of depression symptoms—offering a more holistic and automated approach to mental health analysis.

Multimodal Depression Detection is a deep learning model that mimics clinical assessments by integrating text, audio, and video inputs to evaluate a subject’s mental state. By assigning dynamic weights to each modality, the model delivers a binary classification—indicating the presence or absence of depression symptoms—offering a more holistic and automated approach to mental health analysis.


Project Details

Traditionally, extensive clinical interviews were conducted to analyze the subject's responses to determine their mental state. However, we have developed a model that incorporates this approach by combining three modalities - Text, Audio, and Video. Our deep learning model assigns appropriate weights to each modality and generates a binary classification output of yes or no, indicating whether the patient is exhibiting depression symptoms.


How to Use the Files:

Ready to give our model a try? First, apply to the DAIC-WOZ Database to get access to download the dataset - (https://dcapswoz.ict.usc.edu/#:~:text=DAIC-WOZ%20Database,-This%20database%20contains&text=Data%20collected%20include%20audio%20and,human%20interviewer%20in%20another%20room). Make sure to copy the dataset folder to the correct location, and you're good to go! For an even easier experience, we recommend running the files on Google Colab.

Files:

Want to know what's included in our repository? Check out the following files:

  • Dataset.ipynb: Get code snippets for obtaining data from the DAIC server, unzipping them, and arranging them in a user-friendly format.

  • SVM&RF_Text.ipynb: Run the SVM and RF models on the text modality.

  • SVM&RF_Audio.ipynb: Run the SVM and RF models on the audio modality.

  • Rf_prune.ipynb: Implement pruning on RFs.

  • CNN_Video.ipynb: Use the code snippet to run CNN on the video features.

  • LSTM_With_Gating_Sentence_Level.ipynb: Implement LSTM on all three modalities combined with gating at the sentence level.

Ready to improve your depression detection skills? Give our multimodal approach a try and see what results you can achieve!

Create a free website with Framer, the website builder loved by startups, designers and agencies.