Multimodal Approach for Detecting Depression
About the project
Project Details
Traditionally, extensive clinical interviews were conducted to analyze the subject's responses to determine their mental state. However, we have developed a model that incorporates this approach by combining three modalities - Text, Audio, and Video. Our deep learning model assigns appropriate weights to each modality and generates a binary classification output of yes or no, indicating whether the patient is exhibiting depression symptoms.
How to Use the Files:
Ready to give our model a try? First, apply to the DAIC-WOZ Database to get access to download the dataset - (https://dcapswoz.ict.usc.edu/#:~:text=DAIC-WOZ%20Database,-This%20database%20contains&text=Data%20collected%20include%20audio%20and,human%20interviewer%20in%20another%20room). Make sure to copy the dataset folder to the correct location, and you're good to go! For an even easier experience, we recommend running the files on Google Colab.
Files:
Want to know what's included in our repository? Check out the following files:
Dataset.ipynb: Get code snippets for obtaining data from the DAIC server, unzipping them, and arranging them in a user-friendly format.
SVM&RF_Text.ipynb: Run the SVM and RF models on the text modality.
SVM&RF_Audio.ipynb: Run the SVM and RF models on the audio modality.
Rf_prune.ipynb: Implement pruning on RFs.
CNN_Video.ipynb: Use the code snippet to run CNN on the video features.
LSTM_With_Gating_Sentence_Level.ipynb: Implement LSTM on all three modalities combined with gating at the sentence level.
Ready to improve your depression detection skills? Give our multimodal approach a try and see what results you can achieve!


