REPORTS

Report on the 4th BAIRAL Research Meeting for 2024
“Advancing Mental Health Monitoring: A Deep Learning Framework for Multimodal Emotion Recognition”

Priya Mu (Research Assistant, B’AI Global Forum)

・Date: Thursday, February 20, 2025, 17:00 to 18:30 (JST)
・Venue: Online
・Language: English
・Guest Speaker:
Meishu Song, Ph.D., Graduate School of Education, University of Tokyo
・Moderator: Priya Mu (Research assistant of the B’AI Global Forum)
(Click here for details on the event)

The 4th BAIRAL research meeting for 2024 was held on Feb 20, 2025. This time, we invited Meishu Song to speak on their research titled “Advancing Mental Health Monitoring: A Deep Learning Framework for Multimodal Emotion Recognition.”

The speaker discussed their research on leveraging emotional AI technology for daily mental health monitoring. They used multimodal analysis to recognize and interpret emotions through speech patterns and physiological signals, capturing emotional dynamics in real time which minimizes recall bias. This was achieved by developing the Health Information Technology (HIT) system, which included a smartphone app, wearable devices, and a cloud-based storage platform. The system enabled two weeks of continuous data collection using Ecological Momentary Assessment (EMA). Participants recorded speech samples, tracked physical activity, and completed self-annotated emotional assessments using the Depression and Anxiety Mood Scale (DAMS), focusing on nine emotional states, including anxiety, gloom, happiness, and worry.

The research introduced two datasets: the Japanese Daily Mental Health Speech Dataset (JDSD) and the Japanese Daily Mental Health Multimodal Dataset (JDMD). JDSD contained 20,827 speech samples from 342 participants, while JDMD included 6,200 records from 298 participants, integrating speech and physiological data to analyze emotional patterns. The study explored how speech and physiological data could be combined to provide a more comprehensive understanding of emotions. New methods were introduced to enhance emotional predictions, making detection more accurate and personalized than traditional approaches. The research demonstrated advancements in AI-driven emotion recognition while also identifying areas for further exploration, such as expanding demographic diversity, improving real-time data processing, and enhancing model interpretability.

In conclusion, the research contributed to AI-driven mental health assessment by integrating multimodal data and developing personalized analysis strategies, bridging the gap between traditional assessments and AI-based continuous monitoring for future mental health care applications.