Deep Learning based classification of vocal folds’ vibration dynamics
PDF

Keywords

vocal fold vibration
voice disorders
high-speed video
Phonovibrogram
classification
deep neural network

Abstract

Vocal fold (VF) dynamics can be captured in real-time using high-speed videolaryngoscopy, laying the basis for quantitative assessment of the VFs vibration properties. A compact representation of the vibrational behavior as captured in these high-speed videos (HSV) is provided by the so-called Phonovibrogram (PVG). The PVG encodes the VFs vibrational behavior by characteristic spatial and temporal patterns in a three-dimensional representation. Based on these characteristic PVG patterns, this work realizes a fully automatic classification of different voice disorders. For this purpose, a Convolutional Neural Network (CNN) was trained and evaluated using a stratified 10-fold cross-validation strategy on PVGs from 220 subjects to solve two different classification tasks: (a) Classification of the vibrational behavior as physiologic or pathologic and (b) classification of the PVGs according to the subjects actual clinical diagnosis as healthy, muscle tension dysphonia (MTD), paresis, or polyp. The trained CNN distinguished with an average classification accuracy of 0.82±0.07 between physiologic and pathologic VF vibration (sensitivity: 0.81±0.12, specificity: 0.82±0.12) and achieved an average classification accuracy of 0.85±0.07 across all classes (sensitivity: 0.71±0.19, specificity: 0.91 ± 0.07) for classification according to the clinical diagnoses. Based on the PVG representation, the presented approach reliably differentiates between physiologic and pathologic VF vibration and is even eligible to distinguish types of voice disorders without user interaction. However, to further increase the method’s performance, a larger amount of training data is required.

PDF
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright (c) 2022 Mona Kirstin Fehling, Maximilian Linxweiler, Bernhard Schick, J¨org Lohscheller