Deep Learning Classifies Facial Regions on Ultrasound with 96% Accuracy
Analysis of Facial Ultrasonography Images Based on Deep Learning
Authors: Kang-Woo Lee, Hyung-Jin Lee, Hyewon Hu, Hee-Jin Kim
Summary
This clinical study from Yonsei University College of Dentistry and The Catholic University of Korea investigated how deep learning models can classify facial regions in ultrasound (US) images.
Facial US imaging is becoming a valuable diagnostic and procedural guide in esthetic, reconstructive, and minimally invasive facial treatments, but faces complex challenges due to low image contrast and operator dependency.
A total of 86 healthy participants were recruited. Ultrasound images were acquired from nine facial regions, generating 1,440 transverse B-mode images.
Fifteen pre-trained CNN architectures (ImageNet) were evaluated for classification, including:
AlexNet, VGG-16, VGG-19, ResNet-18/50/101, GoogleNet, Inception-V3, Inception-ResNet-V2, DenseNet-201, MobileNet-V2, NasNet-Mobile, SqueezeNet, ShuffleNet, Xception.
Images were resized to 224–299 px and trained using 10-fold cross-validation on both augmented and non-augmented datasets, with a learning rate of 0.0003 and SGDM optimizer.
Performance was measured via Precision, Recall, F-measure, and BRISQUE (image quality) scores.
LIME was used for visual explainability, identifying which features guided the CNNs’ predictions.
Results:
- Best performing model: VGG-19 (non-augmented 96.75 ± 1.6 %, loss 0.13 ± 0.07).
- Augmented dataset accuracy: average 94.25 ± 1.0 %.
- Lowest model: NasNet-Mobile (91.5 ± 3.4 %).
- Highest F-measure by region: Lateral Nose (99.62 %), Nose (99.15 %).
- Lowest F-measure: Anterior Cheek (88.64 %).
- Top 3 architectures: VGG-16, VGG-19, ResNet-50 — best balance between depth and parameters.
The models focused primarily on skin and bone contours, rather than soft-tissue or muscle, as the most discriminative features.
Performance was only slightly improved by augmentation, suggesting limited benefit for structured, repetitive datasets.
DOI: https://doi.org/10.1038/s41598-022-20969-z
Key Words
Facial Ultrasonography, Deep Learning, VGG-16, VGG-19, ResNet-50, Facial Anatomy, Ultrasound Imaging, Transfer Learning, LIME, BRISQUE, Computer Vision, Dental AI
Extracted Data
- Year: 2022
- Modality: Facial Ultrasonography (2D B-mode)
- Dataset: 1,440 images from 9 facial regions (86 participants)
- Dataset Split: 10-fold cross-validation
- Network Architectures: 15 pre-trained CNNs (ImageNet)
- Best Model: VGG-19 (96.75 % accuracy, loss 0.13 ± 0.07)
- Metrics: Mean Precision ≈ 94 % | Mean Recall ≈ 93.8 % | Mean F1 ≈ 93.7 % | accuracy was VGG-19 with 96.75±
- AP – Strategy: Manual labeling of 9 facial regions + augmentation (translation ±30 px, zoom ±10 %) + LIME explainability
- AP – Professional Qty: No information
- AP – Supervisor Presence: No information
- AP – Experience Level: No information
- AP – Expertise Area: Mecial radiologists
- AP – Tool or System: No information
- Task: Classification of Facial US Regions
- Project Objective: To evaluate the ability of deep learning models to classify facial US regions and identify key features driving model performance.
Clinical Relevance
- Clinical importance: Facial ultrasonography supports non-invasive aesthetic and diagnostic procedures but requires expert interpretation due to low contrast and operator variability.
- Innovation: This is the first study to systematically evaluate 15 CNN architectures for facial region classification in US images using transfer learning and explainability analysis (LIME).
- Practical impact: Establishes baseline reference data for future AI-based ultrasound applications in facial anatomy, enabling AI-guided injection and surgical planning.
YT Video: https://youtu.be/7KBNkgrI4X4