AI-assisted bone age assessment using hand–wrist radiographs
Evaluation of the clinical efficacy of a TW3-based fully automated bone age assessment system using deep neural networks
Authors: Nan-Young Shin, Byoung-Dai Lee, Ju-Hee Kang, Hye-Rin Kim, Dong Hyo Oh, Byung Il Lee, Sung Hyun Kim, Mu Sook Lee, Min-Suk Heo
Summary
This study from Seoul National University (Korea) evaluated the clinical efficacy of a fully automated bone age assessment (BAA) system based on the Tanner–Whitehouse 3 (TW3) method.
The system was developed using deep neural networks to automatically detect and classify the 13 skeletal maturity regions in hand–wrist radiographs.
Eighty radiographs (40 boys and 40 girls, aged 7–15 years) were compared with manual TW3 evaluations by two oral and maxillofacial radiologists.
Statistical analysis revealed no significant differences between AI-based and human assessments (p>0.05), with a determination coefficient (R²) above 0.95 for all subgroups.
The results demonstrated high correlation and clinical equivalence, confirming the accuracy and reproducibility of the TW3-based automated system for pediatric skeletal maturity assessment.
Methodology
- Dataset: 80 hand–wrist radiographs (40 boys, 40 girls; ages 7–15 years) collected retrospectively.
- Reference standard: Two oral and maxillofacial radiologists assessed all images using the TW3 method.
- Training dataset: 3,027 labeled images used to develop the TW3-based deep learning system.
- Architecture: Two convolutional neural networks (CNNs) – Faster R-CNN for region detection and VGGNet for maturity classification.
- Evaluation metrics: Paired t-test and regression analysis comparing AI vs. radiologist-determined bone ages.
- Equivalence criterion: ±0.6 years within 95% confidence interval.
- Software and analysis: IBM SPSS Statistics version 23 (SPSS Corp., Armonk, NY, USA).
Results
- No significant difference between AI-estimated and radiologist-determined bone ages (P > 0.05).
- Equivalence criterion of ±0.6 years was satisfied (95% CI: −0.07 to 0.22 years overall).
- Determination coefficients (R²): 0.962 (boys), 0.945 (girls), and 0.952 (overall).
- High intra-observer reliability: κ = 0.846 and 0.817; inter-observer reliability κ = 0.750 (substantial agreement).
- Regression equations: y = −0.629 + 1.040x (boys); y = 0.201 + 0.985x (girls); y = −0.253 + 1.016x (overall).
- Demonstrated accuracy comparable to radiologists and higher than previously reported automated TW3 systems (error ≈ 0.2 years).
Conclusion
The TW3-based fully automated bone age assessment system achieved accuracy equivalent to experienced radiologists, confirming its clinical efficacy for evaluating skeletal maturity in children and adolescents.
This AI system offers a reliable, objective, and efficient alternative for bone age estimation in orthodontic and pediatric applications.
Keywords
Bone Age Assessment, Tanner–Whitehouse 3, Deep Learning, Hand–Wrist Radiograph, Artificial Intelligence, Faster R-CNN, VGGNet, Skeletal Maturity, Pediatric Radiology, Orthodontic Growth Evaluation.
Extracted Data
| Year | 2020 |
| Modality | Hand–Wrist Radiograph |
| Dataset | 80 clinical images (7–15 years old) |
| Dataset Split | Train: 3,027 images (system development); Test: 80 clinical radiographs |
| Network Architecture | Faster R-CNN + VGGNet (two-stage CNN pipeline) |
| Metrics | Paired t-test; R² = 0.95; 95% CI ±0.6 years |
| AP – Strategy | Comparison against reference standard from two radiologists |
| AP – Professional Qty | 2 |
| AP – Supervisor Presence | No Information |
| AP – Experience Level | 4–7 years of experience |
| AP – Expertise Area | Oral and Maxillofacial Radiology |
| AP – Tool or System | TW3-based Automated Bone Age Assessment System |
| Task | Bone Age Estimation (Classification of Skeletal Maturity) |
| Project Objective | To evaluate clinical efficacy and equivalence of an AI-based TW3 system for bone age assessment. |
| Clinical Relevance | Reliable automated system for skeletal maturity estimation, reducing manual workload and improving diagnostic consistency. |
Clinical Relevance
- Clinical importance: Bone age estimation is essential for pediatric growth evaluation, orthodontic planning, and forensic identification.
- Innovation: First TW3-based fully automated system using deep CNNs for ROI detection and skeletal maturity classification.
- Practical impact: Reduces observer variability, accelerates diagnosis, and standardizes skeletal maturity evaluation in children and adolescents.
:::DOI
https://doi.org/10.5624/isd.2020.50.3.237
:::