Real performance numbers
Every screening algorithm is validated on independent datasets across multiple sources, devices, and patient populations. All metrics are reported with 95% confidence intervals.
| Condition | Modality | AUC-ROC | Sensitivity | Specificity | 95% CI (AUC) |
|---|---|---|---|---|---|
| Diabetic Retinopathy | Fundus | 0.949 | 90.1% | 82.8% | 0.938–0.960 |
| Glaucoma | Fundus | 0.979 | 93.0% | 94.0% | 0.972–0.985 |
| Hypertensive Retinopathy | Fundus | 0.968 | 93.7% | 89.1% | 0.956–0.978 |
| AMD & DME | OCT | 0.999 | 99.2% | 99.4% | 0.999–1.000 |
All metrics validated on held-out external datasets not used during model training.
Validation Methodology
How we ensure our screening models perform reliably across real-world clinical environments.
External Dataset Validation
All models are evaluated on independent, held-out datasets that were never seen during training. Across all conditions, RetGuard has been validated on 13 external datasets sourced from institutions across multiple countries, device manufacturers, and clinical settings.
Cross-Device Generalization
Screening performance is tested across images captured by different fundus cameras and OCT devices to ensure the models generalize beyond a single hardware platform. This is critical for deployment across diverse clinical environments.
Population Diversity
Validation datasets include patients across a range of ages, ethnicities, disease severities, and comorbidity profiles. This ensures the models perform equitably and do not degrade for underrepresented patient groups.
Clinical Robustness
Designed for real-world screening.
Model Design
Sensitivity-First Thresholds
Operating points are tuned to maximize sensitivity — ensuring at-risk patients are flagged for referral, even at the cost of slightly higher false positive rates.
Calibrated Confidence Scores
Every prediction includes a calibrated probability, not just a binary pass/fail. Clinicians see exactly how confident the model is for each condition.
Interpretability
Grad-CAM Evidence Maps
Each result includes visual heatmaps highlighting image regions clinicians can verify.
Multi-Condition Correlation
All five conditions are analyzed in a single pass, allowing the system to flag co-occurring conditions like DR alongside DME that are clinically expected together.
Safety & Reliability
Built-in safeguards that automatically flag suboptimal images and unusual inputs before they reach the screening algorithms.
Image Quality Assurance
Every image is automatically assessed for clinical gradability before analysis. Images that do not meet quality standards are flagged with clear guidance, prompting the operator to recapture and ensuring only reliable inputs proceed.
Input Validation
The system automatically identifies and flags inputs that fall outside its validated range. This prevents unreliable predictions and ensures clinicians are alerted whenever an image requires manual review.