Skip to content

Real performance numbers

Every screening algorithm is validated on independent datasets across multiple sources, devices, and patient populations. All metrics are reported with 95% confidence intervals.

Screening performance metrics by condition
ConditionModalityAUC-ROCSensitivitySpecificity95% CI (AUC)
Diabetic RetinopathyFundus0.94990.1%82.8%0.938–0.960
GlaucomaFundus0.97993.0%94.0%0.972–0.985
Hypertensive RetinopathyFundus0.96893.7%89.1%0.956–0.978
AMD & DMEOCT0.99999.2%99.4%0.999–1.000

All metrics validated on held-out external datasets not used during model training.

Validation Methodology

How we ensure our screening models perform reliably across real-world clinical environments.

13
External datasets

External Dataset Validation

All models are evaluated on independent, held-out datasets that were never seen during training. Across all conditions, RetGuard has been validated on 13 external datasets sourced from institutions across multiple countries, device manufacturers, and clinical settings.

Multi
Device manufacturers

Cross-Device Generalization

Screening performance is tested across images captured by different fundus cameras and OCT devices to ensure the models generalize beyond a single hardware platform. This is critical for deployment across diverse clinical environments.

Global
Patient populations

Population Diversity

Validation datasets include patients across a range of ages, ethnicities, disease severities, and comorbidity profiles. This ensures the models perform equitably and do not degrade for underrepresented patient groups.

Clinical Robustness

Designed for real-world screening.

Model Design

  • Sensitivity-First Thresholds

    Operating points are tuned to maximize sensitivity — ensuring at-risk patients are flagged for referral, even at the cost of slightly higher false positive rates.

  • Calibrated Confidence Scores

    Every prediction includes a calibrated probability, not just a binary pass/fail. Clinicians see exactly how confident the model is for each condition.

Interpretability

  • Grad-CAM Evidence Maps

    Each result includes visual heatmaps highlighting image regions clinicians can verify.

  • Multi-Condition Correlation

    All five conditions are analyzed in a single pass, allowing the system to flag co-occurring conditions like DR alongside DME that are clinically expected together.

Safety & Reliability

Built-in safeguards that automatically flag suboptimal images and unusual inputs before they reach the screening algorithms.

Image Quality Assurance

Every image is automatically assessed for clinical gradability before analysis. Images that do not meet quality standards are flagged with clear guidance, prompting the operator to recapture and ensuring only reliable inputs proceed.

!

Input Validation

The system automatically identifies and flags inputs that fall outside its validated range. This prevents unreliable predictions and ensures clinicians are alerted whenever an image requires manual review.