VOLGA at interim: the first scorecard call held
VOLGA, AstraZeneca's perioperative IO trial in muscle-invasive bladder cancer, reported interim results on May 14 that confirmed two direction calls the Living Scorecard locked in advance on April 10. The pCR magnitude numbers, and the call on triplet versus duplet, are pending the ASCO presentation.
On April 10, I uploaded a document to Zenodo called the Living Scorecard. It lists thirteen predictions across seven phase 3 immuno-oncology trials that have not yet read out. Each prediction has a direction, a confidence, a predicted magnitude, and an explicit falsifier. The falsifier is the observed result that, if seen at primary readout, would refute the call. The point of the document is to make the framework's predictions falsifiable in public before any of them are testable.
One of those seven trials is VOLGA. AstraZeneca, perioperative durvalumab and enfortumab vedotin in muscle-invasive bladder cancer, in patients who are cisplatin-ineligible or have declined cisplatin. Three arms: a triplet (durvalumab plus tremelimumab plus enfortumab vedotin), a duplet (durvalumab plus enfortumab vedotin), and a control arm of surgery alone. The scorecard locked three predictions, generated by a single deterministic function called `formula.predict_orr` applied to the three arm comparisons.
The three calls
Here is what the scorecard said, in plain language.
Prediction one. Triplet versus surgery. The triplet beats surgery alone on pCR. Predicted pCR around 40 percent. The call would be refuted if the triplet did worse than surgery.
Prediction two. Duplet versus surgery. The duplet beats surgery alone on pCR. Same predicted magnitude. Same falsifier.
Prediction three. Triplet versus duplet. Adding tremelimumab on top of the duplet makes no meaningful difference. The two arms come out within five percentage points of each other on pCR. The call would be refuted if the gap turned out to be larger than five points in either direction. The reasoning given in the scorecard was honest about the limits of the formula at this stage of the prediction: both arms saturate at the same depth ceiling in the underlying calculation, so the framework has no real opinion on whether incremental tremelimumab adds anything here.
That document went up April 10. The trial had not read out. Anyone could see the call.
The interim, and what scored
On May 14, AstraZeneca announced the planned interim analysis. The duplet beat surgery alone on event-free survival, statistically significant. The duplet also beat surgery alone on overall survival, statistically significant. The triplet beat surgery alone on event-free survival, statistically significant. The triplet showed a favourable trend on overall survival, not yet statistically significant at this interim. pCR is a secondary endpoint and the number was not in the press release. It will be in the conference presentation.
Now match the announcement to the predictions.
Prediction one said the triplet beats surgery on pCR. The interim shows the triplet beating surgery on both event-free survival and overall survival. The falsifier required the triplet to do worse than surgery, and that is now structurally ruled out. Direction confirmed. Magnitude pending the pCR number.
Prediction two said the duplet beats surgery on pCR. The interim shows the duplet beating surgery on both event-free survival and overall survival, the latter even more clearly than the triplet. The falsifier required the duplet to do worse than surgery. Ruled out. Direction confirmed. Magnitude pending.
Prediction three said the triplet and duplet would come out within five points of each other. The interim shows asymmetry: duplet OS significant, triplet OS only a favourable trend. The duplet looks at least as good as the triplet, possibly better. That is consistent with the framework's "no incremental benefit from tremelimumab at this depth" reading. But the call is on pCR, not OS, and the pCR gap is what determines whether the prediction holds. Still pending.
The magnitude override
There is one more layer the scorecard committed to in advance, which I want to be specific about because it matters more than the direction call. I attached a self-assessment to the magnitude prediction. The raw formula said 40.6 percent pCR. The self-assessment said that number is likely too high, because the formula saturates at the depth ceiling and the publicly reported VOLGA safety run-in at ESMO 2024 came in at 35 percent pCR. So the scorecard carried two competing magnitude predictions for VOLGA: the raw formula at 40.6 percent, and the editorial override saying "probably closer to 35 percent." The override has its own falsifier. If the observed pCR comes in at 38 percent or higher, the override is wrong and the raw formula was closer. If it comes in below 38, the override was the better call.
One thing worth being specific about. The raw 40.6 percent did not come from looking up NIAGARA or the VOLGA safety run-in. The formula generated that number from the framework's general urothelial-domain calibration and a structural depth ceiling. The run-in only appears in the editorial override, which used it to flag the raw formula as probably high. The number being predicted was generated structurally, not pattern-matched from a precedent.
Both numbers were locked in the same April 10 document. When the pCR number is public, both score.
What's still pending
The framework's value, if it has any, is in being callable in advance. Anyone can explain a trial result after the fact. The scorecard's proposition is that the same single deterministic function was applied to seven trials, returned thirteen predictions, and the predictions were locked with timestamps before any of them could be tested. One of those trials has now read out at the interim. The direction calls on the two WIN predictions are confirmed. The magnitude predictions, and the editorial override on magnitude, are awaiting the pCR number. The EQUIVALENT call on triplet versus duplet is also awaiting the pCR gap.
Twelve other predictions are still pending across the other six trials. ENERGIZE, IMbrave251, Harmony Melanoma, KEYLYNK-012 and DeLLphi-312 are next. If the framework generates predictions that hold up under this kind of locked-in-advance discipline, repeatedly, then it has earned a place. If it does not, the scorecard will record that too. Both outcomes are useful. The scorecard is not built to be right. It is built to be testable.
ASCO opens tomorrow in Chicago. The VOLGA pCR data did not appear in the regular abstracts that dropped on May 21, which means it is almost certainly a late-breaker, released at 7 AM CT on the day of presentation, sometime between May 29 and June 2. When the number is public, the full score, including the self-assessment override, will be posted here.
*Living Scorecard, version 1, DOI: 10.5281/zenodo.19502007. Uploaded April 10, 2026. VOLGA interim readout: AstraZeneca press release, May 14, 2026. Full pCR data expected ASCO 2026, May 29 to June 2.*
