2026-05-29

Can structure predict which drug works?

I think the immune response, and much else, runs as a cycle you can read, and that reading it tells you which intervention a patient needs. I would like to know whether that is true. The Living Scorecard is where I try to find out, in public and before the data, so the answer means something when it arrives.

This is an experiment, and I am running it in public. I think the immune response, and a good deal besides, runs as a structured cycle you can read, and that reading the cycle tells you which intervention a given patient needs. The honest question is whether we can actually do that, and not just say it. I do not know yet. So I make the calls in advance, in public, where finding out does not depend on my own account of how it went.

Anyone can explain a trial result the day after it reads out. The explanation will be fluent and mechanistic, and it will be unfalsifiable, because the result is already known and the story is being fitted to it. This is the ordinary condition of the field, and it is why so little of what sounds like understanding survives the next trial. So I commit the prediction first. Each one is timestamped and locked before it can be tested, and each one says plainly what result would tell me the reading was wrong. Without that, I would only ever be explaining, never learning.

On April 10 I uploaded the first version to Zenodo. It carried thirteen predictions across seven phase 3 immuno-oncology trials that had not yet read out. On May 14, VOLGA reported at interim, and the two direction calls it could test came back confirmed. That is the discipline. Make the call in public, before the data, and let the world answer it.

The unit: a drug, a step, a failure

The immune response to a tumour runs as a cycle. The cycle has sixteen positions, grouped into four regimes, and at each position the process can fail in a small number of characteristic ways. A drug acts at a position. So the smallest claim the scorecard makes has three parts: a drug, the exact step it acts on, and the exact failure it fixes there.

One example. Ipilimumab acts at Position 4, the T-cell commitment step, against over-correction. There is a step where a T cell decides whether to fully commit to attacking the tumour. One way that step fails is that the brakes are held too hard, regulatory suppression overshooting what the system is correcting for. Ipilimumab's mechanism releases exactly that brake.

The claim is conditional, and the condition carries the weight. Ipilimumab is the right move when the reason a patient is stuck is this failure at this step. Read that way, the drug has a defined population: the patients whose cycle is stalling at that step, for that reason.

Where the predictions come from

I built an engine that walks the model and proposes these claims. It works in three modes, and only the first is lookup.

The first mode places a drug where the established drug matrix already says it acts. Ipilimumab at P4. T-VEC at P1. This is the credibility floor: old knowledge made rigorous and public, so that when the engine gets the known calls right it has earned a hearing on the calls nobody has made yet.

The second mode moves a drug to a step it is not used for. A STING agonist is given clinically at the antigen-release step. The engine proposes it three steps downstream, at positions that share the same dynamical character as the one where it already works. The reasoning is structural analogy: this mechanism answers a particular kind of failure, and the cycle fails that way in more than one place. These are bets the literature does not yet contain, and they are the ones I am most curious about.

The third mode carries a structural rule across cycles that share only their shape. We hold that the same four-regime structure appears in a T cell, a falling leaf, and a dying star. A rule established on the cancer-immunity cycle can then be predicted to hold on the water cycle, or on stellar evolution, because they are the same process in different material. A prediction like that is only interesting if the rule is specific enough to be wrong on each substrate. A rule too loose to be wrong anywhere teaches nothing. Nothing in this mode reaches the public scorecard until it has earned its place by that test.

How a prediction is scored, and why it is locked

I score each hypothesis on three axes, each between zero and one. Structural, for how cleanly it fits the model, how canonical the placement is, how mature the drug is in the clinic. Evidence, for what a live sweep of PubMed, ChEMBL, and the trial registry returns at the moment the prediction is made. Novelty, for how far the claim sits from anything already in clinical use. From these come a confidence rating, an impact rating, and a composite that lets me triage the day's output.

The evidence is frozen at the moment of the call. The prediction stays readable later against the world as it was when I committed it, before any of it could be edited to fit. Predictions that hold stay on the page. Predictions that miss stay on the page too, and those are the ones I read most closely. A confirmation tells me the reading was good enough this time. A miss tells me something I did not know: that the rule underneath it is wrong somewhere, and roughly where. That is the more interesting result, and the one the whole arrangement exists to surface.

What I want it to be

A published hypothesis is half a tool. It is the drug side of a map: if a tumour stalls here, reach for this. The other half is the reading of where a given patient is actually stalling. Put the two together and the result is decision support: the failure read off the patient, the drug chosen to match it.

On its own, before that join, it already does work. It is a public track record accumulating ahead of the data. It is a way to read a trial someone else designed and say in advance which subgroup should respond. It marks the steps in the cycle where no approved drug yet sits, which is the same as marking where new ones are needed.

What is on it now

Most of what is on the scorecard now is the floor: known calls, made rigorous, made public. The harder material is coming, the off-step bets and the claims that cross from one cycle to another. VOLGA was the first call to hold. There will be calls that do not, recorded under the same timestamp and the same falsifier, and those are the ones I am waiting for. A confirmation is reassuring. A miss in a specific, named place is where I would actually learn something about the cycle. Doing this in public is how I keep myself from looking away from them.