The slide came up at week six of the MFoCS.1 A scatter plot of single cells from a tumour sample, axes labelled CD4 and CD8. The clinician giving the talk circled a region with a laser pointer and called those cells "positive," like that settled the matter. What I wanted to ask — and didn't, because you learn quickly at Oxford that the second-year student is not the one who gets to question the consultant — was: where, exactly, did you draw that line?1. MFoCS thesis, Oxford · Lady Margaret Hall, 2024 — "Analysing and Advancing Automated Immune Biomarker Detection." Multiplex immunofluorescence data on tumour-associated immune populations.
That question became my thesis. Two tumour files from two distinct patients — one around 9,500 cells, the other around 5,500 — six immune markers per cell, and for each marker, a threshold that a pathologist had drawn by eye on a log-transformed intensity histogram. Move that line half a unit and the "positive" population shifts noticeably. Move it the other way and you import background noise. The biology hasn't changed. The category has.
§01Six markers, one panel
The dataset was a multiplex immunofluorescence panel of six markers — CD66b, CD56, CD4, CTLA-4, CD8, and CD20 — measured on tumour-infiltrating lymphocytes from two distinct patients.2 Each cell is a 6-vector of fluorescence intensities. The clinical workflow is to drop a threshold per marker and read off proportions in each gate. CD66b marks neutrophils. CD4 and CD8 split helper from cytotoxic T cells. CD20 picks out B cells. CTLA-4 is an immune checkpoint receptor — constitutively expressed on regulatory T cells, upregulated on activated T cells. CD56 marks natural killer cells. Together, the six markers give you a rough picture of how the immune system is engaging the tumour.2. Two tumour files from distinct patients, selected after discarding files with spatial artefacts from slide scanning. Each accompanied by a pathologist's ground truth — thresholds set by expert review of histograms.
A cell sitting just inside the CD4+ gate and a cell sitting deep inside it count the same. The gate is binary. The biology is not. That gap is where the interesting problems live.
§02Four ways to draw a line
The question I spent most of the summer on: can an algorithm draw the threshold as well as the pathologist? I tried four approaches, each with a different idea about what "the right place" means.
Otsu looks for a valley. It treats the intensity histogram as two classes and finds the threshold that maximises the between-class variance — the point where the two populations are most cleanly separated. It works beautifully on markers with a clear bimodal distribution. On anything else, it guesses.
IsoData iterates. Start with the mean intensity, split the data, recalculate each half's mean, set the threshold to the average of the two means, repeat until stable. It converges quickly and handles skewed distributions better than Otsu. It became the default thresholding method for later stages of the thesis — not because it won every marker, but because it lost gracefully on the ones it didn't win.
A modified Gaussian mixture model fits two bell curves to the distribution and sets the threshold where the curves cross.3 For markers like CD56, where the distribution has an awkward shoulder rather than a clean valley, the GMM outperformed everything else. I ended up writing a custom variant that handled cases where the standard fit returned more than two components — collapsing the extras into the nearest Gaussian before picking the crossing point.3. Standard GMM fits K Gaussians via expectation-maximisation. The threshold is the point where the posterior probabilities of the two components are equal. The custom modification handles the K > 2 case by merging components before thresholding.
Minimum cross-entropy takes an information-theoretic view: find the threshold that minimises the cross-entropy between the original histogram and a thresholded version of it. Elegant in theory. In practice, it was the most sensitive to distributional shape and the least stable across markers.
§03The marker that broke everything
CTLA-4 broke every method I tried. Its expression distribution is not bimodal — it's flat, spread, ambiguous. There is no valley for Otsu to find. IsoData converges to a threshold that's stable but not particularly meaningful. The GMM fits two wide, overlapping Gaussians and picks a crossing point that's as much a coin flip as a classification. Minimum cross-entropy was worst of all.
The pathologist's threshold for CTLA-4 is the only stable anchor, and it's arrived at by looking at the histogram and making a judgement call.4 There's nothing wrong with that — it's how clinical immunology works, and it works. But it does mean that "automated detection" has a hard ceiling on this marker: you can automate the easy calls, and for the hard ones you're still relying on exactly the kind of expert judgement you were trying to replace.4. CTLA-4 is an immune checkpoint receptor — it down-regulates T-cell activation. Its expression is continuous and context-dependent, which is exactly why it resists binary gating. The biology is genuinely ambiguous, and the histogram reflects that.
// pullYou can automate the easy calls. For the hard ones, you're still relying on the judgement you were trying to replace.
§04What 0.11 units hides
The number that stuck with me: move the CD4 gate from 4.39 to 4.50 — less than three percent of the intensity range — and the "positive" population shrinks visibly. The cells that vanish were sitting just inside the boundary. They weren't confidently positive; they were categorically positive, which is a different thing entirely.
This isn't a failure of the methods. It's the nature of the problem. When the underlying distribution has cells clustering around the threshold — which it always does, because biological expression is continuous — any binary gate will be fragile at the boundary. The gate is not a measurement. It's a decision wearing a lab coat.
IsoData was the most reliable method overall. The custom GMM earned its complexity for markers with awkward shoulders. Otsu was fine when the valley was clean and useless when it wasn't. Minimum cross-entropy was a good idea that didn't survive contact with real data. And CTLA-4 taught me that some markers simply resist automation — not because the algorithms are bad, but because the biology is actually ambiguous.
// pullThe gate is not a measurement. It's a decision wearing a lab coat.
The companion essay — Clustering with a conscience — picks up where thresholding stops: what happens when you try to classify cells by their full six-marker profiles, not one marker at a time.
— written from the King's Cross flat, on a Tuesday that felt like a Thursday.