The Optimizer’s Tax
Part 1: The Number Is Not the Thing
It's 6am. Evan is awake before his alarm, which is a small victory he won't get to enjoy, because the first thing he does is reach for his phone and open the app. Recovery: 31 percent. Red. HRV down 18 milliseconds from his baseline. The ring spent the night deciding he is not ready, and now he gets to carry that around.
He slept seven and a half hours. He feels fine. He had two glasses of wine at dinner and a hard upper day yesterday, both of which the number knows about in its own way, though Evan reads the red as a verdict on him rather than a readout of last night's pinot. He texts his training partner that he's going to keep it light today. Then he spends the drive to the office quietly worried about a body that, by every account that matters, is working.
I want to talk about that number. Not whether HRV is real (it is) or whether it tracks anything (it does), but about the gap between the thing on the screen and the thing Evan actually cares about, which is something like "am I healthy, am I recovering, am I building a body that will still work at 60." That gap is the whole subject of this piece. By the end of it you'll be able to look at any metric in your feed, the ones your watch reports and the ones the longevity accounts sell, and know within about thirty seconds whether it's worth chasing or whether chasing it is going to cost you.
Here's the frame. Picture the dashboard of a car.
Every gauge on it is a stand-in. The tachometer isn't the engine, it's a needle that represents how hard the engine is working. The fuel gauge isn't the gas, it's a float in a tank wired to a dial. The speedometer isn't your speed, it's a guess assembled from how fast a wheel is turning. None of these are the thing. They're all representations of the thing, sitting at some distance from it, and that distance is where all the trouble lives.
Call it the Proxy Gap. The Proxy Gap is the space between a number and what you actually want. HRV is not recovery. It's a measure that correlates with recovery, standing some distance away from it. Bodyweight is not health. Your one-rep max is not strength, not exactly. Every metric you've ever tracked is a gauge on a dashboard, and the question that decides whether the gauge helps you or fools you is always the same: how far is the needle from the road, and how easily can the needle move without the car going anywhere.
Some gauges sit close to the thing. The odometer is welded to reality. It only ticks up if the wheels actually turned, which means you cannot fake mileage without literally driving (or committing fraud, which is the whole point of this piece, but we'll get there). Other gauges float far from the thing and twitch at the slightest provocation. Rev your engine in neutral and the tachometer screams toward redline while the car sits motionless in the driveway. Big number. Zero travel. The needle moved and nothing real moved with it.
That's HRV at 6am. [fn-1] When Evan does his slow breathing during the reading, or sleeps on his back instead of his side, or skips the wine, the needle climbs. He feels productive when it does. He has not changed his recovery, his fitness, or his trajectory toward 60. He has revved the engine in neutral and watched the gauge respond. The reading is honest about one thing (the float moved) and silent about the thing he thinks it's telling him (where the car is going).
I'm not picking on HRV because it's useless. I'm picking on it because it sits at the far, twitchy end of the dashboard, and Evan treats it like the windshield. That's the move I want to take apart. Because there is a windshield. There is a way of knowing whether you're actually getting where you want to go, and it has nothing to do with any gauge. You look out the front of the car at the road and you see, directly, with no needle in between, whether the destination is getting closer. We'll spend the whole piece building toward what the windshield is in training. For now just hold the picture: a dashboard full of gauges sitting at various distances from reality, and a windshield that shows you reality itself.
The reason this matters, the reason it's worth eleven thousand words instead of a tweet, is that the entire wellness economy is built on selling you gauges and calling them windshields. Every week a new account discovers a new number and a protocol to move it. The number is always real. The correlation is always real. What gets quietly skipped is the Proxy Gap: the distance between moving the number and changing the thing, and the fact that the easier a number is to move, the less moving it tends to mean.
HRV, What You're Actually Measuring at 6am
Heart rate variability is the variation in time between heartbeats, and it does track autonomic state. Higher is generally better, lower generally means stress or fatigue or illness. That part is real.
What the app doesn't tell you is how many things move the number that have nothing to do with your fitness. Breathing rate is the big one. Slow your breathing during the measurement and HRV rises, because respiration directly drives the variation the device is reading. Body position changes it. Time of measurement changes it. A couple of drinks the night before reliably tanks it. Beta blockers and several other common medications blunt the autonomic signal the device is reading. And short morning recordings, the kind your ring takes, carry meaningful measurement noise from night to night even when nothing about you has changed.
None of this makes HRV fake. It makes it a gauge that floats far from the thing, with a needle that lots of irrelevant inputs can push. Useful as a long-run trend on yourself, where the noise averages out. Near useless as a daily verdict, which is exactly how it's sold and exactly how Evan uses it.
So that's the trunk: every metric is a gauge, the gauge sits at some distance from the thing, and that distance is the Proxy Gap. Hold that. The next question is what happens when you stop reading a gauge and start aiming at it. Because the moment a measure becomes a target, something strange and well-documented happens to it, and it has a name.
Part 2: Goodhart's Trap
In 1975 a British economist named Charles Goodhart was writing about monetary policy, of all things, and noticed something that turned out to be much bigger than monetary policy. Central banks would pick some statistic that reliably tracked the economy, declare they were going to manage the economy by controlling that statistic, and watch the statistic promptly stop tracking the economy. The regularity collapsed the instant they leaned on it.
The idea later got compressed into the sentence everyone quotes, usually credited to the anthropologist Marilyn Strathern: when a measure becomes a target, it ceases to be a good measure. That's Goodhart's Law. It's the thing that turns a useful gauge into a liar, and it does it through a specific mechanism: the moment you aim at the number, you start finding ways to move the number that don't move the thing. Not because you're cheating. Because moving the number directly is almost always easier than changing the underlying reality the number was supposed to represent, and effort flows downhill.
There's a tidier word for the failure underneath it: surrogation. You take the thing you actually care about, you find a surrogate that stands in for it, and then at some point the surrogate quietly replaces the goal in your head. You stop training for a durable body and start training for a green recovery score. You stop running the company and start running the dashboard. The map eats the territory. [fn-2]
If you want to see Goodhart's Law run as a clean experiment, with a control group and a body count, you don't look at fitness. You look at medicine, where they ran the experiment for real, on tens of thousands of people, with a drug.
The story is HDL cholesterol. For decades HDL was "the good cholesterol," and the epidemiology was about as strong as epidemiology gets: people with higher HDL had fewer heart attacks, consistently, across enormous populations. The correlation was real and it held across studies and it pointed in one obvious direction. Raise HDL, prevent heart attacks. So the pharmaceutical industry went and built drugs to raise it, a class called CETP inhibitors, and they worked spectacularly at the one job they were given.
Here is where the Proxy Gap turns into a graveyard. A drug called evacetrapib raised HDL by 133 percent. More than doubled it. It also dropped LDL, the "bad" cholesterol, by about 31 percent. By every number on the lipid panel, the dashboard everyone had agreed to trust, this drug was a triumph. The trial enrolled over twelve thousand high-risk patients to confirm the obvious win.
It did nothing. The rate of cardiovascular events in the drug group was 12.9 percent. In the placebo group, 12.8 percent. The trial was stopped early, not for danger, but for futility: the numbers had moved beautifully and the heart attacks hadn't noticed. You could double a person's good cholesterol and their actual risk of the actual thing sat exactly where it started. The gauge swung all the way across the dial. The car didn't move an inch.
HDL was never a lever. It was a gauge. People with healthy hearts tend to have high HDL the way well-maintained cars tend to have full tanks, but pumping the tank full from the outside doesn't maintain the car. The high HDL was a readout of something deeper going right, and when the drug forced the readout up without touching the deeper thing, the readout simply detached from the outcome it used to predict. That's surrogation with a NEJM citation. They aimed at the surrogate, hit it dead center, and missed the target completely.
The CETP Graveyard
Evacetrapib wasn't the only one, and it isn't even the most dramatic. It's the cleanest, which is why I led with it, and the cleanliness is the point.
The first CETP inhibitor, torcetrapib, didn't just fail. It killed people. The ILLUMINATE trial was stopped because the drug raised HDL 72 percent and increased all-cause mortality by 58 percent. But torcetrapib is a muddy example, because it turned out to have off-target effects: it raised blood pressure, bumped aldosterone, threw off electrolytes. A defender of the HDL hypothesis could say, fairly, that the drug was poison for unrelated reasons, not proof the target was wrong.
That defense is exactly why evacetrapib matters. Evacetrapib had none of that off-target baggage. It raised HDL more than torcetrapib did, dropped LDL hard, carried no blood-pressure penalty, and still produced a hazard ratio of 1.01. Clean drug, clean lipid win, zero outcome benefit. When you remove every confound and the surrogate still detaches from the target, you're not looking at a bad drug. You're looking at a bad target. Four drugs into this class, the lesson held: you cannot treat a gauge as a lever just because it's a good gauge.
Now carry that back to the dashboard in your own life, because the structure is identical and the stakes only feel lower because nobody's running a trial on you.
When Evan optimizes his HRV, he is doing, in miniature, exactly what the pharmaceutical industry did with HDL. He has a gauge that genuinely correlates with something he wants. He has decided to aim at the gauge. And the methods available to move the gauge directly (breathe slow during the reading, skip the wine, take the rest day the red score told him to take) are mostly not the same as the methods that would change the underlying thing. He will get a greener app and an unchanged trajectory. The number will rise to meet his effort because numbers are easier to move than bodies, and he will mistake the rising number for progress.
This is Goodhart's Trap, and almost nobody steps in it on purpose. You don't decide to chase the proxy instead of the goal. You decide to chase the goal, you pick a number to track it by, and then the number, being right there on your wrist and so much more responsive than your actual cardiovascular fitness, slowly becomes the thing you're working on. The trap isn't stupidity. It's that the surrogate is always closer to hand than the target, and we optimize what we can touch.
Which raises the obvious question, the one that turns this from a warning into a tool. If aiming at a number tends to corrupt it, why are some numbers so much more corruptible than others? Evacetrapib could move HDL 133 percent without moving hearts. But you cannot move a marathon time two hours without actually being able to run. Some gauges detach from reality the instant you lean on them. Others refuse to budge unless the real thing budges first. The difference between those two kinds of gauges is the most useful thing in this entire piece, and it turns out there are exactly three ways to move a needle without moving the car.
Part 3: Three Ways to Fake a Number
When people say a metric is "gameable" they usually mean it as one accusation. It's actually three different crimes, and telling them apart is what lets you predict, before you've wasted a year, which numbers will lie to you and which won't.
The first crime is moving the number at the moment you measure it, without changing anything that lasts. Call it measurement gaming. This is the tachometer in neutral. Nothing about the car has changed, you've just goosed the reading for as long as your foot is on the pedal, and the instant you stop, the needle falls back. HRV is the cleanest example in fitness. Slow your breathing during the morning reading and the number rises, because the device is partly reading your breathing. You didn't recover faster. You performed recovery for the sensor. Resting heart rate has the same problem from the other direction: measure it after coffee versus before, lying down versus sitting, and you get different "fitness" with the same heart. Anything you capture in a quiet snapshot, at rest, by a device, is exposed to this. The snapshot is a moment, and moments are cheap to stage.
The tell for measurement gaming is that the number improves faster than any real adaptation could. Your cardiovascular system does not meaningfully change in a day. If a number that's supposed to represent it swings 18 percent overnight, you are not watching your fitness change. You're watching the measurement conditions change, and reading them as news about yourself. [fn-3]
The second crime is subtler and it traps smarter people. Here the number is real and it's durable. You genuinely moved it, and it'll stay moved. The problem is the number was only ever a stand-in for something else, and you moved the stand-in without touching what it stood for. Call it proxy gaming. This is the gauge that's accurate about itself and silent about the thing you cared about.
Grip strength is the textbook case, and it's a good one because the underlying science is genuinely impressive. In the PURE study, a hundred forty thousand people across seventeen countries, grip strength predicted all-cause mortality better than systolic blood pressure did. Every five-kilogram drop in grip came with a 16 percent bump in death risk. That's a real, large, repeatable finding. So the optimizer reads that and starts training grip. Buys the gripper, does the dead hangs, adds forearm work.
And he will get a stronger grip. The number will move and stay moved. But grip strength predicts mortality because it's a readout of whole-body strength, neurological health, and not being sick, the way a firm handshake is a readout of someone being awake and well rather than the cause of it. [fn-4] Train grip in isolation and you've moved the readout without moving the body it was reading. You optimized the proxy. This is HDL all over again, just with a gripper instead of a drug. The needle was honest. It was pointing at something behind it, and you walked up and moved the needle with your hand.
The third crime sits in the middle and it's the most honest of the three, which is why it earns the middle of the spectrum later. Here the number is real and the thing it measures is real. The construct is legitimate. But technique, leverage, and practice let you inflate the output without much underlying change. Call it skill gaming.
Your one-rep max is the example every lifter knows in their body. A beginner can add fifty pounds to his "max" in a month without getting fifty pounds stronger, because most of what changed was skill: bracing, bar path, finding leverage, learning to actually express force he already had. [fn-5] Powerlifters take this further on purpose, with arched benches that cut the range of motion and gear that stores elastic energy, all of which move the number on the bar without moving the muscle underneath. The lift is real. The strength is mostly real. But there's slack between the number and the thing, and skill eats that slack first.
The reason skill gaming is the gentle crime is that it has a floor. Technique buys you a finite amount and then runs out. Once you've learned to brace and find your groove, the only way left to add weight to the bar is to actually get stronger. The proxy gap closes on its own as you stop being a novice. Measurement gaming never closes, because there's always a way to stage the snapshot. Proxy gaming never closes, because the proxy is permanently a different thing than the target. But skill gaming self-corrects, which is why a year-over-year strength number on a lift you've done for a decade is one of the more honest gauges on the dashboard.
Founders have all three crimes too, in case the body stuff feels far away. Measurement gaming is the vanity metric you check after a launch tweet, the spike that's gone by Thursday. Proxy gaming is optimizing signups when what you needed was retained users, hitting the number that's easy to move while the business it was supposed to represent sits flat, the startup equivalent of training grip and calling it health. Skill gaming is the deck that's gotten so polished it raises money the product can't yet back up, real traction underneath, but inflated, with the slack closing only when you actually have to ship.
Same three crimes. Different dashboard.
Why Grip Beat Blood Pressure
The PURE finding that grip out-predicted systolic blood pressure for mortality gets passed around as if it means grip is more important than blood pressure. It doesn't, and the gap between those two readings is the whole lesson of this piece.
Blood pressure is closer to a lever. Lower it and you change a real input to cardiovascular risk; the drugs that lower it actually prevent strokes and heart attacks. Grip is further toward pure readout. It out-predicts because it integrates more: it quietly samples your total muscle mass, your nervous system, your frailty, your whether-you're-secretly-ill, all in one cheap squeeze. A great gauge can absolutely out-predict a real lever, precisely because it's summarizing more of the territory at once. But "predicts better" and "worth targeting" are different properties, and Goodhart lives in the gap between them. The thing that makes grip a brilliant gauge (it reflects everything) is the same thing that makes it a useless target (moving it reflects nothing).
So now we have the machinery. Three ways a needle moves without the car: stage the measurement, move a proxy, or inflate with skill. Every "gameable" metric is gameable in one or more of these specific ways, and once you can name which, you can rank them. Because a metric exposed to all three crimes is a near-worthless target, and a metric exposed to none of them is the closest thing training has to a windshield. Lay every fitness metric out along that axis and a genuinely useful tool appears.
Part 4: The Gameability Spectrum
Here's the tool. Take every metric you might train toward and place it on a single axis, ordered by how easily you can move the number without moving the thing. Far end: trivially faked, exposed to all three crimes, moves overnight. Near end: can't be moved at all unless the real thing moves first. Call it the Gameability Spectrum.
The spectrum is just construct validity flipped over. Construct validity asks how well a number represents the thing it claims to. Gameability asks the same question from the cynic's side: how easily can I make the number lie. Same axis, read in opposite directions. I prefer the cynic's reading because it predicts behavior. When you know which crimes a metric is exposed to, you know exactly how it'll betray you the moment you start chasing it.
Run the fitness metrics down the line.
At the far, faintest end: HRV and resting heart rate. Exposed to measurement gaming at every reading, noisy night to night even when you don't try to game them, and a step removed from anything you care about even when measured perfectly. As a long-run trend on yourself, fine, useful even. As a daily target, the worst gauge on the dashboard. Optimize these and you will optimize the conditions of measurement, because that's the cheapest thing in reach.
A notch in: bodyweight. More honest than HRV, because it doesn't twitch with your breathing. But it's water and gut content and glycogen day to day, and it's a proxy even at its most accurate, since the scale can't tell muscle from fat from the sandwich you haven't digested. Real over long trends. Noise as a daily verdict.
Middle of the line: isolated grip and the one-rep max, sitting together but exposed to different crimes. Grip is proxy-gameable to the floor; you can move it all day and move nothing behind it. The one-rep max is skill-gameable but only down to its floor, after which it turns honest. This is why a max on a lift you've trained for years is trustworthy in a way a grip number never becomes. Time closes the skill gap. It never closes the proxy gap.
Toward the honest end: VO2max. This is the one that surprises people who've decided all metrics are equally suspect. You basically cannot fake it. There's no breathing trick, no measurement posture, no leverage that produces a high VO2max in someone whose heart can't move the blood. To raise the number you have to build the machinery the number is reading: stroke volume, capillary density, the actual cardiac and mitochondrial equipment. The gauge is welded close to the engine. When the speedometer says you're going fast, you are mostly going fast.
And the far, honest end, the windshield: performance itself. Not a number that represents the thing. The thing. Can you do the round you're training for. For the grappler that's a literal test, and it's the least gameable measurement in all of sport: produce your eighth hard scramble at the quality of your first. There is no breathing trick for that. No leverage, no staged snapshot, no proxy you can substitute. You either built the engine that buys back the burst or you didn't, and the test reveals it directly, the way looking out the windshield reveals whether the destination is closer. Call it the Round-Four Test: the metric that is identical to the construct, where the measuring and the performing are the same act. [fn-6]
For Evan it's the same windshield in a different car. Not his HRV, not even his VO2max, but: can he train hard four days running and not get wrecked, can he carry his kid up the stairs at the end of a bad week without his back filing a complaint, can he do the thing his body is actually for. The performance is the test. When the measure and the construct are the same object, Goodhart has nothing to exploit, because there's no gap left to game. The map has become the territory. You're not reading a needle anymore. You're looking at the road.
That's the resolution. The least gameable metric isn't a better gauge. It's the absence of a gauge, replaced by direct contact with the thing. Every step you take from the windshield back toward the dashboard, from performance toward VO2max toward strength toward grip toward HRV, you trade a little reality for a little convenience, and you open a little more gap for Goodhart to climb into.
What HERITAGE Actually Showed
I called VO2max nearly ungameable, and that's true against the three crimes. But it has a different, deeper kind of dishonesty worth knowing, because it keeps even the good gauge humble.
The HERITAGE Family Study put 481 people through the identical 20-week endurance program and measured how much their VO2max improved. The average gain was real and solid. The spread was the story: some people barely moved, some gained more than a liter per minute, on the same program. About 47 percent of the difference in trainability was familial, heritable. So two people can run the same protocol and one transforms while the other crawls, and neither is doing anything wrong.
This is why VO2max is a great gauge of where you are and a treacherous target for how hard you tried. The number partly reflects your training and partly reflects the genetic hand you were dealt for responding to training. Chase a VO2max target and you might be flogging yourself for a gain that was never available to you, or coasting on one that came free. Even the honest gauge isn't honest about the same thing for everyone.
The Study Nobody Will Ever Run
The fitness internet treats VO2max as the longevity metric, and the protocol that raises it fastest, the Norwegian 4x4, as the longevity workout. The Helgerud work is real: four four-minute intervals at near-max heart rate, three times a week, raised VO2max about 7 percent in eight weeks, beating moderate continuous training of the same total work. Fastest needle-mover on the dial.
But notice what that study measured: VO2max, over eight weeks. Not lifespan. Not heart attacks. Not whether the 4x4 group outlived anyone. To actually prove the 4x4 makes you live longer than easy zone 2 work, you'd need to randomize thousands of people to one protocol or the other and follow them for thirty years. Nobody has run that trial. Nobody will. It's too long, too expensive, and people won't hold still for three decades. [fn-7] So the strongest claim anyone can honestly make is that 4x4 moves the gauge fastest, and the gauge correlates with living longer. That the protocol maximizing the eight-week number is also the one maximizing the thirty-year outcome is an assumption wearing the clothes of a finding. It might be true. It has never been shown. This is the exact spot where the slide that started this whole conversation was right: even the honest gauge, chased hard, can walk you off the territory.
Part 5: The Optimizer's Tax
Everything so far has been a tool for telling good metrics from bad ones. Now I want to take the tool away, because the deeper problem isn't that Evan is chasing the wrong number. It's that he's the kind of person who chases numbers at all, and that turns out to be the rarer and more interesting affliction.
Goodhart's Law has a prerequisite almost nobody names: you have to care enough to optimize before a measure can corrupt you. The whole failure mode, the proxy chasing, the staged measurements, the surrogate eating the goal, only exists for people engaged enough to be aiming at something. And that's a tiny slice of people. Walk into any commercial gym at 6pm. The problem in that room, by overwhelming volume, is not that people are over-optimizing their HRV. It's that most of them aren't training with intent at all, have no metric, no plan, and no progressive overload. For them, this entire essay is noise. Their fix is to pick up something heavy twice a week and go to bed earlier, full stop.
So this is the part where I tell you the warning has a target, and the target is small. Goodhart is a disease of the engaged. It afflicts the founder with four wearables and the lifter who's read every Beardsley article and the grappler tracking his heart rate during rounds. It does not afflict the 95 percent whose actual problem is the opposite, who'd be transformed by doing almost anything consistently. If you've read this far, you're probably in the 5 percent, which means the tool in Part 4 is for you, and so is the bill that comes with it.
Because there is a bill. Call it the Optimizer's Tax. It's what you pay for the privilege of being engaged enough to chase metrics: the attention, the worry, the rest days you didn't need, the hard sessions you skipped because a number told you to, the slow erosion of trusting how you actually feel because an app feels more authoritative. [fn-8] Evan pays this tax every morning when a red recovery score talks him out of training he was ready for. He's not unfit. He's taxed. He has so much engagement that it curdles into anxiety, and the dashboard he bought to feel in control is quietly charging him in the one currency that mattered, which was just doing the work and getting on with his life.
Here's the move that cuts the tax, and it's almost rude in its simplicity. Look out the windshield. When you want to know if you're getting where you're going, check the thing, not the gauge. Did the round get easier. Did the weight on the bar that you've pulled for ten years go up this year. Can you do, in the world, the thing you said you were training to do. Those answers live in the territory, and the territory can't be gamed, because there's no needle between you and it.
There's one more piece, and it's the part the wearable companies will never sell you, because it competes with them directly. Who does the measuring changes how gameable the measure is. Your ring can be fooled, because it only knows what its sensors read, and sensors read snapshots, and snapshots can be staged. A coach standing on the mat watching you scramble cannot be fooled the same way. He's not reading a proxy. He's watching the performance with skin in your construct, integrating a hundred things no sensor captures, and he'll tell you you're gassing in round three no matter what your HRV said at 6am. The least gameable instrument in training was never a device. It's a good set of eyes that has watched you for years and wants you to actually get better. [fn-9] We replaced that with a ring because the ring scales and the coach doesn't, and we called it progress.
So here's where it lands, Evan. The body you actually want isn't on the dashboard. It's the one that shrugs off a bad week, trains four days running, picks up your kid without negotiating with your spine. You build it the boring way, by training hard and sleeping and letting the work accumulate on the one gauge that's welded to reality, which is what you can do this year that you couldn't do last year. The wearable isn't evil. It's just a dashboard, and you've been driving with your eyes down. Pick your head up. The road's right there, and it has been the whole time, and the morning you train through a red recovery score and have the best session you've had in a month is the morning you stop paying the tax.
This article is educational and not a substitute for medical advice. Consult a qualified professional before making decisions about your health.
Tier 1 notes
fn-1: Yes, your engine technically does some work to rev in neutral. The analogy holds anyway: motion toward the destination is zero, which is the only output that counts.
fn-2: "The map is not the territory" is Korzybski, 1931, and it's the cleanest one-line statement of the whole problem. Surrogation is just the map slowly convincing you it's the territory.
fn-3: This is the fastest single tell for a junk metric. Ask how quickly it can move. If a number that's supposed to represent a slow biological adaptation can swing overnight, most of what you're seeing is measurement noise, not you.
fn-4: Try the handshake version on yourself. A firm grip on meeting someone reads as "this person is well and present." Nobody believes you could become well and present by practicing handshakes. That's the entire grip-training error in one image.
fn-5: This is also why beginner programs look miraculous. The first few months of "strength gains" are mostly the nervous system learning to express force that was already there. Real, useful, and not the same as the hypertrophy people think they're watching.
fn-6: I'm naming it from grappling because it's where the test is most obviously identical to the thing. But every sport has its windshield. The musician has the performance, not the metronome speed. The founder has shipped product in users' hands, not the burn-down chart.
fn-7: Even if someone started it today, you'd get the answer in 2056, by which point the protocols, the population, and probably the definition of VO2max will have moved. Some questions are structurally unanswerable by the gold-standard method, and "what maximizes my lifespan" is one of them.
fn-8: The cruelest version of the tax is the erosion of interoception. Track long enough and you stop being able to tell how you feel without checking, because the app has become more authoritative than your own body. That's not a recovery aid. That's a dependency.
fn-9: This is the honest argument for hiring a coach that has nothing to do with program design. A good coach is an ungameable instrument pointed at you, which is a thing you literally cannot buy in any other form.
Sources
Catai et al. / Frontiers review, 2025, "Heart rate variability: a multidimensional perspective from physiological marker to brain-heart axis disorders prediction," Front Cardiovasc Med. https://www.frontiersin.org/journals/cardiovascular-medicine/articles/10.3389/fcvm.2025.1630668/full (HRV confounders, night-to-night noise, beta blockers / medications overriding the autonomic signal)
Mason et al., 2019, "Effects of slow breathing rate on heart rate variability and arterial baroreflex sensitivity," PMC6392805. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6392805/ (breathing rate directly drives the HRV reading)
Vaschillo et al., 2010, "Heart Rate Variability Response to Alcohol," PMC2964051. https://pmc.ncbi.nlm.nih.gov/articles/PMC2964051/ (alcohol reduces SDNN/pNN50/HF HRV)
Goodhart, C., 1975, monetary policy lectures; Goodhart's Law overview. https://en.wikipedia.org/wiki/Goodhart's_law (origin: statistical regularities collapse when targeted)
Strathern, M., 1997, "'Improving ratings': audit in the British University system," Eur Rev. https://en.wikipedia.org/wiki/Goodhart's_law (popularized the canonical phrasing "when a measure becomes a target, it ceases to be a good measure"; she was passing along Keith Hoskin's formulation, so "usually credited to Strathern" is the precise framing)
Lincoff AM et al. (ACCELERATE Investigators), 2017, "Evacetrapib and Cardiovascular Outcomes in High-Risk Vascular Disease," NEJM 376:1933-1942. https://www.nejm.org/doi/full/10.1056/NEJMoa1609581 (HDL +133%, LDL -31%, events 12.9% vs 12.8%, HR 1.01, stopped for futility)
Barter PJ et al. (ILLUMINATE Investigators), 2007, "Effects of Torcetrapib in Patients at High Risk for Coronary Events," NEJM 357:2109-2122. https://www.nejm.org/doi/full/10.1056/NEJMoa0706628 (HDL +72%, all-cause mortality HR 1.58; off-target BP/aldosterone effects)
Leong DP et al., 2015, "Prognostic value of grip strength: findings from the PURE study," Lancet 386:266-273. https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(14)62000-6/abstract (grip predicted all-cause/CV mortality, HR 1.16 per 5kg; out-predicted systolic BP; n≈140k, 17 countries)
Mandsager K et al., 2018, "Association of Cardiorespiratory Fitness With Long-term Mortality Among Adults Undergoing Exercise Treadmill Testing," JAMA Netw Open 1(7):e183605. https://pmc.ncbi.nlm.nih.gov/articles/PMC6324439/ (CRF inversely associated with mortality, no upper limit)
Bouchard C et al., 1999, "Familial aggregation of VO2max response to exercise training: results from the HERITAGE Family Study," J Appl Physiol 87:1003-1008. https://pubmed.ncbi.nlm.nih.gov/10484570/ (response ranged ~0 to >1.0 L/min, ~47% heritability of trainability)
Helgerud J et al., 2007, "Aerobic high-intensity intervals improve VO2max more than moderate training," Med Sci Sports Exerc 39:665-671. (4x4 raised VO2max ~7% in 8 weeks, beating moderate continuous of equal workload)