A tweet came through recently about autonomous vehicles — Tesla versus Waymo — that turned out to be about something much larger. Something that applies to every prediction model ever built. Including mine, V85Modellen.
The argument draws on Rich Sutton's bitter lesson: general methods that scale with compute consistently beat hand-crafted architectures. Tesla dropped radar, then ultrasonics, and went full end-to-end — a single input type, learned directly from raw camera data without hand-crafted intermediate steps. Their performance on edge cases accelerated after the simplification, not before. Waymo went the opposite direction: LiDAR, radar, cameras, and a hand-coded fusion layer to reconcile them. They remain stuck in geofenced operations.
My first reaction was the obvious one: Tesla does not use one feature. That is true. But it misses the point.
Waymo's LiDAR is genuinely strong data. Arguably better than camera in low light and rain. Close to real-world vision in spatial accuracy. The problem is not the sensor. The problem is what it costs to keep it alongside everything else.
Every additional modality adds calibration, temporal alignment, conflict resolution, and a fusion layer of hand-written logic for when the sensors disagree. That fusion layer does not scale with compute. It scales with engineering hours. You cap your ceiling at what your team can reason about explicitly.
Dropping radar did not mean losing what radar knew. It meant forcing the vision system to learn it instead. Learning scales. Engineering hours do not.
One move, ten thousand times
Bruce Lee once said he feared more the man who had practised one move ten thousand times than the man who knew ten thousand moves but had practised each only a little.
That is the same insight from a different angle.
Tesla's end-to-end vision is the one move. Not one feature — one input type, trained at a scale nobody else matched. The model discovers hundreds of features inside that input: depth, velocity, occlusion, the way shadows shift before an obstacle appears. It finds them through repetition, not engineering.
Waymo is the man who knows ten thousand moves. LiDAR says this, radar says that, rule 4,847 handles the edge case where they disagree in rain on a left-hand bend. Each move is reasonable. But the repetitions are spread thin, and the ceiling is whatever the engineers decided to encode.
Apply this to trav
Now apply this to trav — Swedish harness racing.
A prediction model has a pipeline. Mine looks roughly like this:
Training → Scoring → Weight adjustment → Coupon optimisation
The last step — coupon building, budget constraints, spike selection — is irreducible. That is a combinatorial optimisation problem. No amount of training data makes it disappear.
The third step is more interesting. After scoring, I blend in signals that arrived after training closed: odds movement, bet distribution, late driver changes. That blending logic is hand-written. It does not improve with data. It improves with my reasoning, which means every edge case I have not thought of is a blind spot. The cleaner architecture is for those signals to enter at scoring time as features, so the model learns their relationship to outcome from the races directly.
But both of those are downstream problems.
The step that matters most is step one.
The decision before the model starts
Feature selection is the decision made before the model starts.
Everything downstream — scoring accuracy, weight adjustments, coupon quality — is bounded by what you let the model see in training. A weak feature set is a ceiling you build before the first training run. And you will never see what the model could have learned from something you chose to exclude.
You can only include features you can measure. That already biases you toward what has been tracked historically, what is easy to extract, what fits cleanly into a row. The things that actually move races — how a horse travelled through the field, a driver's confidence on a specific track, the way a stable's form cycles — are often invisible. You build around what you have and treat it as signal because it is all you have.
Then features start to calcify. Something correlates in an early version. It gets weight. The model learns to compensate for it elsewhere. Three versions later, you cannot remove it cleanly because everything around it has adapted to its presence. It was not necessarily good signal. But it is structural now — a wart on the roof that nobody planned for and nobody can easily remove.
This is how technical debt accumulates in a model. Nobody puts it there on purpose. It grows.
We are not alone in this
I know this because I have lived it. And it turns out, so has almost everyone else building in this space.
Over the past few months I have had contact with a number of developers working on similar problems in harness racing and horse racing prediction more broadly. Different approaches, similar struggles. Some have access to enormous compute capacity. Others have methodically accumulated over ten years of clean historical data — a compounding advantage that is almost impossible to replicate quickly. Others have dismissed certain features as irrelevant, only to quietly reconsider them a model version or two later. I have done the same.
I want to thank them for their openness and transparency. These conversations do not happen often enough in a space where everyone is ultimately competing for the same pool of money. But in the end, we all need to learn from each other to win. The problems we are each working through are more similar than the different architectures suggest.
Starting from scratch
The question I keep coming back to is simple: what would I include if I was building the feature set from scratch today, with no legacy?
Not what I have. Not what I have always used. Not what has always been available. What would I actually choose, knowing what I know now about which signals have proven meaningful across thousands of races?
The honest answer is that some of what is in the current model probably should not be. Features that made sense in v1, when data was thinner and the model needed all the help it could get, may not have earned their place since.
Tesla asked the same question about radar. The answer cost them a sensor that was genuinely useful. But it bought them a system that could learn everything that sensor knew, and more, without a ceiling on how good it could get.
Bruce Lee's punch still has mechanics. He just practised them until they were invisible.
The goal is not fewer features. It is earning every one of them.