Reversible A/B tests, by design
Every variant we ship is a diff. Every diff is reversible. Here's the model.
The single most important property of CloudRoad isn't that the ranker is accurate. It's that every action it takes is undoable. Here's how we built that into the product.
01Every variant is a diff
When the engine ships a thumbnail, it doesn't replace the previous thumbnail. It pushes a new variant into YouTube's A/B test slot alongside the existing one. The old thumbnail stays. The test runs. At the end of the test, the creator decides which one stays — or, if neither, both are kept and a new test is queued.
This is the substrate. We never replace. We always propose alongside.
02Every diff has an explicit rollback
Each variant carries a rollback record at ship time:
- The exact previous thumbnail (file + metadata).
- The CTR and retention baseline at the moment of ship.
- The kill-switch thresholds (CTR drop, retention drop, by surface).
- A one-click "revert to baseline" action available throughout the test.
If a creator clicks revert mid-test, the variant is pulled out, the old thumbnail returns, and the test is recorded as inconclusive. The ranker logs this as a negative signal. Reverting one variant is data, not failure.
03Auto-rollback is opt-in, not default
A common ask is: "auto-revert anything that's underperforming." We don't ship this as default, on purpose. CTR is noisy at the daily level. An auto-revert on day-2 numbers reverts a lot of variants that would have won on day-7. Creators who want auto-revert can enable it explicitly with their own thresholds. We default to: notify, don't act.
An action that can't be undone is an action we don't recommend. An action that auto-undoes itself is an action we recommend cautiously. An action a human chooses to undo is an action that worked exactly as designed.
04What this enables
Reversibility isn't a safety feature. It's the thing that makes aggression possible. Because every variant is reversible, the ranker can propose changes that are unfamiliar to the channel — a new framing, a new emotional register, a typography change — without the creator risking their CTR floor. The worst case is a one-week test that didn't move the number. The best case is a new pattern that lifts CTR for a year.
You don't tune what you can't undo. You don't experiment with what you can't roll back.
— Sam