After the Re-audit: Why the Tornado-ERA5 Signal Does Not Survive

Author
6 min read

A V2 re-audit of the original tornado-ERA5 draft found that the headline signal should not stand. Eleven rerun packets plus five verdict-only packets retired the original claims across five mechanisms: post-event leakage, multiplicity, catalog sensitivity, source absence, and confirmed negatives. This rewrite replaces the original narrative with what survived, what was retired, and how future AI-assisted research should be gated at n=0.

Update, May 2026: this page has been rewritten after a full V2 re-audit. The original draft argued for a detectable tornado-outbreak precursor signal in ERA5 reanalysis. The audit changed that conclusion. The headline signal should not be treated as established.

The Short Version

The original research asked whether tornado outbreaks leave a detectable pre-event atmospheric signature in ERA5. The answer from the re-audit is: not from this experimental program as written. The V1 paper's public claims collapse under corrected validation, source-read review, catalog checks, and stricter closeout discipline.

That does not mean every line of work was useless. It means the original post overclaimed from a pipeline that had not been audited hard enough at the first experiment. The useful result is methodological: it exposed the failure modes that need to be blocked before the next AI-assisted research project scales from one experiment to many.

What The Re-audit Found

The final Lane 2 tally closed eleven rerun packets and five earlier verdict-only packets. Across the eleven rerun packets, five mechanism families explain why the original claims should not stand.

Post-event or full-window leakage was the largest axis: V3, V6, V7, and eigenvector-modes all depended on features that were not safely bounded before the forecast anchor. Multiplicity and hyperparameter-selection failures explained V5 and V8. Catalog sensitivity explained V1 and V2, where the hand-curated original outbreak population did not survive checks against canonical catalog definitions. TDA introduced a separate source-absence axis: the analyzer source needed to reproduce the result was never committed. AT and V10 were confirmed as genuine negative controls rather than positive signals.

In count form: post-event/full-window leakage 4 of 11; multiplicity correction 2 of 11; catalog sensitivity 2 of 11; confirmed negative controls 2 of 11; source absence 1 of 11. The strongest positive finding is therefore not a tornado precursor. It is mechanism breadth in the failure of the original claims.

What Changed From The Original Draft

The original post presented the multivariate CAPE, shear, and low-level jet result as the first confirmed signal in the program. That claim does not survive the V2 outcome gate. The V11/V12/V13 family is retired with the broader multivariate-trajectory claim rather than retained as evidence for a public forecasting result.

The original post also framed a resolution wall and two detectability regimes: synoptic-scale multi-day outbreaks detectable in ERA5, single-tornado events unresolved at ERA5 resolution. That story is no longer a conclusion of this page. A higher-resolution HRRR-window study remains a plausible future question, but it is deferred and is not evidence for this rewrite.

The original topological result is also retired. The issue is not that persistent homology failed under one parameter choice; the issue is deeper. The checked-in record did not include the analyzer source needed to audit the result. A future TDA study would need to be source-first, fold-local, pre-anchor, and reproducible before it could contribute evidence.

What Still Survives

Some negative results survive. AT and V10 are useful because they show the audit machinery can confirm an intended negative rather than simply invalidate everything. V7 survives only as a reframed control or refutation role, not as a predictive signal.

The broader research direction also survives as a question, but not as a claim. It is still reasonable to ask whether outbreak-favorable environments can be detected weeks ahead of time from reanalysis or analysis products. The lesson from this audit is that the question needs a cleaner first experiment before it gets a sixteen-experiment scaffold.

The Method Lesson

The critical prevention lesson is n=0 validation. Before an AI-assisted research pipeline is copied into a family of experiments, the first experiment needs two gates. First, methodology review: read the code, trace features against anchor times, check catalog definitions, and verify the validation split. Second, tool validation: reproduce a known result on a public dataset or pass validated unit tests with known inputs and outputs.

Those gates sound basic because they are basic. They are also where the original project failed. Once the first scaffold was trusted, its assumptions propagated. The later audit had to discover post-event leakage, multiplicity gaps, catalog sensitivity, and missing source after the program already had sixteen experiments. The right blast radius is one experiment, not sixteen.

For future work, the bar is now explicit: canonical catalog definitions before modeling, pre-anchor features only, fold-local feature construction, multiplicity correction before claim selection, all analyzer source committed, and an n=0 sanity check against public data or validated base cases before scaling.

Author Note

This was an AI-assisted personal research project. The original draft was not good enough, and the re-audit changed the conclusion. That is the point of leaving this rewrite here: the useful artifact is the correction, not the old headline.

The project is now parked at a tightened public rewrite and internal cleanup. A methods paper is not being pursued in this cycle unless outside domain engagement makes it worth reopening. The durable win is author learning: the next research pipeline starts with n=0 review and a public-data sanity check before any pattern is allowed to propagate.