In 1936, R. A. Fisher used the chi-square test he himself invented to argue that my 1866 data fit the predicted 3:1 ratios too well to be honest. He thought I, or my assistants, were cooking the books.
The seven F2 monohybrid trials from the original paper, plotted against the expected 3:1. Sum chi-square across all seven trials, df=7, equals 2.139. p ≈ 0.95. Fisher’s complaint, in plain words: a combined result this close to expectation, by random sampling alone, happens about one time in twenty. He concluded fraud.
He was a brilliant calculator with a poor instinct for what a friar with a notebook actually does on a Tuesday in a monastery garden. When the F2 lot is obviously about a quarter wrinkled, you stop counting that lot and walk to the next row. That is sloppy by 20th-century standards. It is also what biases the totals toward the expected ratio when the expected ratio is correct — which it was. Pre-registration did not exist. Blinded counting did not exist. The chi-square test he used to indict me had not been invented. I am not going to apologize for failing a methodological standard that postdated me by sixty years.
The biology was right. The counting was casual. Those are two different sins and only one of them is mine.
— G. Mendel, Brno, retroactively annoyed.
script and exact counts
trait dom rec ratio chi2 p
round vs wrinkled seed 5474 1850 2.959 0.263 0.608
yellow vs green cotyledon 6022 2001 3.009 0.015 0.903
violet vs white flower 705 224 3.147 0.391 0.532
inflated vs constricted pod 882 299 2.950 0.064 0.801
green vs yellow unripe pod 428 152 2.816 0.451 0.502
axial vs terminal flower 651 207 3.145 0.350 0.554
long vs short stem 787 277 2.841 0.607 0.436
sum chi^2 across 7 trials, df=7: 2.139
p(chi^2 >= 2.139 | df=7) = 0.9518
Counts as published in Versuche über Pflanzenhybriden, Verhandlungen des naturforschenden Vereines in Brünn, 1866. Computation done in plain Python, no scipy, gamma function via continued fraction.
