Lab meetings are very useful things. Yesterday we presented our paper (X, Y, & Z, in preparation) to the lab, my fellow graduate students and an assortment of undergraduate RAs. And one of the graduate students asked a very simple question that has snowballed into massive implications for the published paper (Y & Z, 2007). Meaning, the published data may turn out to be wrong. What was reported and published as significant, and thus meaningful, may turn out to be beyond the accepted cutoff for significant, and thus not meaningful at all.
What exactly is wrong? It was really quite basic. To answer the key question in both of these papers, something has to be factored out. I got quite sick of writing "even controlling for [average z-score]", but it was really an important point. The grad student asked a very simple question about what the difference between groups on this z-score was. And we discovered that the z-score measure was wrong.
For those not in the know, a z-score makes data relative. It tells us how far from the average a certain score is. If the z-score is -2, it's very far smaller than average; if it's 0, it's exactly average; if it's +1.5, it's higher than average. The problem is that I told my statistics package, "give me the z-score for Measure1. give me the z-score for Measure2. give me the average of those z-scores", not realizing that "smaller than average" is good for Measure1 but bad for Measure2. So this average z-score is 0 if both scores were good, or if both scores were bad. In sum, were weren't "controlling for [average z-score]" at all, we were controlling for something entirely nonsensical.
The good news it that this doesn't affect my data. I fixed the z-scores. The statistics actually turn out slightly better for all the critical measures, so I just had to update a lot of post-decimal point numbers to reflect the changes.
The bad news is that, as far as I can tell, this really does affect the original, published study. The statistics have gone from being "significant" to being what we grad students usually term "marginally significant" when forced to present something, anything, to the rest of the department. It's enough to make us think the effect is real, but nowhere near strong enough for publication.
I could be wrong. I hope I'm wrong. I am waiting confirmation from Y that I understood his near-incomprehensible column headings on the original data file. With any luck, I pulled the wrong column for the raw data, and the right column will magically leave the important effect as significant. Maybe I just have too much of an ego to doubt my interpretation of the column names, but I don't think I'm that lucky.
This is too much for a graduate student working on a first publication. There aren't even any poster presentations on my CV, and now I may be about to find out what happens when you find a massive error in data after something has already been published. This strikes me as being rather backward in training. I'd prefer to have the confidence of an actual publication before discovering that publications can be just plain wrong and perhaps shouldn't be trusted.
At least I can console myself that the "obvious" error in the z-score calculation was also made by Y, who was an experienced post-doc at the time (now an assistant professor). I'm sure this will be great comfort as I spend the next 24 hours stepping through each data calculation and analysis for the nth time just to make sure that all the statistics are as accurate as we can make them.