星期日, 9月 11, 2005

統計/調查泛濫

近幾年新聞鍾意用很多調查或統計,又話全港有幾多十萬人有呢隻病,又話香港自殺率全球第幾高,又話市民普遍認為要幾多千萬元才夠退休,一直對這些所謂調查抱有懷疑,看了think3介紹的numbers guys專欄,覺得香港為何一直缺少這類肯反省統計泛濫現象的文章,揭破一些有商業或政治企圖,搏宣傳或怪力亂神的統計.

其實即使一些有standing的機構,或大學所作的調查,準確性也大有疑問,傳媒一方面在提供這些資訊作為談資之餘,也是否能容納多些獨立批判的空間?

介紹一段值得一讀的文章,若對統計誤用有興趣,可參看think3早前介紹的書list.

Scientific accuracy

...and statistics
Sep 1st 2005
From The Economist print edition

Just how reliable are scientific papers?

THEODORE STURGEON, an American science-fiction writer, once observed that “95% of everything is crap”. John Ioannidis, a Greek epidemiologist, would not go that far. His benchmark is 50%. But that figure, he thinks, is a fair estimate of the proportion of scientific papers that eventually turn out to be wrong.

Dr Ioannidis, who works at the University of Ioannina, in northern Greece, makes his claim in PLoS Medicine, an online journal published by the Public Library of Science. His thesis that many scientific papers come to false conclusions is not new. Science is a Darwinian process that proceeds as much by refutation as by publication. But until recently no one has tried to quantify the matter.

Dr Ioannidis began by looking at specific studies, in a paper published in the Journal of the American Medical Association in July. He examined 49 research articles printed in widely read medical journals between 1990 and 2003. Each of these articles had been cited by other scientists in their own papers 1,000 times or more. However, 14 of them—almost a third—were later refuted by other work.

Some of the refuted studies looked into whether hormone-replacement therapy was safe for women (it was, then it wasn't), whether vitamin E increased coronary health (it did, then it didn't), and whether stents are more effective than balloon angioplasty for coronary-artery disease (they are, but not nearly as much as was thought).

Having established the reality of his point, he then designed a mathematical model that tried to take into account and quantify sources of error. Again, these are well known in the field.

One is an unsophisticated reliance on “statistical significance”. To qualify as statistically significant a result has, by convention, to have odds longer than one in 20 of being the result of chance. But, as Dr Ioannidis points out, adhering to this standard means that simply examining 20 different hypotheses at random is likely to give you one statistically significant result. In fields where thousands of possibilities have to be examined, such as the search for genes that contribute to a particular disease, many seemingly meaningful results are bound to be wrong just by chance.

Other factors that contribute to false results are small sample sizes, studies that show weak effects (such as a drug which works only on a small number of patients) and poorly designed studies that allow the researchers to fish among their data until they find some kind of effect, regardless of what they started out trying to prove. Researcher bias, due either to clinging tenaciously to a pet theory, or to financial interests, can also skew results.

When Dr Ioannidis ran the numbers through his model, he concluded that even a large, well-designed study with little researcher bias has only an 85% chance of being right. An underpowered, poorly performed drug trial with researcher bias has but a 17% chance of producing true conclusions. Overall, more than half of all published research is probably wrong.

It should be noted that Dr Ioannidis's study suffers from its own particular bias. Important as medical science is, it is not the be-all and end-all of research. The physical sciences, with more certain theoretical foundations and well-defined methods and endpoints, probably do better than medicine. Still, he makes a good point—and one that lay readers of scientific results, including those reported in this newspaper, would do well to bear in mind. Which leaves just one question: is there a less than even chance that Dr Iaonnidis's paper itself is wrong?

沒有留言: