Can Electronic Health Data Solve Medicine’s Reproducibility Problem?

According to headlines in recent years, coffee causes cancer…and it cures cancer; moderate alcohol consumption improves health…and adults should consume no alcohol for optimum health; drugs for osteoporosis cause esophageal cancer…and don’t cause it. And the list of incompatible results goes on.

“The research literature is filled with examples of studies reporting statistically significant results in conflicting directions, sometimes even when using the same data,” says George Hripcsak, MD, chair of biomedical informatics at Columbia University Vagelos College of Physicians and Surgeons.

The divergent findings reported in these headlines usually come from observational studies, a research design known for its poor reproducibility. In contrast to randomized controlled trials, which enroll patients before a study begins, observational studies analyze past events.

And though that approach is rational in theory, in practice it's fraught with problems.

"If you parse the medical literature you find that it's basically a data dredging machine," Hripcsak says. “There’s publication bias, when authors and editors tend to publish things that match the answers they were looking for, or that advance their careers, or that benefit the journal.”

With massive troves of electronic health data now available, Hripcsak and his colleagues think the time is right to change the way observational studies are conducted. Instead of performing one study and publishing (or not) one result at a time, electronic health data now allow researchers to answer thousands of questions at once.

“By disseminating all the findings, we not only provide new results on a large scale but also prevent the publication bias we see in the literature,” says Hripcsak. “The reader then sees all the results, not just the ones favored by the authors or editors, and other researchers can see a whole body of work to better judge if the methods are operating as they should.”

Their new approach is described–and used to examine treatments for depression–in a paper published this month in Philosophical Transactions of the Royal Society A by Hripcsak and a team of coauthors from Columbia, the University of California Los Angeles, and Jansen Research and Development.

5,984 studies conducted simultaneously with data from electronic health records

In the new paper, the researchers used unbiased algorithms and enormous databases of anonymous electronic patient records to perform simultaneously all reasonable observational studies about the side effects of depression treatments. The team defined 17 treatments, 272 pairs of combined treatments, and 22 outcomes—generating almost 6,000 research hypotheses and more than 55,000 control questions.

The analysis crunched data from hundreds of millions of patients and ran continuously for about a month on a powerful network of computers. The result was 5,984 calibrated estimates of effects from different treatments.

Reassuringly, the analysis yielded both positive and negative results, in a distribution that appeared to be more consistent with expectations of what fraction of drugs have side effects. In a small sample, most of the computer’s results matched findings known from randomized controlled trials.

Clinicians can use these results in the same way they now use results scattered across observational papers in the literature. “When data from clinical trials are unavailable, clinicians can make well-informed decisions based on evidence gained by our type of high-throughput observational research,” Hripcsak says.

New approach to observational research

Such evidence is more reliable than that contained in current published literature, he adds. In the same paper, the researchers used sophisticated computer algorithms to analyze the results of all published studies on treatments for depression. They found sharp cutoffs in the way effects are reported, pointing to a stark publication bias.

“The absence of negative results makes the published literature highly suspect,” Hripcsak says. “Did you have to do a billion studies to get those positive ones, in which case they're probably mostly chance, or did you do a small number, in which case the effect is probably real.”

Pointing to a graph of all of the new results, Hripcsak explains that "we're using state of the art methods for each of these dots, and we throw nothing away, we're transparent" about how the data were processed. Any one of the paper's thousands of results, in other words, would meet the current standards for publication in a peer-reviewed journal as an individual paper.

Hripcsak hopes the work will point the way toward a new approach to observational research. But that may take some time. Established researchers and publications have been reluctant to accept the idea of automated studies; the paper sat in review at one major journal for a year before being rejected.

With the publication of this proof-of-concept paper, though, the group now plans to proceed through other areas of biomedical research, eventually yielding a trove of results that any scientist or statistically minded physician can mine for insights.


The paper is titled “A systematic approach to improving the reliability and scale of evidence from health care data.

Other authors: Martijn J. Schuemie (Janssen Research and Development, Titusville, NJ); Patrick B. Ryan (Janssen Research and Development and Columbia University Vagelos College of Physicians and Surgeons); David Madigan (Columbia University); and Marc A. Suchard (University of California, Los Angeles). All authors are members of the Observational Health Data Sciences and Informatics program, an international network of researchers and observational health databases with a central coordinating center housed at Columbia University.

This work was supported in part through the National Science Foundation (IIS1251151 and DMS1264153) and the National Institutes of Health (R01LM06910 and U01HG008680).

The authors declare no conflicts of interest.