New Technique Searches ‘Dark Genome’ for Disease Mutations

‘Orion’ will help researchers identify disease-causing mutations in patients when current methods come up dry

August 10, 2017

New York, NY (Aug. 10, 2017)—When doctors can’t find a diagnosis for a patient’s disease, they turn to genetic detectives. Equipped with genomic sequencing technologies available for less than 10 years, these sleuths now routinely search through a patient’s DNA looking for mutations responsible for mysterious diseases.

Despite many successes, the search still comes back empty more often than not. In fact, disease-causing mutations are found in only about 1 in 3 to 4 patients suspected of having a strongly genetic condition.

A big reason why most investigations turn up empty-handed is the “dark genome.” Only 2 percent of the human genome is well understood by scientists. This small fraction contains the 20,000 genes that encode instructions for making the cell’s proteins. The remaining 98 percent—the “dark genome”—is largely a mystery. Although it’s known that the dark, non-coding genome regulates genes—turning them on and off, for example—the details remain obscure.

As a consequence, sequencing data from the entire genome “is currently considered almost uninterpretable,” says David Goldstein, PhD, the John E. Borne Professor of Medical and Surgical Research and director of the Institute for Genomic Medicine at Columbia University Irving Medical Center, and today’s genetic detectives restrict their search for disease-causing mutations to the sliver of genome that contains protein-coding genes.

To help locate pathogenic mutations in the vast non-coding genome, Dr. Goldstein and his colleagues Ayal Gussow and Andrew Allen have developed a new technique called Orion. Orion is designed to flag regions of the non-coding genome that are likely to contain disease-causing genetic changes by identifying parts of the genome that are under selection in the human population.

“We anticipate that researchers will immediately start using Orion to help them find pathogenic mutations in patients in which previous sequencing efforts were negative,” says Dr. Goldstein. Details about the method were published online today in PLoS One.

Orion was developed by comparing the entire genomes of 1,662 people and identifying stretches of DNA that vary little from person to person. Because these regions are “intolerant” to change, they are most likely doing something important, says Dr. Goldstein, lead author of the paper.

That means a mutation in an intolerant region is more likely to cause disease than a mutation in a tolerant (read: less important) region. This prediction was confirmed when the researchers mapped the locations of previously identified non-coding mutations: More mutations fell within Orion’s intolerant regions.

Previous methods to explore the non-coding genome focused on areas of the non-coding genome that have been retained in multiple species over evolutionary time, suggesting they, too, have an important function. However, this approach is not able to identify regions of the genome that have taken on important new functions in humans.

Orion isn’t yet a finished product, Dr. Goldstein says. As more genomes are sequenced, the resolution of Orion’s regions will improve dramatically.

“At that point, we are optimistic that Orion will constitute one helpful tool in the effort to identify variants throughout the genome that influence the risk of both rare and common diseases," says Dr. Goldstein.

The study is titled "Orion: Detecting Regions of the Human Non-Coding Genome that are Intolerant to Variation Using Population Genetics." Authors are Ayal Gussow (Duke University, Durham, NC, and Columbia University Irving Medical Center, New York, NY), Brett Copeland (CUIMC), Ryan Dhindsa (CUIMC), Quanli Wang (CUIMC), Slave Petrovski (CUIMC and University of Melbourne, Victoria, Australia), William Majoros (Duke), Andrew Allen (Duke), and David Goldstein (CUIMC).

The study was supported by the National Institutes of Health (1U01MH105670, 1UM1HG00901, F31NS092362, RC2NS070344; U01NS077303; U01NS053998, RC2MH089915, K01MH098126, R01MH097971, U01HG007672, and UM1AI100645); Biogen Inc.; SAIC Fredrick Inc.; the Joseph and Kathleen Bryan Alzheimer’s Disease Research Center; the Duke Center for HIV/AIDS Vaccine Immunology and Immunogen Discovery; the Bill and Melinda Gates Foundation; the Ellison Medical Foundation; and the Murdock Study Community Registry and Biorepository.

David Goldstein is a founder of and holds equity in Pairnomix and Praxis and receives support from Janssen, Gilead, Biogen, AstraZeneca, and UCB. The authors declare no other conflicts of interest.


Columbia University Irving Medical Center provides international leadership in basic, preclinical, and clinical research; medical and health sciences education; and patient care. The medical center trains future leaders and includes the dedicated work of many physicians, scientists, public health professionals, dentists, and nurses at the College of Physicians and Surgeons, the Mailman School of Public Health, the College of Dental Medicine, the School of Nursing, the biomedical departments of the Graduate School of Arts and Sciences, and allied research centers and institutions. Columbia University Irving Medical Center is home to the largest medical research enterprise in New York City and State and one of the largest faculty medical practices in the Northeast. Columbia University Irving Medical Center shares a campus with its hospital partner, NewYork-Presbyterian. For more information, visit or