In an era of big data, modern genome sequencing techniques allow individual research groups to sequence whole genomes quickly and cost-effectively, creating the possibility for large-scale genome mapping projects.
The Avian Phylogenomics Consortium, led by Erich Jarvis of Howard Hughes Medical Institute at Duke University, has undertaken just such a project: The international consortium has sequenced the genomes of 48 bird and three crocodile species.
The consortium’s first findings are published today in 28 peer-reviewed papers simultaneously released in scientific journals, including Science, Genome Biology and GigaScience.
The findings illuminate the evolution of the living species that descended from the survivors of the mass extinction that destroyed most of the dinosaurs 66 million years ago, with implications for the conservation of modern species and the understanding of vertebrate evolution.
But before the genomes could be analyzed, the massive datasets generated by the project begged a new question: How to securely store the genomic data so that it could be analyzed and shared among the researchers?
A New Day for Data
University of Arizona associate professor Fiona McCarthy, a researcher in the College of Agriculture and Life Sciences, member of the BIO5 Institute and expert on the chicken genome as a model avian species, has worked on the project since its conception.
"It was obvious to me that the only way we could do this project was to put the data on CoGe," McCarthy said. "It’s one thing to get lots and lots of information. It’s another thing to understand what that information means."
McCarthy immediately recommended to her collaborators that they store the genomic data on CoGe, a comparative genomics platform powered by the iPlant Collaborative. Funded by the National Science Foundation since 2008, iPlant provides the computational capacity and software for researchers to securely store, analyze and share massive datasets.
Developed by Eric Lyons, co-principal investigator of the iPlant Collaborative at the UA, and funded by the National Science Foundation, CoGe is a freely available online platform that enables researchers to securely store whole genomes, share the data among selected parties, compare multiple whole genomes at once, and search for specific genetic sequences.
McCarthy likened the platform to a library in which all the books (genomes) are organized on the shelves, and a search catalog lets researchers find specific pages (genetic sequences).
"CoGe can make everything more efficient so that you can find exactly what you’re looking for and compare information between sequences," she said.
"CoGe is the only tool I’ve seen where you can compare so many different species at once. You can force other genome browser tools to compare two or three genomes at a time. Now we’re looking at 40 bird genomes simultaneously, and the number is just going to go up as we get more sequence data."
McCarthy is a coauthor on three of the publications together with Lyons, an assistant professor also in the UA’s College of Agriculture and Life Sciences and member of the BIO5 Institute.
"Our main role was to get the sequence data for all the different genomes organized and all in one place — on CoGe — with the tools so that scientists can ask biological questions about these species," McCarthy said.
With 51 genomes to play with, biological questions abound. The genetic information has important implications for conservation, as well as understanding behavioral traits in modern birds and understanding more about human health.
"We’re looking to understand a lot about vertebrate evolution and development as a whole," McCarthy said. "By comparing genetic information between species on CoGe, researchers can use birds and other animals to understand more about human health and development."
She and Lyons are coauthors on a paper led by Jaime Gongora of the University of Sydney and published in PLoS One describing an area of the crocodilian genome relating to immune function. The information about the genetics of crocodile immunity is applicable to understanding how vertebrate species — including humans — recognize and fight infection.
Understanding the genetic relatedness of living birds such as the California condor can aid in conservation efforts for the species, she noted, while the genome of species such as the zebra finch, a songbird that is taught to sing by its parents, can help researchers understand the brain chemistry of learning.
"There is, of course, huge interest in understanding something about dinosaurs," McCarthy said.
'Jurassic Park' Revisited: Deriving Dinosaur Genomes
Birds and reptiles are the living descendants of the dinosaurs that survived the meteorite-triggered mass extinction that killed off most of the species living 66 million years ago. And of all reptiles, crocodiles are the most closely related to birds.
"These two sets of genomes have allowed researchers to try and reconstruct the dinosaur genome," McCarthy said.
Don't let the apparent physical differences deceive you: Millions of years ago, birds and crocodiles were one species, called an archosaur.
With the genomic data for birds and crocodiles available, researchers are trying to reconstruct the archosaur genome. Generating a genome is a far cry from growing a creature, but the jury is still out on whether having a dinosaur genome could eventually lead to growing a dinosaur as in the film "Jurassic Park."
"A creature’s genome constrains the possibilities for what traits might have been expressed," McCarthy said. "Reconstructing a genome tells us what was possible, but we still won’t know how the genes within that genome were expressed and when, which is what realizes the possibilities."
In other words, researchers can understand what traits archosaurs might have had, but they won’t know which traits were expressed.
Still, McCarthy said, "Birds and reptiles have a common ancestor. And we have the technology." Renowned paleontologist Jack Horner has suggested it might be possible to grow a dinosaur from a chicken, let alone the comparable knowledge from 48 avian genomes.
Dino-deriving possibilities aside, the ability to sequence so many genomes and compare them side-by-side opens wide the doors of investigation for a myriad of future studies.
"Now we have so much data, but we’ve got to actually understand what it all means," McCarthy said. "This wealth of information will not only impact our research but also our teaching. We’ve got students working on this data, and new opportunities to work with high school students and their teachers on something interesting for them."
Delving into the bird and crocodile data — now freely available to researchers and the public through the CoGe and Giga websites — may lead to answers to questions genomic biologists have yet to ask.
"That’s what’s really exciting," McCarthy said. "We don’t even know what’s going to come out of it."
Who knows? Maybe someday even an archosaur will be possible.
The bird and crocodile genome sequencing project was funded by the National Science Foundation, the National Institutes of Health and the U.S. Department of Agriculture, in addition to grants to individual research groups.