Cyathea bryophila, a tree fern native to Puerto Rico and a close relative of one of the species analyzed in the study. (Photo: Susan Sprunt)
Cyathea bryophila, a tree fern native to Puerto Rico and a close relative of one of the species analyzed in the study. (Photo: Susan Sprunt)

Research Uses DNA to Study Key Events in Plant Evolution

An international research collaboration involving UA scientists and cyberinfrastructure provided by the iPlant Collaborative, based in part at the UA, has used DNA to look back in time at important turning points in plant evolution.
Oct. 28, 2014
Data analyses for the 1KP project are run on supercomputers such as Stampede at the Texas Advanced Computing Center. The iPlant Collaborative provides computational resources for life sciences research through supercomputers at TACC and the University of Arizona. (Photo courtesy of Texas Advanced Computing Center)
Data analyses for the 1KP project are run on supercomputers such as Stampede at the Texas Advanced Computing Center. The iPlant Collaborative provides computational resources for life sciences research through supercomputers at TACC and the University of Arizona. (Photo courtesy of Texas Advanced Computing Center)
A microscopic image of the simple filamentous alga Spirogyra found to be in the group of algae most closely related to land plants. (Photo: Michael Melkonian/University of Cologne)
A microscopic image of the simple filamentous alga Spirogyra found to be in the group of algae most closely related to land plants. (Photo: Michael Melkonian/University of Cologne)
Morning glory (Ipomoea purpurea), a beautiful flowering plant and agricultural weed. (Photo: Lindsay Chaney/Brigham Young University)
Morning glory (Ipomoea purpurea), a beautiful flowering plant and agricultural weed. (Photo: Lindsay Chaney/Brigham Young University)

As part of the One Thousand Plants (1KP) initiative, scientists from North America, Europe and China have published a paper in the Proceedings of the National Academy of Sciences that reveals important details about key transitions in the evolution of plant life on our planet.

"Our study generated DNA sequences from a vast number of distantly related plants, and we developed new analysis tools to understand their relationships and the timing of key innovations in plant evolution," said Jim Leebens-Mack, associate professor of plant biology at the University of Georgia and coordinating author of the paper.

Analysis of the DNA sequences of so many plants was only possible by leveraging the cyberinfrastructure computing capacity provided by the National Science Foundation-funded iPlant Collaborative, based primarily at the University of Arizona's BIO5 Institute.

UA evolutionary biologist Mike Barker, who has been involved with the 1KP initiative since its conception in 2009, contributed bioinformatics pipelines for high-throughput genomic analyses — as well as genetic information of a few fern species — to the paper.

From strange and exotic algae, mosses, ferns, trees and flowers growing deep in steamy rainforests to the grains and vegetables we eat and the ornamental plants adorning our homes, all plant life on Earth shares over a billion years of history.

The international research team is generating millions of gene sequences from plant species sampled from across the green tree of life. By resolving these relationships, the team is illuminating the complex processes that allowed ancient water-faring algae to evolve into land plants with adaptations to competition for light, water and soil nutrients.

Lead author Norm Wickett of the Chicago Botanic Garden described the study as "like taking a time machine back to get a glimpse of how ancient algae transitioned into the diverse array of plants we depend on for our food, building materials and critical ecological services."

"When plants colonized the land 450 million years ago, it changed the world forever," said Simon Malcomber, program director in the National Science Foundation’s Division of Environmental Biology, which funded the research. "The results of this study offer new insights into the relationships among living plants."

As plants grew and thrived across the plains, valleys and mountains of Earth’s landscape, rapid changes in their structures gave rise to myriad new species, and the group’s data also helps scientists better understand the ancestry of the most common plant lineages, including flowering plants and nonflowering cone-bearing plants such pine trees.

The investigation also has revealed a number of previously unknown molecular characteristics of some plant species that may have applications in medicine and industry.

"We are using this diverse set of sequences to make many exciting discoveries with implications across the life sciences," said Gane Ka-Shu Wong, principal investigator for 1KP, professor at the University of Alberta and associate director of BGI-Shenzhen. "For example, new algal proteins identified in our sequence data are being used to investigate how the mammalian brain works."

"Seeing the impact that 1KP has had inspired us to launch a series of 1000-species projects for organisms like insects, birds and fish," said Yong Zhang, director of the China National GeneBank, or CNGB.

Taming big data

The project required an extraordinary level of computing power to store and analyze the massive libraries of genetic data, which was provided by the iPlant Collaborative at the UA, the Texas Advanced Computing Center, Compute-Calcul Canada and CNGB.

"This study is very ambitious in the sense that we’re analyzing not just lots of species but many of the genes in these species," noted Barker, an assistant professor in the UA’s Department of Ecology and Evolutionary Biology. "It’s a new landscape of bioinformatics challenges that we are trying to overcome, and this pilot study is really the first attempt to bring everybody together that has a unique toolkit to bear on this problem so we can efficiently analyze the data."

Barker, a doctoral candidate at Indiana University during the conception of the 1KP project in 2009, met with a small group of scientists to compile the initial bioinformatics tools to analyze such a large data set.

"This study demonstrates how life scientists are using high performance computing resources to analyze astronomically large datasets to answer fundamental questions that were previously thought to be intractable," said iPlant’s Naim Matasci, now at the University of Southern California.

Working with Matasci, Barker’s lab contributed high-throughput bioinformatics pipelines developed using computational infrastructure provided by the iPlant Collaborative to enable the analysis of so many genetic sequences.

Computer scientist Tandy Warnow from the University of Illinois Urbana-Champaign and her student Siavash Mirarab developed new methods for analyzing the massive datasets used in the project. "The datasets we were analyzing in this study were too big and too challenging for existing statistical methods to handle, so we developed approaches with better accuracy," Warnow said.

Many organizations, including iPlant, CNGB and the Computational Analysis of Novel Drug Opportunities group at SUNY Buffalo have joined forces to provide web-based open access to these results. The resources and sequence repositories are described in a companion paper published in the open-access journal GigaScience.

The 1KP project is ongoing, Barker added, with analyses of additional plant genetic sequences continuously running on iPlant supercomputers at the UA and the Texas center.

Ultimately, the researchers hope that their project will not only help in an understanding of the origins and development of plant life, but also provide scientists with a new framework for the study of evolution.

"We hope that this study will help settle some longstanding scientific debates concerning plant relationships, and others will use our data to further elucidate the molecular evolution of plant genes and genomes," Leebens-Mack said.