In the age of the internet, anyone can be a scientist.
Each week, citizen science platforms receive hundreds of thousands of new observations from everyday people who pause to snap pictures and record the plants, animals and insects they see. Citizen science data are big data for scientists, and while many educators engage their students in citizen science by asking them to contribute observations, fewer ask their students to analyze the data.
Wendy Clement, associate professor of biology at The College of New Jersey, called on the University of Arizona’s Katy Prudic and Jeffrey Oliver for help designing and teaching a course that brings the analysis of citizen science data into the classroom. Described in the online journal Teaching Issues and Experiments in Ecology, the cross-country collaborative workshop taught students how to ask and answer scientific questions while providing an entry point into the world of data science and computer programming.
Students were asked to identify a butterfly species and the host plant upon which it lays its eggs. Then, they were asked how the ranges of the butterflies and plants might change over the next 50 years. They downloaded data from a citizen science source and inputted that data into statistical models to answer the questions.
"Plant-insect interactions is a fairly common course," Clement said. "But there are fewer classes that mentor students in asking novel questions that can be addressed with data from citizen science resources."
No matter the data source, the scientific method must be followed to answer scientific questions.
"The scientific method has a pipeline where you gather, organize and analyze data," said Katy Prudic, assistant research professor in the College of Agriculture and Life Sciences.
The students used data uploaded to iNaturalist, a social network that citizen scientists use to record observations that are categorized by location and species. By using iNaturalist, the first two steps of the scientific method – gathering and organizing – were outsourced, allowing Clement’s course curriculum to hone in on the third step.
Analyzing the massive amount of data supplied by iNaturalist required an application of data science, a hard-to-define branch of science that includes contributions from many fields.
"If you ask three people what data science is, you’ll get three different answers," said Jeffrey Oliver, data science specialist for University Libraries. "It’s taking methods and applications from computer science and statistics and applying them to hard questions within the domain."
In this case, the domain was biology, though it could be any field from physics to art to business.
"I think of data science as using the scientific method to translate data to information. It's an interdisciplinary approach using statistics, data and computers," said Prudic, adding that she was the statistics expert on the team, while Oliver anchored the computer programming efforts and Clement was the domain expert.
Clement, Prudic and Oliver did not design the class to turn students into experts; they wanted to give the students an opportunity to discover and appreciate their own ability to analyze data. With Prudic providing the statistical models, Oliver crafted a program written in the computer language R that Clement's students could adapt to their needs.
Popular in life sciences, R is a programming language that can be used to perform statistical analyses and create visuals from data. Clement, who calls R "the future of biology," said many of her students had very little experience with coding and some were reluctant to try their hand at R.
"Students have to learn all the foundational concepts and minute details. On top of that, they have to learn how to use these tools for research," Clement said.
"Providing an entry ramp into this field is really important," Oliver said. "It helps to demystify computer programming."
Instead of trying to squeeze coding courses into an already-packed undergraduate curriculum, educators can try to adapt their current classes to fit the big data landscape. Integrating computer programming into existing courses gives students a gateway and the motivation to learn how to code.
"That’s what Dr. Clement was able to do in this class," said Oliver.
From 2,000 miles away, Prudic and Oliver were able to engage with Clement’s students in real time using video conferencing technology. If the students needed to know something about statistics, they could ask Prudic and receive an answer instantly. If they discovered a bug in the code, they could notify Oliver, and he could share a fixed version of the program immediately using the website GitHub.
Interdisciplinary collaboration occurs often in the research process, but it is a relatively novel approach in the classroom. Clement's course not only presented the students with a valuable lesson about cooperation, it also offered them an important piece of wisdom: "You don’t have to be an expert in every little thing," Clement said.
Clement, Prudic and Oliver hope that their idea will help many more educators get students involved and excited about data science.