When Facebook announced in September that it would use all that personal data it collects to roll out a new ad platform to rival Google, privacy advocates groaned and marketers grinned.
But what if all that intelligence could be used to crack open one of today’s most pressing — yet least understood — public health issues?
That’s precisely the vision of the University of Arizona’s Daniel Zeng, MIS professor at the Eller College of Management, and Scott Leischow, adjunct faculty in the UA College of Medicine and professor of health services research at Arizona’s Mayo Clinic.
Fusing cutting-edge informatics and public health, their plan to scrape social media to create the world’s best data on e-cigarette usage and marketing recently won a five-year, $2.7 million grant from the National Institutes of Health.
The project will tackle four distinct goals. It will:
- Create a massive, real-time and continuously growing data set of what consumers and marketers say about e-cigarettes on sites such as Facebook and Twitter, as well as social media forums focused on e-cigarettes and "vaping."
- Mine that content for insights into why people use e-cigarettes, how they believe they affect their health and whether they help them quit smoking.
- Document the marketing landscape — all the ways brands and vendors use these channels to promote their products and how consumers respond.
- Integrate all of that information in the world’s first one-stop resource for wide-ranging data on e-cigarettes as revealed through social media as a tool for other researchers, health care professionals and more.
While e-cigarettes are relatively new in the U.S. — they were introduced in 2007 — sales are doubling annually and were expected to reach $1 billion last year. Even so, any time public dollars fund research, two questions naturally arise: Why study this? And why study it this way?
"There’s so much we don’t know about e-cigarettes," Leischow says. "The scientific community has found mixed data on whether they’re helpful for smoking cessation. We have questions about how different flavorings impact use, particularly among minors. And many health professionals worry that e-cigarettes may ultimately lead to more young people taking up smoking. All of these blind spots around a product that is still totally unregulated make this a top-priority area for the FDA."
As for why it makes sense to study e-cigarettes in this way, Zeng’s MIS expertise holds the key. By mining social media in real time, as Zeng and Leischow have proposed, there are a number of strategic advantages:
- Data comes from people interacting naturally in their day-to-day lives, thus removing “presentation bias” problems intrinsic in surveys.
- The data collection is automated, which means sample size is not constrained by how much money or how many eyeball hours researchers can muster.
- The lack of constraint also makes anecdotal information scientifically relevant: One personal story is just that, but 10,000 or 100,000 personal stories over time equal robust statistical data.
- Because content is processed by algorithms, not people, data is available in near real time, not months or even years after countless hours of labor-intensive review.
The world of e-cigarettes, like that of any niche product or interest, has its own specialized vocabulary of acronyms and slang, so the research team will first need to construct a base lexical dataset for “training” the computers that will collect and process content.
It’s also one thing to scrape words but a much more complex challenge to automate the process of extracting meaning, so that a computer can spot when someone cites a reason for using e-cigarettes or mentions how the products affect his or her health (both of which first require a computer to detect who is or isn’t a user) or correctly catalog the marketing strategy used in an advertisement.
"We basically will be creating a suite of novel technologies for this study using both established building blocks of informatics and methods that have yet to be developed," Zeng says, "including analysis and visualization tools that were developed here at the U of A. Beyond that, we’re relying on proven tools for pattern mining, group behavior prediction, social network analysis and a lot more, but in ways that have never been combined for this level of research and in this topic area."
For Leischow, the knowledge those tools will produce is invaluable.
"There are all kinds of messages out there, from how effective e-cigarettes can be to help smokers quit tobacco to how they’re totally harmless or taste like candy," he says. "It may be that e-cigarettes prove beneficial to public health, or they may be shown to do more harm than good. In either case, it often takes many years for experts to fully recognize how products are being used and how they impact well-being, and even longer for regulation to catch up.
"This time, it’s going to be different. This time, we’re getting out ahead."