For Mason senior, big data is a big deal
September 24, 2018 / by Damian Cristodero
Evan Cypher loves big data.
The George Mason University senior loves the idea that statistics, computer science and mathematics are, as he said, “all rolled into one.”
But what’s really special about big data, the economics major and Honors College member said, is that “it ventures into entirely new fields to discover insights and find more efficient ways to solve problems in marketing and business, pharmaceuticals and finance, anything you can imagine.”
By his own admission, Cypher wasn’t a numbers kid while growing up. But an Advanced Placement statistics class in high school piqued his interest because he said it showed him how to apply knowledge in different areas. His work as a junior with Mason statistics professor David I. Holmes solidified his enthusiasm.
Using stylometry, the statistical analysis of literary styles between authors, Holmes and Cypher researched the authorship of a 100-year-old book about how to cheat at cards. The research, funded through OSCAR (Mason’s Office of Student Scholarship, Creative Activities and Research), came up with two possible writers, Holmes said.
“Evan was immersed in it,” Holmes said. “He’s a great student, but what impresses me most is his ability to think outside the box, and he’s very good with software.”
A more business-centric application of big data was on display at last spring’s DC DataFest, which was hosted by Mason’s chapter of the American Statistical Association, of which Cypher is president, and included teams from 12 universities.
Cypher’s team won for best visualization for its presentation of how the internet job website Indeed can better connect U.S. health care employers with nurses.
Cypher said the team—which also included new alumnae Alyssa McDonald, BS Criminology, Law and Society ’18; Brooke Gipson, BS Community Health ’18, and Megan Maloney, BS Economics ’18—analyzed 20 million observations and 30 variables set up by Indeed for the exercise.
“We started by using different variable thresholds,” Cypher said. “We looked at how many clicks were on a posting, a measure of its popularity. We looked at which jobs were on the site the longest. The vast majority of them were almost all for licensed nurses. Why aren’t they clicking? Why aren’t they sending applications?”
The team used outside data to discover which states would have nurse shortages during the next few years. The conclusion was that Indeed could work with employers to fashion job ads that play up crucial selling points, such as offers to pay for continuing education and travel expenses. Working in states in which licenses were applicable regionally also could be a selling point.
“What’s really brilliant about big data is it offers individuals, researchers and companies an unprecedented amount of information so they can find increasingly accurate results,” Cypher said. “It offers information and outcomes that—before contemporary machine learning—would have never been possible.”