(Image: National Science Foundation)

The history of our climate is written in ice. Reading it is a matter of deciphering the complex signals pulled from tens of thousands of years of accumulated isotopes frozen miles below the surface of Antarctica.

When making sense of the massive amount of information packed into an ice core, scientists face a forensic challenge: how best to separate the useful information from the corrupt.

A new paper published in the journal Entropy shows how tools from information theory, a branch of complexity science, can address this challenge by quickly homing in on portions of the data that require further investigation. 

“With this kind of data, we have limited opportunities to get it right,” says Joshua Garland, a mathematician at the Santa Fe Institute who works with 68,000 years of data from the West Antarctic Ice Sheet Divide ice core. “Extracting the ice and processing the data takes hundreds of people, and tons of processing and analysis. Because of resource constraints, replicate cores are rare. ”

By the time Garland and his team got ahold of the data, more than 10 years had passed from the initial drilling of the ice core to the publishing of the dataset it contained. The two-mile ice core was extracted over five seasons from 2007-2012, by teams from the multiple universities funded by the National Science Foundation. From the field camp in West Antarctica, the core was packaged, then shipped to the National Science Foundation Ice Core Facility in Colorado, and finally to the University of Colorado. At the Stable Isotope Lab at the Institute of Arctic and Alpine Research, a state-of-the-art processing facility helped scientists pull water isotope records from the ice.

The result is a highly resolved, complex dataset. Compared to previous ice core data, which allowed for analysis every 5 centimeters, the WAIS Divide core permits analysis at millimeter resolution.

“One of the exciting thing about ice core research in the last decade is we’ve developed these lab systems to analyze the ice in high resolution,” says Tyler Jones, a paleoclimatologist at the University of Colorado Boulder. “Quite a while back we were limited in our ability to analyze climate because we couldn’t get enough data points, or if we could it would take too long. These new techniques have given us millions of data points, which is rather difficult to manage and interpret without some new advances in our [data] processing.”

In previous cores, Garland notes that decades, even centuries, were aggregated into a single point. The WAIS data, by contrast, sometimes gives more than forty data points per year. But as scientists move to analyze the data at shorter time scales, even small anomalies can be problematic.

“As fine-grained data becomes available, fine-grained analyses can be performed,” Garland notes. “But it also makes the analysis susceptible to fine-grained anomalies.”

To quickly identify which anomalies require further investigation, the team uses information theoretic techniques to measure how much complexity appears at each point in the time sequence. A sudden spike in the complexity could mean that there was either a major, unexpected climate event, like a super volcano, or that there was an issue in the data or the data processing pipeline.

"This kind of anomaly would be invisible without a highly detailed, fine-grained, point-by-point analysis of the data, which would take a human expert many months to perform,” says Elizabeth Bradley, a computer scientist at the University of Colorado Boulder and External Professor at the Santa Fe Institute. “Even though information theory can’t tell us the underlying cause of an anomaly, we can use these techniques to quickly flag the segments of the data set that should be investigated by paleoclimate experts.”

She compares the ice core dataset to a Google search that returns a million pages. “It’s not that you couldn’t go through those million pages,” Bradley says. “But imagine if you had a technique that could point you toward the ones that were potentially meaningful?” When analyzing large, real-world datasets, information theory can spot differences in the data that signal either a processing error or a significant climate event.

In their Entropy paper, the scientists detail how they used information theory to identify and repair a problematic stretch of data from the original ice core. Their investigation eventually prompted a resampling of the archival ice core — the longest resampling of a high-resolution ice core to date. When that portion of the ice was resampled and reprocessed, the team was able to resolve an anomalous spike in entropy from roughly 5,000 years ago.

“It’s vitally important to get this area right,” Garland notes, “because it contains climate information from the dawn of human civilization.”

“I think climate change is the most pressing problem ever to face humanity, and ice cores are undoubtedly the best record of Earth’s climate going back hundreds of thousands of years,” says Jones. “Information theory helps us sift through the data to make sure what we’re putting out into the world is the absolute best and most certain product we can.” 

Read the paper, "Anomaly Detection in Paleoclimate Records Using Permutation Entropy," in Entropy (December 5, 2018)