* This article was modified by May 1, 2015, updating footnote 12.
In 2010, two radionuclide stations in Northeast Asia detected radioactive particles that seemed to indicate that a nuclear explosion had taken place. While there are other possible explanations, other evidence seemed to suggest that North Korea had conducted a very small and otherwise undetected nuclear test. In the past few years, there have been a number of studies of radionuclide data, seismic data and now, on 38 North, satellite imagery.
While some of the evidence is intriguing, I don’t buy it. My objections are largely methodological—and methodological objections are important to me. Everyone who does analysis will be wrong from time-to-time. I try to be methodologically cautious so that, when I inevitably get it wrong, I will still feel like I made the right judgment based on the evidence available to me.
I think the hypothesis that North Korea conducted a nuclear test in May 2010 is a reasonable one worth considering. North Korea has conducted three nuclear weapons tests, presumably reducing the size and mass of the nuclear device, fixing whatever went wrong in 2006 and possibly confirming a design using uranium. It is possible that, along the way, North Korea conducted a low-yield science experiment or simply tested a dud.
Frankly, I’d love to be the person who proves that North Korea conducted a secret nuclear test. But, based on the evidence we have, I just don’t think it is more likely than not.
The 1979 Flash in the South Atlantic
First, a little history. In many ways, the debate over whether North Korea conducted a May 2010 test reminds me of a similarly ambiguous event in 1979.
The 1979 “flash in the South Atlantic” was precisely that—an optical detector called a “bhangmeter” on a US satellite detected a flash of light that looked a bit like a nuclear test somewhere in the South Atlantic. There was a lot of circumstantial evidence that pointed to Israel as the culprit. And I don’t mean ‘circumstantial’ in an insulting way. I mean that the prospect of a covert Israeli test seemed then, and still does today, totally plausible based on all kinds of evidence.
After the flash, the scientific community started scrutinizing every pool of sensor data to find the slightest corroboration. A few interesting things turned up, but nothing conclusive. There was some hydrophone data, but it required a sound wave to bounce off of Antarctica. There were claims of radioactive sheep thyroids that I’ve never been able to confirm. And so on. A full review of the evidence is beyond the scope of this little essay, but the approach raised a methodological concern. Spurious correlations are a statistical fact. A 90 percent confidence level means you’re still getting fooled 10 percent of the time. So, if you look hard enough for corroboration, you will find a few things, even if they are spurious. As a scientific panel charged with reviewing the data concluded:
We surmise that had a search been made for corroborating data relevant to a nonexistent event chosen to occur at a random time, such a search would have provided ‘corroborative data’ of similar quality and quantity to that which has been found during analysis of the September 22 signal.
To put it simply, one must be careful to avoid collecting coincidences that support a hypothesis while ignoring data that undermines a hypothesis.
Ultimately, the scientific panel decided to reject the hypothesis that the bhangmeter had seen a nuclear test for a simple, elegant reason: the satellite’s bhangmeter, like a pair of eyes, was two sensors, which saw different events. If something is far away—like on the surface of the earth—the two sensors are close enough that they should see the same thing. The fact that the two sensors saw something different, the panel reasoned, suggested the flash occurred in space very near to the satellite and not on the ground. This was an elegant answer. It also persuaded no one. People just simply accused the scientific panel of covering up for the Carter administration, Israel, etc.
I feel precisely the same way about the alleged May 2010 nuclear tests. As in the case of Israel in 1979, I have no trouble accepting that the DPRK might have conducted a nuclear test in May 2010. But, as in the case of 1979, the assembled evidence seems to be merely a collection of coincidences that we could collect for a nonexistent event on a randomly chosen day.
At the core of this problem is a reversal in how we think about detecting underground nuclear tests. The traditional thinking is that the correct way to “detect” an underground nuclear test is to spot it seismically. If radionuclides later appear, that helps “characterize” the seismic event as a nuclear explosion, rather than a conventional one. Generally, policymakers have been reluctant to rely only on radionuclide readings alone to “detect’ events for reasons that should become clear. The radionuclide community, however, is very excited about getting the same recognition as seismologists, especially now that computer simulations promise reliable methods to model the transport of radionuclides based on weather data. So, there may a bit of a disciplinary food fight here.
In May 2010, the DPRK released a series of statements that a “thermonuclear” reaction had occurred in April. In the months following the announcements, a well-respected Swedish radiochemist, Lars-Erik De Geer, correlated these statements with certain radionuclide readings collected by the Comprehensive Test Ban Treaty Organization’s (CTBTO) International Monitoring System (IMS). The data includes xenon isotope ratio measurements at a national radionuclide monitoring site near Geojin (South Korea) and an IMS site near Takasaki (Japan) and Barium/Lanthanum measurements at CTBTO IMS sites near Usurriysk in Russia and Okinawa in Japan. (Only Lanthanum (La) was detected at Ussuyriysk.) All these measurements occurred between May 13-18, 2010.
De Geer published his findings in a 2012 article in Science and Global Security. I was skeptical of the original De Geer paper because it posited an extraordinarily artificial scenario of the observed radionuclide readings. De Geer posited two undetected nuclear tests, conducted in the same chamber approximately one month apart.
A number of radiochemists reviewed and agreed with De Geer’s initial paper. One concluded that the evidence suggested a nuclear explosion, although he argued the radionuclide evidence was best explained by a single explosion and dismissing the xenon detections at Takasaki in Japan as coincidental.
De Geer himself concluded that the initial paper was in error, publishing a second paper in the Journal of Radioanalytical and Nuclear Chemistry. While De Geer’s first paper posited two undetected tests, the second paper posits only a single test on May 11.
Now, there are two ways to respond to this revision: I took it as confirmation of my original complaint that the scenario was being fitted to the data, raising serious methodological warnings. My colleagues, quite reasonably said, “Yeah, but the new scenario is pretty clean. What’s your objection to it now?”
Then along came another group of radiochemists, Ihantola et al, who agreed that a nuclear explosion occurred, but estimated the likely time of the event to be much later than De Geer’s estimate. De Geer and Ihantola et al posit very different explosion times, each outside of the error range posited by the other. Only the confidence intervals overlap, and just for a few hours.
So, here we are. Is De Geer right? Are Ihantola et al right? Or do we just shake our heads, muttering about how the data, like Jay from Serial, seems to always tell us what we want to hear? I don’t blame Wright for concluding that some of the readings might be unrelated to a test, but once we start tossing out awkward data, our thin methodological ice starts to crack.
Moreover, false alarms are possible. Nuclear power stations, reprocessing plants and other human events can result in releases that appear to be nuclear explosions. Early operation of a radionuclide monitoring system in Germany detected xenon spikes from nearby reactors. (The false alarm has led to better methods of characterization that emphasize isotoptic ratios, but these methods still struggle to distinguish an explosion from a fresh load of fuel.) In another instance, in 2004, a radionuclide station detected 140La that was later determined to have been from a military decontamination exercise. We are so worried about false negatives—missing a nuclear test—but we seem to never worry about false positives.
I still wonder about other possible explanations. Japan brought its Monju fast breeder reactor online on May 9, 2010. The reactor experienced a number of alarms on May 9 and 10, indicating radiation leaks. Although Japanese authorities later stated that the alarms were false alarms and turned off the alarm system, this possibility should be examined far more thoroughly than it has been to date. Similarly, China brought its first fast reactor online a few weeks later. Maybe the Chinese had a false start? These hypotheses strike me as equally likely as a North Korean test. They deserve the same scrutiny.
Sadly, the radionuclide background in Asia is getting worse, not better as more reactors come online. In particular, South Korea is planning to build a medical isotope production reactor planned for Busan that will produce a lot of radionuclide “noise.” North Korea’s 2013 nuclear test would likely have been lost in the background had this facility been operating at the time.
After De Geer’s initial report, seismologists began looking for events that would confirm an explosion.
Schaff et al closely examined seismic data from an IMS station in China on the days hypothesized in the original De Geer paper—April 14-16 and May 10-11. They found no evidence of an explosion in either period. For the crucial period of May 10-11, Schaff et al found no explosion down to a threshold of Mb=1.15. 
Zhang and Wen, working from the second De Geer paper, examined a wider range of dates. They identified a tiny event on May 12 using a method that looks for tiny events by matching (or cross-correlating) very small deviations at multiple regional seismic stations. This method should produce a lot of spurious correlations and the Zhang and Wen paper is a little vague about how many standard deviations the event in question represents. But in one study that looks at earthquakes using this method, even nine standard deviations above the mean resulted in something like one spurious correlation a day. This is some serious data mining.
That said, the Zhang and Wen event is still interesting. It occurred in the morning of May 12. This event is consistent with Ihantola et al’s estimate of a later event around 16:00 UTC on May 12, and is just inside DeGeer’s confidence interval, too.
Now, here is a problem. The event identified by Zhang and Wen probably did not occur until several hours after the DPRK released its statement about fusion. Moreover, the original DPRK announcement indicated that the fusion event had occurred on April 15. On balance, if the May 12 event was a DPRK nuclear explosion, it does not appear to be related to the announcement of a successful fusion event earlier in the day and referring to an event in April. In other words, the fusion announcement that De Geer emphasized so heavily in his paper was a coincidence. And, if there is one theme that I keep coming back to over and over again, it’s that we have to be cautious about building a case by collecting coincidences.
There is another problem that is worth pondering. The Zhang and Wen paper posits an Mb of 1.44. It is not straightforward to convert this to a nuclear yield for an explosion, although Zhang and Wen use a formula that yields (sorry about the pun) an estimate of about 3 tons. That’s a very small event. It isn’t clear why North Korea would conduct such a small test.
Although De Geer did not stipulate the size of the event necessary to produce the radionuclide signatures, others suggest the explosion must have been on the order of several tens of tons, if not more. That has lead proponents to argue that North Korea might have conducted a May 2010 explosion in a giant cavity that decoupled the seismic signal from the size of the explosion. Decoupling factors in hard rock could be as large as a factor of 40, transforming a 3 ton event into a 120 ton event. Constructing a cavity in hard rock large enough to decouple a 120 ton explosion would be quite an engineering achievement in hard rock. Moreover, construction of such a large cavity would surely have been noticed. This is yet another complication in the story, one that is plausible yet also unlikely.
There is some circumstantial evidence. To me, though, it just doesn’t hang together. The scenario in the original De Geer paper has been completely abandoned. If there was a test, it was one event, not two. And if there was a test, it occurred much later than De Geer initially thought, making the DPRK announcement a coincidence.
What we are left with are some interesting radionuclide readings, but it is possible to imagine alternative explanations for them. And, to my frustration, we haven’t seen a careful examination of those alternatives. Instead, there has been a tendency to build a case against the DPRK. Collecting coincidences makes me nervous.
I still think it is possible that the DPRK did, in fact, conduct a nuclear test in May 2010. But proving it requires more than just collecting data that corroborates the event, while ignoring alternative hypotheses and data that doesn’t fit.
Writing about the September 22, 1979 event, Pief Panofsky concluded the best description of the evidence was the so-called Scotch Verdict. In Scotland, juries may make one of three findings rather than two—guilty, not guilty and not proven. Panofsky was not quite prepared to acquit Israel or South Africa of having conducted a nuclear test, but nor did the evidence point conclusively to their guilt.
“Not proven” seems to be the right verdict in the case of the May 2010 event as well. It is worth noting, by the way, that there was—or rather could have been—a simple way to determine whether North Korea conducted a low-yield nuclear test in May 2010. If the CTBTO had been in force with North Korea as a member, the United States or any other State Party could have requested that the CTBTO conduct an onsite inspection. Radiochemists are undeniably proud that the CTBTO’s network of radionuclide stations detected hints of a possible nuclear explosion. But the system was never intended to function with sensors alone. The ability to conduct onsite inspections is an essential element in the regime envisioned to verify the worldwide ban on nuclear testing. In November and December 2014, the CTBTO conducted Integrated Field Exercise 2014—a simulated onsite inspection in Jordan. Absent such an inspection—or better evidence than has been found to date—the events of May 2010 remain interesting, but ambiguous.
Jeffrey Lewis is Director of the East Asia Nonproliferation Program at the James Martin Center for Nonproliferation Studies (CNS), Monterey Institute of International Studies, and a frequent contributor to 38 North.
 J. P. Ruina et al, “Ad Hoc Panel Report on the September 22 Event,” http://fas.org/rlg/800717-vela.pdf (accessed April 26, 2015).
 “DPRK Succeeds in Nuclear Fusion,” KCNA, May 12, 2010, http://www.kcna.co.jp/item/2010/201005/news12/20100512-05ee.html; “Swiss Committee Hails DPRK’s Nuclear Fusion,” KCNA, May 19, 2010, http://www.kcna.co.jp/item/2010/201005/news19/20100519-01ee.html; “DPRK’s Successful Nuclear Fusion Hailed,” KCNA, May 20, 2010, http://www.kcna.co.jp/item/2010/201005/news20/20100520-01ee.html. The initial announcement on May 12, 2010 was followed by two stories on congratulations sent from DPRK friendship organizations abroad.
 Lars-Erik De Geer, “Radionuclide Evidence for Low-Yield Nuclear Testing in North Korea in April/May 2010,” Science & Global Security 20, no. 1 (2012): 1-29; Lars-Erik De Geer, “ Reinforced Evidence of a Low-Yield Nuclear Test in North Korea on 11 May 2010,” Journal of Radioanalytical and Nuclear Chemistry (August 2013).
 Christopher M. Wright, “Low-Yield Nuclear Testing by North Korea in May 2010: Assessing the Evidence with Atmospheric Transport Models and Xenon Activity Calculations,” Science & Global Security 21, no. 1 (2013): 3-52. Another paper, by Wotawa, confirmed De Geer’s original two explosion scenario, but this scenario is no longer seriously considered by anyone, including De Geer himself. See: Gerhard Wotawa, “Meteorological Analysis of the Detection of Xenon and Barium/Lanthanum Isotopes in May 2010 in Eastern Asia,” Journal of Radiological and Nuclear Chemistry 296, no. 1 (2013): 339–347.
Sakari Ihantola, Harri Toivonen and Mikael Moring, “140La/140Ba Ratio Dating of a Nuclear Release,” Journal of Radiological and Nuclear Chemistry 298 (2013): 1283–1291.
 Matthias Zahringer and Gerald Kirchner, “Nuclide Ratios and Source Identification from High-Resolution Gamma-Ray Spectra with Bayesian Decision Methods,” Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 594 (2008): 400-406.
 “Seoul to Build Research Reactor for Production of Isotopes,” Korea Times, February 14, 2012, http://www.koreatimes.co.kr/www/news/tech/2012/02/129_104833.html.
 T.W. Bowyer et al., “Potential Impact of Releases from a New Molybdenum-99 Production Facility on Regional Measurements of Airborne Xenon Isotopes,” Journal of Environmental Radioactivity 129 (2014): 43-47.
 David P. Schaff, Won-Young Kim, and Paul G. Richards, “Seismological Constraints on Proposed Low-Yield Nuclear Testing in Particular Regions and Time Periods in the Past, with Comments on ‘Radionuclide Evidence for Low-Yield Nuclear Testing in North Korea in April/May 2010’ by Lars Erik De Geer,” Science and Global Security, 20, no. 2-3 (2012): 155-171.
 Miao Zhang and Lianxing Wen, “Seismological Evidence for a Low-Yield Nuclear Test on 12 May 2010 in North Korea,” Seismological Research Letters, 86, no.1 (January/February 2015): 1-8.
 Zhigang Peng and Peng Zhao, “Migration of early aftershocks following the 2004 Parkfield earthquake,” Nature Geoscience 2, (2009): 877 – 881. “We compute the median absolute deviation (MAD) of the mean correlation coefficient trace for each template event and use nine times the MAD as the detection threshold. For a normally distributed random variable, the standard deviation s is 1.4826×MAD. The corresponding probability of exceedance for 9 times the MAD, or 5.4 times the standard deviation σ, is 6.4×10-10. In a one-day period, we sample 172,800 time steps for each template event. So the chance of random detection using the threshold of 9×MAD is about one event per day, suggesting that most of the detections correspond to real events, instead of false detection by random chance.”
 The Zhang and Wen (2015) event occurred at 00:08 UTC (9:08 Korean Standard Time) on May 12, 2010. The North Korean media’s first account of the alleged April 15, 2010 “nuclear fusion” event appeared in the May 12, 2010 edition of the daily newspaper Rodong Sinmun, which would have gone to print the previous night. After the May 12 Rodong Sinmun appeared online, discussion of the “nuclear fusion” report began appearing in foreign newswires, starting with South Korea’s Yonhap News at 1:04 UTC, or 10:04 Korean Standard Time. Rodong Sinmun is not known to update in response to breaking events. It appears as a single, six-page daily edition, prepared in advance. The timing appears coincidental.