Epic Fails in Engineering – Mars Climate Orbiter
In a previous blog entry, we described a software error that led to the loss of the European Space Agency’s Ariane 5 spacecraft in June of 1996. In hindsight, the error—which basically amounted to a failure to recognize that the new and improved spacecraft would travel considerably faster than the one it replaced (who would of thought?), and as such, perhaps the guidance software should be upgraded—seems obvious. But lest we think our European friends are the only ones capable of making such a mistake, let’s look at another mission failure which started closer to home—at NASA—although, in this instance, the problem didn’t finally manifest itself until the mission was considerably further away—at Mars.
The mission was the Mars Climate Orbiter, a space probe launched by NASA on December 11, 1998 to study the climate and atmosphere of Mars, and to serve as a communications relay for the Mars Polar Lander, which was scheduled to reach Mars in December 1999.
On September 23rd, 1999, after a 9-month, 416-million mile journey, the Climate Orbiter spacecraft reached Mars. As it passed around the far side of the planet the signal was lost, as expected (due to the signal being occulted by Mars), but, 21 minutes later, once the calculated trajectory should have carried it back around the planet to a point where ground controllers on Earth would again receive its signal, there was, instead……silence. Anxious ground controllers searched in vain for its signal for two days, before the $125 million mission was declared a loss.
Post-failure analysis quickly identified a problem with the spacecraft trajectory. In particular, on-going, in-flight course corrections, along with a final trajectory correction maneuver, had been intended to place the spacecraft into an optimum orbit at an altitude of 140 miles above the planet. However, final calculations the day before insertion into orbit indicated the spacecraft would likely end up at an altitude of around 68 miles, only 18 miles or so above the minimum altitude of 50 miles that the spacecraft was thought to be capable of surviving. Although a final course correction was considered, it was rejected by managers. Post-failure calculations painted a much worse picture of the trajectory, indicating the spacecraft was on a flight path that would probably have taken it to within 35 miles of the surface—at which altitude it was likely destroyed, either due to stresses imposed by atmospheric drag, or when the remaining, highly-volatile hydrazine fuel in the propellant tank heated to the point of self-ignition.
So, what had happened? Here’s where the story gets almost hard to believe….Lockheed Martin Aeronautics supplied software that produced thrust vectoring results for the thrusters in customary United States units, while NASA’s Jet Propulsion Laboratory, who was responsible for the trajectory calculations, supplied software that required the results to be in metric units—which, by the way, was what had been required by the Software Interface Specification (in simple terms, the contract documents).
So when the in-flight course corrections were being made, the actual “corrections” were not of the correct magnitude and the spacecraft’s actual trajectory drifted further and further away from the required trajectory. The problem was compounded by the particular, asymmetrical configuration of the spacecraft, which, due to solar pressure-induced forces, resulted in 10 to 14 times more frequent minor course corrections than had been expected by the navigation team. Although there were also other clues that the trajectory was amiss, observational uncertainties, and a culture of “prove it’s wrong” rather than “prove everything’s ok” led mission managers to dismiss the concerns of the navigational team.
So, while at first glance, the unit conversion problem might seem to be the primary culprit that led to the demise of the spacecraft, things were not that simple. Rather, the process of specifying, designing, and integrating the systems necessary to send the spacecraft to Mars, along with the culture within the organizations responsible for doing so, was flawed, with problems at multiple levels…..which is oftentimes found to be the case in epic engineering failures….so although a technical problem may have directly caused the failure, human factors actually led to the failure.
Pictures courtesy of Wikimedia Commons.