By now, everyone with access to a newspaper or an Internet connection knows just how badly the first few weeks have gone for the U.S. healthcare insurance portal HealthCare.gov and some of the independent state-run insurance marketplaces. To say we are experiencing technical difficulties is an understatement. Despite testimony last week from the main contractors for the federal website and from Health and Human Services Secretary Kathleen Sebelius, we still have few details about what precipitated such a colossal failure and why no alarms went off before the site's ill-fated 1 October debut. IEEE Spectrum Assistant Editor Willie D. Jones talks with risk management expert Robert Charette about what likely happened and what we can expect going forward.
Jones: You wrote an article for IEEE Spectrum in 2005 that could have been used as the playbook for the HealthCare.gov debacle. I guess you’re somewhat of a prophet.
Charette: You can go back to the first book I wrote on software systems risk management back in 1989. It’s easy to make any project fail. You just don’t give it enough time, enough money, or requirements that are understandable.
Jones: Last week, Health and Human Services Secretary Kathleen Sebelius assured her inquisitors at a congressional hearing that her department has brought in experts that have a handle on the problems the site is facing. How confident should we be in Sebelius’ assurances?
Charette: Not very. They’re talking about dozens and dozens of items on their punch list—both in terms of functionality and performance issues. They’ve got just over 30 days to get through the list. Let’s just say that there are 30 items on it. What do you think is the actual probability of getting through testing them, making sure that the system works end to end and that there are no security holes all in a single month? How do you expect to get that done, knowing that every time you make a fix, there’s a high probability that you’re going to introduce an error somewhere else?
Jones: Let’s spin this forward a bit. How do you think this next month will actually go?
Charette: They said that they needed five weeks at the minimum to test it, and they’re still making all these changes. Where will that five-week window fit? If they had stopped right then and tested it for five weeks, they wouldn’t have been able to finish on time. And five weeks was probably the absolute minimum they needed, assuming everything worked. They’re patching the system as they go along and as Sebelius admitted, they’re doing very local unit tests (which, by the way, is what got them into this mess in the first place, with each contractor saying, "Well, my stuff works"). If they discover something major, they may have to run the whole system test again.
Jones: So they’ll most likely gain functionality, but security is not a given.
Charette: Yes, unfortunately. It would be very surprising if there isn’t some type of breach, either at the federal or state level, by this time next year. If you can breach some of these high security defense department or intelligence systems, what’s the probability that the Obamacare website is not going to be breached? That likelihood approaches zero if it isn’t zero.
Jones: So what does that mean for the average person who doesn’t get employer-sponsored insurance?
Charette: I’m one of those people. And what it means is that I have to do a personal risk assessment. I’m asking myself, What’s the risk of having identity theft and the subsequent issues that come with that versus not using the federal marketplace and paying a higher insurance rate because I can’t afford to have something like that happen?
Jones: Sounds like you're damned if you do and damned if you don't. How did we get to this point? Somebody had to know on 31 August that something was about to hit the fan. Why would they let Sebelius and President Obama wake up on 2 October with such a massive amount of egg on their faces?
Charette: No matter what they say about bad communication and bad coordination, this is all about plausible deniability. Once it was real apparent that this thing was going to be a turkey, nobody wanted to say a thing. In the military, we used to have this game of chicken. If you were building a submarine or an aircraft, there would be one group producing the software and another group building the hardware. The game came down to who would have to admit failure first. In other words, “Can we keep the lid on our problems until they admit that they have a problem?” If the other group caves in, at that point we can say, "Yeah, we have a problem too, but we’ll still be done before they get fixed." If what I’ve been reading over the past week or so is true, that’s what’s been happening here. You’ve got these different groups that were in charge of different portions of the Obamacare website, each hoping that the others were going to admit that there was a problem. And when none of them admitted it and the thing blew up on 1 October and the days following, everybody said, "Well, we didn’t know." I wonder if they really think we’re all that stupid.
Jones: Do you think one of these hearings will get down to the nitty gritty in terms of technical details related to what went wrong?
Charette: Probably not. What’s really going to be interesting is to see who comes out with the first book about this whole thing.
Jones: So, what’s your take on how things went so terribly wrong?
Charette: HealthCare.gov is a huge system of systems and it’s extremely difficult to manage these things even in the best of times. That’s mostly because you have so many different interfaces with so many different assumptions controlling how the individual systems operate. And they’re rarely built with enough flexibility to be used by lots of other systems. If you take a look at the IRS systems, the Department of Homeland Security systems, or any of the other ones we’re talking about, they were never created to be connected to something like HealthCare.gov. So you have massive risk at each interface in terms of just trying to get the assumptions—stuff like how data is formatted, how data should be captured and passed back and forth—to align. Imagine being given Lincoln Logs, an erector set, and Legos and saying, "I’m gonna make something where everything fits properly." That’s not likely to happen. It takes a tremendous amount of time just to understand how things operate, so that when you begin to design things, you can actually have information pass through all these interfaces seamlessly. Any problem at any one of these junctures will cause a person’s application to stop.
Jones: Anything else catch your eye?
Charette: I haven’t seen any indication of who’s doing the configuration management. Typically in a large system, you have a master configuration list and a change control board, so people can say, "This is what you’re going to change; this is what the effects are going to be." And it’s a fairly rigorous process, especially if you have a system that’s very complex like you do here. What you don’t want is somebody making a change that nobody else knows about. Just getting hold of who’s doing what to what, and understanding what the implications are is itself going to be a huge undertaking. For every bit of software, you want to have some release control. I haven’t seen anything to suggest that type of management. But then again the management has been extremely opaque.
Jones: And I’ve heard you mention that the Centers for Medicare and Medicaid Services (CMS) deciding to run the show was also a mistake.
Charette: CMS doesn’t even have a track record as a system integrator, and system integration is the hardest job of all. You have to have a full understanding of the system. You have to make these tradeoffs, and you have to be almost dictatorial about making decisions and making them stick. But of course once you do, you also have to be able to reverse a decision very quickly upon figuring out that, oops, that isn’t how things work. And from all the press reports, CMS was the wrong organization, had the wrong expertise, and did not have the management capability to be the entity that has to make those decisions.
Jones: We learned during the hearings with the contractors and with Secretary Sebelius that the group writing the code was told, just weeks before the 1 October rollout date, to turn off the setting that would let people browse without signing up. Was there any technical justification for that?
Charette: That was a political decision that had nothing to do with technology. In fact, that made their technological solution a hell of a lot harder. I would speculate that the reason they did that was so that they could claim that they had all these people signing up. It was against standard acceptable practice on online shopping. And nobody’s fessing up to who made that decision or why.
Jones: I’ve heard HealthCare.gov compared unfavorably to the rollout of the Medicare Part D website. Why was that handled so much better?
Charette: That’s a stupid comparison. If you want to have an honest comparison, this is more like what the U.K. has tried to do with universal credit, which has been a giant fiasco. You’re talking about something that few programs have had to face. To be honest I can’t think of one other project you can point to and say it’s on this magnitude in terms of political, technical, and scheduling constraints. There are other ones that have been as complicated, but they started off with much bigger budgets, much longer lead times, and they still couldn’t deliver.
Jones: We can’t end this discussion without talking about money. In a June report, (pdf) the U.S. Government Accountability Office has said that US $394 million has already been spent on the project. How much cash is the government likely to shell out going forward?
Charette: It’s likely to be in the billions by the time everything is said and done. It’s hard to give an exact figure because of the way the contracts are managed. There’s a lot of ways to put money toward Obamacare IT but not have it ever show up as such. So it’s going to take a long time for even the government auditors to figure out how much money is being thrown at this thing. The bigger issue to me is not the rework cost, but how much money is going to be spent maintaining this thing. Changes to the law will have ripple effects across every one of the interfaces in this system of systems, which will mean changes, the possible introduction of errors, and end to end testing. If we extrapolate, the maintenance cost is likely to be three to five times higher than the development cost over the next 10 or 15 years.
Robert Charette is President of ITABHI Corporation, a business and technology risk management consultancy. Charette, who has more than 35 years of experience in a wide variety of international technology and management positions, is recognized as an international authority and pioneer regarding IS, IT, and telecommunications risk management. He is also the editor of IEEE Spectrum's Risk Factor blog.
Photo: Mike Segar/Reuters