‘Why Software Fails’ is a lot like why simulations fail

I came across an article in Dr. Dobbs Journal entitled “Why Software Really Fails And What to Do About It“, by Chuck Connell (March 11, 2010). I think the most succinct answer to the provocative title question appears on page 3:

We fool ourselves about how well we understand the complex new software machines we are trying to build.

Wow. It hit me that this is the same sentiment I run into when I discuss building simulations with certain types of programmers–those inexperienced with creating simulations, but with egos that believe they can do anything just because they have the ability to program. I will call these people ‘naive simulation programmers.’ I don’t mean they are necessarily naive or ignorant about programming or deficient in knowledge of programming languages, rather, they are naive about the true nature of simulation. To be honest, I used to fit in the latter category until I had a revelation during my PhD work, thanks to my advisors.

The bottom line is that simulations express a model of some phenomenon (usually from the real world, like a device or a process), but ALL models intrinsically have assumptions. Imagine, if a simulation/model were made to be 100% accurate with the real-world manifestation, it could only be possible if the simulation WAS the manifestation itself–otherwise there would be some difference, albeit possibly small, which made it different. This is not simply anal retentive arguing. Once you internalize this fact and step back a bit, you realize that our goal with a model is to decide which are the important details of the real manifestation we need to replicate, and which we don’t have to, or which details we can simplify without losing our expected validity of the details we deemed important.

Getting back to the connection with the quotation–I believe that the naive simulation programmers value the objective of reproducing fidelity to the real system, almost like a macho benchmark, over reasoning about the objective of the overall project and designing a model with the right level of detail to meet that objective. That machoism gets people in trouble–more time, more resources, more cost, etc. than necessary, and sometimes the extra burden put on designing to an arbitrary but unnecessary fidelity can even sink the whole project.

For example, I was brought in to oversee a project to convert a physical computer with an 8-bit microprocessor (8088) into a simulated computer with a simulated 8-bit microprocessor (should we call this a ‘mocku-processor’? wow, it’s late), to enable students to perform calculations and to troubleshoot the computer. Rather than examine the types of tasks the students were asked to perform, the lead naive simulation programmer started off saying “we’re going to need an 8088 emulator for this.” This led to heated discussions about whether such detail was going to be necessary, and we really ought to look at what tasks students need to perform because there’s a good chance we don’t have to simulate all the functionality (and absorb the cost and integration time) of some fanciful ’emulator’ that is supposed to exist somewhere (because of course, even that emulator had assumptions that might not hold true for our tasks!). I certainly understood what he was thinking–solving the problem ‘in general’ was going to give us coverage for the wide range of tasks needed. However, he was going to make this proclamation before bothering to understand the level of fidelity and modeling required for the tasks at hand. In the end, because he was the team leader and had the ultimate decision, he took his team through a completely wasted pursuit of the generality illusion before settling back to a more simplified solution, based on the specific needed tasks, that actually could solve the problem at hand.

In brief, naive simulation programmers gravitate to the pursuit of grandiose models that are supposed to simulate behavior comprehensively, under the pretext that it is going to be able to address the current problem and beyond, without sufficiently understanding what assumptions are perfectly reasonably to be made. In a form similar to the quote from Chuck, I would say:

We fool ourselves about how complex we have to make simulations to be effective.

For example, our CommandSim platform for training emergency responder officers uses photographs, videos, and animations to present realistic incidents to the trainees. ‘Serious games’ enthusiasts take a look and argue that a 3D immersive environment would produce better training, by virtue of the fact that some of our competitors produce 3D environments for emergency responder training. However, we have demonstrated that the performance improvement with CommandSim, for its specific purposes–which happen to be the same purposes the competitors are aiming their products–has been seen to be comparable to live exercises. That begs the question, “what exactly is the additional contribution of the 3D immersiveness?” Certainly not in simplified technology, nor ease of distribution. By carefully understanding what problems you are seeking to solve, you can come up with solutions designed at the ‘right’ level of detail, not just going full bore with all the detail possible.

Improving the Success Rate

At the end of the article, Chuck lays out good arguments for ways in which we can improve the success rate of software, which I think parallel the ways we can improve the success rate of simulations (for their specific purposes). Here are two of them:

“Stop fooling ourselves about how much we know and how clever we are”. We must design tightly to the problems and objectives, otherwise risk the project spinning out of control.
“Incremental improvement to existing systems is good”. If we go into a project thinking we’re going to simulate everything, it is very hard to know exactly where to stop and what is overkill. Sometimes adding more complexity increases the problems associated with using it for basic tasks. Critics may say that things become hodge-podge if we design specifically to the tasks presented, and the system will not be flexible enough to handle future extensions. I would argue that a good overall design of process modeling, like using statecharts, will go a long way in providing the foundation for extensibility of the model. We might find that one day the model’s assumptions are flawed for new tasks, but a good systematic process for developing simulations can help us redesign at that later stage, when we know so much more about the problems we need to face. Nothing teaches us more than actual experience!

Improving the Success Rate

Leave a Reply Cancel reply