Изменить стиль страницы

This forced me to go 100 percent to a regimen of unit testing. So I proceeded to construct thorough unit tests for each of my subprocedures. And this turned out to be a very good discipline.

This has influenced the design of Fortress to try to include features that would encourage the construction of unit tests. And recording them alongside the program text, rather than in separate files. To that extent we borrowed some ideas of say, Eiffel’s Design by Contract and similar things where you can put preconditions and postconditions on procedures. There are places where you can declare test data and unit-test procedures, and then a test harness will take care of running those whenever you request.

Seibel: Since you just mentioned Design by Contract, how do you use assertions in your own code?

Steele: I have a tendency to drop in assertions, particularly at the beginnings of procedures and at important points along the way. And when trying to—maybe “prove” is too strong a word—trying to justify to myself the correctness of some code I will often think in terms of an invariant and then prove that the invariant is maintained. I think that’s a fruitful way to think about it.

Seibel: How about stepping through code in a debugger? Is that something you’ll do if all else fails?

Steele: It depends on the length of the program. And of course you can have tools that will help you to skip sections you don’t need to step through because you’re confident that those parts are OK. And of course Common Lisp has this very nice STEP function, which is very helpful. I’ve stepped through a lot of Common Lisp code. The ability to skip over particular subroutines whose details you trust, of course, buys you a lot. Also the ability to set traps and say, “I really don’t need to look at this until this particular loop has gone around for the seventeenth time.” And there were hardware tools to support that on the PDP-10, which was nice, at least at MIT. They tended to modify their machines in those days, to add features. And there’s a lot to be said for watching the actual execution of code in various ways.

Seibel: Do you ever try to formally prove your code correct?

Steele: Well, it depends on the code. If I’m writing code with some kind of tricky mathematical invariant I will go for a proof. I wouldn’t dream of writing a sorting routine without constructing some kind of invariant and proving it.

Seibel: Peter van der Linden in his book Expert C Programming has a sort of dismissive chapter about proofs in which he shows a proof of something, but then, ha ha, this proof has a bug in it.

Steele: Yes, indeed. Proofs can also have bugs.

Seibel: Are they at least less likely to have a bug than the code which they are proving?

Steele: I think so, because you come at it in a different way and you use different tools. You use proofs for the same reason you use data types, or for the same reason that mountain climbers use ropes. If all is well, you don’t need them. But they increase the chance of catching it if something does go wrong.

Seibel: I suppose the really bad case would be if you had a bug in your program and then a compensating bug in your proof. Hopefully that would be rare.

Steele: That can happen. I’m not even sure it’s necessarily rare, because you tend to construct the proof to match the structure of the code. Or conversely, if you’ve got a proof in mind as you’re writing the code, that tends to guide your construction of the program. So you really can’t say the code and the proof are totally independent, in the probabilistic sense. But you can bring different tools and different modes of thought to bear.

In particular, the details of the programming tend to take a local point of view and the invariants tend to focus on the global point of view. It’s in doing the proof that you cause those two things to interact. You look at how the local steps of the program affect the global invariant you’re trying to maintain.

One of the most interesting exercises in my career was the time I was asked to review a paper for CACM by David Gries on proving a garbage collector algorithm correct, a parallel garbage collector. Susan Owicki was a student of Gries’s and she developed some tools for proving parallel programs correct, and he decided to apply these techniques to a version of a parallel garbage collector that had been developed by Dijkstra. The whole piece of code fit on, I think, a half a page. And then the entire rest of the paper was the proof of correctness.

I cranked through the proof and tried to verify every step for myself. And what makes it tricky is, in effect, every statement in the program has the potential to violate any invariant because it’s a parallel program. So the Owicki technique involves cross-checking these at all points. And it took me about 25 hours to go through, and in the process I found a couple of steps I couldn’t push through. So I reported this and it turned out they did represent bugs in the algorithm.

Seibel: So they were bugs in the algorithm which the proof missed since the result of the proof was, Q.E.D. this thing works.

Steele: Yes, the proof presented was a faulty proof. Because something had been overlooked somewhere. It was some detail of formula manipulation—the formula was almost right but not quite. And I think it was a matter of correcting the order—the sequence of two statements or something.

Seibel: So it took you 25 hours to analyze the proof. Could you have found the bug in the code in just 25 hours if you just had the code?

Steele: I doubt I would have even realized there was a bug. The algorithm was sufficiently intricate that I would probably have stared at the code and said, “Yeah, this makes sense to me” and not have spotted this very obscure interaction. It was a multistep sequence that was necessary—a highly unlikely interaction.

Seibel: And so that kind of interaction basically gets abstracted by the process of making the proof so you don’t have to come up with this scenario of what if this happens and then this and then this to realize that there’s a problem.

Steele: Exactly. In effect the proof takes the global point of view and covers all the possibilities, summarizing it in a very complicated formula. And you have to do formula crunching to push it through. So the author resubmitted the paper and it came back for rereviewing and even though I had done the entire exercise, it took me another 25 hours to reverify the proof. This time it all seemed to be sound.

I reported that and the paper was published and nobody’s found a bug in it since. Is it actually bug-free? I don’t know. But I think having gone through the exercise of the proof gives me a lot more confidence that the algorithm is now sound. And I’m hoping that I wasn’t the only reviewer who actually did the complete cranking through of the proof.

Seibel: There’s a Dijkstra quote about how you can’t prove by testing that a program is bug-free, you can only prove that you failed to find any bugs with your tests. But it sort of sounds the same way with a proof—you can’t prove a program is bug-free with a proof—you can only prove that, as far as you understand your own proof, it hasn’t turned up any bugs.

Steele: That’s true. Which is why there is a subspecialty in the discipline having to do with mechanical proof verification. And the hope is that you then reduce the problem to proving that the proof verifier is correct. Which is—if you can write a small enough verifier—actually a much more tractable problem that verifying the proof of any rather large program.

Seibel: And then the manually proved mechanical verifier would do the 25 hours of work you did of grinding out a verification of a specific proof of some other piece of code?