Изменить стиль страницы

Seibel: And what kinds of things do you print out?

Thompson: Whatever I need; whatever is dragging along. Invariants. But mostly I just print while I’m developing it. That’s how I debug it. I don’t write programs from scratch. I take a program and modify it. Even a big program, I’ll say “main, left, right, print, hello.” And well, “hello” isn’t what I wanted out from this program. What’s the first thing I want out, and I’ll write that and debug that part. I’ll run a program 20 times an hour that I’m developing, building up to it.

Seibel: You print out invariants; do you also use asserts that check invariants?

Thompson: Rarely. I convince myself it’s correct and then either comment out the prints or throw them away.

Seibel: So why is it easier for you to print that an invariant is true rather than just using assert to check it automatically?

Thompson: Because when you print you actually see what it is as opposed to it being a particular value, and you print a bunch of stuff that aren’t invariants. It’s just the way that I do it. I’m not proposing it as a paradigm. It’s just what I’ve always done.

Seibel: When we talked about how you design software, you described a bottom-up process. Do you build those bottom-up pieces in isolation?

Thompson: Sometimes I do.

Seibel: And do you write test scaffolds for testing your low-level functions?

Thompson: Yeah, very often I do that. It really depends on the program I’m working on. If the program is a translator from A to B, I’ll have a whole bunch of As lying around and the corresponding Bs. And I’ll regress it by running in all the As and comparing it to all the Bs. A compiler or a translator or a regular-expression search. Something like that. But there are other kinds of programs that aren’t like that. I’ve never been into testing much, and those kinds of programs I’m kind of at a loss. I’ll throw in some checks, but very often they don’t last in the program or around the program because they’re too hard to maintain with the program. Mostly just regression tests.

Seibel: By things that are harder to test, you mean things like device drivers or networking protocols?

Thompson: Well, they’re run all the time when you’re actually running an operating system.

Seibel: So you figure you’ll shake the bugs out that way?

Thompson: Oh, absolutely. I mean, what’s better as a test of an operating system than people beating on it?

Seibel: Another phase of programming is optimization. Some people optimize things from the very beginning. Others like to write it one way and then worry about optimizing it later. What’s your approach?

Thompson: I’ll do it as simply as possible the first time and very often that suffices for all time. To build a very complex algorithm for something that’s never run is just stupid. It’s just a waste of time. It’s a bug generator. And it makes it impossible to maintain because you’ve got to have 50 pages of math to tell the next guy what you’re actually doing.

Ninety-nine percent of the time something simple and brute-force will work fine. If you really are building a tool that is used a lot and it has some sort of minor quadratic behavior sometimes you have to go in and beat on it. But typically not. The simpler the better.

Seibel: Some people just like bumming code down to a jewel-like perfection, for its own sake.

Thompson: Well, I do too, but part of that is sacrificing the algorithm for the code. I mean, typically a complex algorithm requires complex code. And I’d much rather have a simple algorithm and simple code than some big horror. And if there’s one thing that characterizes my code, it’s that it’s simple, choppy, and little. Nothing fancy. Anybody can read it.

Seibel: Are there still tasks which, for performance reasons, people still have to get down to hand-tuned assembly code?

Thompson: It’s rare. It’s extremely rare unless you can really get an order of magnitude and you can’t. If you can really work hard and get some little piece of a big program to run twice as fast, then you could have gotten the whole program to run twice as fast if you had just waited a year or two. If you’re writing a compiler—certainly 99 percent of the code you produce is going to be run once or twice. But some of it’s going to be in an operating system that’s run 24 hours a day. And some of it’s going to be in the inner, inner loop of that operating system. So maybe 0.1 percent of the optimization you put into a compiler here is going to have any effect on your users. But it can have profound effect, so there maybe you want to do it.

Seibel: But that would be a result of generating better code in the compiler rather than writing the compiler itself in assembly.

Thompson: Oh, yes, yes.

Seibel: And presumably part of the reason writing programs directly in assembly is less important these days is because compilers have gotten better.

Thompson: No. I think it’s mostly because the machines have gotten a lot better. Compilers stink. You look at the code coming out of GCC and it’s awful. It’s really not good. And it’s slow; oh, man. I mean, the compiler itself is over 20 passes. It’s just monstrously slow, but computers have gotten 1,000 times faster since GCC came out. So it may seem like it’s getting faster because it’s not getting slower as much as computers are getting faster underneath it.

Seibel: On a somewhat related note, what about garbage collection? With Java, GC has finally made it into the mainstream. As Dennis Ritchie once said, C is actively hostile to garbage collection. Is it good that folks are moving toward garbage-collected languages—is it a technology that deserves to finally be in mainstream use?

Thompson: I don’t know. I’m schizophrenic on the subject. If you’re writing an operating system or a C compiler or something that’s used by lots and lots of people, I think garbage collection is a mistake, almost. It’s a cheat for you where you can do it by hand and do it better—much better. What you’re doing is your sloughing your task, your job, making it slower for your users. So I think it’s a mistake in an operating system. It almost just doesn’t fit in an operating system. But if you are writing a hack program to do a job, get an answer and then throw the program away, it’s beautiful. It takes a layer of stuff you don’t want to think about, at a cost you can afford, because computers are so fast, and it’s nothing but a win-win-win position. So I’m really schizophrenic on this subject.

Part of the problem is there are different garbage-collection algorithms and they have different properties—massively different properties. So you’re writing some really general-purpose thing like an operating system—if you’re going to write it in a language that garbage-collects underneath, you don’t even have the choice of the algorithm for the operating systems. Suppose that you just can’t stand big real-time gaps and you have a garbage collector that runs up to some threshold and then does mark and sweep. You’re screwed before you start.

So if you’re doing some general-purpose task that you don’t know who your real users are, you just can’t do that. Plus, garbage collection fights cache coherency massively. And there’s no garbage-collection algorithm that is right for all machines. There are machines where you can speed it up by a factor of five or more by messing around with the cache. They should be tied to the machine much more than they are. Usually they treat them as separate algorithms that have nothing to do with machines, but the cache coherency is very important for garbage-collection algorithms.

Seibel: Do you think of yourself as a scientist, an engineer, an artist, a craftsman, or something else?