Изменить стиль страницы

It came back and it was about as close to a design document as we got except that it had homonyms that you wouldn’t believe. So I went off and implemented this file system, strictly on a PDP-7. At some point I decided that I had to test it. So I wrote some load-generating stuff. But I was having trouble writing programs to drive the file system. You want something interactive.

Seibel: And you just wanted to play around with writing a file system? At that point you weren’t planning to write an OS?

Thompson: No, it was just a file system.

Seibel: So you basically wrote an OS so you’d have a better environment to test your file system.

Thompson: Yes. Halfway through there that I realized it was a real timesharing system. I was writing the shell to drive the file system. And then I was writing a couple other programs that drove the file system. And right about there I said, “All I need is an editor and I’ve got an operating system.”

Seibel: What’s the worst bug you’ve ever had to track down?

Thompson: Basically bugs where memory gets corrupted. It never happens anymore. I don’t know why. But in the early days we were always working with experimental hardware of various sorts, and there’d be some hardware bug.

Seibel: So memory would get corrupted by the hardware screwing up, not by a runaway pointer?

Thompson: It could be pointer. It could be hardware. Or a combination. The one I’m actually thinking of, the worst example, was the on PDP-11. It didn’t have multiply but you could buy a multiply unit and plug it in, but it was an I/O peripheral. You would store a numerator and a denominator and say go. You’d busy-loop and then pull out the answer, the quotient and the remainder. And this thing was built for a non-memory-managed PDP-11 and we got the first experimental hardware for a memory-managed PDP-11 and this multiply unit didn’t fit with the memory management well.

So you’d store into this thing and then you’d busy-test. And during certain aspects of the busy test it would send a physical address down instead of a virtual address and some piece of memory would get clobbered with a numerator of what you were trying to divide by. And it’d be ages before you’d find it, and it’d be different places. That’s by far the hardest one I’d ever had to find.

Seibel: How did you track it down?

Thompson: There was a program that I wrote that was going after a world record for the number of digits of e. Previous world records were limited not by computation—by cycles per second—but by I/O. I came up with a new algorithm where it was computation-bound and I/O became trivial. It was monstrously heavy on multiply and divide. And we noticed that the machine just crumbled whenever I put on my program. And therefore we got the connection.

Seibel: So that gave you the clue that there was a problem with the multiplier; did you ultimately track it down to some root cause?

Thompson: At some point we got it to where you store in the multiplier in the multiply unit, and you pull it back and it wasn’t there. We reported that to DEC and DEC couldn’t find it, and they didn’t want to deal with it. Their normal people didn’t want to deal with this hybrid machine. In those days you actually got the circuit diagrams of the machines, and we actually found the bug in the circuit diagrams. Then we just called DEC and said, “Connect that wire and that wire.”

Seibel: So, thankfully, hardware mostly doesn’t flake out on us that way these days.

Thompson: Yeah. That’s why I think they’re rare. Plus things are isolated from each other more—if you go bizarrely wild you’ll get a fault. Also you did it in assembly language—it’s really easy to have the wrong thing in some register through some subroutine call. When you have a high-level language where the arguments all have to match up, these things become more and more rare.

In the early days, in assembly language, you’d find them a lot. If it was software, as opposed to a combination of software/hardware, usually it would happen in one spot—the same spot would be corrupted. There’d be some correlation of the bug with something. And you could sit there and put a monitor in the operating system. And every so often, or very often, you’d check and see if the error occurred, and stop as quick as you can, and see what’s going on elsewhere, and chase them down that way. So you could attack them.

This one you couldn’t attack. It wasn’t until I wrote this intensive multiply/divide program that it saw the frequency of the error went way, way up. Instead of crashing once every couple of days you’d crash once every couple of minutes. And then as soon as you got something that would crash the machine you had a fighting chance to find it.

Seibel: So some folks today would say, “Well, certainly assembly has all these opportunities to really corrupt memory through software bugs, but C is also more prone to that than some other languages.” You can get pointers off into la-la land and you can walk past the ends of arrays. You don’t find that at all problematic?

Thompson: No, you get around that with idioms in the language. Some people write fragile code and some people write very structurally sound code, and this is a condition of people. I think in almost any language you can write fragile code. My definition of fragile code is, suppose you want to add a feature—good code, there’s one place where you add that feature and it fits; fragile code, you’ve got to touch ten places.

Seibel: So when there’s a security breach that turns out to be due to a buffer overflow, what do you say to the criticism that C and C++ are partly responsible—that if people would use a language that checked array bounds or had garbage collection, they’d avoid a lot of these kinds of problems?

Thompson: Bugs are bugs. You write code with bugs because you do. If it’s a safe language in the sense of run-time-safe, the operating system crashes instead of doing a buffer overflow in a way that’s exploitable. The ping of death was the IP stack in the operating system. It seems to me that there’d be more pings of death. There wouldn’t be pings of “take over the machine becoming superuser.” There’d be pings of death.

Seibel: But there is a difference between a denial-of-service attack and an exploit where you get root and can then do whatever you want with the box.

Thompson: But there are two ways to get root—one is to overflow a buffer and the other is to talk the program into doing something it shouldn’t do. And most of them are the latter, not overflowing a buffer. You can become root without overflowing any buffers. So your argument’s just not on. All you’ve got to do is talk su into giving you a shell—the paths are all there without any run-time errors.

Seibel: OK. Leaving aside whether it results in a crash or an exploit or whatever else—there is a class of bugs that happen in C, and C++ for the same reason, that wouldn’t happen in, say, Java. So for certain kinds of applications, is the advantage that you get from allowing that class of bugs really worth the pain that it causes?

Thompson: I think that class is actually a minority of the problems. Certainly every time I’ve written one of these non-compare subroutine calls, strcpy and stuff like that, I know that I’m writing a bug. And I somehow take the economic decision of whether the bug is worth the extra arguments. Usually now I routinely write it out. But there’s a semantic problem that if you truncate a string and you use the truncated string are you getting into another problem. The bug is still there—it just hasn’t overflown the buffer.

Seibel: When you’re debugging, what tools do you use?

Thompson: Mostly I just print values. When I’m developing a program I do a tremendous amount of printing. And by the time I take out, or comment out, the prints it really is pretty solid. I rarely have to go back.