Изменить стиль страницы

I should mention two things. First, I did science fairs when I was at Latin School and I made a point of doing projects about computer science. One of the judges one year said, “Have you considered becoming a student member of ACM?” I don’t know his name. But I have been very thankful ever since. That was a good thing for me.

And when I got to Harvard, if I had a spare hour to kill in the morning I would go over to Lamont Library and I would do one of two things: I would read my way backwards through Scientific American and I would read my way forward, from the beginning, in Communications of the ACM. So I was, in particular, trying to pick up all of Martin Gardner’s columns on mathematical games. And I just read whatever interested me out of CACM. In 1972 there was only about 15 years of that journal, so it didn’t take that long to plow through them all.

Seibel: It also must have been easier then than it would today in the sense that the same way you could understand whole systems, one person could hope to understand the whole field.

Steele: Yeah, you could hope to understand the whole field. There were lots of one-page articles. You know: “Here’s a clever new hashing technique.” I read a lot.

Seibel: I often find older papers are hard to get into since they’re tied to the particulars of old hardware or languages.

Steele: Well, necessity is the mother of invention—an idea arises because it’s needed in a particular context. Then a little later it’s recognized that that idea is the important thing. And then you need to strip away the context so the idea can be seen and that takes some years. “Here’s a clever technique for reversing the bits of a word,” and they give something in 7090 assembly language. And there’s an interesting mathematical idea there but they haven’t quite abstracted yet.

Seibel: I guess that’s Knuth’s job, right?

Steele: Knuth and people like him, absolutely.

Seibel: Presumably people who study computer science in school get guided through all that stuff. But there are also a lot of programmers who came into it without formal training, learning on the job. Do you have any advice for how to tackle that problem? Where do you start and how do you get to the point where you can actually read these technical papers and understand them? Should you start at the beginning of the ACM and try to get up to the present?

Steele: Well, first of all, let me say that that exercise of reading through CACM from early on wasn’t my plan to become a great computer scientist by reading everything there was in the literature. I read it because I was interested in stuff and felt internally motivated to tackle that particular set of material. So I guess there are two things: one is having the internal motivation to want to read this stuff because you’re interested or because you think it will improve your skills.

Then there is the problem of how do you find the good stuff? And of course the view of what is the good stuff changes from decade to decade. Stuff that was considered the really good stuff this year may be kind of dated in ten years. But I guess you go to a mentor who’s been through it and say, what do you think was the good stuff? For me the good stuff was Knuth; Aho, Hopcroft, and Ullman. Gerald Weinberg on The Psychology of Computer Programming, which I think is still very readable today. Fred Brooks’s Mythical Man-Month gave me some insights.

In those days I haunted the computer-science book section of the MIT bookstore and just made a point of going through there once a month and browsing through the bookshelves. Of course now you walk into a bookstore and there’s a computer section that’s ten times as big, but most of it is about how to do C or Java this year. But there will be a smaller section of books about the theoretical background, algorithms, that kind of thing.

Seibel: There’s another kind of reading, which I know you think is important—reading code. How do you find your way into a big pile of code you didn’t write?

Steele: If it’s a piece of software that I know how to use and just don’t know how the insides work, I will often pick a particular command or interaction and trace it through.

Seibel: The execution path?

Steele: Yes. So if I were walking up to Emacs, I’d say, “OK, let’s take a look at the code that does ‘forward a character’.” And I won’t completely understand it but at least it’ll introduce me to some data structures it uses and how the buffer is represented. And if I’m lucky I can find a place where it adds one. And then once I’ve understood that, then I’ll try “backwards a character.” “Kill a line.” And work my way up through more and more complicated uses and interactions until I feel that I’ve traced my way through some of the more important parts of the code.

Seibel: And would “tracing” mean looking at the text of the source code and mentally executing it, or would you fire it up in a debugger and step through it?

Steele: I’ve done it both ways—I’ve done it with a stepping debugger mostly on smaller codes back in the ’70s or ’80s. The problem nowadays is from the time a program first fires up until it begins to do anything interesting can already be a long initialization process. So perhaps one is better off trying to find the main command loop or the central control routine and then tracing from there.

Seibel: And once you find that, would you set a break point and then step from there or just do it by mental execution?

Steele: I’d be inclined to do it by desk-checking—by actually reading the code and thinking about what it does. If I really need to understand the whole code then at some point I might sit down and try to read my way all the way through it. But you can’t do that at first until you’ve got some kind of framework in your head about how the thing is organized. Now, if you’re lucky, the programmer actually left some documentation behind or named things well or left things in the right order in the file so you actually can sort of read it through.

Seibel: So what is the right order in the file?

Steele: That’s a very good question. It strikes me that one of the problems of a programming language like Pascal was that because it was designed for a one-pass compiler, the order of the routines in the file tended to be bottom-up because you had to define routines before you use them. As a result, the best way to read a Pascal program was actually backwards because that would give you the top-down view of the program. Now that things are more free-form, you really can’t count on anything other than the programmer’s good taste in trying to lay things out in a way that might be helpful to you. On the third hand, now that we’ve got good IDEs that can help you with cross-referencing, maybe the linear order of the program doesn’t matter so much.

On the fourth hand, one reason I don’t like IDEs quite so much is that they can make it hard to know when you’ve actually seen everything. Walking around in a graph, it’s hard to know you’ve touched all the parts. Whereas if you’ve got some linear order, it’s guaranteed to take you through everything.

Seibel: So when you write code these days would you present more of a top-down organization with the high-level functions before the lower-level functions on which they depend?

Steele: I’d try to present the high-level ideas. The best way to present that might be to show a central command-and-control routine with the things it dispatches to beneath it. Or, it might be that the important thing is to show the data structures first, or the more important data structures. The point is to present the ideas in an order such that they tell a story rather than just being a pile of code thrown together.