Изменить стиль страницы

You need to know your language well. Read the reference. Read the books that tell you both the mechanics of language and the whole enterprise of debugging and testing: Code Complete or some equivalent of that. But I think there are a lot of different paths. I don’t want to say you have to read one set of books.

Seibel: Though your job now doesn’t entail a lot of programming you still write programs for the essays on your web site. When you’re writing these little programs, how do you approach it?

Norvig: I think one of the most important things is being able to keep everything in your head at once. If you can do that you have a much better chance of being successful. That makes a small program easier. For a bigger program, you need extra tools to be able to handle that.

It’s also important to know what you’re doing. When I wrote my Sudoku solver, some bloggers commented on that. They said, “Look at the contrast—here’s Norvig’s Sudoku thing and then there’s this other guy, whose name I’ve forgotten, one of these test-driven design gurus. He starts off and he says, “Well, I’m going to do Sudoku and I’m going to have this class and first thing I’m going to do is write a bunch of tests.” But then he never got anywhere. He had five different blog posts and in each one he wrote a little bit more and wrote lots of tests but he never got anything working because he didn’t know how to solve the problem.

I actually knew—from AI—that, well, there’s this field of constraint propagation—I know how that works. There’s this field of recursive search—I know how that works. And I could see, right from the start, you put these two together, and you could solve this Sudoku thing. He didn’t know that so he was sort of blundering in the dark even though all his code “worked” because he had all these test cases.

Then bloggers were arguing back and forth about what this means. I don’t think it means much of anything—I think test-driven design is great. I do that a lot more than I used to do. But you can test all you want and if you don’t know how to approach the problem, you’re not going to get a solution.

Seibel: So then the question is, how should he have known that? Should he have gone and gotten a PhD and specialized in artificial intelligence? You can’t know every algorithm. These days you have Google, but finding the right approach to a problem is a little different than finding a web framework.

Norvig: How do you know what you don’t know?

Seibel: Exactly.

Norvig: So I guess it’s two parts. One is to recognize that maybe there is a known solution to this. You could say, “Well, nobody could possibly know how to do this, so just exploring randomly is as good as everything else.” That’s one possibility. The other possibility is, “Well, probably somebody does know how to do this. I just don’t know what the words are for it, so I have to discover those.” I guess that’s partly just intuition and saying, “It seems like the kind of thing that should be in the body of knowledge from AI.” And then you have to figure out, how do I find it? And probably he could’ve done a search on Sudoku and found it that way. Maybe he thought that was cheating. I don’t know.

Seibel: So let’s say that is cheating—say you were the first person ever to try and solve Sudoku. The techniques that you ended up using would still have been out there waiting to be applied.

Norvig: Let’s say I wanted to solve some problem in biology. I wouldn’t know what the best algorithms were for doing gene sequencing or whatever. But I’d have a pretty good idea that there were such algorithms. Then I could start looking around. At another level, some of these things are pretty fundamental—if you don’t know what dynamic programming is, then you’re at a severe disadvantage. It’s going to come up time and time again. If you don’t know this idea of search in general—that you can make a choice and backtrack when you don’t need it. These are all ideas from the ’60s. It was only a few years into programming that people discovered these things. It seems like that’s the type of thing that everyone should know. Some things that were discovered last year, not everybody should know.

Seibel: So should programmers go back and read all the old papers?

Norvig: No, because there are lots of false starts and lots of mergers where two different fields develop completely different technology and terminology, and then they discover they were really doing the same thing. I think you’d rather have a story from the modern point of view rather than have to follow all the steps. But you should have them. I don’t know what the best books are for that since I picked it up the hard way, piecemeal.

Seibel: So back to designing software. What about when you’re working on bigger programs, where you’re not going to be able to just remember how all the code fits together? Then how do you design it?

Norvig: I think you want to have good documentation at the level of overall system design. What’s the thing supposed to do and how’s it going to do it? Documentation for every method is usually more tedious than it needs to be. Most of the time it just duplicates what you could read from the name of the function and the parameters. But the overall design of what’s going to do what, that’s really important to lay out first. It’s got to be something that everybody understands and it’s also got to be the right choice. One of the most important things for having a successful project is having people that have enough experience that they build the right thing. And barring that, if it’s something that you haven’t built before, that you don’t know how to do, then the next best thing you can do is to be flexible enough that if you build the wrong thing you can adjust.

Seibel: How much do you think you can sit down and figure out how something ought to work, assuming it’s not something that you’ve built before? Do you need to start writing code in order to really understand what the problem is?

Norvig: One way to think about it is going backwards. You want to get to an end state where you have something that’s good and for some problems there’s roughly one thing that’s good. For other problems there are roughly millions and you can go in lots of different directions and they’d all be roughly the same. So I think it’s different depending on which type of those types of problems you have.

Then you want to think about what are the difficult choices vs. what are the easy ones. What’s going to come back to really screw you if you make the wrong architectural choice—if you hit built-in limitations or if you’re just building the wrong thing. At Google I think we run up against all these types of problems. There’s constantly a scaling problem. If you look at where we are today and say, we’ll build something that can handle ten times more than that, in a couple years you’ll have exceeded that and you have to throw it out and start all over again. But you want to at least make the right choice for the operating conditions that you’ve chosen—you’ll work for a billion up to ten billion web pages or something. So what does that mean in terms of how you distribute it over multiple machines? What kind of traffic are you going to have going back and forth? You have to have a convincing story at that level. Some of that you can do with calculations on the back of the envelope, some of that you can do with simulations, and some of that you have to predict the future.

Seibel: It seems for that kind of question you’ll be far more likely to answer correctly with either back-of-the-envelope calculations or simulation than writing code.

Norvig: Yeah, I think that’s right. Those are the kind of things where the calculations are probably a better approach. And then there are these issues of some vendor says they’re going to have a switch coming out next year that will handle ten times as much traffic; do you design to that? Do you believe them? Or do you design to what you have today? There are a lot of trade-offs there.