Seibel: So do you ever write a program that you know is correct but somehow falls outside the bounds of the type checker?
Peyton Jones: This comes up when you’re doing generic programming, where you want to write functions that will take data of any type and just walk over it and serialize it, say. So that’s a time when types can be a bit awkward and an untyped language is particularly straightforward. It couldn’t be easier to write a serializer in an untyped language.
Now there’s a small cottage industry of people describing clever typed ways of writing generic programs. I think such things are fascinating. But it’s somehow just isn’t as simple as writing it in a dynamically typed language. I’m trying to persuade John Hughes to write a paper for the Journal of Functional Programming on why static typing is bad. Because I think it would be very interesting to have a paper from John, who’s a mainstream, strongly typed, very sophisticated functional programmer, who is now doing a lot of work in untyped Erlang, saying why static types are bad. I think he would write a very reflective and interesting paper. I don’t know quite where we’ll end up.
I think I would still say, “Where static typing fits, do it every time because it has just fantastic maintenance benefits.” It helps you think about your program; it helps you write it, all that kind of stuff. But the fact that we keep generating more and more sophisticated type systems is an indication that we’re trying to keep pushing the boundary out to say more in the world—to cover more programs. So the story hasn’t finished yet.
The dependently typed programming people would say, “Ultimately the type system should be able to express absolutely anything.” But types are funny things—types are like a very compact specification language. They say something about the function but not so much that you can’t fit it in your head at one time. So an important thing about a type is it’s kind of crisp. If it goes on for two pages, then it stops conveying all the information it should.
I think the direction I’d like to see things go is to have still crisp and compact types which are a little bit weak, precisely so that they can be crisp, along with invariants, perhaps stated in a rather richer language than the inferable type system, but which are still amenable to static checking. Something I’m working on in another project is to try to do static verification for pre- and post-conditions and data-type invariants.
Seibel: Similar to Design by Contract in Eiffel?
Peyton Jones: That’s right. You’d like to be able to write a contract for a function like, “You give me arguments that are bigger than zero and I’ll give you a result that is smaller than zero.”
Seibel: How do you go about designing software?
Peyton Jones: I suppose I would say that usually the dominant problem when I’m thinking about writing a program—thinking about writing some new piece of GHC—is not how to get the idea into code. But it’s rather, what is the idea?
To take an example, at the moment we’re in mid-flight for moving GHC’s back end, the code generation part, to refactor it in a new way. At the moment there’s a step in the compiler that takes essentially a functional language and translates it into C--, which is an imperative language. And that’s a pretty big step. It’s called C-- because it’s like a subset of C. But it’s really meant to be a portable assembly language. And it’s not printed out in ASCII—it’s just an internal data type. So this step in the compiler is a function from a data structure representing a functional program to a data structure representing an imperative program. How do you make that step?
Well, I have a pretty complicated bit of code that does that at the moment. But a couple of days ago I realized that it could be separated into two parts: first transform it into a dialect of C--, which allows procedure calls—inside a procedure, you can call a procedure. Then translate that into a sub-language that has no calls—only has tail calls.
Then the name of the game is figuring out, just what is the data type? This C-- stuff, what is it? It’s a data structure representing an imperative program. And as you make the second step, you walk over the program, looking at each bit, one at a time. So your focus of attention moves down the control flow, or perhaps back up through the control flow. A good data structure for representing that is called a “zipper”—which is a very useful purely functional data structure for moving the focus around a purely functional data structure.
Norman Ramsey at Harvard found a way to use this for walking around data structures that represent imperative control flow graphs. So he and I and John Dias then spent a while reengineering GHC’s back end to adopt essentially this factored technology. And in doing so making it much more general so we can now use this same back end as a back end for other languages.
A lot of our discussion was essentially at the type level. Norman would say, “Here’s the API,”—by giving a type signature—and I would say, “That looks way complicated, why is it like that?” And he’d explain why and I’d say, “Couldn’t it be simpler this way.” So we spent a lot of time to’ing and fro’ing at the level of describing types.
But a lot of the time it wasn’t really about programming as such—it was about, what is the idea? What are we trying to do with this dataflow analysis, anyway? You try to say in a clear way what this step of the program is meant to do. So we spent quite a lot of time just being clear on what the inputs and the outputs are and working on the data types of the inputs and the outputs. Just getting the data types right, already you’ve said quite a lot about what your program does. A surprisingly large amount, in fact.
Seibel: How does thinking about the types relate to actually sitting down and coding? Once you sketch out the types can you sit down and write the code? Or does the act of writing the code feed back into your understanding of the types?
Peyton Jones: Oh, more the latter, yes. I’ll start writing type signatures into a file right away. Actually I’ll probably start writing some code that manipulates values of those types. Then I’ll go back and change the data types. It’s not a two-stage process where we say, “Now I’ve done the types, I can write the code.”
If anything I’m a bit ill-disciplined about this. This comes from not working as part of a large team. You can do things when you’re working on code that one person can still get their head around that you probably couldn’t in a much bigger team.
Seibel: You mentioned that in this latest code upheaval in GHC, things got much more general. GHC is a big program that’s evolved over time so you’ve had the chance to benefit from generality and the chance to pay the cost of over-generality. Have you learned anything about how to strike the balance between over- and under- generalization?
Peyton Jones: I think my default is not to write something very general to begin with. So I try to make my programs as beautiful as I can but not necessarily as general as I can. There’s a difference. I try to write code that will do the task at hand in a way that’s as clear and perspicuous as I can make it. Only when I’ve found myself writing essentially the same code more than once, then I’ll think, “Oh, let’s just do it once, passing some extra arguments to parameterize it over the bits that are different between the two.”
Seibel: What is your actual programming environment? What tools do you use?
Peyton Jones: Oh, terribly primitive. I just sit there with Emacs and compile with GHC. That’s just about it. There are profiling tools that come with our compiler and people often use those for profiling Haskell programs. And we do that for profiling the compiler itself. GHC dumps a lot of intermediate output so I can see what’s going on.