Ingalls: I believe that. There are people who operate completely differently. But for a given person I think that’s just how they have to operate. I know I’ve wasted some time one way or another. But there’s also this side of it, and it’s sort of the archetypal aspect of exploratory programming, which is if it gets you more quickly to an environment that you can learn from, you may find out that some of your original goals don’t even matter. What’s much more important is that other thing over there. And that becomes a whole new focus.
Coming back to this need to reflect and get things right, there have been a couple of times when I’ve done that. The example that comes to my mind is BitBlt. When I decided to do the thing that became BitBlt, it had this challenge to it that I had to sit and noodle for a night or two. Which is, how are you going to efficiently move all these bits on bit boundaries across word boundaries? That was a case where there weren’t any alternatives out in the world for me to work with. And so I thought about that and thought about that and came up with a simple model. It wasn’t somebody else’s spec but I had looked at all the places we were doing line drawing and text display and scrolling so I had a spec in my mind for what it needed to do.
Seibel: Maybe you can explain the basic problem that BitBlt was designed to solve.
Ingalls: The disconnect between wanting to think about the display as just a 1000x1000-pixel screen and the fact that the memory is organized word by word. If you want to pick up these four bits and put them down there, they may be in a different part of the word where you put them down. In fact they might straddle two words. If, on the screen, you’re trying to move this thing to there, it may be that you’re going to have to pick up pieces from two separate words here and lay them down there. And when you lay them down you have to store an entire word. So you’re going to have to insert that into what was there before and mask around it. So it’s a mess.
Then there’s the screen raster—the line-by-line aspect of the screen that gives you two dimensions. BitBlt handles the possibility that the source and destination may have differing numbers of words per scan line.
That was a challenge where there was a clear spec for what needed to happen, and it was also one of these things where you tried to have a very general kernel, because if you do this right, it will not only give you moving things from one part to another, but it will allow you to do overlapping scrolls. And it will also allow you to blend pixels. There’s all this opportunity for generalization.
I tested it and made sure it ran first in Smalltalk, then in assembly, then put it into microcode for the Alto. When finally done, we could do these operations at essentially the full speed of the memory without any delay due to all the yucky masking and shifting because that could all be hidden under the memory-cycle time.
Having microprogrammed computers around was a wonderful motivation because it was clear that if there was a little kernel that would do what was needed, then that could be put in microcode and the whole thing would run fast. So I was always motivated to do that.
The thing I came up with for all of this, it actually came to me as an image rather than anything else, which is, it’s like a wheel. If you think of the source and the destination and word boundaries, it’s like there’s a wheel picking up whole words here and then dropping them off here, and there would only be one shift required—that was the picture that came to me. Then it was just a matter of putting that into code.
So at the center of the BitBlt operation there’s essentially one long shifter, which is picking up words from the source and dropping them in the destination. That was the thing I had to sit down and think about. But once you had that you could do storing of constants this way, you could do the laying down of text, the extraction of glyphs from a font, and putting them down at any pixel location.
Seibel: Back to the BASIC implementation of Smalltalk: that was sort of the primordial Smalltalk, before even Smalltalk-72?
Ingalls: Right. The minute that worked I set off and did this whole assembly-language version—because that’s what I had on the Nova—that was fairly complete. So we used that to debug a bunch of stuff and then, in parallel with that, the Alto was being built. As soon as it was available, we moved over and started running on the Alto. That became Smalltalk-72.
Seibel: So Smalltalk-72 was written in assembler—where along the line did it become self-hosting? You often hear that one of the great things about Smalltalk was that so much of it was implemented in itself.
Ingalls: That was a long time later. Smalltalk-72 had a big pile of assembly code with it. And Smalltalk-76 did, too. The big difference from Smalltalk-72 to Smalltalk-76 was that I came up with the byte-code engine for Smalltalk that had the keyword syntax and it was compilable. Also that classes and even stack frames were real objects, to your point about self-description.
Seibel: Where did you get the idea to write a byte-code interpreter?
Ingalls: That was a mechanism. The big thing that I was grappling with was that Smalltalk-72 parsed on the fly and for two reasons, at least, we needed to be able to compile something that had those kinds of semantics but that didn’t require parsing everything on the fly.
So I came up with the Smalltalk-76 syntax, which is pretty much the Smalltalk-80 syntax. Then the question is, what do you compile that into that will run effectively this way? The only place where it got complicated was in doing what we called remote evaluation—variables you declare up here but they get evaluated down here. This is what ended up as blocks in Smalltalk, which are like closures in other systems.
Seibel: Why not just compile to machine code?
Ingalls: We were still very space-conscious and this stuff wound up incredibly compact compared to anything else around. And it needed to be that compact because we were still trying to run this on Altos that had 96K. Then they came out with the big one, which was 128K. The compactness was important.
Seibel: Meaning the generated code would be smaller because the byte codes were richer than native machine instructions?
Ingalls: Yeah. I also just plain loved the idea and was inspired by Peter Deutsch’s work on the byte-code engine for Lisp. I was further inspired by this synergy—it’s another one of these kernels that could fit in microcode. From the beginning I envisioned it as going into the microcode of the Alto.
Seibel: And the microcode was RAM so you could put the Smalltalk kernel in there and then switch to Lisp and put the Lisp byte-code interpreter in there.
Ingalls: Yup.
Seibel: Then what was the next evolution?
Ingalls: Smalltalk-76 inherited all the same sort of graphics baggage—a lot of special code for line drawing, text display, and so on. But at that time I had done BitBlt, so I rewrote the kernel so all the graphics just used BitBlt and Smalltalk, so that made the kernel much smaller. That was Smalltalk-78, which was the first one we ran on a microprocessor—on an 8086.
But that still wasn’t Smalltalk in Smalltalk. The Smalltalk in Smalltalk wasn’t until Squeak. Smalltalk-80 had a virtual machine spec that was published in the book, but all the implementations were in C or assembly code.
Seibel: What about the compiler?
Ingalls: The compiler was written in Smalltalk. Actually, when we were doing the Smalltalk-80 books, Dave Robson and I—it was mainly his work—wrote a Smalltalk emulation of the byte-code interpreter. As part of the Smalltalk-80 release we wanted to help people build their own virtual machines. We had discovered that one of the most useful aids was a trace of exactly what byte-codes get executed in what order when you first start up the system.