Friday, August 8, 2008

A SpaceX debrief (or, Hubris vs. Conviction)

Leadership is a good thing. And good leadership requires conviction. But conviction must be moderated by humility and thought.

Picture 1.jpgThis is all about rockets. Exploding rockets, to be precise. Rockets that collide with themselves and then proceed to fall apart, to be progressively more precise. Rockets that do so because of design errors that people on Internet chat sites are able to spot before the engineers who designed the rocket are. Granted, these are chat sites populated by rocket engineering professionals, but still!

The reactions form SpaceX and charismatic founder Elon Munsk were predictably glib. The problem was identified, verified, and easily corrected. If they had a rocket ready to go, they'd shoot it after tweaking a bit of software. Reliability has always been priority one, they have a better way of doing things, and everything will work out just fine. I like the idea of SpaceX. I like the idea of a dot-com punk showing up the established players. I like the idea of cheap, reliable access to space. I like that there have been a few glimmers of admission that this whole rockets-into-space business is harder than they expected.

But I don't like that every time we hear a new release, it is a reset back to the same old eager confidence, the Russel Crowe dialog (apparently South Africans and Australians swear with the same alacrity), the cheerful admission that wow we learned some lessons but now it is just an easy fix and everything will go great. It is always an easy fix -- it has been an easy fix for three unsuccessful launches and a host of less dramatic problems and failures.

To assert that a series of avoidable design errors is nothing more than a series of design errors -- bad luck or inexperience in effect -- is to ignore the fact that these errors could (and many would say should) have been caught at the outset. SpaceX is reinventing the wheel. Which means that there are a large number of photographs and public domain documents about wheel invention and construction out there. There are wheel engineers that you can hire away from folks who have been building wheels "the old way" for years.

But Elon chose to go it his own way, convinced that he had a better approach and that the paradigm could be changed. PayPal had, after all, been part of that great paradigm changing revolution: Ebay, Amazon, latecomers like iTunes and Wikipedia. I was there too, and I remember the culture (I'm banking that PayPal wasn't too divergent from Amazon). It took a certain degree of brass-balled self confidence to do what (1) no-one had ever done before (2) many were telling you couldn't be done (3) everyone else was trying to do before you could (4) was going to cost more money than you had (5) had the potential to be second-guessed by everyone. Bezos had the balls and the business plan (and, I might add, the personality to serve as our Charismatic Leader through a lot of tough and doubtful times -- an experience that played into more than a little of my president-as-personality theory as talked about in The Obama Post). Elon did for two successful business.

But physics is a harsh mistress. It won't reload a buggy web page, it can't be bought off with a refund and gift certificate, it doesn't stick with you just because you are better than the alternatives. If physics says you are wrong, your couplings corrode, your tanks buckle, your motors burn under thrust, your helium tanks underfill (more about that later), your GN&C algorithms don't converge, your stages collide, and your rockets fail.

Back in the day, when we launched, we pushed a key on a keyboard (footnote and company history moment: often this was actually done by the paw of an adorable pet Corgi) that flipped some symlinks and pow, the new feature/product/store was launched. Later, when it turned out that every single customer was told that John Grisham's The Street Lawyer was the absolutely perfect match for their buying habits, we realized what the testing flaw was, tweaked a few constants, and rolled out an update.

It was a software problem and we fixed it in software. We thought, almost every time, that the software was as perfect as we could make it. Whether we admitted it or not, there were going to be bugs when the launch occurred. There always were. But we knew, at a visceral level, that we could fix them through a quick roll back (more than once), raw human effort (most of the time), or a quick fix (pretty damn often). We always thought that we'd done enough testing. But knew that it wouldn't be enough.

Picture 2.jpgYou can't upload software to a rocket in flight and correct the mixture ratio of your main engine. Not in the 2:55 long first stage burn. You have to get it right, straight away. But the software mindset knows that you can debug. You have to debug. You code and test and code and test and launch and code and test and code and relaunch. The rocket mindset codes and tests and codes and tests and codes and tests...again and again and again and again. True mission critical software design involves parallel development teams (working in isolation to prevent communication and the formation of similar assumptions). True mission critical design involves multiple layers of check and recheck. Recheck checkers check the recheckers.

It is against the whole concept of a lot of "new business" models. It smacks far too much of the old Detroit assembly line where mechanic one didn't bother doing his job well because he knew that mechanic two, three, and four would catch and fix his laziness and that if they didn't checker one, two, or three would. And I do have complete contempt for this process -- when it has led to that kind of institutional diffusion of responsibility that means that neither mechanic 1, 2, 3, or 4 or checker 1, 2, or 3 are doing their job.

But when mechanics 1 and 2 cross check each other, shoot a digital snap of their work, and then submit it to checker 1 for sign off, that can be a healthy process. The mere presence of process does not mean unhealthy (overly rigorous, tedious, and stifling) process.

SpaceX is stuck with this "software mindset" of both assuming easy fixability and relying on sloppy planning and rushed decision making. Now all you coders out there, don't go thinking that I believe all software is tossed together willy-nilly with poor planning and launched with an expectation of fixing it in a service pack. No, only Microsoft software is. ;)

In reality, software projects can be well planned, well run, and produce excellent results with little trial-and-error debugging and tight schedules. It takes, however, very well thought out and carefully followed methodologies to ensure this. And it takes a team, from the top down, that buys in to the methodology in use.

Most of the aggressive software design methedologies (or system design, if you care to generalize) are designed to actually restrain the pace of development. Take your pick: scrum (my favorite), agile, RAD, TDD. All are essentially about providing a methodology that ensures or promotes a controlled, managed, organized cycle of communication, test, and development. The very idea is to prevent cowboy coding, to ensure that specifications are rigerously followed, communication channels are clear, and checkpoints observed.

I fear that enthusiasm, "there is a better way" self-conviction, and frankly arrogance (at least on Elon's part) got in the way of this process. Elon enjoys insisting that problems have been design problems and not cultural (and therefore systemic) problems. But he forgets (or hopes we won't realize) that designs are not born in isolation. The are born of a collection of people operating within a given culture. Now I don't want to sound all postmodernist here, but it is possible to examine the product of a design effort and gain an understanding of the culture that produced it.

Now I don't want to say that aggressive and independantly minded design schools are, by nature, out of question in the aerospace arena. By contrast, quite the opposite. Just look at Lockheed's Skunk Works, easily the most storied aeronautical design organization of all time. They were a small group, working on in many cases cutting edge processes, utilizing streamlined design, management, and accounting protocols. This does not mean they were sloppy. This does not mean they were careless. This does not mean they did not verify their work. On the contrary, the awarness of their responsibility (and knowledge that a company test pilot would be flying the thing) permeated the mind of every engineer. Speed, security, and economy are not at odds with rigerous procedure and quality design.

Today this attitude seems to be harder to locate. Some work in the happy playpen of a Google environment, toiling at aggressive projects for impossible hours but rewarded with perposterous perks including absurd amounts of personal and professional flexibility and freedom. Others toil in the rigor of Traditional Business, at a Boeing, reporting as scheduled, performing their duties as ordered, creating when called upon, and then freed to commute home with everyone else slagged in at the Boeing Access Road. But the two are entirely compatible. An elite group (like Elon thinks he has) working in isolation could do it but must be willing to:

(1) Embrace total, personal responsibility for individual and group actions
(2) Communicate and document regularly and clearly
(3) Establish minimum standards for documentation and then follow them rigorously
(4) Establish minimum standards for quality assurance and then follow them rigerously
(5) Admit areas of ignorance and weakness, calling for and accepting support when necessary
(6) Develop and adhere to procedures that ensure quality and safety of all mission critical steps of conceptualization, planning, design, test, and implementation
(6) Adopt in both hearts and minds philosophies and procedures that will ensure all of the above

Picture 3.jpgA very telling moment that I know well is the pre-launch instructions that show up just before the release from final hold at, I believe, T-6:00 during an Atlas V countdown. The Atlas V is one of the most well engineered and well processed of "old school" boosters. Their countdowns are always flawless and their nearly always so (they had a one-time hiccup with a bad valve in an engine that resulted in off-nominal orbit injection). Even though everyone has done it probably a hundred times in rehearsal, the launch director runs through a litany of instructions, drills to cover launch, abort, recycle, and communications during the final seconds. They know it but it is repeated to remind and to ritualize.

I believe that Elon has created a company that fosters communication and innovation. But I believe that communication is likely too informal and the innovation too total. Unwilling to learn from the errors of others, possessed with a "software mindset" that accepts risks and tolerates sloppiness, and altogether too confident of their status as revolutionaries, I fear they will keep making avoidable errors, responding with glib solutions, and charging ahead with shotgun improvements.

This is somewhat tangential, but it is a little story that shows how subtle things can really effect the thinking within an organization. I currently work at a location that my employer dubs the "Field Service Center." Now when I started, I had a hard time understanding this name. "Field Service Center" (or FSC as we always calls it) sounds like the sort of place damaged products should be returned for repair, not a corporate headquarters. But this is a five building office campus. It is the corporate headquarters. The CEO works here -- four floors above me, in fact.

At first I wondered if this was some sort of legacy name. Perhaps these buildings once belonged to a repair station, exactly as the name had originally implied to me. But then I heard the official story. We call ourselves the FSC because it is our job to service (i.e. support) all of the company's field staff: sales, customer service, engineering, etc. It is a small but important issue of attitude. The "front lines" are the folks directly in contact with the customer and are, therefore, the ones directly and immediately effecting the customer experience (and therefore oh-so many of the all important financials). And so all of us at the FSC (from Mr. 9th Floor CEO down to a humble Analyst Three like myself) are there to back these people up. To supply them with information, policies, products, tools, and services that enable them to create the best possible experience for our customers. We are not, in other words, REMF's (or Fobbits, to use the epithet born of a more current war).

This is the kind of subtle mindset change that could help SpaceX, if Elon were willing to accept the situation and to start to make some changes. I'd love to be a fly on the wall for some of their internal meetings, to listen to the gestalt, the debate, and the planning. But since they are run by a secretive, arrogant, self-aggrandizing CEO, that is unlikely to ever happen. I will say this -- he's building and flying rockets. I wish I was building and flying rockets -- and so do a lot of other people who have never made it a fraction of the way there that Elon has. I'll also say that he's the general -- and some of that brash personality may be a construct, a Patton Speech to the troops to keep them going in the face of a third-strike-we're-out moment. But from what I see (and what I've heard) it is not just a show for the troops. Elon is that kind of personality that truly believes in the rightness of his or her actions. When that kind of personality gets it right, they are a rogue, a maverick, and a genius. But when they get it wrong, as SpaceX appears to have done so far, they look like a fool -- or worse.

In the meantime, despite my criticism, I wish them the best. I wish them the time to grow up and the success to keep flying. I wish them the maturity to admit their failures and the strength to make the necessary changes. I'd like nothing more than for Elon to take the punches, learn the lessons, make the changes, and emerge as rogue, a maverick, and a genius.

No comments: