The Carpenter and the Engineer

This is the script of a lightning talk I developed for the Boston Software Craftsmanship group. It sparked some great discussion around craftsmanship, good programmers vs. bad programmers, and computer science education. There were too many ideas to put to paper, really, but I'll reproduce the original talk here and let you draw your own conclusions.

 

So, I think I just had a revelation. Most software developers—myself included—are basically just labor. Skilled labor, yes, but basically similar to construction workers. We always talk about crafting things. Often those things aren’t very difficult to craft, logically or mathematically. “When someone checks this box, disable this field; when the database is full, show a warning message,” and so on. We often talk about “plumbing” or “wiring things through” as if we’re plumbers and electricians. There’s just a lot of parts of programming that aren’t hard, they’re just tedious and they take time.

Now, there are some programmers who are engineers, who do hard math to develop the core algorithms. But the rest of us just polish up those algorithms and build a harness for them, build an interface for people to interact with them. Often times the “houses” these algorithms live in are the builder's own design, so they are often as skilled as an architect, but the majority of their time is spent doing things I would more closely associate with carpentry or woodworking (to borrow an analogy from The Pragmatic Programmer). And anything really complex can only be handled by the engineer.

That's how it is at my company; my team builds the interface for the visualizations created by our statisticians (sometimes we create the whole visualization, and they only write the algorithm that spits out the p-values, but the analogy holds). Maybe at your company it works a bit differently. Maybe you work for a startup, where you wear both hats. But I still think that most developers work in a labor capacity. And this is interesting, because we often like to think of ourselves as pretty intelligent people; we went to school alongside all the other engineering students; we see ourselves as problem solvers; we see ourselves as nerds. But as for myself, at my current stage I would rate myself a laborer — maybe an architect (on a good day).

So there seem to be two distinct domains here. Now how might that change the way we think of ourselves?

I’ll give you an example from an algorithms class I recently took. We had to write a program to process a graph with 875,000 nodes: this involved two very different, very distinct subproblems.

One was coming up with the algorithm itself, to efficiently process the graph and solve the problem. A purely mathematical problem, minimizing time complexity while proving correctness. This part was given to us by the professor.

The other part—my part of the problem, since I had chosen to do all my homework in Python—was how to get Python to handle the damn data. As it turned out, this was a harder problem than I had expected, since Python isn’t particularly performant. It took minutes to run, so you had to come up with some really good test data instead. Not only that, but once you ran it on the full dataset, it would overflow the stack immediately, so you had to either learn how to increase Python’s stack size and recursion limit, or you had to turn it into an iterative algorithm (which in essence just means that you maintain the stack yourself instead of hijacking the program stack).

So this was a completely distinct problem, and is more of an practical problem than a math problem... somehow it’s just a bit “softer.” It doesn’t necessarily need to be solved with equations (maybe we can throw more hardware at it, or switch to C instead; you know, come up with some practical workaround). Whereas the other problem definitely does require equations and proofs and deep analysis.

An aside: I’ve met a lot of programmers who shrug off mathematical problem-solving as being outside their realm, or impractical, or overkill, or premature optimization. They don’t seem to appreciate the fact that the only reason our programs can run with any kind of reasonable limit is because of all the hard work already done for us in the languages that we use, the compilers that build them, the libraries that support them. You can’t just parrot “optimization is evil” and pretend it doesn’t exist! But I digress.

Point is, both these kinds of computer science were necessary to solve the problem of the 875,000-node graph. And both were a big pain in the butt, believe you me. But the algorithm was already done for me. And I bet it took a lot more work than the afternoon I spent on turning it into an iterative implementation. But then again, maybe some of these practical engineering problems really do take just as long to solve well.

Take all the recent obsession over “big data” and scalability for instance. There are some seriously hard problems there. Many of them are operational problems, the realm of sysadmins. Many of Google’s problems involve machine learning, heuristic-based searches, etc. Many of Facebook’s problems involve very efficient graph traversal and storage. But are these problems that are solved by the same programmers who create the user interface for Gmail? I honestly don’t think so. I suspect that these companies have a separation between their scientists on the one hand, and their construction workers on the other.

You might take our meetup for example. What do we discuss here? We discuss best practices. We discuss design, and aesthetics, and agile methodology. We discuss soft things, subjective things. How many programming meetups do you know of that go over novel data structures; or where the attendees solve math problems, or discuss algorithmic complexity. I’ve yet to find a meetup like that. A meetup where I can actually learn hard things.

It begs a lot of questions. Are we really as skilled as we think we are, or is it just that not many other people enjoy doing what we do? Are we in demand because we’re skilled, or because the “construction” business is booming? Should universities be separating these two types of students, and teaching them different things? Should they have bothered to teach me all that linear algebra, if I was just gonna forget it anyway?

I think this also ties into recent efforts to get kids and others interested in programming: the FIRST Lego League, Google App Inventor, Code Academy, and the like. I think I’m not the first person to realize how easy programming really is, and how much demand there is for it, and that maybe if we can just get people to try it they’ll discover how easy it is, too. It might be the first time that a labor job was so plentiful and yet so understaffed, because people associate it with higher-order thought.

And speaking of higher-order thought, I still have more questions. At what point does a problem become complex enough to enter into this “mathematical” category? Are there really two distinct categories of programming problem? Or is it perhaps a linear progression, a gradient running from easy to hard to NP-hard? … I dunno.

What kind of computer scientist are you? Are you the mathematician? Or are you the carpenter? Or perhaps a bit of both? Is one more “elite” than the other? I dunno... but I think at least the mathematician in all of us deserves a little bit more respect.

Fun With Closures: transform your method calls into sentences

You can transform your method calls into sentences by separating parameters into their own functions. Your method calls would be as close as possible to a real sentence if it weren’t for the problem of this darn syntax, where all the inputs need to be listed at the end of the sentence. What if you could mention the inputs anywhere in the sentence? It would be so much like prose.

Take an example from a Javascript implementation of the Game of Life that I recently wrote. My friend Sam and I discovered that, with a little work, this:

can become this:

The hell?! How does that become this?! Here’s how:

Original:

Improved (maybe):

It’s sort of like a unit test generator since you can simply test all conditions by inputting all possible permutations of your inputs. But then again, so is the original function. The only real difference is the syntactic cheat (and probably computationally expensive cheat) that allows me to morph the appearance, but not the functionality, of the test. This, of course, only works in certain programming languages, and is only beneficial if readability outweighs performance. Still, pretty cool, if I may say so myself.

On Naming and Abbreviating

Okay, can those of us involved in science/technology/innovation please agree on something from now on? Can we try to cut down on the acronyms? I think few people will disagree when I say it's getting a little out of hand these days. There's far too many disambiguation pages popping up on Wikipedia. And on top of all that, there's now abbreviations in the business and information technology worlds that collide in so many contexts that their meaning actually can become ambiguous.

For example, if I tell you my company is hiring a consultant to help us with “updating our CRM techniques,” I could mean any of a number of things: we might need help with Crew Resource Management, Customer Resource Management, Customer Relationship Management, Content Rights Management (a.k.a. DRM), Conceptual Reference Models, Credit Risk Management ... the list goes on and on. We've got MSRPs and EULAs for all our IP, and the DMCA suggests that we produce NDAs and TMs for anything we submit to the USPTO. This is getting confusing.

We need to return to the fun way of doing it, the good old fashioned activity of augmenting our language to include the things we create. Don't worry, I was an English major; I know what I’m doing. I'll be the first to encourage an ever-changing stream of colloquialisms. This is just how language works — in fact, it's the best thing about language: it's whatever we need it to be!

When naming, one should also keep SEO (Search Engine Optimization) principles in mind. I'm telling you to make up new words, not just to repurpose old ones! How am I supposed to find information about Google Gears? When I search for it all I get is an endless stream of websites desparate to sell me sprockets, gears, and sometimes “Gears of War.”

Have you ever heard of the programming framework called "Processing?" Go ahead, try searching for it — you can't. Every article on Earth that mentions programming also mentions processing. Sometimes users of the framework even type it as 'Proce55ing' to avoid confusion! Recently it’s gotten so bad that people have taken to calling it ‘Processing.org.’ The authors gave it such a stupid name that the users actually augmented it themselves.

So get creative. Don't call it High-Definition Digital Video Disc or HD-DVD, call it Blu-Ray. Don't call it 'IEEE 802.11b Direct Sequence' or 'IEEE 802.15.1 WPAN', call it Wi-Fi or Bluetooth. Get into it, it's fun! Inventing things gives you the right to invent words; what could be better?
P.S.  We need to cut it out already with the iThis and eThat and iBlandUnoriginalRetailProduct (iBURP ... lovely). Fucking control yourselves already!

P.P.S.  I guess the abbreviation for 'postscript' is acceptable ... I'm not saying acronyms can't help us out sometimes.


If you liked this article, you may also like To Name a Rose

Famous Algorithms and their Running Times

I recently did some searching and had a very hard time finding a simple yet comprehensive listing of the running times for various sorting algorithms. This surprised me, so I decided to make the Internet a little better by writing this post.

This is an ongoing post, and will be updated with more algorithms as time goes on. It is not limited to just array sorting; it could be a graph search problem, a regression, or any other algorithm as long as it's pretty well-known. I welcome your additions in the comments section.

Algorithm Worst-Case Average-Case Best-Case Space Growth
Selection Sort O(n2) O(n2) O(n2) O(1)
Insertion Sort O(n2) O(n2) O(n) O(1)
Bubble Sort O(n2) O(n2) O(n) O(1)
Merge Sort O(n log n) O(n log n) O(n log n) O(n)
Heapsort O(n log n) O(n log n) O(n log n) O(1)
Quicksort O(n2) O(n log n) O(n log n) O(n)

The Price of Public Transit

The Problem

The Massachusetts Bay Transit Authority (MBTA) is $5.2 billion in debt. A couple days ago the MBTA unveiled its plan for survival over the next year. It involves a 23% fare increase and some service cuts. But this isn't near enough; they stress that this is only a one-year fix, and that they'll be in the same position next year, looking at more cuts and fare hikes.

Many people are outraged at the situation, myself included. This problem didn't exist ten years ago. The MBTA's funding structure was fundamentally changed on June 1st, 2000. Before then, the Commonwealth of Massachusetts paid for any deficits the MBTA might incur. Beginning on that date, the MBTA was instead given a fixed percentage of state revenue, and asked to live within this new "forward funding" budget. In addition, the authority was saddled with around $1.8 billion in debt from some renovations that were conducted as part of the Big Dig. This debt has only increased since then, and the MBTA no longer has any money for expansion.[1]

"So surely the T cuts are unfair!" I thought. Surely the problem can be solved by having the state reassume all debt. It's in the state's interest to do so, or else the lack of good quality transit will surely strangle our city!

I recently made my first trip to Europe, and like most first-time American visitors was thoroughly impressed and jealous at their transit systems. As far as I'm aware, transit there is subsidized by the government. As soon as I saw how well it operated, this became an obvious solution to me. Clearly if the government supports road infrastructure, they should support all transportation infrastructure. Like insurance, or the power grid, public transit is one of those things that only work well when everyone buys in at once. It operates as a network. Facebook would not be successful if it weren't for the fact that all your friends already use it. If only some of your friends used it, you might not have joined in the first place. And Facebook would be a weaker service for it. Something clicked. It all made sense; all we had to do was fund the T with taxes!

But after I returned home, I realized something. I actually paid more for a subway ride in Berlin than I did in Boston. Granted, the trains ran much faster and came much more often and ran later, but I paid a lot more for it (about $4 to our $1.70). The European systems also seemed to cover more ground than the Boston system, but then I learned that the MBTA has in fact been the fastest-expanding transit system in the nation for about 20 years. Maybe the new T fare isn't as unfair as I originally thought. So I decided to a bit more investigating.

Some Price Comparisons

I chose to look at Munich, as its size probably makes it comparable to Boston. Our prices are comparable to Munich's, except for single rides. A single ride under the new MBTA prices will cost up to $2.00. People in Munich already pay $3.33 for a single ride. However, the weekly and monthly passes in Munich are exactly the same as ours. This makes sense. We should encourage people to buy into the pass. Currently, for the Boston pass to be worth it, a person has to make at least 5 round-trips per week. Basically, you would only buy the pass if you rode the T every day. This doesn't encourage people to buy into the program, and thereby become reliant on it.

However, Berlin is much more expensive (which is odd, because you'd think a larger city would be able to take advantage of economies of scale). Our 7-day pass is currently less than half the price of Berlin's 7-day pass, $15 to their $36. (The Germans are smart to hike this price, because this is the de facto tourist ticket. No one who lives there would buy a weekly pass. On the other hand, in Basel, all hotel rooms come with a free tram pass for the duration of your stay, which made for a great tourist experience. But Basel is a lot smaller than Boston.) The Berlin monthly pass is also much pricier, and even their special, year-long pass isn't as cheap as 12 monthly MBTA passes. So it would seem that the Europeans pay as much or more than we do for transit, although I would say that their systems are a lot more efficient than ours.

Buying Into It

I think another obstacle is that gas here is too cheap. Gasoline (and diesel) in Europe is at least twice what it is here, and up to 4 times as much, or more. In addition, we're individualists, and we want to be able to come and go as we please, and where we please, and not be beholden to someone else's schedule. Instead, we'd rather be beholden to our insurance companies, oil futures, and the interest on our car loans. And so we don't buy into the system the same way the Europeans do. Seems like we have more freedom, but do we really?

So, it would seem (from my cursory, unscientific examination) the Europeans pay as much or more than we do. But they have a much nicer system, that is faster, more convenient, and covers more area. They get a lot more bang for their buck, because they buy into it. Can the government force people to buy in by subsidizing the T with tax money? I don't know. Cultural habits are not normally broken so easily. Bad public transit has become a part of our culture, in many ways. We love to make fun of the bad transit, to scoff at them for getting everything wrong, to blame them for our daily commute or our stressful job. People will never buy in as long as it's hip to ridicule the transit authority. Our whole attitude to public transit needs to change, or it will never improve. When I think about this, balancing the budget doesn't really scare me anymore. Getting people to respect the T is going to be a lot harder.