This is a reblog of David Green's post on his blog, Actively Lazy. It's an interesting look at the way that the environments that we work in many times dictate how we code, or learn how to.
There’s a natural instinct to assume that everybody else’s code is an untidy, undisciplined mess. But, if we look objectively, some people genuinely are able to write well crafted code. Recently, I’ve come across a different approach to clean code that is unlike the code I’ve spent most of my career working with (and writing).
There’s a common way of writing code, perhaps particularly common in Java, but happens in C#, too – that encourages the developer to write as many classes as possible. In large enterprises this way of building code is endemic. To paraphrase: every problem in an enterprise code base can be solved by the addition of another class, except too many classes.
Why does this way of writing code happen? Is it a reaction to too much complexity? There is a certain logic in writing a small class that can be easily tested in isolation. But when everyone takes this approach, you end up with millions of classes. Trying to figure out how to break up complex systems is hard, but if we don’t and just keep on adding more classes we’re making the problem worse not better.
However, it goes deeper than that. Many people (myself included, until recently) think the best way to write well-crafted, maintainable code is to write lots of small classes. After all, a simple class is easy to explain and is more likely to have a single responsibility. But how do you go about explaining a system consisting of a hundred classes to a new developer? I bet you end up scribbling on a whiteboard with boxes and lines everywhere. I’m sure your design is simple and elegant, but when I have to get my head around 100 classes just to know what the landscape looks like – it’s going to take me a little while to understand it. Scale that up to an enterprise with thousands upon thousands of classes and it gets really complex: your average developer may never understand it.
Perhaps an example would help? Imagine I’m working on some trading middleware. We receive messages representing trades that we need to store and pass on to systems further down the line. Right now we receive these trades in a CSV feed. I start by creating a TradeMessage class.
I’m a good little functional developer so this class is immutable. Now I have two choices: i) I write a big constructor that takes a hundred parameters or ii) I create a builder to bring some sanity to the exercise. I go for option ii).
TradeMessageBuilder onDate(Date timestamp)
TradeMessageBuilder forAmount(BigDecimal amount)
TradeMessageBuilder ofType(TradeType type)
TradeMessageBuilder inAsset(Asset asset)
Now I have a builder from which I can create TradeMessage classes. However, the builder requires the strings to have been parsed into dates, decimals etc. I also need to worry about looking up Assets, since the TradeMessage uses the Asset class, but the incoming message only has the name of the asset.
We now test-drive outside-in like good little GOOS developers. We start from a CSVTradeMessageParser (I’m ignoring the networking or whatever else feeds our parser).
We need to parse a single line of CSV, split it into its component parts from which we’ll build the TradeMessage. Now we have a few things we need to do first:
- Parse the timestamp
- Parse the amount
- Parse the trade type
- Lookup the asset in the database
Now in the most extreme enterprise-y madness, we could write one class for each of those “responsibilities”. However, that’s plainly ridiculous in this case (although add in error handling or some attempts at code reuse and you’d be amazed how quickly all those extra classes start to look like a good idea).
Instead, the only extra concern we really have here is the asset lookup. The date, amount and type parsing I can add to the parser class itself – it’s all about the single responsibility of parsing the message so it makes sense.
TradeMessage parse(String csvline)
Date parseTimestamp(String timestamp)
BigDecimal parseAmount(String amount)
TradeType parseType(String amount)
Now – there’s an issue test driving this class – how do I test all these private methods? I could make them package visible and put my tests in the same package, but that’s nasty. Or I’m forced to test from the public method, mock the builder and verify the correctly parsed values are passed to the builder. This isn’t ideal as I can’t test each parse method in isolation. Suddenly making them separate classes seems like a better idea…
Finally I need to create an AssetRepository:
Asset lookupByName(String name)
The parser uses this and passes the retrieved Asset to the TradeMessageBuilder.
And we’re done! Simple, no? So, if I’ve test driven this with interfaces for my mocked dependencies, how many classes have I had to write?
Oh, and since this is only unit tests, I probably need some end-to-end tests to check the whole shooting match works together:
12 classes! Mmm enterprise-y. This is just a toy example. In the real world, we’d have FactoryFactories and BuilderVisitors to really add to the mess.
Is there another way? Well, let’s consider TradeMessage is an API that I want human beings to use. What are the important things about this API?
That’s really all callers care about – getting values and parsing from CSV. That’s enough for me to use in tests and production code. Here we’ve created a nice, clean, simple API that is dead easy to explain. No need for a whiteboard, or boxes and lines and long protracted explanations.
But what about our parse() method? Hasn’t this become too complex? Afterall it has to decompose the string, parse dates, amounts and trade types. That’s a lot of responsibilities for one method. But how bad does it actually look? Here’s mine, in full:
String parts = csvline.split(
Now of course, by the time you’ve added in some real world complexity and better error handling it’s probably going to be more like 20 lines.
But, let me ask you which would you rather have: 12 tiny classes, or 4 small classes? Is it better for complex operations to be smeared across dozens of classes, or nicely fenced into a single method?
Though David makes some really good points, I feel almost as if the argument is more about nature versus nurture and how context can dictate so much of what you do, especially when you are first learning. So, if you learn to code in an environment where having a ridiculous amount of classes is seen as the standard solution to building Enterprise-level products, then you will use that same method elsewhere, even if you don't need to: you will simply scale it down to size, but not really do any real and valuable reduction in complexity, but simply continue to write inefficient code.
There are many traits that distinguish great developers from the lesser ones. Passion is most certainly very important, since without it, you will not be learning and pushing and trying new things out. Knowing the basics is important as well. Code reuse, design patterns, fundamental data structures and algorithms are necessary and required if you want to write good code. Agile practices as well are part of what every good software engineer should be using as part of the development process and testing. Complex software almost necessitates these things, but many times they are not used, not used well, or at worst, not at all. I would say that the very best developers believe in common sense, efficiency and simplicity. Those things help them build the immensely complex software that is needed today.
What do you think? You (our Jelastic users and blog readers) are quite active and we would love to know what you think. Voice your opinion in the comments below.