Thursday 8 November 2012

The Urge to Rewrite Code

I recently read a fantastic article on Ars Technica about the inner workings of the new WinRT technology powering Windows 8. It’s a lengthy article but it explains how we got all the way from Win16 to Win32 to OLE to COM to .NET and eventually to WinRT and is well worth a read if you’re interested in Windows development and its history.

What got me thinking though is that it gave me a snapshot into Microsoft’s process of keeping Windows and its technology up-to-date yet also be able to support a vast legacy of applications that have been running sometimes for the past 20 – 30 years. What you realise is that to support that legacy and keep pushing forwards it basically built everything on top of one another. If you were to start from the latest WinRT API’s and strip away all the layers of strata you would eventually hit the ancient Win32 API and kernel, the very same system that has been powering Windows since Windows NT. The same can be said for .NET and COM; it doesn’t matter how new or fancy the latest technology trend is, eventually they all become nicer wrappers over what came previously.

This is interesting to me because I’ll bet every single developer out there has had one thought cross their minds at some point in their career: “I need to rewrite this”. Maybe you have a codebase written from another era or you’ve inherited code that looks like a monkey tap danced on a keyboard, sometimes we all fall prey to thinking that we need to invest time in rewriting huge chunks of code (or entire systems) because, obviously, we can do a better job and that the end result will be “better”.

This is dangerous!

I can speak from experience plus cite any number of references saying that taking on a rewrite of a codebase, especially an enterprise/commercial one, is just a bad idea. What you are effectively suggesting is to take a (mostly) working system, spend 6 – 12 months (maybe more) ditching it and starting all over again, and end up with the exact same system you started with plus countless additional bugs you’ve introduced due to human error and/or lack of domain knowledge. Also, all that time you wasted rewriting code meant that you couldn’t do any current development work, meaning your competitors have now charged ahead of you with brand new features that you will never be able to keep up with.

All of this so that your code looks better, a matter that no-one else but you cares about.

Now I am being a bit extreme here. Of course there are times when you have to rewrite something in order to progress forward. Maybe your code is so stuck in the dark ages that adding new features becomes increasingly time consuming or complex, maybe impossible. Yet I’ve learned over the years that you simply cannot ditch what you already have; your customers don’t give two hoots how you managed to kludge together their feature, the fact remains that it was done and it works so breaking it now is not an option.

So what can be done to refactor code effectively? Below are some ideas that I thought of and some of which I even try and implement myself.

Do Nothing

By far the simplest strategy as this requires no work at all! Simply learn to live with your codebase, quality be damned.  If you can overcome your initial feelings of revulsion at the spaghetti code you see daily and just accept it for what it is you might overcome some hurdles.

Of course doing nothing also means making no improvements so it’s quite a trade-off, but as the old saying goes “if it ain’t broke, don’t fix it”.

The Big Bang Strategy

Image courtesy of http://memegenerator.net/

Or the polar opposite of doing nothing is doing everything in one go, but as I’ve already said this is very extreme and hardly ever needed as there are better ways of improving your codebase without greatly affecting anything else.

The Microsoft/Onion Strategy

What I’ve seen Microsoft tend to do is build layers upon all their existing technologies so that the next layer up has a better API than the one below and each new layer will handle the fiddly, lower level details so you don’t have to.

For example, consider when .NET first came into existence which introduced Windows Forms. This was meant to replicate the drag-and-drop style of development that Visual Basic programmers have long been used to. But do you think that all the framework classes designed to handle windows and controls were written from scratch for a brand new, untested technology? No, Windows Forms was simply an easier to use wrapper over the interop’ed Win32 code because it already worked; why re-invent the wheel?

Of course once you’ve introduced these layers and made sure they are working effectively you could start to clean up the lower layers or possibly even remove and replace them so that you don’t need so much API coverage; that is assuming of course you can remove all the dependencies on low level code.

The Side-by-Side Strategy

This is a refactoring strategy I tend to use myself. Let’s say you have a feature that, for whatever reason, you are going to rewrite. What I do is actually never touch the old code and instead create a separate layer of classes alongside the existing code to replicate the same functionality but written differently; usually I separate these classes with appropriate namespaces.

Now I can work on the new code whilst the old code can still be deployed if necessary and also not affect my other team member’s builds. Eventually I will have fleshed out the rewritten code enough for calling sites to start using the new code which will then phase out the old. Once all references to the old code have been removed, you can safely delete the old code from your codebase. This might take quite a while to achieve fully but it is certainly a lot safer than starting from a blank canvas with nothing to show for a long time.

I also use this strategy with the ObsoleteAttribute to make it clear that code is old and should no longer be used; it also helps find all the references to the old code by giving me compiler warnings that I can work my way through.

The Inheritance Strategy

As an alternative to having old and new code side-by-side, you could also implement it top-to-bottom within an inheritance hierarchy. The new code would be contained in a base class while the old code would derive from the new and still keep it’s existing API. This means that legacy code could in theory be passed into functions which require the new class and it would still work.

Conclusion

There are many alternatives to refactoring code in one large chunk – I’m sure there are also many other strategies thought up by people far cleverer and more experienced by me. Essentially I have learned from my career that the “big bang” approach never works out well and a more long-term, slower strategy usually gives the best results.

No comments:

Post a Comment