Migrations are complex projects, and it is reasonable to at least consider whether one would not be better off rewriting a software system rather than porting/migrating/rehosting it.
It is a tempting proposition: rewriting allows one to use the latest language, database, middleware and methodology; as well as address whatever functional shortcoming crippling the currently existing system.
Even gathering the functional specification of the newly written system should be straightforward, as it is more or less what the existing application currently does, modulo the above-mentioned improvements.
So, why do these rewriting projects fail so amazingly consistently? Depending on who you ask, the literature provides figures ranging from 15 to 30% for software development projects (not just rewriting projects) that are canceled out right without having delivered anything; and these rates can climb up to 70% when considering projects that end up being significantly late or over budget.
I’d love to provide hard figures here – I can’t – but it is my guts feeling that the failure rate in rewriting projects is even worse. It has turned to a gimmick: when a customer says that he will go through rewriting rather than migration, I make a mental note, since more often than not, the rewriting project will be canceled two to five years down the road, and migration will come back on the agenda as a valid option again.
We even encounter sites that have tried to rewrite twice, failed twice, and end up considering migration when everything else seems to have failed, more than a decade after the first attempt at rewriting.
Why is that? How can one explain that the apparently reasonable arguments in favour of rewriting do not seem to hold in practice?
I have compiled a few possible reasons for this situation in this post. As usual, reality is probably more nuanced, and the true explanation is likely to be a combination of these.
1 – Statistically independent events
The first factor is the uneven field. After all, one compares a project that has already been successful (the existing one, which is used in production today) and a future one that can still fail. And the fact that the original project was successful is not a useful predictor for the rewriting project to be equally successful.
It will usually be developed using different tools, by a separate team, with different skills and experience. The timing constraints are different. It is not uncommon to see plans to rewrite a system in 5 years or less, even though it has been developed continuously over more than 20 years. In statistical terms, these differences of context translate to a low correlation between the probability of success for the two projects.
Even the specification of the new system is not as simple a matter as one may be tempted to believe. More often than not, the only up-to-date and reliable source of information regarding the current behaviour of the system is its source code. Rewriting projects then start with a long and painful phase of discovery, where the semantics must be extracted from the existing source code and meta-data.
In other words, don’t count on the success of a development project to predict the success of the corresponding rewriting project. It can and will fail more often than not.
2 – Self-inflicted optimistic blindness
Everybody knows the flaws of the existing system. It is not flexible enough, may suffer from an inadequate user interface, may be slow at times, etc. By definition (I am tempted to write “by design“), nobody knows the flaws of the still non existing system, because, well, it does not exist yet, flaws and bugs and all.
And software engineers are optimistic. They don’t contemplate failure as an option when starting a project. In their infancy, projects are perfect, provide the required features with flexibility, efficiency, elegance and more. It is only later that reality creeps in, and that the beloved baby starts showing flaws, up to a critical point where it may be beyond repair.
And then, with the benefit of hindsight, the original system does not look so bad any more. The problem is that such wisdom does come with the benefit of hindsight only. When starting a project, the resulting system is draped with all possible virtues that contrast heavily with the flaws of the existing system it is meant to replace.
3 – Over-engineering and excessive ambition
Another reason that causes rewriting projects to fail is the difference in scope and ambition. Older systems were developed with a single ambition, namely deliver the required functionality within the constraints at hand.
Most non-trivial systems nowadays include some form of genericity, dynamic behaviours controlled by XML files or other runtime artefacts. They turn to excessively flexible scaffoldings, where most of this flexibility is barely used even though it greatly increases complexity and entropy.
To a degree, one can argue that this is just another case of self-inflicted blindness: one designs a framework (and an instance thereof) rather than design an application, without appreciating that this additional level of flexibility increases the risk of failure accordingly.
4 – The cost of two ongoing systems
All the issues listed above have to do with the intrinsically human nature of software development. On top of all this, there is a very simple organizational issue that has a dramatic impact on the feasibility of any rewriting project.
While a project is being rewritten, the organization has in effect two systems under maintenance: the old one and the new one. Functional changes dictated by the surrounding environment (legal, financial, mergers, technical) must be made on both the old and the new system, and for obvious reasons, the old system is usually updated first as it is considered more important (It actually runs in production after all, right?) In practice, before long, the rewritten system lags, and is constantly catching up on evolutions of the functionality to implement.
Then, assuming – and it is a big assumption !!! – that the rewritten systems ever gets up to date and provides adequate functionality, data must be migrated which is a serious project in its own right; client software must be installed, users must be trained, etc.
Actually going from the old system to the new one is a serious issue in its own right, and some organization would rather go for an incremental migration rather than a big bang.
But this is easy to say and hard to do. Data must be replicated and synchronized across possibly incompatible data models, race conditions must be cared for, etc.
At the end of the day, rewriting the application is insanely complicated, but it is just a step. Putting the newly written application in production can be the real challenge here.
So, am I just playing the devil’s advocate, or do you you recognize yourself and some of your experiences in this post? As ever, feedback is welcome.