The Migration CodeCraft is our automated migration service. You may be in the situation many companies are in: they have some valuable pieces of software they wrote and have grown over the years. This piece of software has been written using technologies that are used less and less, so your company finds itself at a competitive disadvantage. It is like watching a crash in slow motion: the present does not feel great, but the future looks bleaker. And if you think about the prospect of rewriting everything from scratch, that would seem an even less preferable alternative. Luckily, you could not afford it, anyway.
In this article, we present Migration CodeCraft. It is our automated migration service: we take your code, written in an old programming language, and translate it into idiomatic code in a modern programming language. Let’s see how this can be done.
Why should you migrate to a different programming language?
First of all, why the heck should you even consider this move?
From the conversations we have, these are the main reasons why someone is not thrilled about the technologies they are using for “historical reasons”:
- You cannot find developers who are familiar with the language in which your system is written. Maybe you are using RPG, PACBASE, Visual Basic 6, Clipper, PowerBuilder, GeneXus, Natural, or another language that is past its peak
- The hardware on which the language runs is costly. You are not happy paying IBM and the like those high prices when there are cheaper alternatives
- The software requires some license (did anyone say SAS?), while you believe that money could be better kept in your pocket if you moved to Python
- The platform you are using is unmaintained, or support could be ended from one moment to the next, so you feel as relaxed as a turkey one week before Thanksgiving
So, you would like to wake up to the same software minus the headaches caused by technological choices made 10, 20, or 30 years ago.
Can’t we just rewrite the application from scratch in a modern language?
Well, if you have not done that already, it is because your system is large. Maybe it is large by your standard, counting 100,000 lines of code, or it is large by anybody’s standard, boosting over 30,000,000 lines of code. Either way, rewriting the software seems like a very painful exercise.
Estimates for rewriting software “manually” range from 15 USD to 35 USD per line of code. So, yes, that means that depending on the size of your system, it could go from millions of dollars to hundreds of millions of dollars. Of course, you could get an estimate for way less than that. But I guess we know how estimates work in software development.
Let’s suppose you can afford it. Well, it will take time — probably three to five years — but it could be more, depending on the size of your codebase.
So, this is an option for those who have the time and money to invest in it. If you are also considering re-hauling the business logic and can take the time to do that, this is the route I would suggest. Otherwise, if resources are limited and you cannot afford this “pause” in the development of your system, I would look into doing an automated migration.
How is the Migration CodeCraft any different than other automated migration services?
We are not the only ones offering migration services. Perhaps you have already looked into a few of them and are not convinced, or maybe you have no idea how to pick one. After all, you are not in the business of providing migration services, so it is not trivial to compare them.
Let’s skip the jargon and be as clear as possible. The differences we designed our service to provide are:
- Idiomatic Pattern Translation: most services perform a very low-level translation, replicating the logic to the smallest detail. We believe this is simpler to implement, but you pay for it in terms of the maintainability of the resulting code, so we went for another approach. Our approach consists of identifying recurring patterns in the original code, associating an intention to that pattern, and translating that intention in a way that is idiomatic in the target code. For example, to delete a file storing data in a language like RPG, you may want to set the cursor at the start of the file, then iterate on each record in the file and delete it. We would identify that there is such a recurring pattern. We would then look into it, understand the intention behind (clearing a file), and translate that. For example, the file could become a table, and the “clearing file pattern” could be translated into a single delete-table instruction for Hibernate.
- Architecture co-design: most of our competitors built a transpiler that, given some code in input, produces some migrated code. The translation is based on the architectural choices made when the transpiler was built. Typically, those choices cannot be changed or adapted. This offers a great advantage for the vendor: they have a product they can reuse with no customization costs. We believe instead that the architectural choices about the new system to produce should be made with the client. This means we will check with you to see if you want to migrate to Java, Python, C#, or another language and which frameworks you would like your new system to be based on. We would go down to discuss specific patterns and ensure the way they are translated is what you would prefer. Our intent is to give you code that you would be happy to maintain, not whatever we picked. In our opinion, this is important to ensure that the new system can be well maintained.
- Double guarantee: we specify the price of the migration upfront. We do not give you an estimate. We give you a final price. Also, when we deliver the code based on the architecture co-design we did together, you look into it and give us feedback. We work on that feedback to ensure you are happy with the code you receive. If, in the end, there are parts of the codebase you do not deem translated well enough, we reimburse you for that portion of the code. So far, I could not find other vendors willing to stay behind their service in the same way.
Can’t we just use AI to migrate our code?
For a few months, we kept hearing this question. I have to say we are starting to hear it less frequently. Perhaps people gave it a try, and now it looks less of an appealing solution.
I get it: ChatGPT and the various LLMs they came up with looked like magic at the time. So, I think it was fair to entertain the possibility of using these systems to migrate our codebases. Well, I think that there are a few challenges with that:
- Protecting your Intellectual property: Of course, you may want to use an LLM running on your hardware if you want to be sure that your intellectual property remains, in effect, just yours. This requires setting up a non-trivial system and learning how to configure it properly.
- The span of attention: models like GPT4 can consider a certain amount of input, which is way, way smaller than the size of an entire codebase. This means they cannot consider the entire codebase in the translation, only snippets. It is as if the AI solution was aware of only a fraction of the codebase at the time, remaining blind to the overall picture. You may want to translate a function differently because it is invoked in a critical path or not invoked at all. Because it is used in a financial calculation (and so precision matters) or not. Context matters in a translation.
- Reliability: if LLMs can produce code that looks correct, the question is, can you trust the entire codebase will be translated correctly? Probably not. Some lines may not be translated correctly in the millions of lines you feed to it. The problem is that you need to find them. You need to carefully examine the code to ensure it has been translated correctly. Can you afford to do that? Is there a risk that some errors could escape to you? If you think the LLM solution will translate your code perfectly, try asking it to translate the same non-trivial piece of code twice: do you get the same answer?
- Configurability: let’s say that you get some code produced by an AI solution, verify it, and it sort of makes sense. Now, let’s say that you just prefer a different framework or a different programming style to be adopted. Getting the system to correct the course based on your indications can be non trivial and prove to be a frustrating exercise.
I will ask ChatGPT to check if you do not trust me!
Our secret sauce
We make pretty bold claims about our ability to translate code idiomatically. We also do not use the trendy solution of just sprinkling some AI magic in our solution and hoping for the best.
So we better be able to explain how we do what we do.
There are a few ingredients:
- We create refined models of the code. These models are built in steps by processing and refining code. We build a lexical model, a parse tree model, and an Abstract Syntax Tree (AST) model. We then transform it into a graph by performing semantic enrichment. During this latter phase, we recognize the type of each expression, and we connect references to their corresponding declarations
- We use frameworks we developed over the years. All of our models are expressed through frameworks part of the StarLasu family. These frameworks are based on the experience accumulated working with EMF, JetBrains MPS, and other frameworks. We then added a few ideas of our own emerged from all the lessons learned working on tens of Language Engineering projects
- We developed tools to work with code. From StarLasu IDEA, the IDE plugin we use to develop and test parsers, to the Code Insight Studio, an application we use to explore codebases when planning migrations
If you want to learn more about how we do things, this blog contains hundreds of articles explaining our work method.
What happens during the Migration CodeCraft
The Migration CodeCraft is always preceded by the Migration Blueprint. This means that we start from a shared understanding of how the migrated system should look. As part of the Migration Blueprint, we also identified a series of modules into which the application can be decomposed and the order in which the modules should be delivered, deployed, and tested.
So what we do is:
- Deliver the translation for a module to the Client
- Deliver the runtime library to the Client, to be used together with the translated module
- The Client deploys and tests the received code
- The Client provides any feedback on the code
- We act on the feedback received by providing an updated version of the code
- If there are portions of the code that the Client is unhappy with, we provide a refund for such lines
A few things to note:
- The Client receives the migrated code, which they own
- The Client receives the runtime library we develop to support the migrated code. While we retain ownership of the runtime library, the Client receives the source code of the runtime library, together with the license to use such code for each and every purpose they want.
What we want is to keep the ability to reuse part of the runtime library in the future. What the Client gets is the ability to go on their own way, evolving every piece of code on their own, without any need to pay for recurring licenses or be forced to pay for support
Which languages does Migration CodeCraft support?
You may be familiar with how most of our competitors work: They have built a solution to migrate from a certain language (let’s say, COBOL) to another language (let’s say, Java). They provide only this specific service and invest their energy in that particular transpiler.
We do things differently. We focus on refining the skills, tools, and methods for building transpilers. Because of this, we are able to tackle new migrations routinely.
This means two things for our Clients:
- We are the only ones that can cover not-so-common migrations. For example, if you need to migrate from a language such as EGL or to a language different from Java and C#, chances are we are the only ones able to help you
- We are used to writing transpilers. This means that even when we already have a transpiler for a certain migration, we can customize it significantly to meet the specific needs of each project.
Here are a few examples of migrations we have worked with:
- RPG to Java
- RPG to Python
- Visual Basic 6 to Javascript
- VBA to C++
- SAS to Python
- PL/SQL to Java
- Teradata SQL to Redshift SQL
- EGL to Java
What result do you get at the end of the Migration CodeCraft?
At the end of the Migration CodeCraft the Client will get a migrated codebase built according to the specifications we agreed to during the Migration Blueprint. The code will be idiomatic and maintainable. The Client will have ownership of their migrated code, plus a very liberal license for a supporting runtime library. They will get the source code of that runtime library and the possibility to maintain it and evolve it on their own as they see fit.
Typically, after the migration, the Client is better positioned to maintain their codebase because of a few factors. Some are easy to overlook therefore I think it is useful to list them:
- A lot of dead code is removed. Over the years, some code stops being used. However, code analysis tools for legacy languages are often lacking, so this unnecessary code is not identified as such, and it keeps being maintained, constituting unnecessary cruft. Dead code can be identified as removed as part of the migration
- A lot of duplicated code is refactored. Most legacy languages have no great mechanisms for building reusable abstractions. For this reason, code is typically copied and slightly modified. This means that over time, the codebase becomes unnecessarily bloated because of the duplications. These can be removed as part of the migration
- Coherent style is enforced across the codebase. Codebases that evolved over the decades and have been based on different evolutions of the same language, written by various contributors or even different teams, can be written with very different styles. The migration will make such a codebase much more homogeneous, adopting the same style for the entire application
- Tests are put in place. Typically, old codebases have no tests in place. Tests are created as part of the migration, protecting against future regressions
This means that the system can be maintained much more efficiently in addition to the benefits of using a more modern programming language.
What about transmitting knowledge?
A factor that should be properly considered is transmitting knowledge, and that goes in both directions:
- Developers who developed using the legacy language will need to familiarize themselves with the new language if they are going to be part of the team maintaining the migrated systems
- Developers familiar with the modern language need to learn about the history of the system and its structures. They need to learn how to navigate such codebase
So, we need to create a team that possesses a combined knowledge of business logic and new technologies.
A tool that we provide to our Clients is based on what we call Transpilation Traces. You can imagine a tool permitting to navigate the original code and the migrated code side by side, indicating what is translated into what, or, conversely, what originated from what.
By using this tool:
- A developer familiar with the old language can learn how code they are familiar with gets translated into the language they are learning. This helps the training process
- A developer familiar with the new language can see the original code corresponding to some code they are working with. This can shed some light on the original design of the system and help in asking questions to the original developers if they are still around
What happens after the Migration CodeCraft?
The answer is: They lived happily ever after.
This happens because the company is now in a great position to maintain and evolve its system.
This can be the case if the code is maintainable because it is idiomatic, concise, tested, and understood. It also requires the proper transmission of knowledge, supported by the Transpilation Traces mechanism implemented in our tooling.
Summary
We chose to name this service Migration CodeCraft to underscore the importance of software craftsmanship.. This service intends to produce code that is well crafted, code that developers would be happy to work with, and that they could be confident to maintain and evolve.
It all starts with putting a solid basis, through the Migration Blueprint service. That service permits building a shared understanding and plan. Then, through the Migration CodeCraft, we turn that plan into reality. Finally, we let the Client continue their journey, with their codebase, knowing that we did the best that could be done to create the conditions for them to do great work. This is, after all, at the core of our motto Better Tools for Better Work.
Resources
Discover more about the Migration Blueprint.