Create a Transpiler: From VBA to VB.NET

Create a Transpiler: From VBA to VB.NET

The companion repository of this tutorial unosviluppatore/transpiling-languages

In this tutorial we show how to create a transpiler, a software that can be useful in a few scenarios: if you have a large codebase in a language that does not fit your needs anymore, you still want to keep developing in one language, but you need better performance or need to run in a specific environment. In short, whenever there are two incompatible constraints: users need to write code in one language, but the execution requires them to use another language. In such cases, you need a transpiler.

You might need a transpiler because the original language might be obsolete, so a language that is not developed anymore. Or it might just be a language that does not fit your needs anymore. For instance, you might want to move from Pascal to C#. Or maybe you need a more powerful environment, so you want to convert from PL/SQL to Java.

However this is not the only reason: sometimes you still want to keep developing in one language, but for the execution you have requirements that the original language cannot meet. For example, many languages transpile to JavaScript because it is run in a browser. If you have a large codebase and expert developers in one language, but you need better performance, you might try transpiling to C++. This since this language offers very optimized compilers.

The example transpiler that we will work on for this article can be needed both reasons. There are companies that need to keep using VBA, because it is a simple language used by non-developers and inside Excel. However, they also need better performance or they need to mix calculations made by analysts with parts developed by professional developers. In such cases, they can transpile from VBA to VB.NET and let everything run on a single environment.

For instance, you might need to keep using the legacy codebase in VBA, but want to switch to use F# for new code, because it has better support for advanced mathematical computations. So, you transpile legacy code in VB.NET and you can mix it with F#, since they are both .NET languages.

What Is a Transpiler

A transpiler is a software that translate a language in another one. It allows to preserve the investment made to create a code base and use in another context, like another platform.

Let’s understand transpilers better, starting with a definition.

A transpiler, also known as a source-to-source compiler, is a software that translate code written in one language into another language.

In short, it is a special kind of compiler. A compiler usually transform source code into assembler or bytecode (to be run on the JVM or CLR, for example). So, you might say that a compiler transforms source code into any language, even one that is not human readable. While a transpiler transforms the source code in another language that is still human-readable. Then you still need a compiler to transform the new source code in an executable format.

Creating a transpiler seems a daunting exercise, but in this article we are going to see that it can be easily done once you have a good strategy. We know it, because we have written more than a few transpilers for our clients.

We are going to create a transpiler from Visual Basic for Applications (VBA) to Visual Basic.NET (VB.NET), but the principles apply to any transpiler.

How to Write a Transpiler

To build a transpiler there are usually 2-3 steps that need to be performed:

  1. Parsing the original source code
  2. Performing partial Symbol Resolution
  3. Converting the AST in the destination language

You need step 1 to transform the original source in an abstract syntax tree (AST), a logical representation of the source code that can be managed programmatically. And you need step 3 to transform the AST in the destination language. You need to perform step 2 only in some cases, if you need to understand more than simple parsing can tell you, in order to transpile an instruction.

You can read A Guide to Parsing: Algorithms and Terminology if you need an explanation of what is an AST.

Symbol Resolution in a Transpiler

Symbol resolution consists basically in linking a piece of text to the corresponding entity. For example, you might need to link all the times a variable name/identifier is used to that specific variable-entity.

In this code, the identifier/name text refers both times to the same local variable.

Symbol resolution is absolutely necessary for a compiler, it is also usually necessary to perform it partially when you are building a transpiler. You do not need to do any symbol resolution only when two languages are very similar. When you can translate each individual statement without looking up another statement. In this cases, we can simply trust the final compiler, that will check the correctness of the code we produce.

Why you might need to do that for a transpiler? Imagine the following piece of VBA code:

Are you trying to access the element at index 0 of the array data or you are trying to call the function data with argument 0? That is not immediately obvious from this instruction because in VBA you use the same syntax for both operations. So, you have to keep track of the declaration of data, whether is a variable or a function.

This might also be needed if you are transpiling from a language that allows first-level functions (e.g., C++) to one that does not allow them (e.g., Java). That is because you need to put all first-level functions in a custom class and then change all calls to first-level function in the original C++ code to functions of a custom Java class.

So, the kind and extent of symbol resolution that you have to do, depends on both the source language and the destination language and will vary from project to project.

Our Transpiler

In this article we are going to build only a partial transpiler from VBA to VB.NET, but we are going to show the whole process that could be used for a complete transpiler.

Since the two languages are quite similar, we can also avoid doing a complete conversion of the original AST from VBA to VB.NET. Instead we are going just to rewrite the parts of the input that must be adapted for VB.NET. This approach requires less work, but each change will require to find the exact place where to make the change. In contrast rewriting every instruction would be longer, but each instruction will be easier to write.

In both cases, the process is done the same way: we traverse the AST and perform the operations we need.

Since we are transpiling VBA in VB.NET, we do not really care about making a distinction between array indexing and function calls, because VB.NET also behaves in the same way. So, we will not need to change or keep track of instructions like the previous example data(0). However, our transpiler will need to perform an operation that needs the same approach: 2-passes. This simply means that we have to traverse the AST two times to accomplish a transformation. Usually the first pass is used to collect some information from multiple places, while the second pass uses that information to complete the transpilation.

We are going to see what that is later.

The Structure of the Transpiler

The workflow of our transpiler is:

  1. Parsing the original source with an ANTLR-based parser
  2. Making a first-pass of the code on the AST produced by ANTLR
  3. Taking the temporary and partially-transpiled code and parse it with Roslyn
  4. Performing a second-pass to complete the transpilation with the Roslyn visitor
  5. Compiling the transpiled code with Roslyn

Technically the step 5 is not part of the transpiler, but you probably want always want to compile the transpiled code, so we also perform this step. You generally want to do that to discover and report any error to the user. Hopefully these errors are in the original source code and not due to your transpiler. You want to do that because you are transpiling, not compiling the original source code, so you cannot know if the original source code is actually correct.

Usually you cannot easily programmatically compile the transpiled code and report any error, because the compiler is a completely separate program. However, thanks to Roslyn, the compiler-as-a-service, we can do that quite easily and report the results to the user. This could also offer the chance to perform automatically fixes to errors in the source code, in case you wanted to do that.

We do something unusual with our transpiler: we make the second-pass on the AST produced by Roslyn instead of doing that on the AST produced by the first-pass (i.e., ANTLR). This of course requires to parse the text twice, so it not the most efficient process. However, we choose to do that because Roslyn has a very advanced parser for VB.NET, this makes some changes easier to do on the AST produced by Roslyn than the one we have from the start.

The Main Program

The general structure of the transpiler is reflected by the main function of our program.

After initialization and cleanup (lines 3-13), we parse all input files in parallel (lines 16-20) and then list all errors we find (lines 23-30). Notice that we pass by reference the list of errors (line 19) so that we can modify it inside the function and get one list of errors.

We directly show the errors reported by the parser. We show parsing errors because we assume they are due to errors made by the user (i.e., our parser is not faulty). In other words, if we cannot parse the input this means that the original source code is syntactically incorrect.

Inside the ParseFile function we parse and also make the first-pass of transpilation. Usually this means that we gather information necessary to transpile some instructions. However, in this case we do that and also do a partial transpilation. We can do that here because, as we said, VBA and VB.NET are similar enough that we can change a VBA source code file piece by piece, until it becomes a VB.NET source code file.

After we have finished that, we complete the transpilation (line 32) and compile the resulting transpiled files (line 34).

Error!

Inside the function CompileCode we also communicate to the user any compilation error. This is a bit tricky because the errors might due to semantic errors in the original source code or mistakes/limitations of our transpiler. So, during development we have to find a way to discriminate between them and hopefully reduce the second type of errors to zero.

Furthermore, we have to make sure to communicate to the user all errors in relation to the original source code. For instance, imagine that  the user tried to add a string and an int, this is an user error and the compilation will fail. So, we have to communicate it to the user. However, the transpiled file seen by the compiler might have more lines than the original source file, so we have to keep track of which line in the original file caused the error for any affected instruction in the transpiled file.

Parsing VBA Source Files

There is not much to say about parsing, we simply use our trusted ANTLR and a ready-to-use grammar to get a parser with just a few lines. We load the input from the file, pass it to the lexer and then pass it to parser (lines 5-8).

Readers of our website should be familiar with ANTLR, in case you need more information you can read our tutorial. In the companion repository for this article, the parser is already generated, so you do not need to do anything special to take advantage of ANTLR. You might use it as any other library.

In the following lines (11-16) we replace the default error listener (which would output to the console) with our custom error listener. This listener will record all errors, which we will show later to the user.

Finally (lines 19) we take from the parser the root of the AST, so that we can start visiting the tree.

First-pass of Transpilation

We use a listener to perform the first-pass of our transpilation. We pass to the listener the stream of tokens (i.e., the output of the lexer) so that we can alter the input. The listener will also saves information needed for the second-pass.

On lines 4-5 we check whether the current file is the one that contains the main function, the one that is called at the beginning of the program. We do that to use the name of the file as the name of the program (i.e., the executable produced by the compiler).

We add the code modified by the first-pass/listener to a list of source code representing our program. These source code files will be given to Roslyn for the second-pass and compilation. We add the code to the list inside a lock-protected section to avoid any issue, because the parsing of the input file happens in parallel. Alternatively, we could have saved the output of all partially transpiled files and then added them together in the list after the parallel parsing. Given that we are parsing few files and there is a small chance of conflict, this is the most efficient way.

The FixerListener Class

Now that the whole organization is clear, let’s see how the first-pass looks like. In other words, let’s see the the FixerListener class that gather information for the second-pass and performs part of the transpilation.

It is a listener, so remember that the walker will automatically call the proper Enter and Exit methods whenever it finds the corresponding node.

Our listener inherits from the base listener that is generated by ANTLR. Inheriting from this listener allows us to implement only the methods we need. The rest remain the default empty one, like the following example.

The first two lines of the class (lines 3-4) define a CommonTokenStream and a TokenStreamRewriter, they are both needed to do the rewriting of parts of the input. The variable TokenImport (line 5) holds the position of an Option, that, if it is present, must be put before any other instruction in the transpiled file.

The lines 6-8 contain definitions for the transpilation of VBA Type elements. These are the elements that are transpiled to a VB.NET Structure. We need a two-pass solution to transpile them.

The rest is self-explanatory. Inside the constructor, on line 17, we check whether the filename given to us ends with an extension. If so, we eliminate the extension and save the result as the FileName that will have the transpiled file.

The First Steps

Now we can start seeing the first operations of a transpilation. For reference, this is a simple file created by exporting some VBA code from Excel.

To transpile this code to VB.NET we want to eliminate the header (VERSION etc.) and the config parts (Attribute etc.), but we need to keep the Option statement.

In the function EnterAttributeStmt, we first check whether the attribute VB_Name is present. If so, we use it to get the definitive name of the transpiled file. We then delete all attributes, including VB_Name on line 10. This is the same instruction that we use also to delete all header and config parts in the proper functions. These functions are not shown here, but you can see them in the companion repository.

In EnterModuleDeclarationsElement instead we check whether any Option is present. If so, we save the position where the last Option ends. We do this, because we have to add some Imports to the VB.NET files. Usually the Imports are at the beginning of the file, but an Option statement must be put before anything else. So, if an Option is present it must be put first.

Wrapping Everything with a Module

Before looking at some examples of how to transpile specific statements, let’s conclude the overall view. Most of the VBA code is in a bunch of sparse functions and variables. This is not permissible in VB.NET, so we have to put everything inside a module. We do this at the end of our traversal of the tree. By doing this way, we are sure that the elements we add here are exactly where they should be. So, they are not accidentally moved by something else.

On line 7 we gather all the imports that we need for our specific project. In this simple case we just store them in a string. We also add the start of the module declaration.

Then we do what we anticipated in the previous section: if there is an option we put the imports after that, otherwise we put the imports at the beginning of the file.

Finally, we put the End Module statement right at the end of the file.

Mind the Distance

As we said, VBA and VB.NET are quite similar, but there are some differences that we have to consider. In this section we see one of the common ones, we will see another one in the second-pass.

One difference pertains the Erase statement. In VBA this statement deletes and re-initializes the array. However, in VB.NET it basically assign the array to Nothing. Therefore after an Erase statement we would now have a variable that is invalid.

The solution that we use, on line 17, is to call the Array.Clear procedure to re-initialize every element of the array. However, there is a problem: this works for fixed-size arrays, but it does not work for dynamic size arrays. For these arrays, we could leave the Erase statement as it is to obtain the desired behavior.

The problem is that we have to understand whether we are dealing with fixed-size arrays or dynamic arrays. How can we do that? That would be job for a two-pass solution: in the first-pass we would record the kind of array declared in any declaration and in the second-pass we would transpile the Erase statements. We would need two passes because there are global declaration that can be in one file and used in other files. However we do not do any of this here to keep the article of a manageable length. Instead we see a more complex example of a problem that requires a two-pass solution.

Transpiling Types into Structures

A Type in VBA is a custom data type that can be created by the user. Looking at the VB.NET documentation we can see that a Structure is the corresponding element in VB.NET:

A structure is a generalization of the user-defined type (UDT) supported by previous versions of Visual Basic. In addition to fields, structures can expose properties, methods, and events. A structure can implement one or more interfaces, and you can declare individual access levels for each field.

Sounds goods, right? There is a problem, though. Try to transpile the following Type.

Into a corresponding Structure.

You will get an error from the compiler:

Arrays declared as structure members cannot be declared with an initial size

So, we can only transpile the array field Text in this way.

The problem, of course, is that the two pieces of code are obviously quite different. So, how do we solve this issue? We have to transpile the array fields in the only way we can, than we have to initialize them in some way.

Changing a Type into a Structure

We begin with changing a Type into a Structure.

Inside the function EnterTypeStmt we have to do a few things. First, we need to set the current type name and setup the element that will contain the initializer for the Structure (lines 4-6). The current type name will be used when visiting the declarations of the individual type fields (line 13).

Then we change the keyword Type into the keyword Structure (lines 8-9).

Finally, in the lines (11-16) we take the global visibility of the original Type and we apply it to every element of the Structure.

While we are checking every element of the type, we search for arrays declarations. If we find one, we copy the original array declaration from the VBA file (line 11) and save it for the initializer. Then we replace the original declaration with one for a dynamic array with the same number of dimensions.

What Do We Do with the Initializer?

Now we still have to understand where and how to use the initializer.

There first step is to write the code for the initializer at then of the Structure. Basically, we now have added a Sub called Init{name_of_Structure} with all the initialization code for the array. We have not shown it here, but the InitStructure class takes care of adding the beginning and end of the aforementioned Sub.

However, we still have to call the initializer whenever necessary. This will be done during the second-pass.

We Need an Entry Point

Now we have to take care of the main Sub of our program. The problem is that a C++ program requires a main function, so we must ensure that there is one.

The implementation of this function will varies depending on your needs. Here we just shows an example of what you can do. The main issue is that your initial VBA code might not have a proper Main procedure, but VB.NET needs one. In our example, we search for any Sub called Main_Run, Main_Sub or Main, then we ensure that the name of the Sub becomes Main. We also set the MainFile to true, so that we can use the name of this file as the name of the program.

The rest of the function is a bit arbitrary, we just do a bunch of things that we found useful. For instance, since VB.NET is culture-aware, it can change how it parses and output decimal number. A number like 5.6 might be outputted as 5,6 depending on who executes the program. On line 15, we set the current culture as an InvariantCulture, so that we always see decimal numbers with the dot (.) as decimal separator. Whether you want to do this obviously depends on your needs.

We also set two calls to measure how much time it takes to complete the program (lines 15,19). Finally, we force the console to not close until we press a key (line 18).

Notice that since we use the method InsertBefore, the actual position in the transpiled files will be reversed. So, for instance, what we add on line 15 will appear after what we add on line 16.

As we said, these are mostly arbitrary additions, but they are a simple demonstration that you can add whatever you want while you are transpiling. In a real scenario you might want to add memoization or perform some other optimizations to your code.

How We Can Get The Transpiled Code

The last method to see in the FixerListener class is the method that returns the modified text, i.e., the transpiled text.

This is very simple, thanks to the method GetText of the TokenStreamRewriter. Note that the rewriter does not actually change the input: it just records all changes that you add to the rewriter and then it plays them out when you ask for its text. This is very handy if you need to transform the input.

The TranspileCode Method

Since we have already seen the general structure of the program, we can directly go to the TranspileCode method. This is where we finish the transpilation. As you will remember, the method is called with argument the list of structures that have initializer. This data will be used in the second-pass.

What happens in the method is quite simple. We parse the source code as changed by the FixerListener, add the parse tree created by the Roslyn parser, after it has been changed again by the VBARewriter (lines 6-10). The VBARewriter is what performs the second-pass of our transpilation process.

After that we add the Runtime (line 12), which contains any additional code that we need. For instance, it may contain Excel methods or any other library function that you use in your code. If you can run the transpiled program on the .NET Framework, this will save your some time. That is because the .NET Framework already contains some utility functions, like Strings.Right, that are not part of .NET Core.

Finally we write the final transpiled files onto the disk with the correct extension. We will compile them later.

How to Initialize a Variable

All of the code in our VBARewriter is needed for the proper initialization. Despite the difference in terminology between ANTLR and Roslyn this is like a visitor. Actually there is a class in Roslyn called VisualBasicSyntaxVisitor, the difference is that that class cannot alter the visiting nodes, while this one can. Since this is a visitor, you can govern the path of the visit, so for instance you can stop the visit while visiting any node.

The overall strategy is quite simple:

  1. we have to find all declarations of variables with a type one of the Structure that necessitate an initializer
  2. we have to add a call to the initializer after every declaration

The execution is a bit complicated because there are two cases: the variable can be at the module level or inside a method. If the variable is declared inside a method, we can simply add an initializer after the declaration. However, if the variable is declared at the module level we have to add a module constructor (a Sub New()) and add the initializer there. That is because the variable could be used anywhere else in the program, like in other files. So, we cannot simply add an initializer before the first use of the variable.

Adding an Initializer for a Module Variable

As we said, the method to initialize a variable declared directly inside a module is the most complicated. So we are going to see that one.

Now we are going to see the first part, that collect the variables to initialize and create the statements to initialize them.

Logically the code is quite simple: we search for declaration of variables and check their type. If the type of a variable is one of a structure that requires an initializer, we add a statement to initialize it. The statements are stored in the list initInvocations, we do not add them right away.

The actual code seems a bit more complicated because of the Roslyn syntax. Roslyn is a very useful and powerful, but it makes a bit intricated to edit the source code. That is because by default everything is immutable. it is not hard to learn all the terminology of the different parts but requires a bit of time. However it is easy to understand it, while you are reading it in context.

For example, everything that has trivia in its name is unnecessary for the code, like whitespace or end of line characters. So, the code on line 16 means that we want the identifier of the type WithoutTrivia, without any starting or ending space.

In case you need a deeper introduction to Roslyn, you can see our article on Getting Started with Roslyn.

Initializing an Array of Structures

So, in the end, the only unclear part of this piece of code might be on the lines 20-27: we distinguish between the case of an array variable and a simple variable. Then we call the function CreateInitializer with the proper arguments. The reason become obvious once you see the transpiled code.

We obviously cannot call the method InitData directly on Examples because it is an array. Instead we can call it only on its individual elements.

We are going to see the function CreateInitializer later. Now we are going to see the rest of the method, that is how to create a constructor for the module.

The Module Constructor

The constructor for the module will contain all the initializations for the variables.

We create separately the delimiting instructions for the constructor (i.e., Sub New() and End Sub), then we add to the block of the Sub the statements for the initializations. We create the complete constructor on line 7.

The statements on lines 3-4 might seem a bit confusing, but they are actually quite simple. You can see more clearly with special formatting.

We are creating a Sub statement with the identifier New() and a space after the identifier. Then we add a newline both before and after the Sub statement.

Using the identifier to also add the empty parameter lists, might not be the cleaner way to add an empty parameter list. But it works. And the proper way is quite long.

The proper method to add list of parameters

Finally (line 9) we add the module constructor to the current module. Given that in Roslyn everything is immutable, we actually create a new module. We take the current module and add the module constructor to its Members field, then we assign the resulting new module to the current module.

Since we want to continue the visit, we call the method of the base class (line 12).

The CreateInitializer Method

The method that actually creates the initializer is quite short.

To initialize a simple field we just need to call the Init{name_of_Structure} method on the Structure (line 11). Instread to initialize an array we have to walk through all the elements of the array and call the Init{name_of_Structure} method on each of them.

Our example just works on arrays with one dimension (e.g., Data(10)) however, it is not hard to generalize the method to array of all dimensions. We would just have to add a nested for cycle for each dimension of the array.

Fixing a Subtle Difference

Now we are going to solve another issue due to a difference between VBA and VB.NET. In VBA you can start an array from the index 1, while in VB.NET you cannot. All arrays must start at index 0. All we have to do to solve this issue, is to check whether we have a fixed-size array that is initialized in a manner similar to the following example.

If that is the case, we check whether the first number is a literal that corresponds to 1. In case that is true, we change it to 0.

The VBARewriter class contains also the method VisitArgumentList that fixes this problem. We fix it here instead of the first pass because of the possible presence of array declarations as a field in a Type. If we solved this issue during the first-pass we could leave some Type fields wrong. That is because we would have copied the original array declaration of any array field in type without fixing it, so they could still contain an array that starts from 1.

For example, it could happen something like the following.

All we have to do is to find any 1 as first element in a range used inside an array declaration. Since the code is quite simple, you can see this method in the repository.

Compiling the Transpiled Code

Roslyn might make editing the original code a bit hard, but it makes compiling it very easy. We can see it by looking at the method CompileCode.

The steps are easy: we gather the source, then the assemblies and then we compile the code. To find the complete name of the assemblies we simply had to look manually in an existing project.

The constructor for the options variable (line 24) is more complicated than its necessary, you do not have to list all the options we picked. The only required one is the first: outputKind. However, if you are compiling programmatically the output of a transpiler it is frequent to want to ensure that all options are set the way you want them. This to avoid any false positive or unexpected changes in the results. There all self-explanatory options, except for deterministic.

The variable emit (line 35) contains the results of the compilation. The second part of the method deals with them.

If the compilation fails, we have to show errors to the user. We also save them to a file for logging purposes. In a real application, we would  send them to a logging system that would help us understanding when our transpiler fails. Collecting errors could also be useful to perform automatic fixes in case the users themselves made some common mistake.

In case compilation succeed the output directory contains the finished executable. However we might have to add any required .DLL file manually. For instance, if all our code uses a specific library we can add this way. This is needed for our Runtime code. That is also the reason because we copy a model SQLite database.

The Runtime

The Runtime we need is quite simple. Obviously your case might be different.

We add a couple of methods in the runtime to save data and printing on the console. The Debug class is usually available in VBA programs, while SaveData is a simple custom method that we have added to our runtime.

In the repository we have also included a simple example VBA file.

Conclusion

We have created a working transpiler. This is just a prototype, but in little more than 6 thousand words we have taught you how to create a transpiler, not bad. As we promised, there is nothing really complicated once you get passed the magic of the term transpiler. All you need to know are the principles of parsing and the important patterns, like how to use a visitor. The rest is just common programming knowledge.

The main thing that misses from a real transpiler is a good set of examples and tests. We are sure you already know how important testing is for software like parsers or transpilers. An error here can ruin the day of many programmers that uses your software.

There is no testing in this article simply because there is nothing different in testing a transpiler, compared to testing any other software.

 

Download the guide with 68 resources on Creating Programming Languages

68resources

Receive the guide to your inbox to read it on all your devices when you have time

Powered by ConvertKit