If you were using a Windows computer in the ‘90s or early 2000s, half of your applications were created with Visual Basic 6. But all good things come to an end, and when Microsoft officially ended support for the VB6 IDE in 2008 an era was concluded. Sic transit gloria mundi.

But VB6 applications didn’t disappear overnight and today there are still a number of crucial applications that are defined in VB6.

Visual Basic for Applications (VBA) is a completely different beast. It has been the only way to script Excel, Access, and other Office tools for a long time. While it has not been discontinued, Microsoft has been gently pushing users to look for alternatives like TypeScript/Office Scripts and Python in Excel. The writing on the wall is pretty clear: it’s time to look for greener pastures.

So if you find yourself with loads of VB6 or VBA code, and you want to analyze, map them or maybe migrate them, where do you start? Well the answer to that questions is, with getting your hands on a parser for these languages. Luckily this article explains how to use Strumenta’s VB6 and VBA Parser, which parses VB6 and VBA source code and produces a structured syntax tree suitable for analysis, transformation, and migration.

What is the Visual Basic 6 and VBA Parser?

The Visual Basic 6 and VBA Parser is a piece of software that parses source code written in Visual Basic 6 and Visual Basic for Applications and produces a structured syntax tree (AST). It can be obtained as a Saas Service (see CodeLumen) or as a library to embed in your applications.

The parser focuses on:

  • Accurately representing the syntactic structure of the language
  • Preserving source-level information
  • Supporting downstream tasks such as static analysis, documentation generation, and automated migration

The parser is designed as a foundational component. It provides the structured input required for more advanced processing stages.

Supported Languages and Dialects

The parser supports:

  • Visual Basic 6
  • Visual Basic for Applications (VBA)

VB6 and VBA share a large portion of their syntax, but differ in runtime environment and available libraries. The parser handles the shared language constructs while keeping the representation flexible enough to support different execution contexts.

Of course, also host-specific elements (for example, the Excel object model in VBA) are parsed.

Installing the Parser

The VB6 and VBA Parser is distributed as a library and can be integrated into JVM-based projects.

Example using Gradle:

dependencies {
    implementation "com.strumenta:vb6-vba-parser:<version>"
}

And using Maven:

<dependency>
  <groupId>com.strumenta</groupId>
  <artifactId>vb6-vba-parser</artifactId>
  <version>VERSION</version>
</dependency>

Parsing a VB6 or VBA File

Once the parser is available on the classpath, parsing a VB6 or VBA source file requires only a few lines of code.

Consider the following VBA example:

Public Function CalculateTotal(price As Double, quantity As Integer) As Double
    Dim total As Double
    total = price * quantity

    If total > 100 Then
        total = total * 0.9
    End If

    CalculateTotal = total
End Function

Parsing the Source Code

You can parse this file as follows:

import com.strumenta.vb6.parser.VB6Parser;
import com.strumenta.vb6.parser.ParsingResult;
import com.strumenta.vb6.ast.CompilationUnit;

import java.nio.file.Path;

public class Example {
    public static void main(String[] args) {
        VB6Parser parser = new VB6Parser();

        ParsingResult<CompilationUnit> result =
            parser.parse(Path.of("CalculateTotal.bas"));

        if (result.isCorrect()) {
            CompilationUnit ast = result.getRoot();
            System.out.println("Parsing completed successfully");
        } else {
            result.getIssues().forEach(System.err::println);
        }
    }
}

If the source contains syntax errors, the parser reports them without failing abruptly, allowing you to inspect partial results when needed.

Inspecting the Abstract Syntax Tree

The abstract syntax tree (AST) produced by the parser exposes the structure of the VB6 or VBA program in a form that is easy to navigate programmatically.

At the top level, a CompilationUnit typically contains:

  • Module declarations
  • Class module declarations
  • Procedure declarations (Sub and Function)

Accessing Procedures

For example, to list all procedures defined in a module:

CompilationUnit ast = result.getRoot();

ast.getProcedures().forEach(proc -> {
    System.out.println(
        proc.getName() + " : " + proc.getClass().getSimpleName()
    );
});

This allows you to distinguish between:

  • Sub procedures
  • Function procedures
  • Their visibility (Public, Private)
  • Their parameters and return types

Inspecting Statements

Inside a procedure, you can traverse its body:

proc.getBody().getStatements().forEach(stmt -> {
    System.out.println(stmt.getClass().getSimpleName());
});

Statements such as assignments, conditionals, and loops are represented explicitly, making it possible to:

  • Analyze control flow
  • Extract business rules
  • Perform transformations

VB-Specific Aspects in the AST

The AST reflects several characteristics specific to VB6 and VBA:

  • Explicit representation of Dim, Public, and Private declarations
  • Optional typing and Variant types
  • Distinction between modules and class modules
  • Structured representation of If, For, While, and Select Case

These aspects are difficult to handle reliably using text-based approaches, but become straightforward once the code is parsed into a proper syntax tree.

Typical Use Cases

The VB6 and VBA Parser can be used in a variety of scenarios. In all of them, parsing is the essential first step.

Static Analysis and Code Metrics

Once VB6 or VBA code is parsed, its structure becomes explicit in the abstract syntax tree. This allows analyses that rely on structural information rather than text processing.

Typical metrics that can be derived directly from the AST include:

  • Number of modules, class modules, and procedures
  • Size of procedures, measured in statements or nesting depth
  • Use of specific language constructs (for example GoTo, On Error, or Select Case)
  • Distribution of variable declarations and scopes

Because these analyses operate on the AST, they are resilient to formatting differences and coding style variations.

Dependency and Call Graph Extraction

The parser represents procedure declarations and procedure invocations as distinct nodes in the AST. This makes it possible to identify relationships between code elements.

By traversing the AST, you can:

  • Collect procedure definitions
  • Identify call expressions within procedure bodies
  • Build call graphs at the module or application level
  • Detect unreachable or unused procedures

This information is often required before performing refactoring or restructuring work.

Business Rule Discovery

In VB6 and VBA systems, business rules are frequently embedded in conditional logic and control-flow constructs.

The parser exposes:

  • Conditional statements (If, Select Case)
  • Loop constructs
  • Expressions used in conditions and assignments

This makes it possible to locate and extract sections of code that implement decision logic, without relying on naming conventions or manual inspection. At this stage, the extraction is purely syntactic; semantic interpretation can be layered on top if needed.

Documentation Generation

The abstract syntax tree provides a structured view of the code that can be used to generate documentation automatically.

Using the AST, it is possible to:

  • Enumerate modules, classes, and procedures
  • Extract procedure signatures, parameters, and return types
  • Associate comments with the corresponding syntactic elements

The resulting documentation reflects the actual structure of the code, rather than a textual approximation.

Automated Migration to Modern Languages

In migration scenarios, the parser is used to obtain a precise representation of the source code that can be transformed programmatically.

The AST makes explicit:

  • Declarations
  • Statements
  • Expressions
  • Control-flow constructs

This representation can then be used as input to later phases such as semantic analysis, normalization, and code generation for target languages like C#, Java, or Python. The parser itself is responsible only for the syntactic step, but the correctness of subsequent phases depends on it.

Supporting Code Understanding Tools

The VB6 and VBA Parser can also be used as a component in systems that analyze or inspect codebases.

In such systems, the parser:

  • Converts source files into structured data
  • Enables indexing and traversal of code elements
  • Serves as input for further analyses, summaries, or visualizations

This allows higher-level tooling to operate on VB6 and VBA code at the level of language constructs rather than raw text.

Common Challenges with VB6 and VBA Code

VB6 and VBA were designed at a time when rapid application development was often valued more than strict typing, encapsulation, or static analyzability. As a consequence, code written in these languages tends to exhibit characteristics that make automated analysis and transformation more difficult than in more modern languages.

Code Issues

One of the most common issues is the extensive use of implicit typing and the Variant type. Variables can be declared without an explicit type, and even when types are present, values frequently flow through Variant-typed expressions. From a static analysis point of view, this means that type information is often incomplete or absent at the syntactic level. A parser cannot infer the runtime type of a Variant variable, but it can accurately record where type information is present, where it is missing, and how values flow syntactically through assignments and expressions. This structural information is a prerequisite for any later type inference or data-flow analysis.

Another source of complexity is late binding and dynamic behavior, especially when VB6 or VBA code interacts with COM objects or external libraries. Method calls and property accesses may only be resolved at runtime, depending on the actual object instance involved. While this limits what can be determined statically, these operations still appear in the source code as well-defined syntactic constructs. By representing calls, member accesses, and expressions explicitly in the abstract syntax tree, the parser allows later analysis phases to distinguish between statically resolvable constructs and those that require conservative handling due to dynamic binding.

Environment Issues

VB6 and VBA codebases also tend to rely heavily on global state and loosely defined modular boundaries. Public variables and procedures declared at the module level are often accessed freely across the application, leading to implicit dependencies that are not immediately obvious when reading the code. By making scopes and declarations explicit in the syntax tree, the parser exposes these dependencies and allows tools to reason about coupling, data sharing, and cross-module interactions systematically.

Finally, VBA code is tightly coupled to the host environment in which it runs, such as Excel, Access, or Word. Many identifiers in the code refer to objects provided by the host application rather than user-defined constructs. The behavior of the code therefore depends not only on the language itself, but also on the surrounding execution context. The parser remains deliberately agnostic with respect to host semantics: it parses these references as part of the language, without attempting to resolve them. This keeps the parsing phase independent and reusable, while still preserving all the information needed to integrate host-specific knowledge in later analysis stages.

VB6 and VBA systems often combine all of these characteristics in the same codebase. A robust parser does not eliminate these challenges, but it makes them explicit by providing a precise structural representation of the source code. This is what enables further analysis, transformation, or migration to be approached in a controlled and systematic manner.

From Parsing to Semantic Analysis

Parsing provides structure, but not meaning. For advanced use cases such as migration or deep analysis, additional steps are required, including:

  • Symbol resolution
  • Type inference
  • Control-flow and data-flow analysis

The VB6 and VBA Parser is designed to integrate into such pipelines, providing the structured representation required for semantic enrichment and transformation.

Conclusion

VB6 and VBA systems continue to play a critical role in many organizations. Understanding and modernizing these systems requires reliable tooling, starting with accurate parsing.

The VB6 and VBA Parser provides a solid foundation for analyzing, transforming, and modernizing Visual Basic code. The same parser described in this article is also used in platforms like CodeLumen to support large-scale exploration and understanding of legacy VB6 and VBA codebases.

If you are interested in parsing other legacy languages, you may also want to look at: