Using Language Engineering to Build a Smarter RAG for Code

The rise of Large Language Models (LLMs) has opened up a lot of possibilities (some even good!), but they come with a core limitation: knowledge cutoff and hallucinations. This is why Retrieval-Augmented Generation (RAG) was developed. It allows you to analyze proprietary data and add new knowledge without retraining and reducing hallucinations. If the LLM uses only the data you gave it to answer, there are fewer chances that it will hallucinate.

Key Takeaways

However, the default approach has issues when you need to ask questions about your code. It sees it just as text. Instead, you want to analyze code structure and relations to build a better system to retrieve information from code. In this article we show how to improve your RAG using parsers and symbol resolvers.

What Exactly Is RAG?

In simple terms, a RAG system is a technique that uses a retrieval mechanism to give information to an LLM. Instead of relying solely on the LLM’s internal knowledge, the RAG first searches a curated knowledge base for relevant documents, then it includes those documents in the prompt given to the LLM.

This process transforms the LLM from a generic answer-generator into a domain-specific expert by giving it a reliable, up-to-date, and verifiable source of information for every query.

Adding pertinent information also reduces hallucinations, since the LLM doesn’t need to cover the gaps in its knowledge by speculating. For a quiz, the historian Barbero asked participants to discover what Napoleon said to his brother before his coronation as emperor. The reality is that he said: “If our father could see us!”. However, when people asked ChatGPT about it, it fabricated plausible‑sounding statements that had no historical basis.

Why a RAG Instead of Trusting the LLM?

A RAG can reduce hallucinations and ensure that only a specific and known source of information is used. If you are analyzing something like a private codebase, you need a RAG, as the code was not part of the LLM’s training data.

Honestly, I find the idea of RAG kinda hacky, necessary given LLM limitations, but hacky nonetheless. The LLM doesn’t really have a separate internal store of knowledge. Its understanding of concepts is mixed with the information itself. It cannot systematically abstract, therefore it cannot understand if it lacks information to analyze a specific case, or it needs to focus on something. It uses everything it knows to answer every question you ask it.

A RAG is essentially a workaround to increase the chances that the LLM looks at exactly what we want and interprets it in the way we intend. We are changing the original question the user had, by injecting into the prompt the instruction to answer the question using just the information we deem relevant.

So, we are forcing the LLM to ground its answer in the facts we provide. It is not as powerful as we would want it to be (i.e., just look at all the meaningful data for this question), but it usually works, as long as we can select the right documents to provide context.

This approach is not just more reliable, but also much cheaper than the alternative of refining your module with extra training.

The Standard RAG Pipeline: A Technical Breakdown

A standard RAG pipeline is built in two main phases.

1. The Indexing Phase: Turning Data into Vectors

Storage of Documents: Your source documents (code, manuals, etc.) are collected.
Splitting: Documents are broken down into smaller, manageable chunks called Text Segments because LLMs have a token limit.
Embedding: An Embedding Model converts each text segment into a numerical vector (an embedding). This vector should capture the semantic meaning of the text.
Storage: These embeddings are stored in an Embedding Store (a vector database)

2. The Retrieval Phase: Answering the Query

Query Embedding: The user’s query is also converted into an embedding using the same embedding model used during indexing.
Similarity Search: The query embedding is compared against all document embeddings in the store to find the most relevant documents
Augmentation: The text content of these relevant documents is added to the original user query, creating a final, augmented prompt.
Generation: This augmented prompt is sent to the LLM, which uses the provided context to generate a factual answer. Given that the information ends up in the prompt, we want to provide just the right information: too much information, and we distract the LLM (and make it slower or more expensive), too little, and it does not provide a good answer.

Mapping RAG Concepts to Elements in LangChain4J

We’ll use LangChain4J for our implementation, as it seems the most widespread library to build RAG pipelines in Kotlin. The core concepts map directly to LangChain4J classes:

ContentRetriever: The component responsible for taking the user query and finding the most relevant Text Segments (the chunks of information).
EmbeddingModel: Converts text (queries and documents) into numerical vectors. We use the BgeSmallEnV15QuantizedEmbeddingModel here, which seems to be the standard one.
EmbeddingStore: The vector database where the document embeddings are stored (e.g., InMemoryEmbeddingStore for simplicity).
RetrievalAugmentor: It ties it all together, combining the retrieved context with the user query to build the final prompt for the LLM.

Setting Up the Baseline RAG Pipeline

Our goal is to create a RAG to help document and understand a codebase. We will use the open-source project FTPAPI as our example codebase and will ask questions like:

Describe the procedure FTP_binaryMode and how it is used

Since we are looking at a project that uses RPG, we are going to use our RPG parser to analyze the code. The techniques remain the same whatever language or parser you want to use.

Here is the setup for a basic RAG pipeline in Kotlin using LangChain4J.

// Basic setup structure 
fun startRAG(path: String) {
    println("Using RAG without Parser")
    val assistant: Assistant = createAssistant(
        embedDocuments(path)
    )
    startConversationWith(assistant, defaultQuery)
}

A RAG relies on an LLM. For the answers you see in this article example, we are going to use a Gemini3 local model. It is smart enough to be useful, but dumb enough that we are not overestimating the results in a real-life scenario. For a production use, you may want a commercial model. In any case testing is easy: you can swap modes once you build a good pipeline.

The Basic Approach: Treating Code as Just Text

The simplest way to implement a RAG for code is to treat the code as a generic text document. This approach works for any textual document. LangChain4J also comes with default parsers for formats like PDF or Word documents.

Here’s how we define the embedding process, using the standard DocumentSplitter. This is the recommended DocumentSplitter for generic text. It tries to split the document into paragraphs first, and fits as many paragraphs into a single TextSegment as possible. If some paragraphs are too long, they are recursively split into lines, then sentences, then words, and then characters, until they fit into a segment.

private fun embedDocuments(documentPath: String?): EmbeddingData {

    val embeddingModel: EmbeddingModel = BgeSmallEnV15QuantizedEmbeddingModel()
    val embeddingStore: EmbeddingStore<TextSegment?> = InMemoryEmbeddingStore()

    val ingestor =
        EmbeddingStoreIngestor.builder()

            .documentSplitter(
                DocumentSplitters.recursive(
                    maxSegmentSizeInChars = 1000,
                    maxOverlapSizeInChars = 200
                )
            )

            .textSegmentTransformer(TextSegmentTransformer { textSegment: TextSegment? ->
                TextSegment.from(
                    textSegment!!.metadata().getString("file_name") + "\n" + textSegment.text(),
                    textSegment.metadata()
                )
            })

            .embeddingModel(embeddingModel)
            .embeddingStore(embeddingStore)
            .build()

    ingestor.ingest(documents);

    return EmbeddingData(embeddingStore, embeddingModel)
}

The key problem is the DocumentSplitters.recursive call: it just divides the document into blocks of characters (1000 characters with a 200-character overlap), essentially cutting the code at random intervals. In short, it does not understand code syntax. We need to split documents because raw code files can be too large for the LLM’s context window. Furthermore, giving too much information can confuse the LLM.

The Flawed Retrieval Result

What is the result of this approach? Here are the documents found as relevant by the standard content retriever using this basic approach.

Describe the procedure FTP_binaryMode and how it is used

Answer using the following information:
*+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
      *  FTP_binaryMode:  Set file transfer mode to/from binary
      *
      *    peSession = Session descriptor returned by FTP_conn
      *    peSetting = Setting of binary  *ON = Turn binary mode on
      *                                  *OFF = Turn binary mode off.
      *
      *    Returns -1 upon error, or 0 upon success.
      *+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
     P FTP_binaryMode  B                   EXPORT
     D FTP_binaryMode  PI            10I 0
     D   peSession                   10I 0 value
     D   peSetting                    1A   const

c                   callp     initFtpApi

c                   if        selectSession(peSession) < 0
     c                   callp     SetSessionError
     c                   return    -1
     c                   endif

*+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
      *      Deprecated. See: FTP_binaryMode
      *+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
      *  FTP_binary:  Set file transfer mode to/from binary
      *
      *    peSetting   = Setting of binary  *ON = Turn binary mode on
      *                                    *OFF = Turn binary mode off.
      *
      *     Returns -1 upon error, or 0 upon success.
      *+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
     P FTP_Binary      B                   EXPORT
     D FTP_Binary      PI            10I 0
     D   peSetting                    1A   const

D i               s             10I 0

c                   for       i = 1 to MAX_SESSION
     c                   callp     FTP_binaryMode(i: peSetting)
     c                   endfor

c                   return    0
     P                 E

c                   callp     ftp_linemode(peSess: *on)
c                   endif

This is not terrible, as the code is somewhat relevant, but it’s only a part of the procedure we are interested in. This happens because the pipeline is dividing the document, essentially, at random intervals based on a character count.

We can also see the hacky nature of a RAG in full force. We are telling the LLM to answer our question just by looking at this bunch of text. Furthermore, we know that these are snippets of code of a larger project, but the LLM does not know or understand that. He can only work on what is in the context.

The Mediocre LLM Answer

Now, let’s see the answer the LLM generates from this chopped context.

## FTP_binaryMode Procedure Description

The `FTP_binaryMode` procedure is used to set the file transfer mode for FTP sessions. 

**Parameters:**

*   `peSession`: A session descriptor, which is a value returned by the `FTP_conn` procedure. This identifies the FTP session to modify.
*   `peSetting`:  A single character constant (`1A`) indicating the desired binary mode setting:
    *   `ON`: Turns binary mode on.
    *   `OFF`: Turns binary mode off.

**Return Value:**

*   `0`: Indicates success.
*   `-1`: Indicates an error occurred, typically because the provided `peSession` is invalid.  The error is set via a call to `SetSessionError`.

**How it's Used:**

The procedure is called to explicitly control whether files are transferred in binary or ASCII mode.  Binary mode is generally used for non-text files (images, executables, archives, etc.) to preserve data integrity.  ASCII mode is used for text files and may involve character set conversions.

The provided code also shows that `FTP_binaryMode` is used within another procedure `FTP_Binary`. `FTP_Binary` iterates through a range of sessions (up to `MAX_SESSION`) and calls `FTP_binaryMode` for each, effectively applying the setting to multiple sessions.  

**Important Note:** The documentation also indicates a deprecated procedure, `FTP_Binary`, which essentially just calls `FTP_binaryMode` in a loop. `FTP_binaryMode` is the preferred procedure to use.

Again, not a bad answer, but the quality is mediocre because the source documents provided by the RAG pipeline are also mediocre. Some code is missing from the procedure, and the fact that the code is cut abruptly makes it hard for the LLM to understand what is going on. Actually, look back at the retrieval result, it would be difficult for anybody to parse the information just by looking at the snippets of code appearing as a wall of text.

The Better Way: Using a Parser to Find Meaningful Segments

This is where Language Engineering comes into play. Instead of treating the code as generic text, we can use a Code Splitter based on a Parser to segment the document into meaningful parts, such as procedures, definitions, or classes.

This is the ingestion code for the improved RAG.

private fun embedDocumentsWithParser(documentPath: String?, codeSplitter: CodeSplitter) : EmbeddingData {

val documents = getDocuments(File(documentPath), setOf(
    "sqlrpgle", "rpgle", "cpybk", "ddl", "rpglem", "rpgleinc")
)

val embeddingModel: EmbeddingModel = BgeSmallEnV15QuantizedEmbeddingModel()
val embeddingStore: EmbeddingStore<TextSegment?> = InMemoryEmbeddingStore()

val ingestor =
    EmbeddingStoreIngestor.builder()

        .documentSplitter(
            codeSplitter
        )

        .textSegmentTransformer(TextSegmentTransformer { textSegment: TextSegment? ->

            TextSegment.from(
                textSegment!!.metadata().getString("file_name") + "\n" + textSegment.text(),
                textSegment.metadata()
            )
        })

        .embeddingModel(embeddingModel)
        .embeddingStore(embeddingStore)
        .build()

ingestor.ingest(documents);

return EmbeddingData(embeddingStore, embeddingModel)


}

The only difference is that this approach uses a custom CodeSplitter that leverages a parser for the specific language (RPGLE, in this case) to understand its structure. This allows us to ensure that procedures and definitions are kept whole, regardless of their character count.

class CodeSplitter() : DocumentSplitter {
    [..]

    override fun split(document: Document): List<TextSegment?> {
        [..]
                root.procedures.forEach {
                    val metadata = document.metadata().copy()      
                    metadata.put("name", it.name.lowercase())                    
                    var commentText = getCommentForNode(it)                    
                    val text = if (commentText.isNotEmpty())
                        commentText + "\n" + it.sourceText
                    else
                        it.sourceText
                    segments.add(
                        TextSegment.from(
                            "Procedure Name: ${it.name}\n" +
                                    text,
                            metadata
                        )
                    )
                }                
                segmentsStore.addAll(segments)
                segments
            } else {
                listOf(TextSegment.from(document.text()))
            }
        }
    }
}

This is the part of the CodeSplitter that create segments for procedures. As any DocumentSplitter, it implements a split method that accept a document and returns the resulting segments. In our CodeSplitter we added a header specifying the type of the construct and its name, like Procedure Name: FTP_binaryMode to make the code easier to find during the search. This wouldn’t be necessary if we were sure that the embedding model would understand code structure, because that is information in the code. However, we don’t, so we added it. We also make sure that any comment related to the node is prepended to the procedure code.

Your Code May Vary

Of course, the specific might change based on your language, parser and needs. For instance, imagine you have large classes, too large to be provided as context to the LLM for every question. You might want to analyze the classes once and create a description of what the overall code does. Then you can provide this class description as context, in addition to segment representing a specific function. This is where the hacky nature of a RAG helps: you can provide both code and descriptions to the LLM in the same prompt.

In any case, the basic structure is fairly easy, the hard work is done by the parser before this phase. The parser finds the meaningful code structure, you just need to separate them in segments that make sense for your use case.

The Cleaner Retrieval Result

Using this improved approach, the RAG finds these structured documents as relevant.

Answer using the following information:

*+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
* FTP_binaryMode: Set file transfer mode to/from binary
*
* peSession = Session descriptor returned by FTP_conn
* peSetting = Setting of binary *ON = Turn binary mode on
* *OFF = Turn binary mode off.
*
* Returns -1 upon error, or 0 upon success.
*+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
P FTP_binaryMode B EXPORT
<full code of the procedure>

*+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
* Deprecated. See: FTP_binaryMode
*+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
* FTP_binary: Set file transfer mode to/from binary
*
* peSetting = Setting of binary *ON = Turn binary mode on
* *OFF = Turn binary mode off.
*
* Returns -1 upon error, or 0 upon success.
*+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
P FTP_Binary B EXPORT
<full code of the procedure>"

This is much better. The procedure code is now provided in its complete form to the LLM. Notice, however, how the hacky nature of the RAG still shows: we also get the deprecated FTP_Binary procedure because its name is semantically similar, and our target procedure name is also mentioned in a comment. This is not the fault of the document splitter, it’s due to the kind of search (i.e., the ContentRetriever) we use. To change this aspect we would need to use a specialized EmbeddingModel.

This is the first step in building a smarter RAG using language engineering. Some issues still remain: the documents retrieved do not help the LLM (or us) understanding how the code is used, since they are only retrieved based on a semantic search on their content.

The Improved LLM Answer

Let’s check the answer from the LLM with this better context.

## Description of the FTP_binaryMode procedure and its usage

The `FTP_binaryMode` procedure is a foundational part of setting the binary transfer mode for a specific FTP session. Here's a breakdown of how it works, based on the provided information:

**Purpose:**

The primary purpose of `FTP_binaryMode` is to enable or disable binary mode for a designated FTP session. Binary mode is crucial for transferring files accurately, especially non-text files, as it prevents any character translation or modification during the transfer.

**Procedure Details:**

* **Name:** `FTP_binaryMode`
* **Parameters:**
* `peSession`: A 10-digit integer (10I) identifying the FTP session number to which the setting applies.
* `peSetting`: A 1-character constant (1A) that specifies whether binary mode should be turned *ON* or *OFF*. Valid values are `*ON` or `*OFF`.
* **Functionality:**
1. **Initialization:** It first calls the `initFtpApi` procedure, likely to initialize the FTP API.
2. **Session Validation:** It verifies the validity of the provided `peSession` using the `selectSession` procedure. If the session is invalid (indicated by a return value less than 0), it calls `SetSessionError` and returns -1, indicating failure.
3. **Setting Validation:** It checks if `peSetting` is either `*ON` or `*OFF`. If it's neither, it calls `SetError` to report an invalid setting and returns -1, indicating failure.
4. **Binary Mode Setting:** If both session and setting are valid, the value of `peSetting` (either `*ON` or `*OFF`) is assigned to the internal variable `wkBinary`.
5. **Return:** The procedure returns 0, indicating successful completion.

**How it's Used:**

`FTP_binaryMode` is not typically called directly by the user. It's a lower-level procedure used by a higher-level procedure named `FTP_Binary`.

The `FTP_Binary` procedure is designed to set the binary mode for *all* active FTP sessions. It iterates through all possible session numbers (from 1 to `MAX_SESSION`) and calls `FTP_binaryMode` for each session, passing the desired `peSetting` (`*ON` or `*OFF`).

**In summary:**

`FTP_binaryMode` is a core procedure that sets the binary mode for a single FTP session. `FTP_Binary` uses `FTP_binaryMode` to apply the binary mode setting to all established FTP sessions. This ensures that all file transfers happen in the designated transfer mode (binary or ASCII).

This is a good answer. It summarizes the code in a sensible way. However, it only explains how FTP_binaryMode is used by the deprecated FTP_Binary procedure. This partially answers the question, mostly by accident. Because the two procedure names were similar, they content retriever returned them both. By pure accident, FTP_binaryMode was also using FTP_binary. So the LLM used correctly the wrong information we have given it.

A developer would probably recognize this as an artifact of development history; we would be more interested in how FTP_binaryMode is used directly by other active code. This information is useful if you are interested in maintaining the project, but not if you just want to use it as a client.

GraphRAG: There Is More Than Just Content

We’ve succeeded in feeding proper code to the LLM, but the RAG still lacks an understanding of the relationships between pieces of code. We need to tell the LLM: This procedure is called by that procedure, not just hope that the embedding search pulls them together.

Finding relationship is the job of a GraphRAG. A GraphRAG is a more structured approach to a RAG system pioneered by Microsoft. It creates a knowledge graph based on analyzing the documents. For instance, if a document contains this line of text:

Bảo Đại was the last Emperor of Vietnam

This node would be added to the knowledge base:

CREATE (person:Person {name: 'Bảo Đại'})<-[:RULED]-(ruler:Country {name: 'Vietnam'})

The analysis of the documents and the creation of the knowledge base is usually done with an LLM. Basically, you set up a system to analyze the documents, extract the knowledge (entities and relationships) and save this information in a graph database. You also ingest the documents as usual. Then, at retrieval time, you retrieve the documents and add also the related information found in the knowledge base.

This system relies on the LLM being able to extract knowledge from the document or query and then interrogates the knowledge base appropriately.

Trying and Failing At Using GraphRAG

In our case we want to consider code segments (procedures, classes) as nodes and the relationships between them (calls, inheritance, usage) are edges. During retrieval, the RAG can search the graph for the most relevant code structure and then send that structure, along with the code itself, to the LLM. That is the plan.

GraphRAG is supported in LangChain4J, relying on Neo4J. I built a Neo4J graph of the code relationships.

fun createGraphForProject(documentPath: String) : Neo4jGraph {
    val neo4jContainer = Neo4jContainer("neo4j:5.26")
        .withoutAuthentication()
        .withLabsPlugins("apoc")
    neo4jContainer.start();
    val driver: Driver = GraphDatabase.driver(neo4jContainer.getBoltUrl(), AuthTokens.none())
    val graph: Neo4jGraph = Neo4jGraph.builder().driver(driver).build()
    val session: Session = driver.session()

    val codebase = rpgCodebase(File(documentPath))
    val semantics = codebase.resolveSymbolsForRPG()
    val usages = semantics.usages
    val nodes = semantics.files.flatMap { it.ast.walk().filter { it is Procedure || it is Subroutine  } }
    nodes.forEach { node ->
        val named = node as? Procedure ?: node as Subroutine
        val ids = usages.filter { u -> (u.key as? Named)?.name == named.name }.flatMap {
            it.value
        }.map {
            val container = getParent(it)
            container
        }.toSet()
        val data = getNameAndCodeFromNode(node)
        val parameters : Map<String, String> = mapOf(
            "name" to data[1],
            "code" to data[2]
        )
        session.run(data[0], parameters);
        // add relationships
        ids.forEach { id ->
            session.run(
                "MATCH ${getNameFromNode(id, "a")}\n"+
                        "MATCH ${getNameFromNode(node, "b")}\n"+
                        "MERGE (a)-[:USES]->(b)"
            )
        }
    }

    graph.refreshSchema()
    return graph
}

As you can see, I didn’t try to make the LLM build the graph, I manually created the knowledge graph and have it stored in Neo4J.

Then you just need to use a Neo4JGraphRetriver in your pipeline.

val graphRetriever = Neo4jText2CypherRetriever.builder()
        .graph(graph)
        .chatModel(chatModel)
        .build()

Nevertheless, the system didn’t really work. The problem lies in what I said before: this system relies on the LLM being able to interrogate the knowledge base appropriately.

Why It Failed

The system is, again, hacky but clever, the Neo4jText2CypherRetriever provided by LangChain4J first asks the LLM to translate the user question into a graph query.

Let’s see the prompt used behind the scenes.

Task:Generate Cypher statement to query a graph database.
Instructions
Use only the provided relationship types and properties in the schema.
Do not use any other relationship types or properties that are not provided.

Schema:
Node properties are the following:
:Procedure {code: STRING, name: STRING}, :Subroutine {code: STRING, name: STRING}

Relationship properties are the following:
:USES {}

The relationships are the following:
(:Procedure)-[:USES]->(:Procedure)


Note: Do not include any explanations or apologies in your responses.
Do not respond to any questions that might ask anything else than for you to construct a Cypher statement.
Do not include any text except the generated Cypher statement.

The question is: Describe the procedure FTP_binaryMode and how it is used

The prompt includes the automatically generated description of the schema of your graph database and then the question. So far, so good.
Now, the LLM generates this answer.

MATCH (p:Procedure {name: "FTP_binaryMode"})-[:USES]->(usedProcedure)
RETURN p, usedProcedure

Hopefully, you can see the subtle error. Honestly I didn’t, until I saw the results. The query actually asks what code FTP_binaryMode uses rather than what we asked for (i.e., the opposite).

Promising But Not There Yet

Now, maybe a commercial model would work more reliably, or maybe a better graph schema would work better. However, my take is that this is not reliable enough for production use. There is a correct answer here, and it is not the one you get when using the Neo4jText2CypherRetriever. Working with LLMs you need to accept a bit of fragility in some places. However, for the system to work most of the time, you need also robustness in crucial places, where the LLM cannot compensate for mediocre information. If you can’t be sure to get code that is actually related to the code you are interested in, it will confuse the LLM. Since the LLM is unable to understand that the code is not related to the searched code, it will return a confused or hallucinated answer. So, it will confuse or misled the user in turn.

It’s a bummer, because, if it worked reliably, we could also add more advanced relationships or searches.

For example, imagine that there is a utility method, like a method to uniform line endings, that a lot of code uses. By usage count and spread, it is an important piece of code, because it is referenced all over your codebase. However, you probably do not want to consider it when trying to understand what a specific piece of code does. You would want to retrieve the pieces of code that have a specific relationship to the method you are currently analyzing. Something like the functions that are always and only called when that method is called.

This would be hard to do it algorithmically in a general way for every language, code style or request. So, it’d be great if we could rely on an LLM to do that. Unfortunately, this is still not possible.

So, given that we gave no additional power and less reliability, it is better to implement our Graph RAG for code manually.

The Engineered Solution: Using a Parser and Symbol Resolver

We already used a parser to split the code into meaningful segments. Now, we add a Symbol Resolver to connect symbols to their usage (e.g., connecting a procedure call with the procedure definition).

Instead of relying on the LLM to query a graph, we will manually augment the final prompt by explicitly adding the source code of the procedures that call our target procedure. This way we can build our knowledge graph for code.

The first thing we need to do is to modify the CodeSplitter to add information about which parts of the code use other parts.

class CodeSplitter() : DocumentSplitter {
    [..]

    override fun split(document: Document): List<TextSegment?> {
        [..]
                root.procedures.forEach {
                    val metadata = document.metadata().copy()
                    val ids : Set<String> = usages.filter { u -> (u.key as? Named)?.name == it.name }.flatMap {
                        it.value
                    }.map {
                        val node = getParent(it)
                        if (node is Procedure)
                            "${node.name}"
                        else if (node is Subroutine)
                            "${node.name}"
                        else if (node is CompilationUnit)
                            "VirtualMainRoutine in ${(node.source as CodeBaseSource).relativePath}"
                        else
                            "VirtualMainRoutine in ${(node.source as CodeBaseSource).relativePath}"
                    }.toSet()                    
                    metadata.put("usedIn", ids.joinToString ( "," ).lowercase())                    
                    metadata.put("name", it.name.lowercase())                    
                    var commentText = getCommentForNode(it)                    
                    val text = if (commentText.isNotEmpty())
                        commentText + "\n" + it.sourceText
                    else
                        it.sourceText
                    segments.add(
                        TextSegment.from(
                            "Procedure Name: ${it.name}\n" +
                                    text,
                            metadata
                        )
                    )
                }                
                segmentsStore.addAll(segments)
                segments
            } else {
                listOf(TextSegment.from(document.text()))
            }
        }
    }
}

We added such information in the usedIn metadata value. There are obviously cleaner ways to do it, but you get the idea: you need to associate the element in the segment to other parts of the code that references it.

Then we need to add a custom ContentInjector, which we call CodeInjector.

class CodeInjector(val segments: List<TextSegment>) : DefaultContentInjector() {
    [..]    

    override fun inject(
        contents: List<Content?>?,
        chatMessage: ChatMessage?
    ): ChatMessage? {
        val related = contents?.flatMap {
            it!!.textSegment().metadata().getString("usedIn")?.split(",") ?: emptyList()
        }?.toSet()?.mapNotNull { r ->
            segments.firstOrNull { it.metadata().getString("name") == r }
        } ?: mutableListOf()
        val uses: MutableList<Content?> = mutableListOf()
        related.forEach {
            uses.add(
                DefaultContent(
                    TextSegment(
                        it.text(),
                        it.metadata()
                    )
                )
            )
        }
        if (contents != null && contents.isEmpty()) {
            return chatMessage
        }

 val prompt: Prompt = createPrompt(chatMessage!!, contents!!.toMutableList(), uses)
        if (chatMessage is UserMessage && Utils.isNotNullOrBlank(chatMessage.name())) {
            return prompt.toUserMessage(chatMessage.name())
        }

        return prompt.toUserMessage()

}

In the section shown, we override the inject method. We inspect the segments found as relevant by the base ContentRetriever, the one doing the standard semantic search, and we check where that elements are used. We use the information in the metadata to retrieve the segments containing such elements that use that code. Basically, if the ContentRetriever finds that the user wants to know about procedure FTP_binaryMode, we use the metadata to look for segments containing references to that procedure.

We also added a custom method CreatePrompt (not shown here) to add these segments to the prompt given that, obviously, that standard prompt that comes with LangChain4J does not know that we want to add this information. You can see the final prompt in the next section, with the answer.

The Context-Rich Retrieval Prompt

The final prompt presented to the LLM now explicitly includes the contextual code that uses the target procedure.

You are part of a RAG, so you need to answer questions about a specific code base.
The user has the following request related to this project.

Describe the procedure FTP_binaryMode and how it is used

The definition of the code is:

Procedure Name: FTP_binaryMode
     
	 <full code of the procedure>

Procedure Name: FTP_Binary
      <full code of the procedure>

Definition of: FTP_binaryMode
     D FTP_binaryMode  PR            10I 0                                                    
     D   peSocket                    10I 0 value                                              
     D   peSetting                    1A   const                                                                  

For context, the previous constructs are used or called from this code:

Procedure Name: FTP_url_get_raw
     P FTP_url_get_raw...
    <full code of the procedure>
	      

Procedure Name: FTP_url_get
     P FTP_url_get     B                   EXPORT
     <full code of the procedure>

Avoid any comment about missing information or mistake in the code provided for context.
The user provided just the query. The code has been selected automatically based on the query, so it might be missing some parts.

There are still some limitations in this approach. If you are using a local model, the code might exceed the context window of the model. This is unlikely for commercial models, but something to keep in mind if you go local.

It is also a prompt that looks at code with fresh eyes every time: a developer looks at usage as a first resort. When it knows nothing about the project itself, there is no documentation and only the code itself. So, it would be even better if we could also retrieve general descriptions of the overall purpose of the program and the module, or any other group, in which the piece of code is used. We can even use the comments appearing in the code. However, these are all things we can do having a parser and a symbol resolver. We kept things simple in this tutorial to explain how the overall approach works.

The Final LLM Answer

The `FTP_binaryMode` procedure sets the file transfer mode for FTP connections, either to binary or from binary. 

**How it works:**

*   **Input:** It takes two parameters:
    *   `peSession`:  A session descriptor, likely an identifier for an established FTP connection (returned by a `FTP_conn` call).
    *   `peSetting`:  A flag that indicates the desired binary mode. `*ON` turns binary mode on, and `*OFF` turns binary mode off.
*   **Validation:** It validates that the `peSetting` parameter is either `*ON` or `*OFF`. If it's not, it returns an error.
*   **Setting the Mode:** It appears to simply assign the value of `peSetting` to a local variable `wkBinary`. Although the code doesn't *explicitly* show the setting taking effect, the intention is to store the setting for later use within the FTP session.
*   **Return Value:**  It returns `0` upon success and `-1` if an error occurs (invalid setting).

**How it's used:**

The `FTP_binaryMode` procedure is called within the `FTP_url_get_raw` and `FTP_url_get` procedures before retrieving a file. It's used to ensure that the FTP connection is configured with the correct transfer mode (binary or ASCII) based on whether the `peASCII` parameter is set or not.  Specifically, if `peASCII` is *OFF*, `wwBinary` is set to `*ON` and passed to `FTP_binaryMode`. If `peASCII` is *ON`, `wwBinary` is set to `*OFF` and then passed to `FTP_binaryMode`.  This allows for flexibility in how files are transferred.

This is a better answer. With the additional information about the procedures that call FTP_binaryMode, we get a clearer, more practical picture of how the code is actually used. It works better than the original RAG, proving that using language engineering — parsers and symbol resolvers — to understand a codebase can dramatically improve a RAG whose purpose is explaining code. It is not just the quality of the answer, but more importantly for a production project, its reliability.

One of the major problems of LLM-based systems in production is the non-deterministic nature of the results. Sometimes it works, sometimes it doesn’t. Using language engineering we can at least always give an LLM the best chance to succeed and even detect trouble scenarios.

Conclusion

We learned that simply throwing raw code at a RAG results in a mediocre and potentially misleading LLM answer. By leveraging parsers and symbol resolvers to provide the LLM with complete, meaningful segments and explicit contextual usage information, we transformed the RAG from a text search engine into a functional code intelligence tool.

However, challenges remain:

Smarter Embeddings: Our current embedding search is still based on semantic similarity of the text. If we could convert the user query into a more code-sensible search (e.g., if the user mentions “procedure,” we only search among procedure definitions), the results would be even better.
Structured RAG: For questions about the overall behavior (e.g., “Show me the main procedure”), a simple RAG is insufficient. We would need a more structured RAG that could search a generated diagram or a high-level model of the codebase to explain relationships between classes, procedures, and modules, rather than just looking at specific pieces of code.

At this stage of development, a RAG that exploits language engineering could be of great help in an IDE: imagine having an Explain Code item, next to your usual Go To Definition of Find Usages. Major editors have moved to work using Language Servers, so it would be doable to rely on them to create a smarter Code RAG.

Summary

Language engineering can provide crucial tools that allow us to turn unstructured code into structured knowledge, feeding the LLM the high-quality, meaningfully related data it needs to get the most out of it.