Contributing Semantic Highlighting to Kolasu Language Servers

Introduction

Kolasu is an open-source Kotlin library that simplifies the development of language engineering tools, we will talk about contributing semantic highlighting capabilities to Kolasu Language Servers. It has a companion repository that adds support for creating language servers following the Language Server Protocol.

One of the benefits of using the Language Server Protocol is the ability to semantically highlight code, an advanced form of highlighting that allows coloring tokens with contextual information. However, this feature is currently missing in Kolasu. Not for long, as in this tutorial, we will contribute semantic highlighting capabilities to Kolasu:

Furthermore, we will explain the steps to contribute to open-source projects following the GitHub flow workflow. If you want to skip to the end, all the code is available on a GitHub Pull Request. Otherwise, here is the agenda for today:

Define the feature we want to contribute precisely.
Fork the repository and implement it in a branch.
Test the implemented changes locally.
Contribute the changes back by opening a clean pull request.

Define the feature

The language server protocol specifies the messages related to semantic highlighting here. In particular, there are predefined categories and modifiers for tokens:

Then, each token has a type and possibly multiple modifiers. For example, a constant variable could have the category variable and the modifier list of [static, readonly].

To highlight a file, we need to list the category and modifiers of all the tokens in it. This list can become very long, and sending this large message can affect the language server’s performance. For that reason, the protocol specifies two optimizations:

1. Token encoding: The semantic token information is represented as relative integers instead of strings to save on message length.

2. Incremental highlighting: Only recalculate semantic tokens for the portion of the code affected by the latest edit. Saving computations significantly in editor environments.

In this tutorial, we will add facilities to Kolasu to:

Create semantic tokens with categories and modifiers
Encode semantic token lists to integers following the LSP standard
Configure language servers to respond to semantic highlighting requests.

We may revisit the incremental highlighting capabilities in a future article.

Implement the feature

Forking

Only the repository contributors can push changes to the original repository. However, we can create a fork to create our copy. Go to the repository and click on fork:

You will get a copy of the repository with full permissions like this one. You can play and break everything here; it won’t affect the original repository:

The goal is to implement the changes in a feature branch and open a pull request to merge the changes in the original repository. Let’s clone the repository and create a feature branch:

We are now ready to start writing code.

Data structures

First, we will formalize the data structures for semantic tokens. Given the repository is written in Kotlin, we will use enumerated classes to represent all the predefined token types and modifiers:

A semantic token is a portion of source code with a category and possibly multiple modifiers:

Let’s commit our changes and move to the next step.

Encoding

We will also contribute the algorithm to convert a list of semantic tokens into the integer encoding the language server protocol expects. Every semantic token is represented as five integers following the specification:

Calculating the difference between the end and start characters is not enough for the current token’s length since it could span multiple lines. Thankfully, Kolasu provides a helper method for this, given the original source code text.

The list of token modifiers is represented as a single bitmask integer. Each modifier is initialized with a bit value, so we can represent the modifier list by adding up all the individual bit values. Let’s commit the changes and move on.

Configuration

Finally, we would like to simplify the language server configuration for semantic highlighting. Here, it is essential to follow the project’s existing philosophy. The language server plugin follows the “configurable magic” philosophy: Everything is initialized with a default, sensible configuration, but it is always possible to customize it by overriding the default behavior.

By default, we will initialize the server with the capability for full semantic highlighting using all the predefined types and modifiers as the legend:

We can also contribute the skeleton for responding to the semanticTokens/full request that computes and returns the list of encoded semantic tokens:

By providing this infrastructure, the language engineer needs only to implement the abstract semanticTokens function, which, given an AST, produces the list of all the semantic tokens in it. The library takes care of protocol specifics, but if necessary, the language engineer can customize all the defaults by overriding the initialize and semanticTokensFull methods.

Test the feature

Publish to Maven Local

Before submitting the code for review, it is a good idea to check that it works as expected. To this end, we will publish a new version in our local Maven repository and use it in a language server project.

Running the publishToMavenLocal gradle task builds and deploys the project to the local file system. The deployed artifacts are under the ~/.m2/repository/com/strumenta/kolasu folder for UNIX systems:

Verify that the publishToMavenLocal task outputs these four artifacts. You can find the published version by opening the folders. At the time of writing, the version is 1.0.6-SNAPSHOT.

Use the published plugin

We will try the new version by adding semantic highlighting to the Kuki Language Server, a toy Kolasu Language Server for cooking recipes. This is the project that initially inspired the addition of semantic highlighting to Kolasu. For more context, read the syntactic vs semantic highlighting article. Since we won’t contribute the changes, we can directly clone the Kuki repository and work on our local copy:

First, we add mavenLocal as a plugin source in settings.gradle.kts:

We will also need it as a library repository in build.gradle.kts:

Now, set the plugin to version `1.0.6-SNAPSHOT` and refresh Gradle. If you use IntelliJ and have had previous versions installed, you may need to repair the IDE to refresh the dependencies cleanly.

With the new version ready, adding semantic highlighting reduces to implementing the semanticTokens method in the KukiLanguageServer class. This is of course language specific. Here is a portion of how it looks for Kuki:

We navigate the Abstract Syntax Tree and add the Semantic Tokens in order with the corresponding types and modifiers. For example, every ingredient listed at the top of the recipe is of type variable with modifier list [declaration].

To see it in action, we can run the createVscodeExtension and launchVscodeEditor tasks. This will open an instance of VSCode with the Kuki example files ready:

Whenever we edit a *.kuki file, the editor sends a semanticTokens/full request to the language server. The server responds with the encoded semantic tokens list, and the editor colors the tokens correspondingly.

Thanks to semantic highlighting, we can distinguish ingredients from utensils and declarations from references, even though syntactically, they are all identifiers.

Contributing the feature

Now that we feel confident about our contribution, let’s publish it for review. Go to the original Github repository and open a draft Pull Request from our branch to the main branch. Here is an example:

I like reviewing the changed files to double-check that we didn’t miss anything or add anything unnecessary. In this case, the pull request is nice and small, simplifying the work for the reviewers.

This repository does not have any Continuous Integration checks at the moment, but if there are, we should wait and check that all the checks keep passing.

When everything is green, we can mark the Pull Request as Ready for Review and feel proud of our contribution to the open-source community!

Summary

We have learned how to implement semantic highlighting in language servers and contribute the feature to an open-source library. When this gets merged, Kolasu and its language server plugin will be closer to their goal of providing an open-source tool for simplifying language engineering projects.

How about contributing features you find missing? Here are some examples related to the topic:

Incremental semantic highlighting
Support for custom semantic categories and modifiers
An automated way to assign categories from ANTLR tokens

You may also have other features in mind. With open source, you can never go wrong; everything is welcome. Happy coding!