Integrating TextX and Monaco – A Non-Tutorial

Introduction

Or, is writing this tutorial a good idea?

We’ve written about Monaco and, separately, about TextX, in several past articles. However, we’ve received enough requests for a TextX + Monaco integration tutorial, that we really had to write one.

The thing is, combining TextX with Monaco doesn’t really make much sense. Or does it? Monaco is a web-based code editor component, so it targets the browser; on the other hand, TextX is a Python toolkit for building languages – thus, in particular, it doesn’t run in the browser.

In this non-tutorial, we’ll learn how we might combine the two. We’ll see why it’s probably a bad idea in a production application. However, we’ll obtain valuable insights on porting code editors to the web and, possibly, dispel some common misconceptions about the Language Server Protocol.

A Brief Recap on Monaco and TextX

Let’s do a quick recap on the two technologies we’ll be talking about. Those who have read our past articles on Monaco and TextX may want to skip this part. Those who haven’t should read them eventually if they want to know more about Monaco and/or TextX.

Monaco is the editor component of Microsoft Visual Studio Code, or VSCode for short, that we’ve used in several commercial projects. VSCode is a code editor, or lightweight IDE, built using web technologies, that runs on the desktop thanks to Electron. Recently, VSCode has been ported to the web as well, with some limitations.

Microsoft automatically extracts Monaco from VSCode’s sources and packages it for use in web applications. It’s a powerful, mature code editor which is also popular because of the great number of readily supported languages and thanks to its extension API. In fact, it’s relatively easy to add support for new languages, including code completion, by plugging in other pieces of technology such as ANTLR.

TextX is an independent project that takes some broad ideas from Eclipse’s Xtext and brings them to a much simpler and leaner Python framework. In their own terms, TextX is a meta-language – a dedicated language (DSL) to describe other languages in fairly abstract terms. From such a description, TextX generates a parser capable of building an abstract syntax tree (AST). A parser and an AST are the first steps into an interpreter, compiler, code analysis tool, etc.

So we can use TextX to quickly get a working parser and an AST library; even Python developers or students that don’t have any language engineering experience can quickly get to pace. Of course, such ease and speed impose some constraints on the languages we can design. For example, TextX automatically resolves containment relationships (e.g. “a block contains a variable number of statements”) and references to named elements. So, it’s a good fit for many block-structured languages, e.g. with a Java-like syntax, but you wouldn’t want to write an XML parser with it.
Also, as we’ve previously shown, we can easily integrate TextX with Visual Studio Code. In fact, TextX comes with a facility to generate a VSCode extension for a given TextX language. Since Monaco is extracted from VSCode, this could be a hint towards using the same approach with Monaco. In the following section, we’ll explore this option.

Our Objectives

So, having said that, we’ll set two different objectives that we want to explore in this tutorial:

to develop a solution for a single user on a single machine, i.e., a Monaco-based editor that happens to run in a browser, but which is not a web application;
to develop a solution for many users on the world-wide web, such as a cloud editor/IDE.

These two have greatly different requirements and constraints, so we’ll need to keep that in mind throughout the tutorial. However, they both fit the general topic of “Integrating TextX with Monaco”. Let’s now look at how we may approach each of the above problems.

The Language Server Protocol

Visual Studio Code extensions are JavaScript components that talk with the editor using its APIs. And, as we said earlier, the editor is ultimately Monaco, which VSCode embeds. However, many extensions use a minimal amount of VSCode APIs directly, and instead launch a “language server” that talks with VSCode using the Language Server Protocol.

In fact, the extensions that TextX creates use precisely this approach, so we may want to apply the same schema to integrate with Monaco, given its connection with VSCode. Let’s delve into the LSP to understand whether that’s a good idea.

The LSP is a protocol standard developed by Microsoft so that tools such as editors can gain “language intelligence” by talking to a dedicated piece of software. For example, an editor such as VSCode may gather syntax and semantic errors from a compiler for, say, C#. Rather than hooking VSCode to n compilers for n different languages, each with its own proprietary API, the LSP offers a standardized interface that VSCode implements only once.

Then, each vendor can implement the LSP and have their language available in every editor and IDE supporting the LSP. Therefore, the “m times n” problem of integrating m languages into n editors becomes an “m + n” problem – each language has to implement the LSP once and each editor has to develop an LSP client once.

Furthermore, the LSP is not an all-or-nothing API, rather, each tool can declare which subset of the LSP it implements. Thus, language vendors can adopt the LSP gradually, and the protocol can continue to evolve without breaking existing implementations that do not yet support its new features.

Not Really a Server

So, we’re talking about integrating TextX, a Python library, with a Web client, using a Language Server Protocol. This looks like run-of-the-mill web application development, doesn’t it?

Except that it isn’t. The word “server” in Language Server Protocol is somewhat misleading. Many people assume that it has a meaning similar to “web server” – a software component running continuously on a networked machine, capable of “serving” many concurrent users. But a “language server” is really a different thing.

A web server, or more specifically a web service or application, typically has the following properties:

it has provisions for handling numerous concurrent requests from different users;
it’s stateless or it keeps limited state in the form of HTTP sessions, often relying on a database for persistence and to share data among users;
it has facilities for secure communication, authentication, and authorization.

Well, none of the above apply to the LSP, which is really more of an inter-process communication protocol than a network protocol. In fact, while it’s based on JSON-RPC, and it can theoretically work over HTTP(S), the LSP typically connects a “server” and a “client” which are two processes on the same machine, communicating over console I/O. And, in contrast with the earlier list of characteristics of a web service, a language server:

is single-user and often single-threaded;
is stateful, since it keeps track of which files/documents are open in the editor, and most likely caches information to avoid continuously reparsing the same content over and over;
has no provisions for secure communication, authentication, or authorization, since it assumes that the client is trusted and is under the control of the same user who is executing the server process.

So, while it may be technically possible to expose a language server protocol interface through a web server, it would not be a good fit for a multi-user, networked application (such as a web application). Therefore, we won’t be exploring this scenario in this article.

Single User, Single Machine

However, a solution based on the language server protocol could work in the single user, single machine scenario. Just, it’s probably not the most sensible choice. In fact, running a web application only to serve one local user increases the complexity of the solution and it’s less secure than simpler alternatives. Still, we can do it, maybe in some use cases it makes some sense, and we may learn something from it.

So, let’s look at how we might approach this. We’ll need to cover the two sides of the problem:

Hook Monaco to an LSP client;
Wrap and expose TextX with an LSP server.

The Client Side: Monaco and the Language Server Protocol

Since Monaco is extracted from VSCode, we can expect it to be somewhat compatible with the LSP; if not in its API, at least in the set of features it supports. Indeed, the folks at TypeFox have bridged the API gap between Monaco and the LSP with their project, monaco-languageclient.

monaco-languageclient used to be a component of the Theia IDE framework. Theia is a project that helps bring rich code editors or lightweight IDEs to the browser. It uses Monaco as the editor and allows to reuse many existing VSCode extensions without modifications, thanks, in part, to the Language Server Protocol.

So, at one point, as part of project Theia, TypeFox developed monaco-languageclient. However, nowadays they apparently don’t use it anymore. Still, they’ve found new maintainers and at the time of writing the project is still going on.

Therefore, we’ll set up a project with Monaco plus monaco-languageclient. Luckily, the latter comes with an example that we can start from. This comprises a web page with a Monaco instance in it, talking with a node.js server over WebSockets. Let’s check it out:

git clone https://github.com/TypeFox/monaco-languageclient.git
cd monaco-languageclient/example
yarn
yarn start

If all goes well, we’ll have built the client and server and started the latter. It listens on port 3000; that’s hardcoded in src/server.ts. So, if we point our browser to localhost:3000, we ought to see something like this:

As we can confirm by opening our browser’s developer tools, the Monaco instance is opening a WebSocket connection with the node.js server on port 3000. We can see the JSON messages exchanged when it validates the code as we type.

We’ve tagged this first version of the client as 01-client-side on Github.

The Server Side: Hooking With TextX

Now that we’ve covered the client part, let’s focus on the server side. We want to replace the sample server based on node.js with a Python server using TextX.

So, we may remember from our previous article on TextX that we can generate VSCode extensions for TextX languages. Indeed, those extensions use the LSP internally, thanks to a project called TextX-LS. We may enthusiastically want to start hacking with it right away, but we’re out of luck, for two reasons:

TextX-LS is not under active development anymore, at least at the time of writing;
And, anyway, it doesn’t support a real client-server scenario, as it’s only meant for VSCode extensions that run locally on the same machine, with no network sockets involved.

However, not all is lost! There’s another project called pygls that implements a Language Server in Python. It’s not specific to TextX, and we’ll have to hook them up together. But, it comes with WebSocket support out of the box, and we know that it’s a valid choice because TextX-LS itself uses pygls.

In fact, we’ll likely end up copying some code from TextX-LS, purging it of its VSCode-specific parts. We’ll then expose its services over WebSockets thanks to pygls.
Heads up!We’re going to set up a server that has no form of security whatsoever. We’re doing this for educational purposes, but this is not a solution we would advise using in a production environment, not even on a personal computer for a single user. That’s because other users or processes on the same machine or network may freely connect to our server, which is not hardened for security.

Setting Up the Server Project

We’ll now set up the server project. In particular, regarding dependencies, we’ll follow the approach described in this Python Packaging Authority tutorial, so that it will be easy to install all the libraries that our solution requires, without polluting our operating system installation. This amounts to something like:

mkdir server
cd server
pip install --user pipenv
pipenv install textx "pygls[ws]"

Your system may require slightly different commands; please refer to the aforementioned tutorial. Note how we add the packages our server will depend on with the last line. Also note the use of “pygls[ws]” to request WebSocket support, which is optional and doesn’t come with the default installation of pygls.

Now, we can create a server.py file with the following contents:

from pygls.server import LanguageServer

server = LanguageServer()
server.start_ws('localhost', 3001)

The pygls documentation erroneously mentions start_websocket but start_ws is the actual name of the method. With that, we’ll launch a pygls server on port 3001 (remember that port 3000 is already taken by the node.js server that publishes the page with Monaco):

python server.py

For now, this server can accept the basic LSP handshake, but it doesn’t do much more than that, and in particular, it doesn’t yet support anything specific to a TextX language.

We’ve tagged this first version of the server as 02-server-setup on Github.

Connecting Server and Client

Still, we can try connecting the example client with our barebones server. We’ll have to edit client.ts slightly, to accommodate for the connection to a different port:

// create the web socket
const url = createUrl(`${location.hostname}:3001`);

Since the createUrl function assumes that the connection happens to the same host and port where the HTML page has been downloaded from, we’ll have to modify it as well:

function createUrl(path: string): string {
    const protocol = location.protocol === 'https:' ? 'wss' : 'ws';
    return normalizeUrl(`${protocol}://${path}`);
}

If we compile and launch the client again, and open the network tab of the browser’s developer tools, we ought to see the successful WebSocket connection and a few messages exchanged between the client and the server:

Network panel in the browser's development tools, showing an exchange of JSON-RPC messages

With these messages, the client learns about the capabilities of the server and notifies it of having opened a document. We can see that the client no longer shows any validation errors, because the server doesn’t yet provide any diagnostic messages.

We’ve tagged this version of the client as 03-client-server on Github.

Providing Diagnostics

Before we even start dealing with TextX, we have to add the validation capability to our language server. Everytime the client opens a document – which, in our case, will be only when the web page has loaded – and whenever the user changes the code, the language server ought to update the client with a new set of validation issues.

So, we’ll add the following code to server.py, just before the last start_ws instruction that starts the server:

@server.feature(TEXT_DOCUMENT_DID_CHANGE)
def did_change(ls, params):
    validate(ls, params)

@server.feature(TEXT_DOCUMENT_DID_OPEN)
def did_open(ls, params):
    validate(ls, params)

def validate(ls, params):
    ls.show_message_log('Validating program...')

    text_doc = ls.workspace.get_document(params.text_document.uri)

    source = text_doc.source
    diagnostics = []
    # TODO validate the code
    ls.publish_diagnostics(text_doc.uri, diagnostics)

This will require that we add a few imports as well, at the top of the file:

from pygls.lsp.methods import (TEXT_DOCUMENT_DID_CHANGE, TEXT_DOCUMENT_DID_CLOSE, TEXT_DOCUMENT_DID_OPEN)
from pygls.lsp.types import (Diagnostic, DiagnosticSeverity, Position, Range)

As we can see, we have set up the necessary boilerplate to be notified when a document is opened or changed, and we respond to the client with an empty list of diagnostic objects. So, while obviously, we won’t yet see any markings on the editor, we can inspect the messages in the network pane in the browser’s development tool to check that, indeed, the server and the client are communicating.

Integrating TextX

Now, we can finally teach the server about a language made with TextX, and show validation errors on the client once again. This will be our objective for this section.

As an example, we’ll reuse the Turtle language we’ve developed in our TextX tutorial. This is a toy declarative language for describing 2D graphical scenes drawn with Logo-style “turtle graphics”. As such, it requires a version of Python with the “turtle” module built into it; check your Python installation if you want to follow along with the code.

So, let’s copy the turtle.tx file from the previous tutorial to our server directory. This is the file that defines the language. We’ll then be able to load it into a TextX metamodel: a description of our language that TextX compiles from the sources that we feed it. Such a metamodel is the basis for parsing code, which in our case is written in our Turtle language, into Python objects that we can later query, transform, interpret, etc. We can obtain the metamodel with the following instruction in server.py:

from textx import metamodel_from_file

turtle_meta = metamodel_from_file("turtle.tx")

Now, we’re ready to provide validation messages. We’ll parse the source code with turtle_meta.model_from_str, which as the name implies creates a model (abstract syntax tree) from a source string. Also, this method will raise a TextXSyntaxError when it cannot parse the given text. We can catch that to provide the required diagnostics. Importantly, this means that TextX will only ever report the first parse error it finds. Other parsers, such as ANTLR-based parsers, are able to restart after an error and report multiple issues, which is particularly useful in an editor.

So, let’s translate the above paragraph into code for our validate function:

try:
    turtle_meta.model_from_str(source)
except textx.exceptions.TextXSyntaxError as err:
    diagnostics.append(Diagnostic(
        range=Range(
            start=Position(line=err.line - 1, character=err.col - 1),
            end=Position(line=err.line - 1, character=err.col)
        ),
        message=err.message,
        severity=DiagnosticSeverity.Error,
        source=type(server).__name__))

We can see how we adjust line and column indexes to what Monaco expects. We the above code in place, we can finally see errors in our editor.

We’ve tagged the code developed so far as 04-text-integration on Github.

Error Reporting

Interestingly, TextX reports only the position of the first character where it encountered the error. Again, this is different from ANTLR and other parsers that report the entire problematic range, which would be better suited for an editor. We may want to modify the server so as to report a longer range, however we cannot do it in a generic way – it is definitely tied to the language and possibly to each specific language construct. So, we won’t be showing that here.

We can also see how, by hovering with the mouse over the error marking, we get a nice message from TextX, that includes what the parser expected to find. The default message is good enough for this simple language; with more complicated grammars, listing the full set of expected alternatives could become unwieldy, and we’ll probably want to customize the error messages on the server. This is an issue that ANTLR-based parsers share with TextX: they report errors that are meaningful for simple cases, but could become less user-friendly for more complex languages or constructs.

Moreover, we could get a semantic error instead. This happens when the syntax is correct, but we’re violating some other constraint that’s not encoded in the grammar. To catch semantic errors, we use TextXSemanticError. However, conveniently, both TextXSyntaxError and TextXSemanticError inherit from the same TextXError class and we can just catch that.

In TextX languages, there’s only one possibility for semantic errors out of the box. That is referring to a name of an element that doesn’t exist, as we’ve shown in the picture. However, it would be entirely possible to perform additional validation on the model. That would be the case when we want to enforce other constraints that we cannot express declaratively in TextX. For that, we may use object processors. That is, user-defined functions that TextX automatically calls when it parses some piece of code into an object. An object processor may raise a semantic error if it detects that the model violates some constraint. Then, the editor would show such an error exactly as it does with built-in errors, with no changes required.

We’ve tagged this final version of the code as 05-semantic-errors on Github.

What We’ve Learned and Going Further

That concludes our journey. We’ve completed a successful, if basic, integration of TextX and Monaco. Going further, it would be interesting to explore how to go past error reporting, for example adding syntax coloring and code completion.

In this tutorial, we’ve attempted something which is probably not a very good idea in most scenarios. We’ve learned that TextX with the Language Server Protocol is not really fit for a network service with multiple users; not without developing substantial pieces of software that don’t exist at the time of writing. So, another direction we may want to explore is: what are the challenges in exposing a Language Server as an HTTP service for multiple users?

For now, we’ve just looked at an integration scenario where TextX and Monaco run with a single user. While, in general, there are better approaches to this problem, there are situations where it could make sense. For example, think of a Python application that uses TextX for some parts of its configuration. Also, suppose that we distribute it as a Docker image. Then, editing files inside the image is awkward for the users of the application. However, we can expose a language server on the Docker network and access it through a privileged container that publishes an instance of the Monaco editor through a web interface.

So, even if this is more of a technical exercise of the “what if…?” kind, it may still have practical applications.
As usual, all the code is on Github. Each section where we’ve introduced new code matches with a tag in the repository.