Tutorials and issues on all aspects of creating software to analyse code

Migrating from ANTLR2 to ANTLR4

ANTLR is a popular parser generator or “compiler-compiler”, developed by prof. Terence Parr and several contributors. It’s been around since 1992, as an evolution of PCCTS. It’s gone through multiple major versions. The latest incarnation of ANTLR is version branch 4.x, the first release of which is from 2013. Here at Strumenta, we use ANTLR4 […]

Getting started with ANTLR: building a simple expression language

This post is part of a series. The goal of the series is to describe how to create a useful language and all the supporting tools. Building a lexer Building a parser Creating an editor with syntax highlighting Build an editor with autocompletion Mapping the parse tree to the abstract syntax tree Model to model […]

Building and testing a parser with ANTLR and Kotlin

This post is part of a series. The goal of the series is to describe how to create a useful language and all the supporting tools. Building a lexer Building a parser Creating an editor with syntax highlighting Build an editor with autocompletion Mapping the parse tree to the abstract syntax tree Model to model […]

Parsing SQL

You can find the code presented in this article in the companion repository SQL is a language to handle data in a relational database. If you worked with data you have probably worked with SQL. It is in the same league of HTML: maybe you never learned it formally but you kinda know how to […]

Why you should not use (f)lex, yacc and bison

In the field of parsing Lex and Yacc, as well as their respective successors flex and GNU Bison, have a sort of venerable status. And you could still use them today. But you should not do that. In this article will explain why they have problems and show you some alternatives. Lex and Yacc were […]

Pyleri: Parsing with Ease

Pyleri Tutorial: Parsing with Ease

Welcome to a tutorial on Pyleri, aka Python Left-Right Parser, a simple parsing tool. To use it when you need something more than a regular expression, but less than a full parser generator. In this tutorial we are going to show you how to use the tool and the basics of parsing. Why Learning Pyleri? […]

So Much Data, So Many Formats

So Much Data, So Many Formats: a Conversion Service

Data is a core resource for many activities. One important challenge for handling data is storing the data in the right way. We need to choose a format that makes easy solving the problem at hand. When multiple problems are being solved using the same data, that could mean that the same data has to […]

Getting started with ANTLR in C++

Getting Started with ANTLR in C++

ANTLR can generate parsers in many languages: Java, C#, Python (2 and 3), JavaScript, Go, Swift, and C++. We have written an article to use an ANTLR C# parser and we have also written a mega tutorial to teach you how to use ANTLR with Java, C#, Python and JavaScript. In this article we are […]

Guide to Natural Language Processing

Analyze and Understand Text: Guide to Natural Language Processing

What Can You Do With Natural Language Processing? Natural Language Processing (NLP) comprises a set of techniques to work with documents written in a natural language to achieve many different objectives. They range from simple ones that any developer can implement, to extremely complex ones that require a lot of expertise. The following table illustrate […]

A Guide To Parsing: Algorithms And Terminology

A Guide to Parsing: Algorithms and Terminology

We have already introduced a few parsing terms, while listing the major tools and libraries used for parsing in Java, C#, Python and JavaScript. In this article we make a more in-depth presentation of the concepts and algorithms used in parsing, so that you can get a better understanding of this fascinating world. We have […]

Do You Need a Parser?

We can design parsers for new languages, or rewrite parsers for existing languages built in house.

On top of parsers we can then help building interpreters, compilers, code generators, documentation generators, or translators (code converters) to other languages.