In this article, we present our Teradata SQL parser – a commercially licensed parser for Teradata SQL. Reading this article, you will learn why, when, and how you would want to use our ready-to-go parser.

In the following sections, we will first provide a short introduction to Teradata SQL – what it is and why you might want to have a parser for it. Then, we will briefly explain what a parser is and why ready-to-go parsers might represent the right solution for you. Finally, we will illustrate how our Teradata SQL parser can be integrated and used in both JVM and Python projects.

Farewell Teradata SQL

Teradata SQL is the Structured Query Language (SQL) dialect used to interact with Teradata databases. Like many other SQL dialects, Teradata SQL extends the ANSI SQL standard with language constructs targeting the needs of its specific use cases – large-scale data warehousing and analytics.

Starting from the early 1980s, Teradata has imposed itself as a prominent provider for large-scale data warehousing and analytics with its robust and highly-parallel Database Management System (DBMS). 

However, the demand for moving on-premises data warehouses to the cloud has increased over the recent years and relevant competitors providing cheaper and more agile solutions have emerged, e.g. Snowflake. An increasing number of customers has therefore started considering the migration of their Teradata solutions to more modern technologies.

Manual migrations might not represent a reasonable option for large-scale Teradata solutions. In these cases, (semi-)automated procedures are often preferred – and that is where a parser takes the stage (along with other precious components)!

Fantastic Parsers and Where to Find Them

The term parser is part of our daily jargon at Strumenta, we know what a parser is, what can be done with it and how to build one to be proud of.

In short, a parser is a piece of software that understands the syntax of code written in some programming language – and can read and extract structured knowledge from it. This information is typically used in wider systems analyzing and manipulating source code, such as refactoring tools. 

Want to know more about parsing? Take a look at this article we wrote some time ago.

Choose any language and there is probably going to be an open-source parser available for it, so why bother buying a commercial one? Glad you asked!

Developing high-quality parsers is time-consuming and requires knowledge of the subject. We are specialized in this field and we have built tons of parsers for our customers over the years. In doing so, we have developed a solid methodology to ensure high quality in our products from inception to maintenance. Our ready-to-go parsers are battle-tested and continuously improved over time in terms of language coverage and use cases.

Curious about what methodology we use to build parsers? We like to call it Chisel – check it out!

Introducing our Teradata SQL Parser

Our Teradata SQL parser is implemented in Kotlin using ANTLR4 and Kolasu – the Kotlin implementation of our runtime libraries supporting the StarLasu methodology. The parser provides support for parsing Teradata SQL code from both strings and files and is distributed as a Java Archive (JAR) file.

It is possible to interact with the parser from the terminal through a simple Command-Line Interface (CLI). In the following snippets, we illustrate how this can be done using a file or string:

# Parsing from file...
java -jar <PARSER_JAR_PATH> parse <INPUT_FILE_PATH>
# Parsing from string...
java -jar <PARSER_JAR_PATH> parse "INPUT_STRING"

The parser can also be invoked programmatically if included as a dependency in a JVM project. It is worth mentioning that, while we are going to use Kotlin in the remaining sections of this article, our Teradata SQL parser can be used from Java and other JVM languages as well. The following code snippet illustrates a possible Gradle build configuration integrating the parser from its physical JAR file located into a libs folder:

implementation files('libs/kolasu-teradata-sql-parser-<VERSION>.jar')

Once we’ve added the parser as a dependency, we can invoke it as follows:

// Declare the input file path
val path: Path = ...;
// Create char stream from file path
val file: CharStream = CharStreams.fromPath(path);
// Create new Teradata SQL parser instance
val parser: TeradataSqlKolasuParser = TeradataSqlKolasuParser();
// Parse the input file using the parser
val result: ParsingResult<TeradataSqlCompilationUnit> = parser.parse(file);

We also provide a Python implementation of the Teradata SQL parser – which wraps the original JAR file and handles data exchange to integrate it within the Python world. The Python parser is distributed as an ordinary wheel package.

In this case, the distributed archive can be placed into a libs folder and declared as a dependency. For example, in a setup.py file, we could write:

setup(
  #...other configurations...
  install_requires=[
    f"teradata-sql-parser @ <archive_absolute_path>",
    #...other dependencies...
  ]
)

Once done, it will be possible to invoke the parser as follows:

# required imports (pylasu)
from pylasu.model.traversing import walk
# required imports (teradata-sql-parser)
from tera
data_sql.parser import TeradataSqlParser

# Declare the input file path
filename: str = ...
# Create new Teradata SQL parser instance
parser = TeradataSqlParser()
# Parse the input file using the parser
parsingResult = parser.get_ast(filename)

Please note that, in this case, the Kotlin executable is wrapped into a Python project handling data exchange. A valid Java installation is therefore still required (to execute the contained JAR executable file).

Extract Information from Teradata SQL Code

At this point, you might be wondering what the ParsingResult data structure returned back by the parser contains and what could be done with it. 

In addition to the root of parsed AST, the Teradata SQL parser returns back a list containing possible issues encountered during the parsing process. The parsing result simply contains these two sources of information.

The first operation that could be performed using the parsing result consists of verifying that no error occurred – i.e. the parsed source code is syntactically correct. This could be done by checking that the list of issues is empty, as illustrated in the following examples using Java:

assertTrue { parsingResult.issues.isEmpty() }

and using Python:

assert len(parsing_result.issues) == 0 

Starting from the AST root, it is also possible to perform arbitrary traversals to implement various tasks. Let us consider the following Create Table Statement defining a table People with Id, Name and Age columns:

CREATE TABLE People (Id INT, Name VARCHAR(255), Age INT);

Let us imagine now that we are interested in programmatically extracting information about the declared column names of the table and printing these to the standard output stream. The code snippet below illustrates how this could be achieved in Kotlin and Kolasu through the following steps:

  1. First, we parse the initial statement and extract an AST using our Teradata SQL parser;
  1. The parser will return back a parsing result from which we retrieve the AST root and start traversing its nodes;
  1. Among the AST nodes, we focus our search on column definitions – i.e. TeradataSqlSimpleColumnPartition;
  1. From each column definition, we extract a string representation of their declared name by processing their TeradataSqlQualifiedIdentifier;
  1. Finally, we print out the extracted string representations to the standard output;
// store the input statement into a string
val input = "CREATE TABLE People(Id INT, Name VARCHAR(255), Age INT);"
// (1) parse the input string containing the create table statement
val parsingResult = KolasuTeradataSqlParser().parse(input);         
// (2) traverse the obtained AST from its root node
parsingResult.root.walk()                                           
	// (3) extract all column definition nodes
	.filterIsInstance<TeradataSqlSimpleColumnPartition>()        
	// (4) extract name from each column definition
.map { col -> col.fName.fIdentifiers.joinToString(".") { it.fValue } }
	// (5) print out the name for each column definition
.forEach { println("Found column with name $it") }

The same task can be performed using Pylasu and Python as follows:

# required imports (pylasu)
from pylasu.model.traversing import walk
# required imports (teradata-sql-parser)
from teradata_sql.ast import SimpleColumnPartitionDefinition
from teradata_sql.parser import TeradataSqlParser

# store the input statement into a string
input = "CREATE TABLE People(Id INT, Name VARCHAR(255), Age INT);"
# (1) parse the input string containing the create table statement
parsing_result = TeradataSqlParser().get_ast_from_string(input)
# (2) traverse obtained AST from root node and (3) extract columns definitions
column_definitions = [node for node in walk(parsing_result.root)
                    if isinstance(node, SimpleColumnPartitionDefinition)]
# (4) extract name from each column definition
column_names = [column_definition.column_name.name 
                for column_definition in column_definitions]
# (5) print out the name for each column definition
for column_name in column_names:
    print(f"Found column with name {column_name}")

Executing the code on the initial statement will print out the following output:

Found column with name Id
Found column with name Name
Found column with name Age

Summary

In this article, we have seen how easy it is to use and be productive with our Teradata SQL parser. It is not perfect, but various companies battle-tested it, used it in production and will continue to do so.

If you are interested, please find out more about our parsers at Parser ready-to-go and schedule a meeting with us to get to know more.

Read more

You can discover more about our Teradata SQL parser Ready-to-go here.