Posts

Translate Javascript to C#

Problems and Strategies to Port from Javascript to C#

Let’s say you need to automatically port some code from one language to another, how are going to do it? Is it even possible? Maybe you have already seen a conversion between similar languages, such as Java to C#. That sounds much simpler in comparison.

In this article we are going to discuss some strategies to translate Javascript to a very different language, such as C#. We will discuss the issues with that and plan some possible solutions. We will not arrive to writing code: that would be far too complicate for an introduction to the topic. Let’s avoid putting together something terribly hacky just for the sake of typing some code.

Having said that, we are going to see all the problems you may find in converting one real Javascript project: fuzzysearch, a tiny but very successful library to calculate the difference between two strings, in the context of spelling correction.

When it’s worth the effort

First of all you should ask yourself if the conversion it’s worth the effort. Even if you were able to successfully obtain some runnable C# you have to consider that the style and the architecture will probably be “unnatural”. As consequence the project could be harder to maintain than if you write it from scratch in C#.

This is a common problem even in carefully planned conversion, as the one who originated Lucene.net, that started as a conversion from Java to C#. Furthermore, you will not be able to use it without manual work for every specific project, because even the standard libraries are just different. Look at the example: while you could capitalize the length of haystack.length, you cannot just capitalize charCodeAt, you will have to map different functions in the source and destination language.

On the other hand all languages have area of specialization which may interest to you, such as Natural Language Processing in Python. And if you accept the fact that you will have to do some manual work, and you are very interested in one project, creating an automatic conversion will give you a huge head start. Though if you are interested in having a generic tool you may want to concentrate on small libraries, such as the Javascript one in the example.

Parse with ANTLR

The first step is parsing, and for that, you should just use ANTLR. There are already many grammars available which may not necessarily be up-to-date, but are much better than starting from scratch and they will give you an idea of the scale of the project. You should use visitors, instead of listeners, because they allow you to control the flow more easily. You should parse the different elements in custom classes, that can manage the small problems that arises. Once you have done this generating C# should be easier.

The small differences

There are things that you could just skip, such as the first and last lines, they most probably don’t apply to your C# project. But you must pay attention to the small differences: the var keyword has a different meaning in Javascript and C#. By coincidence it would work most of the time, and would be quite useful to avoid the problem of the lack of strict typing in Javascript. But it’s not magic, you are just hoping that the compiler will figure it out. And sometimes it’s not a one to one conversion. For instance you can’t use in C# in the way it’s used in the initialization of the for cycle.

The continue before outer should be transformed in goto,  but when it is alone it works just as in C#. A difference that could be fixed quite brutally is the strict equality comparison “===/!==”, that could be replaced with “==/!=” in most of cases, since it’s related to problems due to the dynamic typing of Javascript. In general you can do a pre-parse check and transform the original source code to avoid some problems or even comment out some things that cannot be easily managed.

I present you thy enemy: dynamic typing

The real problem is that Javascript uses dynamic typing while C# use strict typing. In Javascript anything could be anything, which lead to certain issues, such as the aforementioned strict equality operator, but it’s very easy to use. In C# you need to know the type of your variables, because there are checks to be made. And this information is simply not available in the Javascript source code. You might think that you could just use the var keyword, but you can’t. The compiler must be able to determine the real type at compile time, something that will not always be possible. For example you cannot use it in declaring function arguments.

You can use the dynamic keyword, which makes the type be determined at execution time. Still this doesn’t fixes all the problem, such as initialization. You may check the source code for literal initialization or, in theory, even execute the original Javascript in C# and find a way to determine the correct type. But that would be quite convoluted. You might get lucky, and in small project, such our example, you will, but not always.

There are also problems that can be more easily to manage than you imagined. For instance, assigning a function to a variable it’s not something that you usually do as explicitly in C# as you do in Javascript. But it’s easy using the type delegate and constructs such as Func. Of course you still have to deal with determining the correct types of the arguments, if any is present, but it doesn’t add any other difficulties per se.

Not everything is an object and other issues

In Javascript “string” is a string, but not an object, while in C# everything is an object, there are no exceptions. This is a relevant issue, but it’s less problematic than dynamic typing. For instance to convert our example we just have to wrap around the function a custom class, which is not really hard. One obvious problem is that there are different libraries in different languages. Some will not be available in the destination language. On the other hand some part of the project might not be needed in the destination language, because there are already better alternatives. Of course you still have to actually change all the related code or wrap the real library in the destination language around a custom class that mimic the original one.

Conclusion

There are indeed major difficulties even for small project to be able to transform from language to another, especially when they are so different like Javascript and C#. But let’s image that you are interested in something very specific, such a very successful library and its plugins. You want to port the main library and to give a simpler way for the developers of the plugins to port their work. There are probably many similarities in the code, and so you can do most of the work to manage typical problems and can provide guidance for the remaining ones.

Converting code between languages so different in nature it is not easy, that is sure, but you can apply some mixed automatic/manual approach by converting a large amount of code automatically and fix the corner cases manually. If you can also translates the tests maybe you can later refactor the code, once it is in C#, and over time improve the quality.

ANTLR and the web: a simple example

ANTLR on the web: why?

I started writing my first programs on MS-DOS. So I am very used to have my tools installed on my machine. However in 2016 the web is ubiquitous and so our languages could be needed also there.

Possible scenarios:

  • ANTLR also on the web:
    • users could want to access and possibly to minor changes files written in a DSL also from the web, while keeping using their fat-clients for complex tasks.
  • ANTLR only on the web:
    • you are dealing with domain-experts which are reluctant to install IDEs, so they prefer to have some web application where to write their DSL programs.
    • you want to offer a simple DSL to specify queries to be executed directly in the browser.

In the first case you can generate your ANTLR parser using a Java target and a Javascript target, while in the second you could target just JavaScript

A simple example: a Todo list

The DSL we are going to use in this example will be super easy: it will represents a todo list, where each todo item is contained in a separate line and started by an asterisk.

An example of a valid input:

And this is our grammar:

Using the ANTLR Javascript target

You would need to install the ANTLR tool to generate the Javascript code for our parser. Instead of manually downloading ANTLR and its dependencies you can use a simple Gradle script. It makes also very straightforward to update the version of ANTLR you are using.

You can now generate your parser by running:

Ok, this one was easy.

Invoking the parser

Unfortunately the JS libraries we are using do not work when simply opening local files: it means that also for our little example we need to use HTTP. Our web server will just have to serve a bunch of static files. To do this I chose to write a super simple application in flask. There are millions of alternatives to serve static files so pick the one you prefer. I will not detail how to serve static files through flask here but code is available on GitHub and if you have issues with that you can add a comment to this post to let me know.

Our static files will include:

  • the generated parser we got by running gradle generateParser
  • the Antlr4 JS runtime
  • the JS library require.js
  • HTML and CSS

You can get the Antlr4 JS runtime from here. To avoid having to import tens of files manually we will use require.js. You can get the flavor or require.js we need from here.

We are going to add a textarea and a button. When the user clicks on the button we will parse the content of the textarea. Simple, right?

Selection_052

This is the HTML code for this design masterpiece:

First thing first, import require.js:

By the way, we are not using jquery, I know this could be shocking.

Good, now we have to invoke the parser

Cool, now our code is parsed but we do not do anything with it. Sure we can fire the developer console in the browser and print some information about the tree to verify it is working and to familiarize with the structure of the tree ANTLR is returning.

Selection_054

Display results

If we were building some kind of TODO application we may want to somehow represent the information the user inserted through the DSL.

Let’s get something like this:

Selection_053

To do so we basically need to add the function updateTree which navigate the tree returned by ANTLR and build some DOM nodes to represent its content

Here you go!

Code

If it is not the first time you are reading this blog you will be suspecting that some code is coming. As usual, code is on GitHub: https://github.com/ftomassetti/antlr-web-example

Next steps

The next step is to perform error handling: we want to catch errors and point them to the users. Then we may want to add syntax highlighting by using ACE for example. This seems a good starting point:

https://github.com/antlr/antlr4/blob/master/doc/ace-javascript-target.md

I really think that simple textual DSLs could help to make several applications much more powerful. However, it is not straightforward to create a nice editing experience on the web. I would like to spend some more time playing with this.

An approach to UML diagrams and ER models bearable for a Software Engineer

As part of my current job at Groupon I have to create diagrams, those nice pictures which make project managers happy. I write basic UML diagrams (State diagrams and Activity diagrams) together with Entity-Relationship diagrams (yes, the ones for the DB).

FOyn3i8m34Ltd-AhUo_G0Q5I2R4XLHMpavesearAueAuFGa3a_Nz_9_aOrAEkgyBaJfT1DKrPsVTnbuJQlJotCLRGUTuYhnMH6mrH0n98fcm-v7Z1zLD3Cx3fGAdCcbaPSCfzrgYSelwK00Qd1Pd7mWUQJUhKwAophHhCqJFBu7EWcB07_uK3Vevl663lxkuiheisNWIegFuCN_n1G00

687474703a2f2f6275726e7473757368692e6e65742f73747566662f6572642d6578616d706c652d73696d706c652e706e67

Yes, people want these pictures and I have to create them

What is wrong with the previous process

I am a Software Engineer and I understand the importance of communication, therefore I understand how useful diagrams can be. However, I have to confess that I am always a bit suspicious when I interact with people too fond of them: I am always afraid of dealing with people who like to spend endless time discussing about things, pretending to be able to build something and generally just wasting time. I am an Engineer, I like building things, not just talking about building things.

On the other end system evolves and diagrams can easily end up being outdated. One thing that make this problem harder to fix is that normally you need some specific tool to create diagram, so when you need to update a diagram you have to install the right tool, start it, generate your image and update the document.

I do not like this particular process, I would really love to improve it.

My current process

I prefer text formats instead of nice WYSIWYG editors, that is because:

  • text is portable, while each WYSIWYG editors tend to have their own format
  • you can easily compare and merge text files
  • you do not waste endless time trying to convince the editor to do what you want or exploring all the menu items

So if I need to write diagrams I use text formats to describe them and then I generate the actual pictures from those text files. I feel the process is more controllable, repeatable and versionable and the other people get their pretty pictures.

Currently I am using plantuml for UML diagrams and erd for ER diagrams. Erd wins extra points because it is written in Haskell. There is also a nice website that offer a web editor for PlantUML: it is named PlantText.

Now, this solution has problems:

  • you need anyway to install software, at least for the ER diagrams (you can generate the UML diagrams using the planttext website).
  • there are not nice editors supporting the DSLs used to describe these diagrams
  • there is not integration between the web editor and my github repository
  • you need to update the images in the documents after having generated them

The ideal process

To solve the current process I would love to have a web application to edit the diagrams and have this web application able to talk with my GitHub repository, doing the versioning for me. I would like also this web applications to generate the images on the fly and my documents to support links to images exposed on the web. That would be great for two reasons:

  1. I would not have to update all the documents containing a diagram when the diagram change. The problem is that many documents take a copy of the image, not a reference. The other problem is that the server with diagrams need to be always up.
  2. I would know where to find the source of a diagram. I imagine for example that we could have an image available at, let’s say, http://diagrams.foo.com/diagram1.png and the web application to edit it at http://diagrams.foo.com/diagram1.png/edit.

It would be fantastic to have a process to commit changes and a git hook to generate the images, maybe even updating the existing documents.

What I started doing: syntax highlighting for PlantUML

Now, I am still far away to have the ideal process in place and probably I will never be there: the effort for implementing it and the changes required to the current process would not justify it. However I am starting doing some steps in that direction. In particular I am focusing on improving the web editor for UML by implementing syntax highlighting.

I have implemented Syntax Highlighting for a large part of PlantUML for the CodeMirror web editor. The code is available on GitHub and I have sent a pull request to plantuml-server.

Writing the Syntax Highlighting mode could resemble writing a grammar, in fact my first thought was writing the grammar for ANTLR and then implement an automatic conversion from EBNF to a CodeMirror mode. However the goal of a grammar and Syntax Highlighting systems are different. The former is intended to parse correct files and stop when it finds errors (only very good grammars have strong error handling and are able to overcome a few errors) while a system for syntax highlighting works on a document that is wrong all the time: as you type the document is incorrect, only when you complete your statement the document is correct until you start typing the next character and the document is wrong again. Syntax highlighting system need to be very robust and tolerate a lot of errors.

This is a random piece of the mode I have defined.

Now, the basic idea is that you have a state machine where your states start from start and go through things like class def or stereotype style. Depending on the state you interpret tokens differently. Now, the point is that you should keep the number of states very limited. Remember, you want your Syntax Highlighting system to be robust and to provide some reasonable output as the user type. So your parser would not be as refined as the parser you would write for a compiler. You will end up instead having a few states, so few that could make sense to define them manually (no need for parser generators) and they should have human-comprehensible meanings instead of using parser generators as we would do in a compiler.

Note that CodeMirror provides also a library to test tour mode, and I really appreciate that. These are a few of mine tests:

Consider the first test: it says that the first word (class) should be recognized as a keyword while the second (car) as definition (or def).

The only problem with writing this code is that the plantuml grammar is… suboptimal. It is used for a lot of different types of diagrams and it is not so clear to me. I would definitely not suggest it as an example of a well designed DSL.

What I want to do in the future

Once I am finished with the syntax highlighting I want to implement the auto-completion. This would make much easier for me to write UML diagrams: currently I have always to look up at examples to figure out how to do things. Some support from the editor would help greatly. It would be fantastic to have also error reporting as you type but that could be a bit more complicate to build.

The next step is to write a web application around the erd program. I started creating the project (erd-web-server), let’s see when I can find the time for playing a little more with Haskell…

Once I have done that I would work on the GitHub ingreation. I would like to access diagrams in my projects and to generate the images as part of a git web hook.

So there is plenty of room for improvements and also an engineer can have fun with diagrams, especially building the tool chain around them.