Everything related to software development that every developer could find useful

The Best Programming Languages for Each Situation

The Best Programming Languages For each situation

There is a question that many people take as a sign that the questioner does not understand the subject at all. Some people even find it enraging. The question is usually in the form What is the best X? What is the best car? What is the best programming language? But at the same time it is a question that we ask ourselves every time we start a project or we pick a car.

The difference, of course, is that we ask this question in our specific context. We want to know the best programming language for us, for the situation we are in.

We obviously cannot know your situation, but with this article we hope to provide useful information to whoever is thinking about entering a new field or just want to know the current state-of-the-art.

The Best Programming Languages

The_best_programming_languages_in_each_situation_-_federico_tomassetti

Get the guide to the best programming languages to use in each situation delivered to your email and read it when you want on the device you want

Powered by ConvertKit

Changing Programming Language Has Costs

If you have a large codebase, you probably do not really choose a language for a new project. You just use the one you have used so far. The main factor is the cost of adding a new language. Hiring new developers, changing your infrastructure, learning the best practices, is simply too much for many companies.

Another problem is that while it is reasonably possible to learn a new language in a month or two, it takes much longer to become proficient at that language. So both developers and their employers prefer to take advantage of the knowledge they already have.

In such cases the answer to the best programming language question usually becomes what you are already using, which is a boring answer. Although this not necessarily mean that is the wrong answer.

When To Change Programming Language

Fail Whale of Twitter

Twitter outages were so frequent that the error page became a mascot: Fail Whale

The necessities of their business may force some companies to change programming language, like when Twitter changed from Ruby to Java and Scala. The performance of Ruby was simply not good enough and was impacting their ability to deliver their service. The change was costly, but necessary. Other than performance there were other technical factors behind their choice.

No language is perfect, in fact there seems to be new problems with Scala, such as the learning curve. But it is fair to say that, in some situations, it is not true that all language are equals. Which is a common refrain of some people.

When you have significant and peculiar needs, some languages are really better than others.

A Huge Amount Of Money Can Change The Language Itself

HHVM reduced the median page save time by half on Wikipedia

Graph of the performance improvement brought by HHVM to Wikipedia

There is another famous case that shows that money can help you in some cases: Facebook and PHP. PHP has been accused of many things, many of them true, especially when it comes to security. But few people would say that PHP is particularly slow. It is just that Facebook was, and is, so big that any improvement in performance can save them a significant amount of money.

On one hand PHP was not the language with the best performance, but it was easy to work with and they had a huge codebase in that language. You might say that they had outgrown the language. But instead of changing their development language they changed the language itself and the way it worked. They created not one, but two technologies to improve its performance: first HipHop for PHP, a PHP-to-C++ transpiler, and then the more famous HHVM, a virtual machine for PHP and Hack. In fact they also invented a dialect of PHP called Hack.

They could keep PHP because its performance was not so bad, nor they required it to be that good. Twitter simply could not work with Ruby, while Facebook could work even with normal PHP, it would have simply cost more.

A Few Criteria For Choosing A Language

While we all agree that there is not a best programming language in absolute terms, we think that there are preferable programming languages for specific tasks. We think that it is possible to set some reasonable criteria to guide professional developers and business alike. These criteria can help you choose the best programming language for your situation.

Good Enough Technical Qualities

Aside from the obvious performance requirements, the language must have good technical qualities for your needs. For instance, if your software is heavily concurrent you need a language that has first-rate support for that.

Another problem that Twitter had was that LAMP model of Ruby deployment did not support encapsulation well. So it was difficult to build a separate and independent storage or search service. This was not a case of Ruby is bad, but more Ruby is designed for something else. Remember that technical qualities does not just mean the things you can measure and see, like performance or the syntax. It also means how the language works behind the scenes.

So you do need to choose a language that fit well your use case, a language that checks all the boxes. Although you do not necessarily need to pick the one that best fit your use case. That is because it is not always possible to rank different needs or, even if it were, you might not know which one matter the most to you.

For example, imagine that you can determine that language X would be better if you achieve 5 million users or more, while language Y would make easier and cheaper to reach 5 million users. How can you know if you will ever reach 5 million users? Maybe language X would be too costly to use at the start and so your company will fail before reaching that many users.

Consider technical requirements as a filter, your language must pass them, but it does not need to be the best possible language for that.

Popularity

The language you choose should be popular enough. This can help you saving money and time, especially because of open source development. For instance, if your company use PHP, now you can take advantage of the hard work of Facebook developers. You can use HHVM to improve the performance of your software. It can also make more probable to find good ready-to-use libraries to speedup your development.

There are several ways to measure the popularity of a programming language:

  • the quantity of jobs available
  • the quantity of search engine searches
  • the quantity of GitHub projects

All of them are useful, none of them is perfect. For instance, there are many JavaScript projects on GitHub that are composed of one file and are currently inactive. That is simply because the language and the tools encourage you to create very small libraries. How can you understand the popularity of a language called Go (there is also another one called Go!) looking at search engine results?

Community Fit

The language must also be popular in the community you belong to. A good community fit has many advantages: developers think the way you want them to think and they also usually have non-programming skills or knowledge that you need. Which means that you have to spend less time in training them and you are less at risk of hiring the wrong developer.

PHP has improved a lot in the technical department, but it is still chosen by people that want programming to be easy. Which means that there are many not-that-good PHP programmers and it might be hard to find the good ones. Even worse, you may not find out if they are good until you make them work for your project and you see the results.

On the other hand, maybe not all people that works professionally with C and C++ have a computer science degree, but probably they all think like the people that do have one. Simply because if you do not think, plan and write C/C++ code in the proper way, your code will make you suffer for your mistakes soon enough.

Also, some languages are typically embraced by certain communities, so there are more libraries to cover that area. Think about the amount of libraries that are being developed to analyze data in Python. This is a consequence of the fact that many data scientists have embraced Python. You can benefit from this by adopting Python if you need to develop applications in that area.

The Best Programming Languages For Some Specific Contexts

A lot of programming books

We have made this list for pragmatic purposes. We did no try to find the best programming language for each possible niche. We simply listed the sectors for which we could find at least two programming languages which fit reasonably well.

Let’s see our list of best programming languages, so we can all start quarrel about it.

For Enterprise And Industry-Minded Academia

The worlds of enterprise and academia are both full of unmovable objects.

While there is innovation, there is also a proper way of doing things and there must be a good reason to change. By industry-minded academia we mean academia that is linked directly with real products and businesses. For instance when we talked about parsing in Java we found out that many of the best tools came directly from academia. Indeed research in all fields of academia may end up in real products, but usually the path is more circuitous.

Java

It could be argued that C# is a slightly better language than Java and the CLI is more flexible than the JVM. But Java is a good language and the JVM is a good platform. More importantly everybody else use it, so you should do that too. In other words, Java is good enough and it is very popular which is especially relevant in the enterprise world. Java is also taught in many computer science courses, but the enterprise presence it is the reason is used in industry-minded academia.

In the enterprise world changing a platform does not just mean changing the codebase, but finding a whole new set of experts and solutions for all the technical, regulatory and business problems. It is frequently not worth the effort. And for Java there are many libraries and products already available which are non-existent for everything else. There are even special Java platforms for the enterprise that add new features specifically for that world.

Kotlin

If you are in a Java company the only way to get partially out of Java might be Kotlin. Kotlin is a language designed to be safer and more productive than Java, while being easy to use and 100% compatible with Java. It is developed by the famous software development company JetBrains. Recently it became the second language (after Java) to be supported by Android. It is also used by Corda, a distributed ledger developed by a consortium of well-known banks.

C++

C++ was born as an improvement of C and it largely succeeded, at least in the eyes of the public. Linus Torvalds might not like it, but most people do. If your company need a language with great performance and closeness to hardware it is a great choice. Although compatibility with C is still important, it is not easily obtained now. If nothing else because both languages have kept evolving.

This long history has given similar advantages to Java for the enterprise, namely that there is a lot of code already written in it. It has also great performance. One reason is because the language is lower level than Java and most languages used by the average programmer. Another reason is that some of the best programmers in the world have dedicated their life to optimize its compilers. They have not just worked on the code for the compiler themselves, but they have studied and developed many techniques to improve them.

The history has also given some disadvantages: there are some many features that it does not really exist a standard way of using C++. You will probably have to enforce your way of using it, to avoid many maintenance problems down the line.

Theoretical Academia

A stylized brain as a symbolic representation of interesting academia

By theoretical academia we intend to contrast it with the industry-minded academia that we have talked about before. We mean to indicate the kind of work in academia that people find really academic. The stuff that is more experimental and farther away from real applications. That stuff that will be eventually prove to be useful.

Python

The only good general answer for this category is Python. It is not the only language that is widely used, but it is the only language that is used in many different fields. It has some important technical features, such as the fact that is easy to extend and embed. But it is so popular because it is designed to be beautiful and fun to use.

For instance, Python uses whitespace instead of curly brackets to delimit blocks of code. The language and the community support a culture that lead to code that reject clever tricks, but prefer instead code that is readable, simple and explicit. So it is easy to use and it is thus great for prototyping. In many ways, it is the best programming language to underline the importance of culture in programming.

A good example of what we mean by theoretical academia is artificial intelligence. And in fact Python is very popular in several areas of artificial intelligence: from machine learning, with libraries like scikit-learn and TensorFlow, to natural language processing, with NLTK. Python has also great support for scientific computing, with SciPy, and a whole lot of mathematical needs, with NumPy.

The Bad Part

Unfortunately there are downsides to the use of Python: many practical aspects are not taken care of. For one, being an interpreted language performance is not great compared to C++, although it is not terrible either. This is one of the reasons that has lead to the creation of several different implementations. There are also implementations made to support JVM or .NET or even make Python run on microcontrollers.

Another problem is that there is not yet a standard and easy way to deploy Python applications, which means that it is surprisingly complex.

The migration to version 3, released in 2008, is problematic and still ongoing. Version 3 is incompatible with the previous one and this has resulted in the longest migration time in recent memory.

Julia

Julia is a language that is made to address the need of high-performance scientific computing, although it can also be used for more mundane web development. In very short terms: it is a more performing and less popular version of Python. Indeed it has also other technical advantages, like good support for concurrency and parallelism and the ability to call directly C, Fortran and Python code. Given the respective communities and intended usage it is often used together with Python.

Many Other Small Languages

There are many other small languages that are used in some part of academia or industry when complex mathematics is needed. One such language is R, a programming language that is widely used in statistics and data analysis. R has some useful features for such sectors, like its own documentation format and the fact that many standard functions are written in R itself, thus they are easy to analyze for their users. But the best advantage is the great availability of libraries and expertise for statistics and data analysis.

MATLAB is both a language and a computing environment. Which means that it can be readily and easily used by people that are not programmers by trade. This category include most scientists. It is greatly used for technical computing in industry and academia.  Being proprietary software it has literally a price tag. A famous competitor of MATLAB is Wolfram Mathematica, which is behind the notable Wolfram Alpha, a computational knowledge engine.

Both R and MATLAB are technically programming languages, but for the most part they are not used as such or by programmers. They are used by scientist and industry experts for research and development.

Concurrency And Reliable Software

The fundamental issues related to concurrency affect any software. For example, how to coordinate the access to shared resources it is a problem that could affect any program. In practical terms, though, most developers tend to ignore it. This is because the basic solutions provided by most programming languages are good enough. Where it could really start affecting them, like web development and access to databases, very smart people have done a lot of work for them.

When you start to have same real strong requirements in terms of concurrency, you might adopt a functional programming language.

However if your business depends on it, you want something more. A language that is explicitly geared to develop highly reliable concurrent applications.

Erlang

Any sufficiently complicated concurrent program in another language contains an ad hoc informally-specified bug-ridden slow implementation of half of Erlang.

Robert Virding, one of the creators of Erlang

When you absolutely need to develop distributed, fault-tolerant, reliable, soft real-time, concurrent applications you choose Erlang. It is an open-source language developed by Ericsson, a telecommunications company, exactly for such needs. The language itself obviously has proper technical features like being a functional programming language and implementing an Actor-model.

However the real advantage of Erlang it is that the whole ecosystem is developed with that objective in mind. For instance, it has some interesting features relative to the supervision of the processes and their organization. In fact, Erlang it is not just a language, but a runtime system and some ready-to-uses components like a distributed database. It has tools to check the quality of the code or the execution of the program. The whole system is called Open Telecom Platform (OTP).

In short, Erlang is a battle-tested language and platform to create reliable and concurrent applications.

Elixir

Erlang has been developed for the enterprise which means that it might be technically flawless, but not that pretty or very productive. To address this flaw José Valim created Elixir, a language built on top of the Erlang platform. For instance, Elixir added support for extensibility. You could think of Elixir as Erlang for people that are used to Ruby or Python.

It is gaining increasing popularity and is used by companies like Pinterest and Moz, but it has not yet a very large number of users.

Go

Another company that has a need to develop concurrent applications is certainly Google. Because of that is not surprising that people that works there created Go. It started as an experiment by some engineers, to develop a language that improved on C++ and Java. More than an evolution is a re-imagination of these languages, with some added benefits.

It was designed to be scalable to large systems and usable without an IDE, but also productive and being especially good at networking and concurrency. Other than a well-thought design, it has some specific features for concurrency like a type of light-weight processes called goroutines.

All its authors expressed a dislike for C++ complexity. So, in some way, it is a language designed to persuade C people to enter the new century.

While being developed at Google it would be incorrect to say that is backed by Google in the same way Microsoft backed C#. People working at Google have developed other languages, too. So it is more a language that many people like, including some people that work at Google. In fact, it is used also by Docker and Dropbox.

System Software

An vector graphics image of an arduino board

System software is a software meant to be used primarily by other programs. It has some specific needs: performance and closeness to hardware. If you are writing system software you are the one that have to deal with all the hardware differences so that everybody else can work more productively. You have also have to offer the best possible performance, because everything else depends on you. Not all programs need a 1% more efficiency, but some do and you are writing your software also for them.

C

C was a revolutionary language that essentially eliminated all of its then contemporary competitors, except for a few niches. Despite being still loved by many of its users, for everybody else it has some issues. Some of them, like manual memory management, and not being very productive might be necessity for performance. But others, like not having any feature to support large scale programs or the arcane preprocessor, are mostly legacy of its age, that nobody would put in a contemporary language.

Nonetheless there is simply no alternative that has the same large number of libraries and optimization that C has. No other language has that many expert developers in system programming. And its users must be expert, because the language does not help you in any way to develop good and large applications.

The compilers for C language are so efficient that the language has also a second-life has an intermediate language. For example, implementations of Python and PHP are written in C.

C++

C++ is also widely used for system programming. It has the advantage of better supporting large applications, for instance with object-oriented programming. But some people find this and other additions simply unnecessary for system programming.

Rust

Rust is a new language sponsored by Mozilla for system programming. It is open-source and thus open to the contributions of the community. But the design of the language evolved with the development of Servo, an experimental layout engine for a browser. It is loved by many of its users, and in fact is the most loved programming language according to a survey of developers by StackOverlow.

Rust is designed to have also great support for concurrency, memory safety and large scale applications. For instance, you cannot have null or dangling pointers, which can cause memory-related bugs in C or C++. It has classes and a performance comparable to C++.

It is much less popular than C or C++ and it has not been used in widespread applications. Mozilla plan to eventually use it for Firefox (although not directly with Servo). At the moment the most important software that uses it is probably Tor, an anonimity network for which security is literally the raison d’etre.

Game Development

Game development might seem the most orthogonal thing to system programming, since you really do not want software that plays games. Nonetheless there is a similar need for performance. This is useful for some demanding games, but it is mandatory for building a game engine, a software that will be used to simplify the development of games.

Game development has also a diametrical need: productivity. This is a requested feature for many businesses, but it is a life-or-death reality in game development. No other software industry is so dependent on a good launch for its revenue. A game developer could simply fail before even launching the game.

The combinations of these requirements lead to an industry that is heavily reliant on specific libraries, game engines, SDKs and tools.

C++

The most used programming language for professional game development is C++, which marries good productivity with performance. It has also the most experienced developers. Some software may be written in C, but most developers prefer using C++. Notable examples of software made for C++ are DirectX technologies and the Unreal Engine.

C#

If you do not need optimal performance, or prefer easier development and cross-platform support, C# is a good choice. It is really a matter of libraries and SDK. Initially Microsoft created XNA to make game development more accessible on Windows and Xbox, but was later supplanted by the cross-platform MonoGame. It is also the preferred language to use with the widespread game engine Unity.

Rapid And Productive Web Development

A bicycle with the logo of WordPress on the wheels

Try as you might, you cannot escape WordPress

There is a lot of web development going on. It is probably the most widespread field of development. Everybody need to develop web applications or at least applications that interact with the internet. So the capabilities are there, both in the sense of available libraries and available programmers. Almost any popular language power some large websites, proving that it can be done professionally. Heck, there are even C++ frameworks for rapid web development.

Then the rationale for using one language rather than the other mostly relies on the community you need and/or the language you know. For example, your application might be related to technical computing and then you choose Python.

If you have no preference though, we can present some default choices.

Javascript

JavaScript is obviously the only choice for client-side web development, but it is used also on the server-side. There are countless libraries for JavaScript, in every possible style of development. There are web frameworks for all kind of applications, from ambitious ones to simple ones. So you can also find everything that suits your needs and your style.

There are fundamentally two reasons for the spread of JavaScript to the server: optimization of the runtime environments and the competence of developers.

Multitude of developers have spent a lot of time to optimize the performance and quality of the JavaScript engines and the tools used with the language. Probably only C and C++ can claim a similar level of investment. Thus it makes sense to take advantage of such efforts and use it everywhere.

JavaScript can offer the best of the best developers, that have learned both to write code fast and write it optimized for performance. You want to use their skills everywhere you can. And the other side of the web, that is to say the server, need very similar expertise.

On the other hand, JavaScript might be the only language that is used, for the most part, by people with no formal training or education in it. People that have not even read a manual of JavaScript, but they have sort-of learned it by copy and paste. People that have started using jQuery before they started using JavaScript. They also contribute to the spread of JavaScript, because it is the only thing they know.

This means that you might have to take some time to find the right developers for your project.

TypeScript

New web frameworks have made easier to build large and robust applications. Despite that some people think JavaScript itself is not enough. For these people TypeScript might be the ideal compromise. It is a superset of JavaScript with static typing and class support that transpiles to JavaScript itself. Which means that you can reuse the JavaScript infrastructure in deployment.

It is developed primarily by Microsoft, but it is also used by Google for Angular (not to be confused with AngularJs).

PHP

PHP was designed as a set of tools to simplify and speedup server-side web development. You could accuse PHP of being a badly-designed language. But you would be wrong: it was not designed at all, it was not even intended to be a language.

Rasmus Leedorf about the creation of PHP:

I really don’t like programming. I built this tool to program less so that I could just reuse code. [..] I don’t know how to stop it, there was never any intent to write a programming language […] I have absolutely no idea how to write a programming language, I just kept adding the next logical step on the way

On the other hand this is a testament of the power of the philosophy of just getting things done. Because PHP did work, not beautifully or securely, but it did work where nothing else could. Indeed, until few years ago it was the only easy way of doing server-side web development. It is easy to deploy and it has a huge range of libraries for all the web development needs. For these reasons, even today, CMS are a PHP stronghold.

Nowadays the field is full of competitors. PHP is still the most widely used server-side language, but it is not the only option anymore. Also the language has improved a lot and PHP 7 is a real designed language. There are many good web development frameworks, like Laravel and Phalcon, that encourage robust, secure and well thought development.

At the same time PHP still suffers from its beginnings. You can create good PHP code, but fewer of its users do that. The reason is that PHP still attracts many users that just want to programming to be easy. So you have to pay attention to find the good PHP developers.

Apple Software

The glass is compatible only with Apple iWater

Despite the tendency of describing their software with an overabundance of words like magic, Apple write mundane software, like the rest of us. So it may seem strange to dedicate a specific section to Apple software. The reason because it makes sense to do so, is that the company has a strong vision to inspire, which is often also an order to follow.

You can certainly develop for their platforms with normal languages, like C#, and there is also some third-party support for that. But Apple makes more convenient to use their languages. They provide better support and first party tools, like the Xcode IDE. Other than the need to learn a new language the disadvantage is that the languages have some peculiar features. For instance, they use ARC instead of the more common garbage collection for memory management.

Swift

Objective-C without the C

Swift is the newer language and the most productive one. It is already the most used and most loved programming language among Apple developers. It uses a more familiar syntax and support more programming paradigms, for instance functional programming. So you should probably use Swift.

Objective-C

While Apple might decide any moment to deprecate anything in its stack, it is unlikely that will do so for Objective-C. That is simply because  a lot of its codebase and infrastructure is written in Objective-C. So you could choose it if you needed a slightly better performance.

But the language is one of the most dreaded programming languages. It could be unkindly defined as a worse version of C++. Something made to address similar concerns to the ones that lead to C++, only that it does that in a weirder way.

You Have To Build Many Things With One Technology

There are occasions when you have to build something, but you do not know how exactly the final product will look like.

Or you have to build many things, but you are required to pick one technology. It could happen working with or for small businesses. It may be useful working as a freelance developer.

It can also happens when you are not sure how your product will develop, because you cannot anticipate the market. You might start building a desktop application, but your clients demand certain functionalities to be accessible by multiple users, so you add a web application, then…

C#

When you have unstable and different requirements you need a language that is adaptable to all situations. Not just in the theoretical sense of being a general purpose programming language, but something that has been used to build all sorts of things. There are tools that can help you in everything. C# is the best language in such cases.

It was not meant to be a revolutionary language, but basically a better version of Java and C++. Since all three languages kept evolving we could argue forever if it has succeeded and find a new answer every year. Technically C# support particular features that no other language has, like LINQ, that adds support for querying data easily.

A few examples of the real usage of C# for: desktop software (Windows), games (MonoGame, Unity), web development (ASP.NET Core), mobile (Xamarin) and even embedded systems (.NET Micro Framework). Some of this software (e.g. .NET Micro Framework, Unity) is written in C/C++, but it is meant to support the use of C#.

Platform

An important difference is that, while Java was always designed with portability in mind, C# was designed by Microsoft for its platforms. This has been both and advantage and a disadvantage. The disadvantage is that it was always mostly used on Microsoft platforms and according to its needs. The advantage is that it was brought and used for everything from desktop software to web development.

This has certainly changed, both in the sense that now .NET Core is designed and actively supported to be cross-platform. Also Microsoft does not use C# for everything anymore. On the other hand Microsoft keep developing what many consider the best IDE of all: Visual Studio.

The CLI standard is independent from C#, while a Java Virtual Machine (JVM) is very much tied to support Java. This means that an implementation of CLI, like the .NET Framework or Mono, can support many other languages as first-citizens, like Visual Basic.NET and F#. While this is not an advantage of C# per se it comes with choosing the language. This is useful because you can keep maintaining one platform and infrastructure, but you can keep C# with another language better suited for a specific subsytem.

JavaScript

JavaScript is a language created for client-side web development, so it might seem odd to see it as a language for all things. And indeed it is.  But this is not a suggestion, rather the statement of a fact, or an act of surrender. JavaScript will be used for everything. It started to be available for use in server-side web development, with Node.js, then it became possible to use it for mobile, for example with Apache Cordova, then desktop, with Electron and WinJs and games, with WebGL.

There are people that have build parser generators in JavaScript or even entire operating systems for JavaScript.

We have already seen the reasons that lead to the spread of JavaScript: optimization and developers expertise. While the expertise may be less applicable to everything else outside the web, the will and the support of many companies is there. So we will all have to deal with the fact that some people are willing to use JavaScript for almost everything. At this point any resistance seems futile.

Two Interesting Languages

There are two languages that we would like to talk about. They are interesting to show what could make a language useful and popular to a diverse community. They are both created in academia and great languages on technical grounds.

Prolog

Prolog is a general-purpose logic programming language used for artificial intelligence and computational linguistics. It is still quite popular in that field, but it never really left academia. This is due to its very design. Logic programming is a programming paradigm based on formal logic. So you state some fact, or truth, about an element and then the program itself find the solution. You say thinks like:

  • A is true
  • B is false only if A is true and C is a cat
  • C is a dog

Then you ask the program to tell you what is B.

This is neat, but nobody really found a way to make it work. There are really no artificial intelligence programs useful in real-life with this logical approach. Most of contemporary development in artificial intelligence relies on machine learning or similar non-logic approaches.

Another problem is technical. Prolog is a fifth-generation programming language. In this context, generation does not denote a chronological feature, but a programming language in which the developer states the constraints of the problem and the language itself will find a strategy to solve it. In the case of Prolog the constraints are logical statements.

Well, it turns out that to design a general and efficient algorithm to automatically find a solution is really hard. So the resulting programs are not efficient. This makes impossible to write large scale applications with them.

There are newer languages that attempt to improve on Prolog and make it usable in the real-world, like Mercury, but they are not in widespread use.

Haskell

Haskell was designed to be the C of functional programming languages: the definitive purely functional programming language. It largely succeeded, but for some time it remained confined to academia and the kind of industry that requires heavy use of advanced mathematics, like finance. Notice that this does not necessarily mean numerical computing.

The functional programming paradigm relies on having programming functions that behave as mathematical functions. This allows the creation of a function that does not modify outside data or have observable interactions with the outside context. Observable interaction refers to things like writing data on a file or raising an exception. Technically this means that a function has no side-effects. This also makes possible to have functions as first-class citizens. Functions are like any other type and, for example, you can use them as argument of other functions.

Expert programmers have noticed, or probably already know, that these are desirable features for avoiding many problems created by concurrency. And the rising popularity of the internet has increased the need to deal with such problems. So Haskell has increased its popularity. Facebook and Microsoft use it and there are even web frameworks for this language. On the other hand it is not specifically designed for concurrency, or any industry use, so it is less pragmatic than Erlang or Go.

What Does All This Mean

By the standard of academia Prolog and Haskell are both successful languages. Haskell is an usable and popular functional programming language and Prolog in a sense proved that logic programming does not work in practice and it is not easy to use it to solve real problems. That count as success because even a negative answer to an important question is worth the effort.

Of course, by the standard of the real world Prolog it is not that successful. Despite being still the most popular logic programming language. We personally think that Prolog is one of the most interesting languages ever created, but even we could not find a place for it in production code.

This does not just mean that technical features are not enough. That is true, but Prolog and Haskell did have a community and specific uses in mind, in addition to technical features.

It serves to illustrate a strange fact: developers are literally creating the edge of technology, but they benefit from being late adopters. You cannot predict which languages will be successful, nor why. You might spend a lot of time learning and developing with a language, but your community may choose another one. Sometimes the best programming language simply changes. So we cannot guarantee that these will be the best languages forever, only that right now many smart people think they are.

Even the real world success of Haskell might be temporary, because if history teaches us something a language like F#, could easily take its place.

F#

From the business perspective, the primary role of F# is to reduce the time-to-deployment for analytical software components in the modern enterprise

Being a purely functional programing language restricts the appeal of Haskell and it makes difficult to understand by the average programmer. It may be easier to adopt a language like F#, that is primarily functional, but support also traditional programming paradigms. F# can run on any platform that implements the CLI-standard, like C#. This also allows it to interact with other languages that run on the same platform and reuse their codebase.

Good Languages That We Could Not Fit Anywhere

There are obviously many other programming languages that we could not fit in this analysis. One of the reasons is our approach: some languages are not that popular, some do not have won over a specific community. Sometimes there was simply a better alternative. These reasons had lead to the exclusion of languages like Ruby, Perl, Object Pascal and Visual Basic.NET.

There also languages used in very specific niches, for example:

  • ADA for the US military and related industries
  • Fortran for numerical computing, especially with supercomputers
  • COBOL for finance and related industries

We did not mention them because these are very specific, so if you need to work with them you probably already know them.

Summary

Talking about the best programming language is risky. It is easy to enrage somebody and hard to say something useful to a large audience. That is why we tried to clearly explain our rationale and provide pragmatic information. We talked about technical features when necessary, but we mostly reported on the community and the best use each language has.

We also tried to make the information accessible to programmers that have no familiarity with a certain field and to non-programmers. So that, if you are working for a small business or thinking about starting one, this might help you in making a decision considering both technical and business aspects.

The Simple Way to Find the Correct Syntax

The Simple Way to Find the Correct Syntax
It happens to the best of us, you are writing some code in a Language that you use sporadically and you start asking yourself what is the correct syntax for this or that. You know many languages and you start wondering how do you iterate the elements of a collection? Is it foreach, for .. in or something else? You know the answer, you just need a second or two… it’s matter of pride! You are a professional and you are going to remember it…

Let’s not kid ourselves you are going to search for it, in a search engine or directly on stackoverflow. Or maybe you are going to search for a cheat sheet in Duck Duck Go.

A Better Alternative (Sometimes)

It works, but it’s not always perfect. Sometimes you have to browse different questions or to sift through a long and erudite answer about the performance profile of the different options of looking the characters in a string. Which is cool, but you just need to refresh how to do it not how to optimize to death a 40-lines script. So now there is an interesting alternative: SyntaxDB. It’s both a search engine for syntax and a reference. Actually it’s called reference, but it’s a bit better than a standard reference and a bit worse.

SyntaxDB Python Reference

As you can see it’s better in the sense that it provides some guidance, something that is not always available for all references, and a generic structure for many different languages. It’s also a bit worse because it’s not complete, in fact there are many level of completeness. You can’t even say that they cover all the basics (like strings) for all the languages. It seems that the author has inserted the constructs that came in his mind and then he tried to gave a structure to them and choose some languages to cover. And probably that’s pretty much what happened since it is the work of a single author. Indeed a very competent one, but it’s not service from a company.

The Future is Bright

Don’t get me wrong, it’s a good idea that it is implemented well and it actually works if you really need to find the syntax for for in python. Also the author is working on letting everybody contribute:

Something I’m very happy and surprised with is the number of people asking how they can help. The most requested feature for SyntaxDB has been a way to let you developers contribute, and I’m happy to say that’s the next feature I’ll be working on!

from the About page

It’s just that not all the content it’s there. Plus it seems that only one version of any language is supported, which is something that can really be a problem in languages such as Python which, for some reason, it’s still very much split between version 2 and 3.

Another good reason to follow the service is that is indeed well designed and thought for the contemporary world. What I mean is that there is already an API and its implementation follows the Swagger/OpenAPI Specification. There are already integrations with editors/IDEs such as Visual Studio Code and Atom. And there is even one for the Duck Duck Go search engine and a bot for Slack. I think it’s off to a good start, it just need some time to develop.

 

A template system for Google Docs: Google Drive automation and PDF generation with Google Execution API

My consulting business is getting more steam and I am starting to be annoyed by the administrative steps. Basically when I need to prepare a new invoice I have to:

  • copy I template I have in my Google drive
  • fill the data
  • download a PDF
  • archive the PDF on Dropbox
  • send the PDF to the customer
  • if the customer is in the European Union (outside France) I need to fill a declaration for the “Douane”. This is what is called an “intrastat” declaration in some places

Now, this is not the most exciting and creative part of the job so I automated some of the process.

Right now I have a script that can create a copy of the template fill it, generate the PDF and download it. I still need to automate the part in which I upload the PDF to Dropbox, but for now I could just copy the PDF in my Dropbox local checkout.

Google Developers Console

Now, this is the boring part and it is easy to miss something. I will try to recollect from memory what I did and wish you good luck. After all I do not want to make things too easy. That would be boring, wouldn’t it?

First, visit https://console.developers.google.com and create a project.

Then add permissions to that project, selecting the Google Drive API and the Google Apps Script Execution API.

Finally go to credentials and generate the “OAuth client ID” credentials. You should get a file to download. It is a JSON file containing the credentials for your project.

Good, enough with the boring bits, let’s start to program.

Loading the data

For now the data for the invoices is kept in a simple JSON file. Later I could store it in a Google Spreadsheet document. At that point I could trigger the invoice generation when I add some data to that file.

Right now instead my script take 2 parameters:

  1. the name of the file containing the data
  2. the number of the invoice I want to generate

So the usage scenario is this: first of all I open the data file and add the data for the new invoice. Typically I copy the data from a previous invoice for the same customer and I adapt it. Then I close the file and run the script specifying the number of the new invoice to generate. If I need it I could also regenerate an old invoice just by running the script with the number of that invoice, without the need to touch the data file.

This is the code which parses the arguments and load the data:

An example of data file:

 

Finding the template and cloning it

In my Google Drive I have a directory named DriveInvoicing which contains a Google Doc named Template. Here it is the first page:

Screenshot from 2016-04-12 21-47-21

The second page contains uninteresting legalese in both French and English: French because I am supposed to write my invoices in French, given that I am located in France. English because most of my clients do not speak any French.

The code to locate the template file is this:

Copying the template and filling it

First of all we create a copy of the template:

Then we execute a Google Script on it:

Finally we download the document as a PDF:

The script which fill the data is this:

This is created in the online editor for Google Scripts:

Screenshot from 2016-04-12 22-02-58

What we got

This is the final result:

Screenshot from 2016-04-12 21-49-22

ENOUGH, GIMME THE CODE!

Code is available on GitHub: https://github.com/ftomassetti/DriveInvoicing

Recognizing hand-written rectangles in an image

Machine learning for points classification?

Last time we have seen how to identify key points in an image. I was then thinking to use machine learning techniques to recognize the roles played by each point. I played for a while with Weka, a tool which make very easy to experiment with different Machine Learning algorithms. To identify the features to use in the classification I used this strategy:

  • I draw two concentric circles around the points of interest: one close and one further away
  • I identified the intersection of the contour to which the key point belonged and the concentric circles
  • I splitted the circles in 12 parts and counted how many intersections were falling into each of those parts
  • I then used those 24 values for the classification

..if simple heuristics can do…

However I realized that there was no actual need for machine learning. I could instead very simple heuristics. After all I was just looking for the corners of the rectangles so I considered only points which had two intersections for the closest and the farthest circle. Then I considered the angle of this intersection, basically looking for something around 90°. Then considering the orientation of the corner I classified it as a top-left, top-right, bottom-left or bottom-right corner.

Once I have classified the points I started looking for top-left corners and considered matching bottom-right corners. I just took the closest one in the right direction. Once I have a pair of top-left and bottom-right corners I know where to look for the missing corners: the top-right is supposed to have an x equal to the one of the bottom-right point and an y equal to the one of the top-left corner, viceversa for the bottom-left point. If I can find these two points where I am looking for them I consider the rectangle complete.

Finally I have just to check if I recognized overlapping rectangles: in that case I just throw away the smaller ones.

This algorithm is not perfect but I get decent results:

res_whiteboard1

Why I am not using OpenCV

When we are manipulation images OpenCV is the obvious answer, however I did not get good result with it. It seems that the typical algorithms for detecting rectangles are confused by the fact the contourns I found are not rectangles. This is because of the connections between rectangles (the lines linking the rectangles). I tried a few thing but I did not get any good result. In addition to that OpenCV is written in C/C++ and that basically means that deploying it is much more cumbersome. My current solution is Java based and that means that I can easily run it on every possible platform without headaches. I will have another look at OpenCV and I am very open to suggestion. In fact a friend of mine just gave me a couple of nice ideas to try.

Code, where is the code?

You know, words are nice and all but the only thing that really matters is code. You can grab it on GitHub, here: https://github.com/ftomassetti/SketchModel

Functional programming for Java: getting started with Javaslang

Java is an old language and there are many new kids in the block who are challenging it on its own terrain (the JVM). However Java 8 arrived and brought a couple of interesting features. Those interesting features enabled the possibility of writing new amazing frameworks like the Spark web framework or Javaslang.

In this post we take a look at Javaslang which brings functional programming to Java.

Functional programming: what is that good for?

It seems that all the cool developers want to do some functional programming nowadays. As they wanted to use Object-oriented programming before. I personally think functional programming is great to tackle a certain set of problems, while other paradigms are better in other cases.

Functional programming is great when:

  • you can pair it with immutability: a pure function has not side-effect and it is easier to reason about. Pure functions means immutability, which drastically simplifies testing and debugging. However not all solutions are nicely represent with immutability. Sometimes you just have a huge piece of data that it is shared between several users and you want to change it in place. Mutability is the way to go in that case.
  • you have code which depends on inputs, not on state: if something depends on state instead than on input it sounds more like a method that a function to me. Functional code ideally should make very explicit which information is using (so it should use just parameters). That also means more generic and reusable functions.
  • you have independent logic, which is not highly coupled: functional code is great when it is organized in small, generic and reusable functions
  • you have streams of data that you want to transform: this is in my opinion the easiest place where you can see the values of functional programming. Indeed streams received a lot of attention in Java 8.

Discuss the library

As you can read on javaslang.com:

Java 8 introduced λ which dramatically increases the expressiveness of our programs, but “Clearly, the JDK APIs won’t help you to write concise functional logic (…)”jOOQ™ blog

Javaslang™ is the missing part and the best solution to write comprehensive functional Java 8+ programs.

This is exactly as I see Javaslang: Java 8 gave us the enabling features to build more concise and composable code. But it did not do the last step. It opened a space and Javaslang arrived to fill it.

Javaslang brings to the table many features:

  • currying: currying can be use to implement the partial application of functions
  • pattern matching: let’s think of it as the dynamic dispatching for functional programming
  • failure handling: because exceptions are bad for function compositions
  • Either: this is another structure which is very common in functional programming. The typical example is a function which returns a value when things go well and an error message when things go not so well
  • tuples: tuples are a nice lightweight alternatives to objects and perfect to return multiple values. Just do not be lazy and use classes when it makes sense to do so
  • memoization: this is caching for functions

For developers with experience in functional programming this will all sound very well known. For the rest of us let’s take a look at how we can use this stuff in practice.

Ok, but in practice how can we use this stuff?

Obviously showing an example for each of the feature of Javaslang is far beyond the scope of this post. Let’s just see how we could use some of them and in particular let’s focus on the bread and butter of functional programming: functions manipulation.

Given that I am obsessed with manipulation of Java code we are going to see how we can use Javaslang to examine the Abstract Syntax Tree (AST) of some Java code. The AST can be easily obtained using the beloved JavaParser.

If you are using gradle your build.gradle file could look like this:

We are going to implement very simple queries. Queries we can be answered just looking at the AST without solving symbols. If you want to play with Java ASTs and solve symbols you may want to take a look at this project of mine: java-symbol-solver.

For example:

  • find classes with a method with a given name
  • find classes with a method with a given number of parameters
  • find classes with a given name
  • combining the previos queries

Let’s start with a function which given a CompilationUnit and a method name returns a List of TypeDeclarations defining a method with that name. For people who never used JavaParser: a CompilationUnit represents an entire Java file, possibly containing several TypeDeclarations. A TypeDeclaration can be a class, an interface, an enum or an annotation declaration.

getTypesWithThisMethod is very simple: we take all the types in the CompilationUnit (cu.getTypes()) and we filter them, selecting only the types which have a method with that name. The real work is done in hasMethodNamed.

In hasMethodNamed we start by creating a javaslang.collection.List from our java.util.List (List.ofAll(typeDeclaration.getMembers())Then we consider that we are only interested in the MethodDeclarations: we are not interested in field declarations or other stuff contained in the type declaration. So we map each method declaration to either Option.of(true) if the name of the method matches the desidered methodName, otherwise we map it toOption.of(false). Everything that is not a MethodDeclaration is mapped to Option.none(). Note that we do that in two steps: first the method is mapped to an Option<String> then the Option<String> is mapped to an Option<Boolean>.

So for example, if we are looking for a method name “foo” in a class which has three fields, followed by methods named “bar”, “foo” and “baz” we will get a list of:

Option.none(), Option.none(), Option.none(), Option.of(false)Option.of(true)Option.of(false)

The next step is to map both Option.none() and Option.of(false) to false and Option.of(true) to true. Note that we could have than that immediately instead of having two maps operation concatenated. However I prefer to do things in steps. Once we get a list of true and false we need to derive one single value out of it, which should be true if the list contains at least one true, and false otherwise. Obtaining a single value from a list is called a reduce operation. There are different variants of this kind of operation: I will let you look into the details 🙂

We could rewrite the latest method like this:

Why we would like to do so? It seems (and it is) much more complicate but it shows us how we can manipulate functions and this is an intermediate step to obtain code which is more flexible and powerful. So let’s try to understand what we are doing.

First a quick note: the class Function1 indicates a function taking one parameter. The first generic parameter is the type of the parameter accepted by the function, while the second one is the type of the value returned by the function. Function2 takes instead 2 parameters. You can understand how this goes on 🙂

We:

  • reverse the order in which parameters can be passed to a function
  • we create a partially applied function: this is a function in which the first parameter is “fixed”

So we create our originalFunctionReversedAndCurriedAndAppliedToMethodName just manipulating the original function hasMethodNamed. The original function took 2 parameters: a TypeDeclaration  and the name of the method. Our elaborated function takes just a TypeDeclaration. It still returns a boolean.

We then simply transform our function in a predicate with this tiny function which we could reuse over and over:

Now, this is how we can make it more generic:

Ok, now we could generalize also hasMethodWithName:

After some refactoring we get this code:

Now let’s see how it can be used:

The source file we used in this tests is this one:

This is of course a very, very, very limited introduction to the potentialities of Javaslang. What I thinki is important to get for someone new to functional programming is the tendence to write very small functions which can be composed and manipulates to obtain very flexible and powerful code. Functional programming can seem obscure when we start using it but if you look at the tests we wrote I think they are rather clear and descriptive.

Functional Programming: is all the hype justified?

I think there is a lot of interest in functional programming but if that becomes hype it could lead to poor design decisiong. Think about the time when OOP was the new rising star: the Java designers went all the way down forcing programmers to put every piece of code in a class and now we have utility classes with a bunch of static methods. In other words we took functions and asked them to pretend to be a class to gain our OOP medal. Does it make sense? I do not think so. Perhaps it helped to be a bit extremist to strongly encourage people to learn OOP principles. That is why if you want to learn functional programming you may want to use functional-only languages like Haskell: because they really, really, really push you into functional programming. So that you can learn the principles and use them when it does make sense to do so.

Conclusions

I think functional programming is a powerful tool and it can lead to very expressive code. It is not the right tool for every kind of problem, of course. It is unfortunate that Java 8 comes without proper support for functional programming patterns in the standard library. However some of the enabling features have been introduced in the language and Javaslang is making possible to write great functional code right now. I think more libraries will come later, and perhaps they will help keeping Java alive and healthy for a little longer.

 

Note: thanks to Lorenzo Bettini for pointing out a couple of mistakes

A tutorial on using Sql2o with Spark and other updates

A few weeks ago I wrote a tutorial on getting started with Spark (the Java web framework). A few readers appreciated it and it was linked by the Jetbrains blog, republished by DZone and republished by the new Spark tutorials blog.

After that me and David Åse chatted a bit and we decided to work together on a few tutorials to publish on the Spark tutorials blog. So today we publish the first of hopefully a long list: Spark and Databases: Configuring Spark to work with Sql2o in a testable way.

Content of the tutorial on Sql2o + Spark

  • see when to use an ORM and when not

  • how to organize the code that access the database and integrate it with the controllers

  • how to use Sql2o

  • we put everything together and improve the BlogService we have started in the first post on Spark.

At the end we will have something like this:

5069583_orig

Plans for the future

David is a great guy that among the other things rewrote the Spark website (does look cool, eh?). I asked him how he was involved in Spark and we are working on a short interview, similar to the one I had with Luca Barbato: I think it always inspiring to learn how people started giving back to the open-source community.

Reviewing, reviewing, reviewing

In the rest of the week I have been fairly busy doing a technical reviews for two books from the Pragmatic Bookshelf (did I tell already that I love their books?). It required a fair amount of effort but I learned a few things on topics I would not have time to spend time on normally, so I am fairly happy.

Getting started with Spark: it is possible to create lightweight RESTful application also in Java

Recently I have been writing a RESTful service using Spark, a web framework for Java (which is not related to Apache Spark). When we planned to write this I was ready to the unavoidable Javaesque avalanche of interfaces, boilerplate code and deep hierarchies. I was very surprised to find out that an alternative world exists also for the developers confined to Java.

In this post we are going to see how to build a RESTful application for a blog, using JSON to transfer data. We will see:

  • how to create a simple Hello world in Spark
  • how to specify the layout of the JSON object expected in the request
  • how to send a post request to create a new post
  • how to send a get request to retrieve the list of posts

We are not going to see how to insert this data in a DB. We will just keep the list in memory (in my real service I have been using sql2o).

Note: I wrote a bunch of other tutorials on Spark. Take a look at Spark tutorials website.

A few dependencies

We will be using Maven so I will start by creating a new pom.xml throwing in a few things. Basically:

  • Spark
  • Jackson
  • Lombok
  • Guava
  • Easymock (used only in tests, not presented in this post)
  • Gson

Spark hello world

Do you have all of this? Cool let’s write some code then.

And now we can run it with something like:

Let’s open a browser and visit localhost http://localhost:4567/posts. Here we want to do a simple get. For performing posts you could want to use the Postman plugin for your browser or just run curl. Whatever works for you.

Using Jackson and Lombok for awesome descriptive exchange objects

In a typical RESTful application we expect to receive POST requests with json objects as part of the payload. Our job will be to check the code is well-formed JSON, that it corresponds to the expected structure, that the values are in the valid ranges, etc. Kind of boring and repetitive. We could do that in different ways. The most basic one is to use gson:

We probably do not want to do that.

A more declarative way to specify what structure we expect is creating a specific class.

And then we could use Jackson:

In this way Jackson check automatically for us if the payload has the expected structure. We could want to verify if additional constraints are respected. For example we could want to check if the title is not empty and at least one category is specified. We could create an interface just for validation:

Still we have a bunch of boring getters and setters. They are not very informative and just pollute the code. We can get rid of them using Lombok. Lombok is an annotation processor that add repetitive methods for you (getters, setters, equals, hashCode, etc.). You can think of it as a plugin for your compiler that looks for annotations (like @Data) and generates methods based on them. If you add it to your dependencies maven will be fine but your IDE could not give you auto-completion for the methods that Lombok adds. You may want to install a plugin. For Intellij Idea I am using Lombok Plugin version 0.9.1 and it works great.

Now you can revise the class NewPostPayload as:

Much nicer, eh?

A complete example

We need to do basically two things:

  1. insert a new post
  2. retrieve the whole list of posts

The first operation should be implemented as a POST (it has side effects), while the second one as a GET. Both of them are operation on the posts collection so we will use the endpoint /posts .

Let’s start by inserting  post. First of all we will parse

And then see how to retrieve all the posts:

And the final code is:

 

Using PostMan to try the application

You may want to use curl instead, if you prefer the command line. I like not having to escape my JSON and having a basic editor so I use PostMan (a Chrome plugin).

Let’s insert a post. We specify all the fields as part of a Json object inserted in the body of the request. We get back the ID of the post created.

Screen Shot 2015-03-30 at 17.25.22

Then we can get the list of the posts. In this case we use a GET (no body in the request) and we get the data of all the posts (just the one we inserted above).

Screen Shot 2015-03-30 at 17.30.33

Conclusions

I have to say that I was positively surprised by this project. I was ready for the worse: this is the kind of application that requires a basic logic and a lot of plumbing. I found out that Python, Clojure and Ruby do all a great jobs for this kinds of problems, while the times I wrote simple web applications in Java the logic was drown in boilerplate code. Well, things can be different. The combination of Spark, Lombok, Jackson and Java 8 is really tempting. I am very grateful to the authors of these pieces of software, they are really improving the life of Java developers. I consider it also a lesson: great frameworks can frequently improves things much more than we think.

Edit: I received a suggestion to improve one of the example from the good folks on reddit. Thanks! Please keep the good suggestions coming!

Getting started with Docker from a developer point of view: how to build an environment you can trust

Lately I have spent a lot of thoughts on building repeatable processes that can be trusted. I think that there lies the difference between being an happy hacker cracking out code for the fun of it and an happy hacker delivering something you can count on. What makes you a professional it is a process that is stable, is safe and permit you to evolve without regressions.

As part of this process I focused more on Continuos Integration and on techniques for testing. I think a big part of having a good process is to have an environment you can control, easily configure and replicate as you want. Have you ever updated something on your development machine and all the hell breaks loose? Well, I do not like that. Sure, there are a few tools we can use:

  • Virtualenv when working on python, to isolate the libraries you want to access
  • RVM and Gemfiles to play with different versions of Ruby/JRuby + libraries for different projects
  • Cabal, which permits to specify project specific sets of libraries for Haskell projects (and BTW good luck with that…)
  • Maven to specify which version of the java compiler you want to use and which dependencies

These tools help a lot, but they are not nearly enough. Sometimes you have to access shared libraries, sometimes you need a certain tool (apache httpd? MySQL? Postgresql?) installed and configured in a certain way, for example:

  • you could need to have an apache httpd configured on a certain port, for a certain domain name
  • you could need a certain set of users for your DB, with specific permissions set
  • you could need to use a specific compiler, maybe even a specific version (C++’11, anyone?)

There are many things that you could need to control to have a fully replicable environment. Sometimes you can just use some scripts to create that environment and distribute those scripts. Sometimes you can give instructions, listing all the steps to replicate that environment. The problem is that other contributors could fail to execute those steps and your whole environment could be messed up when you update something in your system. When that happen you want a button to click to return to a known working state.

You can easily start having slightly different environments w.r.t. your other team members or the production environment and inconsistencies start to creep in. Moreover if you have a long setup process, it could be take a long time to you to recreate the environment on a new machine. When you need to start working on another laptop for whatever reason you want to be able to do that easily, when you want someone to start contributing to your open-source projects you want to lower the barriers.

It is for all these reasons that recently I started playing with Docker.

What is Docker and how to install it

Basically you can imagine Docker as a sort of lightweight alternative to VirtualBox or other similar hypervisors. Running on a linux box, you can create different sort-of virtual-machines all using the kernel of the “real” machine. However you can fully isolate those virtual machines, installing specific versions of the tools you need, specific libraries, etc.

Docker runs natively only on Linux. To use it under Mac OS-X or Windows you need to create a lightweight virtual machine running Linux and Docker will run on that virtual machine. However the whole mess can be partially hidden using boot2docker. It means some additional headaches but you can survive that, if you have to. If I can I prefer to ssh on a Linux box and run Docker there, but sometimes it is not the best solution.

To install docker on a Debian derivative just run:

Our example: creating two interacting docker containers

Let’s start with a simple example: let’s suppose you want to develop a PHP application (I am sorry…) and you want to use MySQL as your database (sorry again…).

We will create two docker containers: on the first one we will install PHP, on the second one MySQL. We will make the two containers communicate and access the application from the browser on our guest machine. For simplicity we will run PhpMyAdmin instead of developing any sample application in PHP.

The first Docker container: PHP

Let’s start with something very simple: let’s configure a Docker image to run httpd under centos6. Let’s create a directory named phpmachine and create a file named Dockerfile.

Note that this is a very simple example: we are not specifying a certain version of httpd to be installed. When installing some other software we could want to do that.

From the directory containing the Dockerfile run:

This command will create a container as described by he instructions. As first thing it will download a Centos 6 image to be used as base of this machine.

Now running docker images you should find a line similar to this one:

You can now start this container and login into it with this command:

Once you are logged into the container you can start Apache and find out the IP of the docker machine running it:

Now, if you type that IP in a browser you should see something like this:

Screenshot from 2015-03-08 17:13:53

Cool, it is up and running!

Let’s improve the process so that 1) we can start the httpd server without having to use the console of the docker container 2) we do not have to figure out the IP of the container.

To solve the first issue just add this line to the Dockerfile:

Now rebuild the container and start it like this:

In this way the port 80 of the docker container is re-mapped into the port 80 of the host machine. You can now open a browser and use the localhost or 127.0.0.1 address.

Wonderful, now let’s get started with the MySQL server.

The second Docker container: MySQL server

We want to create a Dockerfile in another directory and add in the same directory a script named config_db.sh.

Note: we are not saving in any way the data of our MySQL DB, so every time we restart the container we lose everything.

Now we can build the machine:

Then we can run it:

And we can connect from our “real box” to the mysql server running in the docker container:

Does everything works as expected so far? Cool, let’s move on.

Make the two docker containers communicate

Let’s assign a name to the mysql container:

Now let’s start the PHP container telling it about the mysqlcontainer:

From the console of the phpmachine you should be able to ping dbhost (the name under which the phpmachine can reach the mysql container). Good!

In practice a line is added to the /etc/hosts file of the phpmachine, associating dbhost with the IP of our mysqlmachine.

Installing PHPMyAdmin

We are using PHPMyAdmin as the placeholder for some application that you could want to develop. When you develop an application you want to edit it on your development machine and making it available to the docker container. So, download PhpMyAdmin version 4.0.x (later versions require mysql 5.5, while centos 6 uses mysql 5.1) and unpack it in some directory, suppose it is in ~/Downloads/phpMyAdmin-4.0.10-all-languages. Now you can run the docker container with php like this:

This will mount the directory with the source code of PhpMyAdmin on /var/www/html in the* phpmachine*, which is the directory which Apache httpd is configured to serve.

At this point you need to rename config.sample.inc.php in config.inc.php and change this line:

In this way the phpmachine should use the db on the mysqlmachine.

Now you should be able to visit localhost and see a form.There insert the credentials for the db: myuser, myuserpwd and you should be all set!

Screenshot from 2015-03-09 19:59:07

How does Docker relate with Vagrant and Ansible, Chef, Puppet?

There are a few other tools that could help with managing virtual machines and sort-of-virtual machines. If you are a bit confused about the relations between different tools this is an over-simplistic summary:

  • Vagrant is a command line utility to manage virtual machines, but we are talking about complete simulations of a machine, while Docker uses the kernel from the Docker host, resulting in much lighter “virtual machines” (our Docker containers)
  • Ansible, Chef and Puppet are ways to manage the configuration of these machines (operationalising processes) they could be used in conjunction with Docker. Ansible seems much lighter compared to Chef and Puppet (but slightly less powerful). It is gaining momentum among Docker users and I plan to learn more about it.

This post gives some more details about the relations between these tools.

Conclusions

In our small example we could play with a realistic simulation of the final production setup, which we suppose composed by two machines running CentOS 6. By doing so we have figured out a few things (e.g., we have packages for MySQL 5.1 and it forces us to not use the last version of PhpMyAdmin, we know the complete list of packages we need to install, etc.). In this way we can reasonably expects very few surprised when deploying to the production environment. I strongly believe that having less surprises is extremely good.

We could also just deploy the docker containers itself if we want so (I have never tried that yet).

Update: I am happy the guys at Docker cited this article in their weekly newsletter, thanks!

Portability: stories of what can go wrong when run your code on another machine

In the last year I faced many surprises when running some well tested code on my dev-servers or my laptops. It is curious (and scaring) how code that has been widely used in production (sometimes for years) can still hide portability issues so that the first time you try that piece of software in slightly different conditions the unexpected happens.

I have experienced that both when working on some open-source projects and in some very big companies. The difference probably is that such problems tend to emerge sooner in open-source projects, if there is an active userbase, while in companies that control their development environment these little time bombs can remain silent and struck a lot of time after being put in place. In the following a list a few categories of portability issues that caused problems.

Locale configuration

This is something we constantly overlook but a lot of libraries do assumptions according to the locale configured on the current machine. If you are on a unix-ish box (linux, bsd, mac, etc.) open a console a run locale. You will get something similar.

Screen Shot 2015-02-09 at 10.08.54

These environment variables could affect the way dates are parsed or the even numbers are parsed. For example in Italian we use the comma instead of the dot to separate the integer from the fractional part of numbers so that “12.14” could not be parsed if you locale is set to Italian and be parsed if it is set to UK English. Or American expect the month to precede the day in dates. So:

02/01/2015

Could be the 1st of February for an American or the 2nd of January in most European countries. The way it is parsed could depend on the locale configuration.

You will notice that the locale configuration contains also a default encoding (UTF-8) in my case, so I would imagine that also encoding problems with text are possible. I did not face them yet this year but I will keep an eye open in that.

Locale configuration… over SSH

A variant of the previous problem (or a multiplier of it) is that locale configuration can be transferred when ssh-ing on a machine. By default if you connect, let’s say, from a machine with an Irish locale to a machine with an US locale the console opened will be configured with the Irish locale. Imagine how fun is to try to debug this problem: a colleague of yours (with the American locale) ssh into that machine and does not see any problem, then you ssh into it and run in the problem magically appearing just for Irish folks (should we suspect Leprechauns?).

How can you avoid that? Simple, you can solve it either preventing the client to send the environment configuration or preventing the server from accepting it. To prevent the client from sending it open your /etc/ssh_config and look for these lines:

Now, remove these bad boys and save yourself some headaches. For preventing the server from accepting it you have to look for the configuration of the ssh daemon (sshd).

Bonus solution: fix your software to not depend on the locale configuration

Poor man solution: force the locale to the holy working value (typically en_US.UTF-8) before starting compiling or running the locale-dependant/buggy application

Timezone

I found out that some tests were passing if they were ran in a certain timezone… hint: was not the timezone where I was in

Why was it happening? Because some functions had an hard-coded timezone, while others had not. Now, it has been very confusing to solve this issue because a value obtained from parsing a date like: 1/1/2015 ended up being transformed in 2/1/2015 (2nd of January) after a few passages. So, be sure to not being silently using the current timezone in some places and use an hard-coded one (says, UTC) in others. Or be ready to deal with weird bugs. I wonder what happens when the summer time is enabled or disabled… fun time.

Version dependent implementations

Sometimes the problem is that you are doing something really stupid and do not realize because it happens to work on a very specific configuration. Those are among my favourite bugs. Suppose for example that you write a test checking if a certain value is present as the first element of an array. So far so good. The problem is this array is obtained by iterating over a Set which does not give any guarantees about the order of the iterated elements (they are not sorted by any known and sensible function and they are not necessarily in the same order they were inserted).
Now, until you run your tests on a machine with the same architecture, and the same version of the standard libraries (the same JDK in this case) you do not notice any issue, and you will not notice them until a new version of the JDK is released which return the values of that implementation of Set in a different order (absolutely legit). And now your tests do not pass. Have fun finding out the root cause.

Compilers

This will deserve a series of post of its own. I experienced that while working on C++ code using some features from C++ ’11. In particular I was trying to make the some codebase work on:

  • gcc
  • clang
  • mingw
  • visual c

I was very surprised by the warnings (and even errors) that some compilers report on code that other compilers are perfectly fine with. The worst thing was one function (a pretty important one) of the standard library were not available under one particular platform. I figured out after I started using that function all over the same and when I tried to port my application to a new compiler, I ended up making the feature using that function unavailable/crappy on that platform. Definitely not satisfying but at least I remembered why I stopped programming in C++. The advantages of the JVM are easily overlooked. And everything in the end is easier to port than C++ code.

Conclusions

This sort of issues make me wonder how software can work at all: the number of possible errors that can go unnoticed is simply mesmerising.  I think the only answer is release, test, stress your code in any way possible and be anyway ready to face all sort of problems leading to interesting debugging sessions. If you have talented, well-educated and patient developers maybe your code will be working as desired a reasonably portion of time. Maybe.

 

Getting started with Frege: Hello World and basic setup using Maven

I spent a couple of hours playing with Frege (Haskell on the JVM) and not much documentation tutorials seems available. I am trying to help writing this simple Hello World tutorial.

The code is available on Github: https://github.com/ftomassetti/frege-tutorial/tree/01_HelloWorld

Update: Frege has some very useful documentation at http://www.frege-lang.org/doc/… where … represents the package, or module, name. For example, if one needs some reference for the frege.java.util.Regex package, one looks at http://www.frege-lang.org/doc/frege/java/util/Regex.html

Frege source code

The code is very simple for our little hello world example. In this tutorial we focus mainly on configuring our environment.

We declare the name of module to be HelloWorld. It will affect the name of the Java class produced.

The third line defines the type signature of the main function, while the fourth lines define main as a call to putStrLn using an IO Monad. In practice, you have to do the operations which affect the real world (like reading from a file or writing to the screen) inside a do statement. The reason is that the compiler treat them differently from pure functions, which can be optimized in several ways (lazyness, memoization, etc.) while “realworld operations” cannot.

Writing the POM (Maven configuration file)

First let’s take a look to the whole file:

The dependencies contain frege, no surprises here:

We then use two plugins, to compile Frege code and Java code:

Finally we save the classpath used by Maven in a file (classpath.conf) by using  the maven-dependency-plugin

The classpath.conf file will be useful for running the application using the run.sh script.

Running the application, the run.sh script

To run the application we need the frege jar and the classes generated from our frege source code.

Compile and running HelloWorld

After cloning the repository, you can simply run:

The result, should be something like:

[federico@normandie frege-tutorial]$ sh run.sh
Hello world. Frege is a lot of fun!
runtime 0.001 wallclock seconds.