Things that Can Go Wrong when Running Code on Another Machine

In the last year I faced many surprises when running some well tested code on my dev-servers or my laptops. It is curious (and scaring) how code that has been widely used in production (sometimes for years) can still hide portability issues so that the first time you try that piece of software in slightly different conditions the unexpected happens.

I have experienced that both when working on some open-source projects and in some very big companies. The difference probably is that such problems tend to emerge sooner in open-source projects, if there is an active userbase. While in companies that control their development environment these little time bombs can remain silent and struck a lot of time after being put in place. Here there is a list a few categories of portability issues that caused problems.

Locale Configuration

This is something we constantly overlook but a lot of libraries do assumptions according to the locale configured on the current machine. If you are on a unix-ish box (linux, bsd, mac, etc.) open a console a run locale. You will get something similar.

These environment variables could affect the way dates are parsed or the even numbers are parsed. For example in Italian we use the comma instead of the dot to separate the integer from the fractional part of numbers. So 12.14 could not be parsed, if you locale is set to Italian and be parsed if it is set to UK English. Another example is that American expect the month to precede the day in dates. So:

02/01/2015

Could be the 1st of February for an American or the 2nd of January in most European countries. The way it is parsed could depend on the locale configuration.

You will notice that the locale configuration contains also a default encoding (UTF-8) in my case, so I would imagine that also encoding problems with text are possible. I did not face them yet this year but I will keep an eye open in that.

Locale Configuration… Over SSH

A variant of the previous problem (or a multiplier of it) is that locale configuration can be transferred when ssh-ing on a machine. By default if you connect, let’s say, from a machine with an Irish locale to a machine with an US locale the console opened will be configured with the Irish locale. Imagine how fun is to try to debug this problem: a colleague of yours (with the American locale) ssh into that machine and does not see any problem, then you ssh into it and run in the problem magically appearing just for Irish folks (should we suspect Leprechauns?).

How can you avoid that? Simple, you can solve it either preventing the client to send the environment configuration or preventing the server from accepting it. To prevent the client from sending it open your /etc/ssh_config and look for these lines:

# Site-wide defaults for some commonly used options. For a comprehensive
# list of available options, their meanings and defaults, please see the
# ssh_config(5) man page.

Host *
SendEnv LANG LC_*

Now, remove these bad boys and save yourself some headaches. For preventing the server from accepting it you have to look for the configuration of the ssh daemon (sshd).

Bonus solution: fix your software to not depend on the locale configuration

Poor man solution: force the locale to the holy working value (typically en_US.UTF-8) before starting compiling or running the locale-dependant/buggy application

Timezone

I found out that some tests were passing if they were ran in a certain timezone… hint: was not the timezone where I was in.

Why was it happening? Because some functions had an hard-coded timezone, while others had not. Now, it has been very confusing to solve this issue because a value obtained from parsing a date like: 1/1/2015 ended up being transformed in 2/1/2015 (2nd of January) after a few passages. So, be sure to not being silently using the current timezone in some places and use an hard-coded one (says, UTC) in others. Or be ready to deal with weird bugs. I wonder what happens when the summer time is enabled or disabled… fun time.

Version Dependent Implementations

Sometimes the problem is that you are doing something really stupid and do not realize because it happens to work on a very specific configuration. Those are among my favorite bugs. Suppose, for example, that you write a test checking if a certain value is present as the first element of an array. So far so good. The problem is this array is obtained by iterating over a Set which does not give any guarantees about the order of the iterated elements (they are not sorted by any known and sensible function and they are not necessarily in the same order they were inserted).
Now, until you run your tests on a machine with the same architecture, and the same version of the standard libraries (the same JDK in this case) you do not notice any issue, and you will not notice them until a new version of the JDK is released which return the values of that implementation of Set in a different order (absolutely legit). And now your tests do not pass. Have fun finding out the root cause.

Compilers

This would deserve a series of post of its own. I experienced that while working on C++ code using some features from C++ ’11. In particular I was trying to make the some codebase work on:

gcc
clang
mingw
visual c

I was very surprised by the warnings (and even errors) that some compilers report on code that other compilers are perfectly fine with. The worst thing was one function (a pretty important one) of the standard library were not available under one particular platform. I figured out after I started using that function all over the same and when I tried to port my application to a new compiler, I ended up making the feature using that function unavailable/crappy on that platform. Definitely not satisfying but at least I remembered why I stopped programming in C++. The advantages of the JVM are easily overlooked. In the end, everything is easier to port than C++ code.

Conclusions

This sort of issues make me wonder how software can work at all: the number of possible errors that can go unnoticed is simply mesmerizing. I think the only answer is release, test, stress your code in any way possible and be anyway ready to face all sort of problems leading to interesting debugging sessions. If you have talented, well-educated and patient developers maybe your code will be working as desired a reasonably portion of time. Maybe.

Things that Can Go Wrong when Running Code on Another Machine

Locale Configuration

Locale Configuration… Over SSH

Timezone

Version Dependent Implementations

Compilers

Conclusions

Categories

More on Software Development

Do Software Architects Still Matter in the Age of GenAI?

Enso: a Platform for Data-driven Applications