Ideas and tips on how to make open source project successful and my open source projects

An introduction to Spark, your next REST Framework for Java

This is a post I wrote for the Java Advent. It was initially published here.

Today we’re going to look into a refreshing, simple, nice and pragmatic framework for writing REST applications in Java. It will be so simple, it won’t even seem like Java at all.

We’re going to look into the Spark web framework. No, it’s not related to Apache Spark. Yes, it’s unfortunate that they share the same name.

I think the best way to understand this framework is to build a simple application, so we’ll build a simple service to perform mathematical operations.

We could use it like this:

Screenshot from 2015-11-26 14-57-18

Note that the service is running on localhost at port 4567 and the resource requested is “/10/add/8”.

Set up the Project Using Gradle (what’s Gradle?)

Now we can run:

  • ./gradlew idea to generate an IntelliJ IDEA project
  • ./gradlew test to run tests
  • ./gradlew assemble to build the project
  • ./gradlew launch to start our service

Great. Now, Let’s Meet Spark

Do you think we can write a fully functional web service that performs basic mathematical operation in less than 25 lines of Java code? No way? Well, think again:

In our main method we just say that when we get a request which contains three parts (separated by slashes) we should use the Calculator route, which is our only route. A route in Spark is the unit which takes a request, processes it, and produces a response.

Our calculator is where the magic happens. It looks in the request for the paramters “left”, “operatorName” and “right”. Left and right are parsed as long values, while the operatorName is used to find the operation. For each operation we have a Function (Function2<Long, Long>) which we then apply to our values (left and right). Cool, eh?

Function2 is an interface which comes from the Javaslang project.

You can now start the service (./gradlew launch, remember?) and play around.

The last time I checked Java was more verbose, redundant, slow… well, it is healing now.

Ok, but what about tests?

So Java can actually be quite concise, and as a Software Engineer I celebrate that for a minute or two, but shortly after I start to feel uneasy… this stuff has no tests! Worse than that, it doesn’t look testable at all. The logic is in our calculator class, but it takes a Request and produces a Response. I don’t want to instantiate a Request just to check if my Calculator works as intended. Let’s refactor a little:

We just separate the plumbing (taking the values out of the request) from the logic and put it in its own method: calculate. Now we can test calculate.

I feel better now: our tests prove that this stuff works. Sure, it will throw an exception if we try to divide by zero, but that’s how it is.

What does that mean for the user, though?

Screenshot from 2015-11-26 15-14-08

It means this: a 500. And what happens if the user tries to use an operation which does not exist?

Screenshot from 2015-11-26 15-14-30

What if the values are not proper numbers?

Screenshot from 2015-11-26 15-16-01

Ok, this doesn’t seem very professional. Let’s fix it.

Error handling, functional style

To fix two of the cases we just have to use one feature of Spark: we can match specific exceptions to specific routes. Our routes will produce a meaningful HTTP status code and a proper message.

We have still to handle the case of a non-existent operation, and this is something we are going to do in ReallyTestableCalculator.

To do so we’ll use a typical function pattern: we’ll return an EitherAn Either is a collection which can have either a left or a right value. The left typically represents some sort of information about an error, like an error code or an error message. If nothing goes wrong the Either will contain a right value, which could be all sort of stuff. In our case we will return an Error (a class we defined) if the operation cannot be executed, otherwise we will return the result of the operation in a Long. So we will return an Either<Error, Long>.

Let’s test this:

The result

We got a service that can be easily tested. It performs mathematical operations. It supports the four basic operations, but it could be easily extended to support more. Errors are handled and the appropriate HTTP codes are used: 400 for bad inputs and 404 for unknown operations or values.

Conclusions

When I first saw Java 8 I was happy about the new features, but not very excited. However, after a few months I am seeing new frameworks come up which are based on these new features and have the potential to really change how we program in Java. Stuff like Spark and Javaslang is making the difference. I think that now Java can remain simple and solid while becoming much more agile and productive.

You can find many more tutorials either on the Spark tutorials website.

5 key aspects for a successful open-source project

I love open-source: for me it is great way to develop any product, to acquire new skills, to have fun and to make something useful for the community. I am not an open-source rock-star (at least not yet :D) but I have created and contributed to tens of projects (take a look at my GitHub profile). Some of them got a bit of attention like WorldEngine, JavaParser or EffectiveJava. I am also an avid open-source user: almost daily I have to choose some open-source program or library to use or to contribute to. So I evaluate open-source projects regularly. I am also lucky enough to be in touch with many open-source developers, some of which I have interviewed for this blog.

So I want to share a few aspects I think are fundamental for the success of open-source projects.

The basic principles

One issue I have seen is many open-source projects is the lack of time from the committers: someone get excited with a new idea, build a project, the project get some traction and after a while life happens and the amount of patch to reviews pile up, questions remain unanswered for a long time and the new features take forever to be implemented. I do not claim I have a solution for that but I think that we can alleviate the problem by:
1) automate as much as possible: write documentation once instead to answer ten times the same question, run tests automatically instead of correcting the same errors over and over, etc.
2) get more contributors to share the burden of maintining the project. To do that you need to create a “contributor-friendly” project

To translate these general principles in practice I think there are five aspects to consider.

1. Automated tests

I think automated tests are very important in general in software development but they are especially important when it comes to open-source because they offer a way to automatically evaluate a contribution. In all of my projects I use some tool for continuos integration: each time I receive a pull request the tests are automatically run and a symbol show clearly if all the tests pass with the new patch or something need to be fixed. It is great because it helps new contributors to get confidence. It also help contributors which have very few time to reduce the time spent in code reviews: tests already guarantee a minimum quality standard. In this way the human reviewer can focus on other aspects such as the architectural coherence and the respect of stylistic guidelines.

2. Documentation

I know, no one likes to write documentation but it is a necessity for a variety of reasons.

First of all it is a way to communicate decisions long after the discussions which led to them. Today I discuss with other contributors about some design decision in my project, I write it down and a new contributor joining six months from now can read it. This helps to mantain the architectural integrity of the project.

Then there are users who want just to take benefit from the project. Without documentation to read they are either never adopting your project or keep asking the same questions over and over.

It is important to understand that there are different kinds of documentation for different audiences. The two main ones are users and contributors. For users documentation I suggest using readthedocs, while for contributors I normally use some form of wiki. There are also specific formats for certain domains. For example java libraries should use javadoc like we do for JavaParser.

3. Reactiveness

A few days ago I received an e-mail about a bug I reported over three years ago for an Eclipse project.

Screenshot from 2016-01-10 09-50-48

It was the first comment I received to this issue: the problem is probably not present anymore, I cannot help verify that because in the meantime I stopped using Eclipse. This is because I got the first answer over three years after I reported the bug.

Providing a proper solution to a problem can be difficult and require an effort that we are able to spend immediately. However I think that it is vital to react to user requests, questions and bug reports in a timely manner, perhaps just acknowledging that we received their message and that we are going to look into it. If we do not do so the user will get the feeling of being facing a dead project, with no one available to help or develop the product.

I realize this is much easier to say than to apply and I admit there are issues which have been open for far too long in the projects in which I am involved. However sometimes just providing a short and partial answer can help, and a few times after a short answer from me the user itself provided a solution and eventually became a contributor.

In any case if you still reply after 2 years and a half there will be projects which do worse than you…

4. Community

If you are doing things right your open-source project should solve exactly one problem. Normally your project could be a useful piece in a larger solution to a larger problem, perhaps part of a tool-chain. By finding out which are other related projects you could find way to collaborate and benefit from each other, providing more value.

For example, we built this great world generator named WorldEngine and we realized that the maps generated could be used inside tiled games. So we found out that Tiled is a great editor for tiled maps which has its own format named TMX. There are libraries to manipulate TMX files for a lot of programming languages. We understood that writing an exporter to TMX for our world generator would permit to a lot of game designers to use our project, so we started working on it.

Screenshot from 2016-01-10 10-17-52

We discussed it on the Tiled forum and we got useful feedback.

Finding related projects and start collaborating can increase the value of all the projects involved and lead to mix contributors. It basically means new energies, new ideas and new users.

5. Be open to new contributions

When you start working on an open-source project you are very excited and it is amazing to see your baby growing. At that point you are very attached to it and you would like to maintain control on it, be sure that it evolves according to your ideas and remain really your project. I think instead you should involve other people as early as possible and give up as much control as possible. There are at least two reasons for that. The first one is that other people have great ideas too. If you do not give spaces to contributors for proposing ideas and improve your project it will remain constrained by your time and your energy. The second aspect is that you want your project to survive you. Hopefully you will not hit by a bus very soon but you could end up having less time to dedicate to the project because of a new job or perhaps because you start focusing more on some other open-source project. That is perfectly fine if someone else can keep the project alive as you step down.

I know it is hard to not be in control but I think it is necessary to grow your project. For example WorldEngine was born from merging two projects: lands, which was my little project, and WorldSynth, created by Bret Curtis. We merged our projects and started mantaining WorldEngine together. Eventually we received amazing contributions by other people. This is an incomplete list:

  • Evan Sampson contributed the amazing implementation of the Holdridge life zones model and improved a lot the ancient-looking-map, biome, precipitation and temperature generators. Thanks a million!
  • Ryan contributed the Windows binary version and discussed Lands on Reddit bringing a lot of users. Thanks a million!
  • stefan-feltmann made Lands depends on pillow instead that on PIL (which is deprecated). This could also help when moving to Python 3. Thanks a million!
  • Russell Brinkmann helped saving the generation parameters in the generated world (so that we can use it to generate the same world again, for example), improved the command line options and added tracing information (useful for understanding the performance of the various generation steps)
  • Joshua Coppola implemented the satellite view. Thanks a lot, it looks gorgeous!
  • Stephan made WorldEngine make heavy use of numpy, helping to speed up the generation. He also made world-generation much more reproducible and helped improve compatibility with Python 3.

It was possible because we were open to new contributions. Sure, WorldEngine is not my own little project where I control every single decision. It is owned by a community, a small community perhaps, but definitely a community able to produce something much better of what I could by myself.

Conclusions

These are just some aspects that I think we should consider when running an open-source project. I tried to stay on practical aspects and provide examples but I am sure there are many other aspects I did not touch. I would suggest to take a look at this interview on the problems with open-source  and this post on making your project friendly for newcomers. This is very important because you are going to need help along the way.

I think that open-source projects are mostly run and used by professionals. Participation in open-source projects is becoming more and more a way to evaluate developers. While participating wins over non-participating I think that it is not enough: we should apply the same professionalism we apply to our daily job also to open-source projects. With the additional constraint of having a limited amount of time to split between several projects.
It is not easy and personally I see there are many ways in which I could improve. I just hope this list could help you and me to make our projects a bit better.
I would love to hear your stories about open-source.

Walkmod: automatically refactor code to apply code conventions

I am very interested in tools which support the software development process, making automatic the boring bits. One system which permits to go in that direction is Walkmod: a smart tool which can refactor Java projects and enforce code conventions. It is a quite powerful tool and it should ges more and more attention.

TL; DR: Tools like Sonar find issues in your code, Walkmod fix them for you

I had the chance to get in touch with the lead developer, Raquel Pau, because we are both contributors to JavaParser. I could not resist so I started asking her lot of questions on the project and I thought it would be nice to share her answers. So, here they are.

At the end of this post there is also an example of applying walkmod on JavaParser to remove unused imports.

Hi Raquel, can you tell us something about yourself and your interests?

I am a software engineer from Barcelona, and my interests are related to software modeling with specification languages, metamodeling transformations, language interpreters and compilers. A part from that, I also like to understand how people explains their thoughts and passions through the art.

Can you tell us which problem Walkmod is trying to solve?

Correct the code automatically according to a set of code conventions.

In which scenarios do you think it makes sense to use Walkmod? Is that suitable only for large projects or also small projects can benefit of it?

All projects with more than one developer involved can take benefit from Walkmod. However, it is easy to see that the magnitude of the project has a direct repercussion on the number of developers.

Here there are a few interesting adoption stories:

  • MetricStream, a worlwide spread company, has been using Walkmod for the last 10 months and it has created its own plugins for WalkMod to automatically refactor code. They were mainly interested in correcting a set of Sonar issues automatically and using Walkmod they have been able to correct thousands of lines of code. Right now,they are working to publish their plugins on GitHub. A part from MetricStream, ThoughWorks is also using some Walkmod libraries in one of its open source projects, which also contributes to improve the quality of our tool.
  • An another side, there are consultancy companies. They are in an interesting position because they work on the same codebase with their customers. I am aware of a few of them that have also used Walkmod (I am not sure I can mention their names). In one case they also developed plugins to integrate Walkmod with other tools (i.e., Gradle).
  • Open source Java projects like JUnit, Guava, Arquilian and Apache projects could take special benefit from Walkmod because there are a lot of developers involved and they change all the time. In fact, even if they have some code conventions specified in some documents, these are difficult to manage and review.  However, I have learned from them that applying code conventions into the whole project (e.g., applying the Eclipse formatter according their own rules) in a single commit implies too much code to review and consequently, risk. Therefore, they reject these kind of commits. In fact, this is the main reason why the current version of walkmod allows:
    • To apply code conventions for a set of files .
    • To apply a set of code conventions without rewriting the whole source file according a set of code conventions.

Federico: I see. Yes, I have been doing a lot of code reviews recently and it is tiring and distracting when the code to review is polluted by a lot of minor changes related to style. I can see the benefits of having these changes applied automatically and just real changes being examined during code reviews. It will make life much easier and bring more attention to the real changes, which are often overlooked because of all the attention dedicated to the number of spaces between operators or finding the occasional tab character that slip in a commit.

How Walkmod compare to existing solutions?

Currently, companies control people are applying their own conventions using PMD or Sonar. However, this kind of software just checks if a rule is not satisfied, but never corrects the code even it was very easy to solve.

For another hand, IDEs have set of generic and automatic quick fixes (e.g., remove unusued imports). However, there are a lot of editors, and everybody has its prefered editor; and consequently, specially in open source projects, you can’t prevent that people don’t execute the quick fixes of an specific editor before pushing the code.

Federico: Cool, Walkmod is not only more powerful than more automatic quick fixed available in editors but it can be triggered by the command line and permit to have consistency in a project also if everyone is using a different IDE.

How easy is to customize and develop new transformations? I have used M2M languages like QVT before and they were painful

I have had experience designing model-to-model and model-to-text transformations (e.g., using ATL). Considering this experience I decided to not couple Walkmod to any specific transformation language. I think that if we expect people to contribute to create code conventions through code transformations, Walkmod should not require to learn any additional language. Therefore, I have designed 3 ways to design code transformations:

  1. Using the Visitor pattern: where people just need to add a function called visit for the type element that they want to modify. Afterwards, developers should upload their visitor as a Java library into the Maven repository.
  2. Using templates: many Java developers have experience working with template technologies such as JSPs, Velocity, Freemaker or Groovy. We selected the Grovy template system (GStringTemplateEngine) as the default template technology for Walkmod.
  3. Using scripting: Groovy is the scripting language preferred by Java developers and it easy to integrate with Maven.

Who should develop the transformations? Should they be software architects or single developers should do that?

Conventions should be managed by the project leader, but transformations can be created by any Java developer.

Do you think that transformations tend to be very general and can be shared across projects or are they more typically project specific?

I think that in general, software architectures are the composition of several generic solutions configured or parameterized for a specific project. So, I think that if transformations are well coded, they tend to be general and the code conventions of a project consist of configuring these generic transformations.

How Walkmod integrate with other tools? Is it typically used with CI tools like Jenkins or Travis?

Walkmod can be integrated in Forge, Eclipse, Maven or Gradle and it has been designed to be executed locally. If developers add Walkmod as a Maven or Gradle plugin and run it from a Continuous Integration tool, they just can be notified if some source files do not follow their conventions.

Are transformations whitespace preserving? Do they preserve comments?

Currently Walkmod allows multiple ways to apply the changes into the code: using the Eclipse formatter or applying just the minimum changes made by the transformation. In the second scenario, the transformations are whitespace preventing.

All comments are always preserved as any part of the source code during the parsing process.

Are transformations based only on the AST or are they aware of symbol resolution? Can I define rules like “all the classes extend this base class should have a default constructor?”

Transformations are aware of symbol resolution if the visitor class have the annotation @RequiresSemanticAnalysis. Therefore, when your code transformation is executed, all declaration types and expressions have the reflection element that the node is referencing (java.reflect.Class, java.reflect.Method, java.reflect.Field, etc.).

How difficult would be to use Walkmod for other languages, like Python or Ruby?

The Walkmod architecture is completely independent of the programming language because it is completely extensible through plugins. If people would like to work with another language, they should create plugins with the implementation for some parts of the process (e.g., the parsing process).

Walkmod is an open-source tool, can you tell us something about the community? What kind of users do you have? What feedback are you getting?

Walkmod was announced in 2014 and from that moment our community involves MetricStream, ThoughtWorks and developers that work in consultancy services.

How the community could help Walkmod? Do you need more plugins, help with the development, more documentation or more success stories being spread about WalkMod?

Mainly, I would like the community to contribute by writing plugins for Walkmod and sharing their feedback.

What are the plans for the future of Walkmod?

Our plans is to improve the configuration style making it more easy to write and create new plugins. Moreover, we are working on creating a service around the tool.

How is your experience with open-source? There were negative or positive surprises?

I have had the positive surprise on people who give you thanks to create the product and any help that you give to them to solve their problems. The best experience I have had was presenting Walkmod in the Devoxx UK, when one guy comes personally to me to say thank you for my support to create the Maven plugin for Walkmod, which it was an issue he reported some months ago.

So, hopefully that should be enough to convince you to start using Walkmod: the next step is visit www.walkmod.com and give it a try!

Example: using walkmod on the JavaParser source code to remove unused imports

Ok, we described how cool walkmod is, listed a bunch of features, etc. etc. Let’s how it works in practice.

I first of all downloaded walkmod from their website (current version is 1.2.0, available here). I unzipped it and set a few environment variables:

Then I configured walkmod to perform one single operation: remove the useless imports. I just created a file name walkmod.xml in the root of the project

At this point all what I had to do was to run walkmod apply from the root of the project:

walkmod

It took a little while to retrieve some dependencies but then it went though the source code of the project quite fast and found three files to correct. As a natural reflex I ran the tests and they all passed: the corrections done by walkmod are correct. I then checked the changed files and I noticed that walkmod reformatted them preserving all comments (good) but using an indentation of three spaces which struck me.

Luckily it is super easy to customize the behavior of the formatter: open  walkmod-1.2.0/config/formatter.xml and go through the properties to understand what you need to change. For example I was not happy the tabulation size and the indentation size (those two properties having value 3 in the screenshots below):

Screenshot from 2015-08-25 16:26:51

Screenshot from 2015-08-25 16:24:30

Now, you could just change the values in place if you are going to use walkmod for one simple project. If you plan to use it for several projects, which have different formatting guidelines that does not work. You can instead specifying a forrmatter configuration in you walkmod file (read here for details).

Then I remembered that JavaParser has its own formatting configuration for Eclipse and Walkmod uses the same format so I just specified the path to this file in the configuration. I reverted my changes on the source code (so that the duplicate imports were back) and re-ran Walkmod: I obtained correct files and nicely formatting.

And the story ended with a new Pull-Request being sent to JavaParser:

Screenshot from 2015-08-25 16:38:43

I have to say that my impression of Walkmod was quite positive and I will look into using it regularly to ensure the code is formatted correctly and no cruft (like unused imports) creep into the projects that I collaborate with.

In the last few days they have also launched walkmodhub: it is a service that you can use as a webhook for GitHub. Each time you push some code to your repository walkmod will run and send you a Pull-Request if there are violations of your coding guidelines. I think I am going to love this.

 

Interview with David Åse from the Spark web framework project

I think that there are a lot of people looking for ways to get involved in Open-Source projects. I thought I could help by collecting a few stories from people who already started giving back to the community. A few weeks ago I talked with Luca Barbato and today I am going to talk with David Åse.

How David and I met

Recently I started using the Spark web framework, and I wrote a tutorial on it: Getting started with Spark: it is possible to create lightweight RESTful applications also in Java. David saw that post and contacted me. After a few emails, we decided to work together on a series of tutorials for Spark to be published on sparktutorials. While talking with David I learned more about his role in the Spark project and I thought it would be interesting to share.

So let’s get started with the questions:

Hi David, tell us a bit about yourself

Hi! My name is David. I work as a Software Engineer in the UX/UI division of a global telecommunications company, where I’m allowed to do things like create Lemmings-based analytics visualizations, or build a device lab made from LEGOs. When I’m not playing with Lemmings or LEGOs, I do design and web programming with a strong focus on delivering high performance services (~1 second perceived load time for GPRS connections). I hold a Master’s Degree in Computer Science from the Norwegian University of Science and Technology, but I studied music in high school and my parents are both artists.

Is Spark the first open-source project you get involved into?

The first serious one, yes. My master thesis was an open source project which someone else took over, and I created some free mIRC scripts when I was a kid, but Spark is the first project I’ve worked on that’s being used by thousands of people every day.

How did you find out about Spark?

I was looking for a simple Java framework to set up a prototype at work. I had previously worked with Spring, JAX-RS and Play Framework, but I wanted something lighter and simpler. I was googling for lightweight Java web frameworks when I saw Spark. At first I dismissed the project as outdated/dead due to how the website looked, and googled some more. After a little while I came back to Spark again, and I decided to give it a shot when I noticed the website said the project was recently rewritten for Java 8.

How did you get involved?

After having worked with Spark for a day, I was very impressed with how easy everything was and how right it felt. I was worried that other people would (like I did) judge Spark by it’s cover and miss out. So, I sent Per (note: Per refers to Per Wendel, the creator and maintainer of Spark) the following email:

email

A very intensive three days later, this commit showed up on GitHub:

commit

How did you help?

I completely redesigned and reimplemented the website, then tried to promote it.

For the design part I focused on eliminating unneeded content, only leaving the most important bits. I created a massive banner for the index page to really grab the attention of our visitors, communicating what I think are the main selling points of Spark: Java 8 and “minimal effort”. For the other pages I wanted it to be very clean, so I left everything white. It’s as minimalist as Spark itself.

For the implementation part I focused on writing search engine optimized content and following best practices regarding optimization and accessibility. The page scores 100/100 in mobile usability and 87-94/100 in speed using Google Pagespeed Insight, which makes google like us more and places us higher up in the search results (we didn’t have to worry about Mobilegeddon!). Note: Mobilegeddon refers to the abrupt downgrade Google gave to websites because of their poor performance on mobile usability, read here for details.

After I was pleased with the look and performance of the website, I tried to spread the word online. This was the hard part. I created social media accounts and posted to various Java forums online. The most successful was a post to reddit, which I think got us about a thousand visitors in a few days (which is a lot for a Java web framework).

Talk about the effects of rewriting the website?

It’s hard to say since we did not have analytics on the old page, but I’ve used Alexa and Ahrefs to estimate the past website traffic. When I joined Spark, it’s popularity had fallen from rank 800.000 to about 1.200.000 on Alexa, and it was losing more backlinks than it was gaining. Since then we’ve been on a steady climb up. We’re currently hovering around rank 400.000, and the amount of referencing pages/domains has doubled. The number of visitors to our webpage has increased with 30% comparing Q4 2014 to Q1 2015, so it looks like everything is going the right way. We’ve also increased our google search position a lot, which is important since about 65% of our traffic is from google.

How do you get feedback?

I rely a lot on my friends, colleagues and my girlfriend. I appreciate brutally honest feedback, which can be hard to get from strangers. Other than that I use analytics data a lot to see how the site is performing and how users are behaving, and make changes accordingly.

What plans do you have in the future?

We are currently evaluating if we can establish a dedicated Spark team with paid developers. We recently ran a user survey which gave us a pretty good understanding of who uses Spark and for what, and if our users would be willing to sponsor the project in return for extended support.

If he decides to go that way, I will work part time on the project, expanding the webpage functionality in order to provide better documentation, migration guides and tutorials. If not, I will contribute when I have the time, as I do now.

Are there any other projects which you find interesting?

Of the lesser known projects, I am a big fan of Intercooler. While I do like the concept behind Angular and the like, I just don’t think we’re quite there yet. Especially considering low end devices in emerging markets, going full JavaScript is just too slow.

How was your experience giving back to the community? Did it help you in any way?

I learned a lot about the importance and benefits of analytics and having an “online presence”, which I think a lot smaller open source projects could be better at. There seems to be sort of a “if we build it, they will come” mentality, but people are usually set in their ways and they need to be convinced that your project is worth looking into.

Federico: I fully agree with this. I think everyone is very busy and we have to help them find out immediately what we are providing, and Spark is doing a great job in this respect. “A tiny Sinatra inspired framework for creating web applications in Java 8 with minimal effort” is a clear and effective description of Spark.

What suggestions would you give to people who want to contribute to Open-source, but don’t know where to start?

As I started frequenting reddit (while trying to build Spark’s online presence), I noticed that people sometimes post about wanting to contribute to open source projects in programming language subreddits. These threads usually rank pretty high for a while, so I would just suggest doing that. If you have Java skills and you want to contribute, just go to /r/java and ask for project suggestions. Otherwise, if you already use open source software, there’s almost always a “Contact” or “Contribute” tab you could click on on their webpage.

Federico: I should probably start adding a “Contribute” section to the README.md of my projects, or maybe a Contributing.md file, as several projects are starting to do.

P.S. In the last days David has released a new project called j2html: it is library to build HTML pages programmatically, and the source is available on GitHub. I find it quite useful when I have to throw in some snippets of HTML for which it is not worthy the hassle of adding a template engine. Give it a try!

Conclusions

I found David’s story very interesting because it shows us how complex the Open-Source world is, and how many different things we can do to contribute. He is a technical person and rewrote the Spark website making it amazing, but he also focused on promoting the framework, finding different channels and communicating on all of them, finding ways to monitor the improvements he was doing and recruiting other volunteers (like me :D).

I also like very much the fact that he found ways to contribute focusing on aspects that the maintainer did not consider. I think this is what is great about having many people involved in one project: everyone contributes according to his/her own specific skills and the result is so much more than the sum of the single parts.

As an encouragement to you: There are many different ways to help Open-Source projects, you just have to find one that aligns with your skillset!

Releasing JavaParser 2.1

The other day the guys involved in JavaParser left me the honor of releasing our new version: 2.1

The community on GitHub took over the project previously hosted on Google Code and abandoned at some point. Nicholas Smith, among the other things rewrote all the tests to use JBehave and wrote detailed instructions to perform the release on Maven Central using Sonatype. To release the new version I just had to follow his instructions and get the permissions from Sonatype.

So now you can add it to you projects using:

What is news

We have 30 closed issues or pull requests

Including but not limited to:

  • a lot of bug fixing
  • improved test coverage
  • correctly support different encodings
  • improvement to the documentation
  • fix some issues with lambdas
  • removing some major performance issues
  • introduced the NamedNode interface

And now the community is already working on the next release, which will probably be JavaParser 3.0. Exciting times are coming.

How people get started contributing to open-source? A few questions to Luca Barbato, contributor to Gentoo, MPlayer, Libav, VLC, cairo/pixman

I am hearing a lot of persons interested in open-source and giving back to the community. I think it can be an exciting experience and it can be positive in many different ways: first of all more contributors mean better open-source software being produced and that is great, but it also means that the persons involved can improve their skills and they can learn more about how successful projects get created.

So I wondered why many developers do not do the first step: what is stopping them to send the first patch or the first pull-request? I think that often they do not know where to start or they think that contributing to the big projects out there is intimidating, something to be left to an alien form of life, some breed of extra-good programmers totally separated by the common fellows writing code in the world we experience daily.

I think that hearing the stories of a few developers that have given major contributions to top level project could help to go over these misconceptions. So I asked a few questions to this dear friend of mine, Luca Barbato, who contributed among the others to Gentoo and VLC.

Let’s start from the beginning: when did you start programming?

I started dabbling stuff during high school, but I started doing something more consistent at the time I started university.

What was your first contribution to an open-source project?

I think either patching the ati-drivers to work with the 2.6 series or hacking cloop (a early kernel module for compressed loops) to use lzo instead of gzip.

What are the main projects you have been involved into?

Gentoo, MPlayer, Libav, VLC, cairo/pixman

How did you started being involved in Gentoo? Can you explain the roles you have covered?

Daniel Robbins invited me to join, I thought “why not?

During the early times I took care of PowerPC and [Altivec](http://en.wikipedia.org/wiki/AltiVec), then I focused on the toolchain due the fact it gcc and binutils tended to break software in funny ways, then multimedia since altivec was mainly used there. I had been part of the Council a few times used to be a recruiter (if you want to join Gentoo feel free to contact me anyway, we love to have more people involved) and I’m involved with community relationship lately.

Note: Daniel Robbins is the creator of Gentoo, a Linux distribution. 

Are there other less famous projects you have contributed to?

I have minor contributions in quite a bit of software due. The fact is that in Gentoo we try our best to upstream our changes and I like to get back fixes to what I like to use.

What are your motivations to contribute to open-source?

Mainly because I can =)

Who helped you to start contributing? From who you have learnt the most?

Daniel Robbins surely had been one of the first asking me directly to help.

You learn from everybody so I can’t name a single person among all the great people I met.

How did you get to know Daniel Robbins? How did he helped you?

I was a gentoo user, I happened to do stuff he deemed interesting and asked me to join.

He involved me in quite a number of interesting projects, some worked (e.g. Gentoo PowerPC), some (e.g. Gentoo Games) not so much.

Do your contributions to open-source help your professional life?

In some way it does, contrary to the assumption I’m just seldom paid to improve the projects I care about the most, but at the same time having them working helps me when I need them during the professional work.

How do you face disagreement on technical solutions?

I’m a fan of informed consensus, otherwise prototypes (as in “do, test and then tell me back”) work the best.

To contribute to OSS are more important the technical skills or the diplomatic/relation skills?

Both are needed at different time, opensource is not just software, you MUST get along with people.

Have you found different way to organize projects? What works best in your opinion? What works worst?

Usually the main problem is dealing with poisonous people, doesn’t matter if it is a 10-people project or a 300+-people project. You can have a dictator, you can have a council, you can have global consensus, poisonous people are what makes your community suffer a lot. Bonus point if the poisonous people get clueless fan giving him additional voices.

Did you ever sent a patch for the Linux kernel?

Not really, I’m not fond of that coding style so usually other people correct the small bugs I stumble upon before I decide to polish my fix so it is acceptable =)

Do you have any suggestions for people looking to get started contributing to open-source?

Pick something you use, scratch your own itch first, do not assume other people are infallible or heroes.

ME: I certainly agree with that, it is one of the best advices. However if you cannot find anything suitable at the end of this post I wrote a short list of projects that could use some help.

Can you tell us about your best and your worst moments with contribution to OSS?

The best moment is recurring and it is when some user thanks you since you improved his or her life.

The worst moment for me is when some rabid fan claims I’m evil because I’m contributing to Libav and even praises FFmpeg for something originally written in Libav in the same statement, happened more than once.

What are you working on right now and what plans do you have for the future?

Libav, plaid, bmdtools, commonmark. In the future I might play a little more with [rust](http://www.rust-lang.org/).

Thanks Luca! I would be extremely happy if this short post could give to someone the last push they need to contribute to an existing open-source project or start their own: I think we could all use more, better, open-source software. So let’s write it.

One thing I admire in Luca is that he is always curious and ready to jump on the next challenge. I think this is the perfect attitude to become an OSS contributor: just start play around with the things you like and talk to people, you could find more possibilities to contribute that you could imagine.

…and one final thing: Luca is also the author of open-source recipes: he created the recipes of two types of chocolate bars dedicated to Libav and VLC. You can find them on the borgodoro website.

1385040326653

I suggest to take a look at his blog.

A few open-source you could consider contributing to

Well, just in case you are eager to start writing some code and you are looking for some projects to contribute to here there are a few, written with different technologies. If you want to start contributing to any of those and you need directions just drop me a line (federico at tomassetti dot me) and I would be glad to help!

  • If you are interested in contributing to Libav, you can take a look at this post: there I explained how I submitted my first patch (approved in the meantime!). It is written in C.

  • You could be also interested in plaid: it is a Python web application to manage git patches sent by e-mail (there are a few projects using this model like libav or the linux kernel)

  • WorldEngine, it is a world generator written in Python

  • Plate-tectonics, it is a library for plate tectonics simulation. It is written in C++

  • JavaParser a Java parser, written in Java

  • Incremental Java parser, an incremental Java parser, written in Scala

Continous Integration on Linux and Windows: Travis and AppVeyor

Recently I worked on improving the testing and the Continuos Integration (CI) configuration for a few open-source projects. In particular I have spent some time on WorldEngine, a world generator written in Python, which uses a C++ extension named plate-tectonics.

There have been several issues, the main two are:

  • the deployment of the application on windows is problematic
  • the applications do not behave in the exact some way on Linux and Mac OS-X

To mitigate these issues I invested some time in writing better tests and improve my usage of Travis (CI for Linux) and start using AppVeyor (CI for Windows). While my solution is still not perfect I feel I am far better covered from regressions on the different platforms and I have a more reliable development process.

Travis

Travis is well-known in the open-source community because of three nice qualities:

  1. It is free for open-source projects
  2. It integrates seamlessly with GitHub
  3. It is very easy to use

Getting started with Travis is very easy: you simply register and connect your GitHub profile, You then select on which projects you want to activate Travis.

At this point you will see a list of your projects. The green or red color used for the project names make immediately evident which projects need to be fixed. You can also take a look at a specific build and see what caused it to fail.

Screen Shot 2015-03-02 at 11.09.46

 

Everytime you push to Github, whatever in the master or in another branch, a build is started. If the branch you are building is used in a pull request a badge indicating if the build failed or succeeded.

Screen Shot 2015-03-03 at 10.17.11

Travis, setting up different C++ compilers

While having your tests all passing on one platform is good, having them passing on different platforms is great. For example it is a very good thing to verify if you C++ code compiles correctly both with gcc and clang. This is particularly important if support for C++’11 is needed and it can be affected by using the wrong version of the compiler. You can do that by creating a .travis.yml file containing these lines:

Now, what you want to do typically is to install specific versions of the compilers, to have a completely controlled environment, and not just using whatever happens to be installed on the machine Travis is offering you. Doing that is pretty simple, you just use apt-get to install whatever you need to install.

Given you are a smart guy I am sure you can adapt this example to your specific case.

Travis, setting up different Python versions

Let’s build our application with a few different versions of Python:

In this case we do not manually install Python but we relay on Travis having the correct versions already installed. To find out more about Python on travis read here.

A useful trick is to use a different requirements file depending on the python version (damn you Python 3!):

My Travis files

These are a couple of complete Travis files I am using.

 plate-tectonics (C++)

WorldEngine (Python, using C++ extensions)

Civs (Clojure)

Javaparser (Java)

Ok, you got the picture. Travis is awesome and you can use it with a lot of different languages. This is easy to get started with (see the Civs script) but it is also flexible and powerful, if you need it to be.

AppVeyor

Recently I was very bugged by problems about building plate-tectonics and WorldEngine on Windows. Luckily Bret (who is maintaining WorldEngine with me) pointed me to AppVeyor which is basically Travis for Windows.

This is out we configured it for our project, so that it can build our library on 6 different versions of Python:

  • Python 2.7, 32 bit
  • Python 2.7, 64 bit
  • Python 3.3, 32 bit
  • Python 3.3, 64 bit
  • Python 3.4, 32 bit
  • Python 3.4, 64 bit

One feature of AppVeyor is really great: it store artifacts generating during the build.

Screen Shot 2015-03-03 at 10.07.42

We use AppVeyor to generate the Windows binary wheels for our library and then we upload them on Pypi. The project of uploading the files on Pypi is manual at the moment because we want to have some control on it (we do not want just to upload a new version everytime we do a push).

Badges

Badges are nice, and permit to check the status of your project.

It could sound childish but those small virtual stickers can motivate you to fix problems as they arise, because you want to be proud of your green status, on both Travis and AppVeyor.

Screen Shot 2015-03-03 at 10.28.23

Conclusions and thoughts for improvements

I think that CI integrations is fundamental to give solid basis to your projects. I sleep better since I have start using it.

However there is still a lot of room for improvements and a few ideas:

  • I am still missing a CI solution for Mac OS-X

  • I could use docker under Travis to verify the building process under different distros

  • I sometimes use Travis in a trial and error fashion: if I do not have access to a Windows machine I just cut a separate branch and push furiously to it, to trigger builds on AppVeyor and collecting feedback on the building process under Windows. This seems silly… but it works for me 🙂

Bonus

If you want to use Travis with Perl you can read this.

(Building Binary Wheels for Windows using Appveyor)[https://packaging.python.org/en/latest/appveyor.html] is an interesting reading for Python developers targeting Windows users.

How to contribute to Libav (VLC): just got my first patch approved

I happened to have a few hours free and I was looking for some coding to do. I thought about VLC, the media player which I have enjoyed so much using over the years and I decided that I wanted to contribute in some way.

To start helping in such a complex process there are a few steps involved. Here I describe how I got my first patched accepted. In particular I wrote a patch for libav, the library behind VLC.

The general picture

I started by reading the wiki. It is a very helpful starting point but the process to setup the environment and send a first patch was not yet 100% clear to me so I got in touch with some of the developers of libav to understand how they work and how I could start lending an hand with something simple. They explained me that the easier way to start is by solving issues reported by static analysis tools and style checkers. They use uncrustify to verify that the code is adhering to their style guidelines and they run coverity to check for potential issues like memory leaks or null deferences. So I:

  • started looking at some coverity issues
  • found something easy to address (a very simple null deference)
  • prepared the patch
  • submitted the patch

After a few minutes the patch was approved by a committer, ready to be merged. The day after it made its way to the master branch. Yeah!

Download source code, build libav and run the tests

First of all, let’s clone the git repository:

Alternatively you could use the GitHub mirror, if you want to.

At this point you may want to install all the dependencies. The instructions are platform specific, you can find them here. If you have Mac Os-X be sure to have installed yasm, because nasm does not work. If you have installed both configure will pick up yasm (correctly). Just be sure to run configure after installing yasm.

If everything goes well you can now build libav by running:

Note that it is fine to build in-tree (no need to build in a separate directory).

Now it is time to run the tests. You will have to specify one directory where to download some samples, later used by tests. Let’s assume you wanted to put your samples under ~/libav-samples:

Did everything run fine? Good! Let’s start to patch then!

Write the patch

First of all we need to find an open issue. Visit Coverity page for libav at https://scan.coverity.com/projects/106. You will have to ask for access and wait that someone grants it to you. When you will be able to login you will encounter a screen like this:

Screenshot from 2015-02-14 19:39:06

Here, this seems an easy one! The variable oggstream has been allocated by av_mallocz (basically a wrapper for malloc) but the result values has not been checked. If the allocation fails a NULL pointer is returned and when we will try to access it at the next line things are going end up unpleasantly. What we need to do is to check the return value of av_mallocz and if it is NULL we should return an error. The appropriate error to return in this case is AVERROR(ENOMEM). To get this information… you have to start reading code, getting familiar with the way of doing business of this codebase.

Libav follows strict rules about the comments in git commits: use git log to look at previous commits and try to use the same style.

Submitting the patch

I think many of you are familiar with GitHub and the whole process of submitting a patch for revision. GitHub is great because it made that process so easy. However there are some projects (notably including the Linux kernel) which adopts another approach: they receive patches by e-mail.

Git has a functionality that permits to submit a patch by e-mail with a simple command. The patch will be sent to the mailing list, discussed, and if approved the e-mail will be downloaded, processed through git and committed in the official repository. Does it sound cumbersome? Well, it sounds to me, spoiled as I am by GitHub and similar tools but, you know, if you go in Rome you should behave as the Romans do, so…

Sending patches using gmail with 2-factor authentication enabled

Now, many of you are using gmail and many of you have enable 2-factor authentication (right? If not, you should). If this is you case you will get an error along this lines:

Here you can find how to create a password for this goal: https://support.google.com/accounts/answer/185833 The name of the application that I had to create was smtp://f.tomassetti@gmail.com@smtp.gmail.com:587. Note that I used the same name specified in the previous error message.

What if I need to correct my patch?

If things go well an e-mail with your patch will be sent to the mailing-list, someone will look at it and accept it. Most of the times you will receive suggestions about possible adjustments to be done to improve your password. When it happens you want to submit a new version of your patch in the same thread which contains your first version of the patch and the e-mails commenting it.

To do that you want to update your patch (typically using git commit –amend) and then run something like:

Of course you need to find out the message-id of the e-mail to which you want to reply. To do that in gmail select the “Show original” item from the contextual menu for the message and in the screen opened look for the Message-Id header.

Tools to manage patches sent by e-mail

There are also web applications which are used to manage the patches sent by e-mail. Libav is currently using Patchwork for managing patches. You can see it deployed at: https://patches.libav.org/project/libav-devel/list/. Currently another tool has been developed to replace patchwork. It is named Plaid and I tried to help a little bit with that also 🙂

Conclusions

Mine has been a very small contribution, and in the future I hope to be able to do more. But being a maintainer of other open-source projects I learned that also small help is useful and appreciated, so for today I feel good.

Screenshot from 2015-02-14 22:29:48

Please, if I am missing something help me correct this post

Resurrect a C++ codebase and create a proper open-source project out of it

Our interests often are the sparkle to start a pet project. For example I am interested in world generators and because of that I created Lands: an application which simulate different physical phenomena and produce as outputs different maps (for elevation, rivers, biomes, etc.). After many experiments I finally understood that a critical component of a world generator is the plate tectonics simulation. Now, writing a plate tectonics is not an easy task: it requires a lot of research and a lot of tuning to obtain realistic results. Moreover it is not easy to achieve decent performances,  generating a small world (let’s say 512 x 512 cells) could easily take several minutes even on a recent and powerful machine so the code need to be reasonably optimized.

Luckily I found a very interesting project to be used as a base for Lands. The project was built as part of a master thesis and it is called platec. I started by creating python-bindings for this project (pyplatec) given that Lands is written in Python and platec it is written in C++. After a while I needed to improve a few things in that project. For example platec can generate only square maps with a side which is a power of 2 (e.g., a map of size 512 x 512 can be generated but one with 511 x 511, 513 x 513 or 800 x 600 can not). Initially I was just rescaling the maps generated from platec to the desired size but it caused distorsions (especially when the map had a width/height ratio very different from 1).

The project did not seem maintained (no forum, no way to report issues, no recent releases) so I wrote to the original author and then I created my fork. It could sound silly but I also wanted to have it on GitHub instead that sourceforge. My fork, plate-tectonics, simply started with the code from platec. I just threw away the code for the UI (I was interested only in using the code as a library for Lands).

While the code was doing most of what I wanted there were a few things missing to make it a “proper” open-source project:

  1. Build system there was a Makefile, but I wanted a cross-platform build system
  2. Tests the project did not contain tests at all
  3. Automated builds I love them for two reasons: 1) they force me to create a completely repeatable build process 2) they verify my code run and my tests pass not only on my machine (ever forgot a git add?)
  4. Documenting I wanted to save the trouble I had understanding the code to next contributors and especially to myself, when I am going to look at the code again in a few months

So I invested some time on these aspects before writing the few changes I needed.

Build system

I decided to use CMake, which supports almost all platforms and compilers (I got Mac Os X, Linux and Windows covered, so I am happy enough). I never used CMake before but it is sort of decent. I miss a better system for dependency management. Forget Maven or Gems, you have to obtain the code of you dependency yourself. For example the suggested mode to include Google Test (a test framework) is to just copy the code in your project. It seems sub-optimal to me… There are still libraries that have to be installed by the user using platform specific tools (apt-get, yum, brew, you name it), CMake can just try to verify if the libraries are installed but it cannot provide any help in actually installing them. There is definitely a lot to be desired here. I would very happy to hear about alternatives.

Tests

Tests were very important also because the code was very complex and C++ can behave in mysterious ways from time to time. It was a codebase with very few comments and big methods and classes. I needed sort of black-box tests to check I was not going to break anything while refactoring. I started using Google Test about I could also have used CppUnit. Currently I have decent tests for the code I added. I test the existing code just by checking that some generated worlds keep being exactly the same as before. From time to time I have to break this absolute constraint; in that case I just run the tool, generate a new world, look at it and decide if it can be used as the new reference. This approach was also necessary to test the generation of worlds which could not be generated before (like the non-square worlds).

Automate builds

I love Travis. Unfortunately it uses just linux machines (so I cannot verify my code works on Mac or Windows) but it still a very good sentinel. I can use it to build my application with a couple of different compilers (clang and gcc) and it does not hurt. Unfortunately when I tried to compile the code on windows I faced errors that I could not discover using Travis, so there is space for improvements. Any suggestion?

Documenting

On this side there is still a lot to be done. I just added a few comments here and there, removed comments written in Finnish (unfortunately I do not speak the language :D) and written a minimal README. I also added a link to the original master thesis, which helped me a lot in understanding the codebase.

Conclusions

I am happy about the status of plate-tectonics: it is far from being perfect but I managed to do the main changes I wanted to do. I would definitely need to document and test better the project. However in the last few months I thought about all the things I wanted to see in the open-source projects I interacted with and I tried to put them in plate-tectonics. I realized that code is fundamental, but there are still many things to build around it which transform it from a bunch a files in a directory into a proper open-source project.

I would love to hear about similar experience with resurrecting open-source projects and in general about which aspects you think are the most relevant for the success of a similar project.

Old-style map

Have you ever took a look at some maps from the Middle-ages? Or at some map of the Middle-Earth?
I simply love them, therefore I spent a couple of hours adapting my map generator to generate something similar.
Those are the first attempts.

kamora_old

yann_old japurr_old