Representing relationships as first-class citizens in an Object-oriented programming language

As beginners we used to write very large functions and then giant God classes. As we improve our skills, our classes become smaller but more numerous: we get a swarm of small, focuses classes that collaborate to create the final system. In other words we shift the complexity from the relations between the components of a class to the relations between classes. Indeed most Design Patterns just describe how different classes should be related.

We shift the complexity from the relations between the

components of a class to the relations between classes

We have different kinds of relationships between classes: aggregation and composition, unidirectional and bidirectional associations. We have constraints on associations and we could have attributes associated to these relations. However we do not have language support for relations.

This is very different in meta-modelling languages or frameworks, such EMF, which instead permit to describe relations. When building complex models, for example Abstract Syntax Trees, a good representation of relations helps a lot.

So I am considering adding support for relations in Turin (my very own programming language).

Example: an AST

Consider an abstract syntax tree. The basic relation is a composition relation, so each node can contain zero or more children nodes, while all nodes have one single parent node (or none, for the root of the tree).

Suppose we can represent this relation like this:

Ideally we would like to use it like this:

So we would expect a fully bidirectional relation: when we update one endpoint, the other one is automatically aware of the change:

  • if we set the parent of A to be B, the children of B will list A
  • if we add A to the children of B, the parent of A will be B

And each node can have one parent so:

No book-keeping is needed: all the parties involved have always the most updated picture. Normally in Java you would have a field parent and a list of children of each single node and changing the parent of a node would mean:

  1. alerting the old parent so that it can remove the node from its children
  2. alerting new new parent so that it can add the node to its children
  3. update the parent field in the node

Boring and error prone. In Turin instead it just works out of the box. Multiple endpoints (such as children) are seen as auto-updating lists while single endpoints (such as parent) are instead seen as a sort of Optional (with the possibility to set the value, not just verifying if it is present and read it).

Relationships and subsets

That is great, however we would like to have something more. We would like to have different specializations of those relationships. For example:

  • a Method (which is a Node) can have multiple FormalArguments and one return type (TypeUsage)
  • a FormalArgument (which is a Node) has one type (TypeUsage)

Now, we would like to add a FormalArgument instance as the param of a method and see it both as part of the “params” and of the “children” of that Node.

In other words “params” would represent a subset of the “children” of Method. Also “returnType” would be a subset (with exactly one element) of the “children” of method.

Note that I want to be able to:

  • add a node into a subset (e.g., params). I should see that node both among the params and the children
  • add a node directly among the children: I should see the node as part of the children but not as part of any subset

Status

This is the syntax I am working on and most of the implementation for relationships is ready. Time to battle test it!

I am still working on relationships subsets but many tests are already passing and I should be ready to commit it soon.

For now I plan to support only relations with two endpoints which can be either single or multiple. In the future relations could hold values or have additional constraints, like we have for fields. I am still exploring the idea and experimenting how it works in practice, after all there are not many programming languages supporting relationships out there 🙂

A few references

No, I am not the first one to think about relations as first-class citizens in object-oriented programming languages:

A few useful references to understand more on the theoretical level:

I have to study better myself these papers, and then be sure to deliver a good compromise to make easier to think about some classes of problems which involves relationships and associations. After all, this is the goal of Turin: be a tool which support thinking about problems and solutions.

3 Comments

  1. I’m the original author of the ClassBuilder tool and I’m wondering for years why the the whole concept of defining the relationships directly into a programming language is not picked up.
    Several new languages has been developed and none of them seem to have this focus.
    ClassBuilder uses intrusive list to implement the relations, which have several advantages over the more traditional implementations with container classes using templates/generics.
    The disadvantage is more complex code.
    This disadvantage is handled with ClassBuilder, since it generates the code, using macro’s so the actual code made by the programmer itself is not overwhelmed with the generated code. If the code would be expanded it would be more then 90% of the code for a normal not too complex class.
    The impact of the relations is only a few lines of code containing the macro’s and is generated automatically from e.g. UML diagrams.
    Object lifetime is also managed via the relationships. Over the years that I use the tool, having a memory problem is a very very very rare thing.
    Thus if those principles are build into a language it can be a big win. It should be combined with Garbage Collection or ARC with almost no impact on performance compared to e.g. C++. Since in the cases that are heavy for GC languages and I see here fail in practice are all cases that there are many objects in a defined data model. By using the known relations between the objects the CG is not needed for 99% or more of the objects. The CG or ARC is only needed for stuff outside the actual in core data model or objects that are not “owned”. In practice this would be almost none and more or less only the variables that would be on the stack in a non GC language.
    So I’m glad to see that at least some persons do see that there is still a big improvement step possible in the area of programming languages. Via a language and its tooling it is also much better possible to guarantee the consistency and the constrains. Now e.g. there must be a ConstructorInclude method called in every constructor and DescructorInclude method in the destructor. By default they are generated, but they can be removed. I can give a warning if they are missing, which I do, but they can be ignored. By actually putting it in the language those loopholes that are there can be avoided.
    I do not understand why something so obvious as this, is not picked up by more people. So my thumps up for your work!

  2. Thank you for your kinds words! I did not know about ClassBuilder but I think it is something very interesting. If you would like to talk about it in more details I would love to publish such post here. It could be very interesting for me.

  3. Pingback: Creating Usable JVM Languages: An Overview-IT大道

Leave a Reply

Your email address will not be published. Required fields are marked *