Posts

Python reflection: how to list modules and inspect functions

Recently I have been playing with some ideas about applying static analysis to Python and building a Python editor in Jetbrains MPS.

To do any of this I would need to first build a model of Python code. Recently we have seen how to parse Python code, however we still need to consider all the packages our code use. Some of those could be builtin or be implemented through C extensions. That means we do not have python code for them. In this post I look into retrieving a list of all modules and then inspect their contents.

My strategy is to use reflection writing scripts in Python. I will then invoke those scripts from inside Jetbrains MPS (and so from Java code). However this is the topic of a future post.

Listing modules

Listing top modules is relatively easy if you know how to do it. This script prints a list of all top level modules:

Now we need to look inside modules to find sub-modules. For performance reasons I want to do that only when it is needed:

For example for xml I get:

 

Examining module contents and recognizing functions

Now given a module I need to list all its contents. I can load the module by name and iterate over it, printing information about the elements found.
I want to distinguish between classes, submodules (which I will ignore for now), functions and simple values.
Builtin functions need to be treated differently: to access their information I need to parse their documentation. Not cool, not cool at all.

This is what I get for the module os:

Of course for functions I want to build a model of its interface (which parameters it takes, which ones are optional, which ones are variadic and so on). We have the information needed here, it is just a matter of transforming it in a representable form.

Conclusions

I still need to build a model of the imported classes but I starting to have a decent model of the elements I can import in my Python code. This would permit to verify easily which import statements are valid. Of course this can be used in combination with virtualenvs and requirements files: given a list of requirements I would install them in a virtualenv and build the model of the modules available in that virtualenv. I could then statically verify which import would work in that context.

Dynamic, static, optional, structural typing and engineering challenges

How an engineer should NOT look at technical matters

Dynamic versus static typing is one of those confrontations that seams to attracts zealots. It really sadden me to see how people tend to defend vehemently their side simply because they are not able to understand the other position. In my opinion if you do not get why both static and dynamic typing are great in certain situations you are an incomplete software developer. You have chosen to look at just a portion of the picture.

An engineer should be open-minded: it is ok to have your opinions but when you keep bashing ideas that made millions of developers very productive, chances are that you are missing something. Try being humble, you could end up learning something.

Ok, so what?

I had very good experiences using Java or Haskell (from the static side) or Python and Ruby (from the dynamic field). There is no way you can obtain the succintness of Ruby code, the amazing possibilities given by its metaprogramming features with statically typed languages. On the other end I really end up missing static typing when I have to refactor large chunks of code. Tests help (when they are there) but it is often not enough.

Now, there are approaches to try getting the best of whole worlds. One solution is to combine languages for different parts of the system: perhaps you could write the infrastructure of your platform using Java and the business logic using JRuby. Why not?

However that means that you have still some challenges at the language boundaries. You need to master several tools and learn the best practices for the different languages and so on. Therefore it would be interesting to get the advantages of both worlds in one single language.

A compromise?

A very partial solution is to use static typing with type inference: it does not give you the same power you get with dynamically types languages but you get some of the succintness. Not a giant leap but it helps closing the gap. Types are still calculated at compile time, and that means that you could have to streamline your solution, so that also the compiler can understand it. However you would not have to type yourself all the types. Also, when refactoring you have to change types in less places.

Type inference sounds as a great idea but it has its drawbacks: try writing some Haskell code, pretty soon you will start being puzzled about the type derived from the compiler for a value. In practice I need to add some annotations to make some of the types explicit otherwise I get lost pretty soon.

Another approach to combine static and dynamic typing is to start from a dynamic language and try to make it more static-like. There are two options for that: optional typing and structural typing.

Basically optional typing means that you can add some type annotations in certain parts of your dynamic language, so that a part of it can be statically checked, while the rest will keep being dynamic. This can be helpful both to catch errors and to improve performances because it enable the compiler to optimize some calls. Look at Typed Clojure and PEP -0484: Type hints for python.

Structural typing means to have duck-types: your type is the list of the operations that it supports. Basically you can look at the code of your method and look which operations perform on the parameters. Does it invoke the method foo on the parameter bar? Cool, that means that when this method is invoked I should ensure that bar has some type that has the method foo. It does not matter the name of the class, what it matters is its structure (i.e., the fields and methods it has). In principle a good idea but imho it can get confusing quite soon.

It is supported in Scala, where you can write this:

However it seems verbose, doesn’t it? With structural type inference perhaps we could avoid the verbose type declaration and have some safety at compilation time. At this time I am not aware of any language that get this exactly right. Any suggestion?

Can’t I just have Ruby with a super smart compiler catching my stupid errors?

No, so far you can’t.

People tried to do that: think about the Crystal language, which aims to be a statically typed version of Ruby or Mirah which tried to be a statically typed version of JRuby. It is tempting to think that you can have a nice language as Ruby, just add a little bit of static-typing and get the best of both worlds. The problem is that it does not just work in that way, like the authors of Crystal figured out. Eventually that particular feature ends up having an impact on other aspects of the language and force you to rethink it globally.

I think this is often the case in engineering project: a customer think you can just squeeze another feature in and he does not get the impact on other aspects such security or architectural integrity. We as engineers have to juggle many balls and take decisions considering all of them. Adding one other ball could force us to rethink our strategy or cause all the balls to fall on the ground.