Written by Federico Tomassetti
in Code processing

    Recently I have been playing with some ideas about applying static analysis to Python and building a Python editor in Jetbrains MPS.

    To do any of this I would need to first build a model of Python code. Recently we have seen how to parse Python code, however we still need to consider all the packages our code use. Some of those could be builtin or be implemented through C extensions. That means we do not have python code for them. In this post I look into retrieving a list of all modules and then inspect their contents.

    My strategy is to use reflection writing scripts in Python. I will then invoke those scripts from inside Jetbrains MPS (and so from Java code). However this is the topic of a future post.

    Listing modules

    Listing top modules is relatively easy if you know how to do it. This script prints a list of all top level modules:

    import pkgutil
    for p in pkgutil.iter_modules():

    Now we need to look inside modules to find sub-modules. For performance reasons I want to do that only when it is needed:

    import pkgutil
    import sys
    def explore_package(module_name):    
        loader = pkgutil.get_loader(module_name)
        for sub_module in pkgutil.walk_packages([loader.filename]):
            _, sub_module_name, _ = sub_module
            qname = module_name + "." + sub_module_name

    For example for xml I get:


    Examining module contents and recognizing functions

    Now given a module I need to list all its contents. I can load the module by name and iterate over it, printing information about the elements found.
    I want to distinguish between classes, submodules (which I will ignore for now), functions and simple values.
    Builtin functions need to be treated differently: to access their information I need to parse their documentation. Not cool, not cool at all.

    import sys
    import inspect
    def describe_builtin(obj):
        """ Describe a builtin function """
        # Built-in functions cannot be inspected by
        # inspect.getargspec. We have to try and parse
        # the __doc__ attribute of the function.
        docstr = obj.__doc__
        args = ''
        if docstr:
            items = docstr.split('n')
            if items:
                func_descr = items[0]
                s = func_descr.replace(obj.__name__,'')
                idx1 = s.find('(')
                idx2 = s.find(')',idx1)
                if idx1 != -1 and idx2 != -1 and (idx2>idx1+1):
                    args = s[idx1+1:idx2]
        return args
    package_name = sys.argv[1].strip()
    mymodule = __import__(package_name, fromlist=['foo'])
    for element_name in dir(mymodule):
        element = getattr(mymodule, element_name)
        if inspect.isclass(element):
            print("class %s" % element_name)
        elif inspect.ismodule(element):
        elif hasattr(element, '__call__'):
            if inspect.isbuiltin(element):
                sys.stdout.write("builtin_function %s" % element_name)
                data = describe_builtin(element)
                data = data.replace("[", " [")
                data = data.replace("  [", " [")
                data = data.replace(" [, ", " [")
                sys.stdout.write(data.replace(", ", " "))
                    data = inspect.getargspec(element)
                    sys.stdout.write("function %s" % element_name)
                    for a in data.args:
                        sys.stdout.write(" ")
                    if data.varargs:
                        sys.stdout.write(" *")
            print("value %s" % element_name)

    This is what I get for the module os:

    value EX_CANTCREAT
    value EX_CONFIG
    value EX_DATAERR
    value EX_IOERR
    value EX_NOHOST
    value EX_NOINPUT
    value EX_NOPERM
    value EX_NOUSER
    value EX_OK
    value EX_OSERR
    value EX_OSFILE
    value EX_PROTOCOL
    value EX_SOFTWARE
    value EX_TEMPFAIL
    value EX_USAGE
    value F_OK
    value NGROUPS_MAX
    value O_APPEND
    value O_ASYNC
    value O_CREAT
    value O_DIRECT
    value O_DIRECTORY
    value O_DSYNC
    value O_EXCL
    value O_LARGEFILE
    value O_NDELAY
    value O_NOATIME
    value O_NOCTTY
    value O_NOFOLLOW
    value O_NONBLOCK
    value O_RDONLY
    value O_RDWR
    value O_RSYNC
    value O_SYNC
    value O_TRUNC
    value O_WRONLY
    value P_NOWAIT
    value P_NOWAITO
    value P_WAIT
    value R_OK
    value SEEK_CUR
    value SEEK_END
    value SEEK_SET
    value ST_APPEND
    value ST_MANDLOCK
    value ST_NOATIME
    value ST_NODEV
    value ST_NOEXEC
    value ST_NOSUID
    value ST_RDONLY
    value ST_RELATIME
    value ST_WRITE
    value TMP_MAX
    value WCONTINUED
    builtin_function WCOREDUMPstatus
    builtin_function WEXITSTATUSstatus
    builtin_function WIFCONTINUEDstatus
    builtin_function WIFEXITEDstatus
    builtin_function WIFSIGNALEDstatus
    builtin_function WIFSTOPPEDstatus
    value WNOHANG
    builtin_function WSTOPSIGstatus
    builtin_function WTERMSIGstatus
    value WUNTRACED
    value W_OK
    value X_OK
    class _Environ
    value __all__
    value __builtins__
    value __doc__
    value __file__
    value __name__
    value __package__
    function _execvpe file args env
    function _exists name
    builtin_function _exitstatus
    function _get_exports_list module
    function _make_stat_result tup dict
    function _make_statvfs_result tup dict
    function _pickle_stat_result sr
    function _pickle_statvfs_result sr
    function _spawnvef mode file args env func
    builtin_function abort
    builtin_function accesspath mode
    value altsep
    builtin_function chdirpath
    builtin_function chmodpath mode
    builtin_function chownpath uid gid
    builtin_function chrootpath
    builtin_function closefd
    builtin_function closerangefd_low fd_high
    builtin_function confstrname
    value confstr_names
    builtin_function ctermid
    value curdir
    value defpath
    value devnull
    builtin_function dupfd
    builtin_function dup2old_fd new_fd
    value environ
    class error
    function execl file *args
    function execle file *args
    function execlp file *args
    function execlpe file *args
    builtin_function execvpath args
    builtin_function execvepath args env
    function execvp file args
    function execvpe file args env
    value extsep
    builtin_function fchdirfildes
    builtin_function fchmodfd mode
    builtin_function fchownfd uid gid
    builtin_function fdatasyncfildes
    builtin_function fdopenfd [mode='r' [bufsize]]
    builtin_function fork
    builtin_function forkpty
    builtin_function fpathconffd name
    builtin_function fstatfd
    builtin_function fstatvfsfd
    builtin_function fsyncfildes
    builtin_function ftruncatefd length
    builtin_function getcwd
    builtin_function getcwdu
    builtin_function getegid
    function getenv key default
    builtin_function geteuid
    builtin_function getgid
    builtin_function getgroups
    builtin_function getloadavg
    builtin_function getlogin
    builtin_function getpgidpid
    builtin_function getpgrp
    builtin_function getpid
    builtin_function getppid
    builtin_function getresgid
    builtin_function getresuid
    builtin_function getsidpid
    builtin_function getuid
    builtin_function initgroupsusername gid
    builtin_function isattyfd
    builtin_function killpid sig
    builtin_function killpgpgid sig
    builtin_function lchownpath uid gid
    value linesep
    builtin_function linksrc dst
    builtin_function listdirpath
    builtin_function lseekfd pos how
    builtin_function lstatpath
    builtin_function majordevice
    builtin_function makedevmajor minor
    function makedirs name mode
    builtin_function minordevice
    builtin_function mkdirpath [mode=0777]
    builtin_function mkfifofilename [mode=0666]
    builtin_function mknodfilename [mode=0600 device]
    value name
    builtin_function niceinc
    builtin_function openfilename flag [mode=0777]
    builtin_function openpty
    value pardir
    builtin_function pathconfpath name
    value pathconf_names
    value pathsep
    builtin_function pipe
    builtin_function popencommand [mode='r' [bufsize]]
    function popen2 cmd mode bufsize
    function popen3 cmd mode bufsize
    function popen4 cmd mode bufsize
    builtin_function putenvkey value
    builtin_function readfd buffersize
    builtin_function readlinkpath
    builtin_function removepath
    function removedirs name
    builtin_function renameold new
    function renames old new
    builtin_function rmdirpath
    value sep
    builtin_function setegidgid
    builtin_function seteuiduid
    builtin_function setgidgid
    builtin_function setgroupslist
    builtin_function setpgidpid pgrp
    builtin_function setpgrp
    builtin_function setregidrgid egid
    builtin_function setresgidrgid egid sgid
    builtin_function setresuidruid euid suid
    builtin_function setreuidruid euid
    builtin_function setsid
    builtin_function setuiduid
    function spawnl mode file *args
    function spawnle mode file *args
    function spawnlp mode file *args
    function spawnlpe mode file *args
    function spawnv mode file args
    function spawnve mode file args env
    function spawnvp mode file args
    function spawnvpe mode file args env
    builtin_function statpath
    builtin_function stat_float_times [newval]
    class stat_result
    builtin_function statvfspath
    class statvfs_result
    builtin_function strerrorcode
    builtin_function symlinksrc dst
    builtin_function sysconfname
    value sysconf_names
    builtin_function systemcommand
    builtin_function tcgetpgrpfd
    builtin_function tcsetpgrpfd pgid
    builtin_function tempnam [dir [prefix]]
    builtin_function times
    builtin_function tmpfile
    builtin_function tmpnam
    builtin_function ttynamefd
    builtin_function umasknew_mask
    builtin_function uname
    builtin_function unlinkpath
    builtin_function unsetenvkey
    builtin_function urandomn
    builtin_function utimepath (atime mtime
    builtin_function wait
    builtin_function wait3options
    builtin_function wait4pid options
    builtin_function waitpidpid options
    function walk top topdown onerror followlinks
    builtin_function writefd strin

    Of course for functions I want to build a model of its interface (which parameters it takes, which ones are optional, which ones are variadic and so on). We have the information needed here, it is just a matter of transforming it in a representable form.


    I still need to build a model of the imported classes but I starting to have a decent model of the elements I can import in my Python code. This would permit to verify easily which import statements are valid. Of course this can be used in combination with virtualenvs and requirements files: given a list of requirements I would install them in a virtualenv and build the model of the modules available in that virtualenv. I could then statically verify which import would work in that context.

    Download the guide with 68 resources on Creating Programming Languages

    Receive the guide to your inbox to read it on all your devices when you have time

    Powered by ConvertKit
    Creating a Programming Language

    Learn to Create Programming Languages

    Subscribe to our newsletter to get the FREE email course that teaches you how to create a programming language