OpenMP exception propagation

While implementing OpenMP exception propagation I overlooked something: each OpenMP thread that is not a Python thread that executes a with gil block actually allocates and deallocates a new threadstate each time they are entered and exited. This is because first call to PyGILState_Ensure() allocates the threadstate and the last call to PyGILState_Release() deallocates the thread state. Normally Python has already done this for you, but any thread but the master thread in the outermost parallel section is not a Python thread. So if we have a parallel section with a with gil block, we need to ensure the gil and allocate the thread state and then immediately release it with PyEval_SaveThread():

{
    declare exc_info
    declare parallel_error_indicator

    #pragma omp parallel
    {
        /* ensure to allocate a threadstate for 
           the entire parallel section */
        PyGILState_STATE _gilstate = PyGILState_Ensure();

        /* release the GIL immediately */
        Py_BEGIN_ALLOW_THREADS

        #pragma omp for
        for (...) {
            if (!parallel_error_indicator) {

                loop body

                goto no_error;
            error_label:
                {
                    PyGILState_STATE _gilstate = PyGILState_Ensure();
                    save exception info in exc_info
                    PyGILState_Release(_gilstate);
                    set parallel error indicator
                }
            no_error:;
                #pragma omp flush(parallel_error_indicator)
            } // end skip-loop-body if
        } // end omp for
        
        Py_END_ALLOW_THREADS
        PyGILState_Release(_gilstate);

    } // end omp parallel

    // after parallel section
    if (parallel_error_indicator) {
        PyGILState_STATE _gilstate = PyGILState_Ensure();
        restore exception info
        PyGILState_Release(_gilstate);
        goto parent error label
    }
}

Here ‘loop body’ is the code from the user containing a with gil block with a possible exception. In case of an exception it will contain a goto to our error_label. We acquire and release the GIL in the surrounding parallel section to avoid doing it for every iteration. Hence we cannot use the concise #pragma omp parallel for anymore.

We also need to save and restore the filename, lineno and C-lineno, and make them private variables in each parallel section, as in the time between raising the exception and setting the shared variables and releasing the GIL, another OpenMP thread may acquire the GIL, raise another exception and overwrite the line information. We simply remember only the first exception that was raised and propagate that outwards.

Alternatively we could remember the last exception, and inititialize our exc_info variables to NULL and do a Py_XDECREF, and leave the line information shared. However, the OpenMP compiler on my OSX (gcc 4.2) does not support OpenMP v3.0 (it was released before the 3.0 specification) and seems to be a little buggy (feeding it a trivial 10-line OpenMP program with certain combinations of privatization I had it segfault (gcc itself!)). With this alternative approach it seems to generate code that often resets the line information (and the filename) to 0 and NULL after the parallel section, thus segfaulting the program when it tries to build a traceback with a NULL filename (a print shows that the shared variable is assigned to multiple times (and we have the GIL so there is no race)). Seeing that the later version generates code that can execute such a function 10 million times without any problem, I can only assume that 4.2 is indeed a little buggy and that my generated code was correct. However, setting the variables private and by propagating them we have code that works in all compilers, so it seems like the better choice.

Posted in Uncategorized | Leave a comment

Fused Types Syntax

I’m back to continue work on the gsoc! Today I changed the syntax for fused types according to the discussion on the mailing list. You can now write

    cdef fused my_fused_type:
        int
        float
        ...

and


    my_fused_type = cython.fused_type(int, float, ...)

in pure mode. The plan then is to merge the current fused function on top of Vitja's work on the new CyFunction. Until that is merged I'll try to support fused types as part of def functions and not just as part of cdef or cpdef functions.

Posted in Uncategorized | Leave a comment

Fused Types w/ Runtime Dispatch

If you have a fused function that is exposed to Python, it also needs to be specialized from Python (in Cython space all this functionality already works). There are two ways to do this:

  • indexing
  • directly calling

For indexing from Python space, you can use (Cython) types or strings. So if you have a function in Cython like


    def f(cython.integral x, cython.floating y):
        print x, y

then it can be indexed as follows:


    module.f[cython.long, cython.double]
    module.f[int, float]
    module.f["int, long double"]

Please note that this example is for demonstration only, one could just use the largest types that are needed here.
All these index operations return a new specialized version of the function.

You can also call the function immediately, like


    module.f(10, 2.0)

and the specialized version with the largest types will be called, if it can be inferred from the types of the arguments, otherwise a TypeError is raised.

In an attempt to support all this, “binding Cython functions” were also made to be a bit more like actual Python functions. Vitja has added quite a bit more than just a __dict__, and functions should then also be pickle-able.

Because these binding functions can currently only be used for Python (def and cpdef) functions and methods of normal classes (non-extensions classes), the fused version (which will be a subclass) will bind differently based on whether it’s in a normal class, or in an extension class. In a normal class the methods expect self to be in the args tuple, or depending on the signature, as the second argument to the C function. In an extension class however, it expects code>self to be passed in as part of the PyCFunctionObject, through the m_self attribute (i.e., PyMethod_New vs PyDescr_NewMethod). Unfortunately, we cannot just at binding time decide to use which, because we need to be subscriptable after binding. So we have to implement tp_call and for extension methods bind self as m_self (so for unbound calls we need to get the tail of the args tuple) and for normal methods we need to type check args[0] (‘self’) for unbound calls.

Posted in Uncategorized | Leave a comment

Fused Types Syntax

For fused types we recently decided to go for another syntax. So the current syntax is


    ctypedef cython.fused_type(float, double, long double) floating_t

and we want to change it to


cdef fused floating_t:
    float
    double
    long double

I think this syntax is a lot better as it doesn’t look awkward, it will be easy to add types and you don’t have to worry about breaking lines.

Posted in Uncategorized | Leave a comment

with gil

The with gil branch brings some neat features. You can now say ‘with gil:’ and have any GIL-requiring code inside that block. Exceptions are propagated through nogil code and try/finally may now be used if there is a with gil block inside the try.

Of course, if with gil is used from a cdef function from a non-Python thread, it is the user’s responsibility to call PyEval_InitThreads() beforehand, in the same way it’s needed for ‘with gil’ functions.

Whenever the user (c)imports cython.parallel, PyEval_InitThreads() is called on the user’s behalf, as only the OpenMP master thread is allowed to call PyGILState_Ensure(), but we need any OpenMP thread to be able to acquire the GIL.

Posted in Uncategorized | Leave a comment

OpenMP break, continue, return and exceptions

Firstly, my gsoc schedule is a bit unusual, as I will be largely unavailable for heavy coding until the second week of July.

So I’ve been working on OpenMP support, and now it is possible to use break, continue and return in parallel code, by trapping them inside the parallel construct and then propagating after them after the parallel construct. Of course, if you’re returning from multiple threads at the same time, it’s unspecified which value will be returned. Also, if you’re breaking out of these sections, the index variable has an undefined value afterwards (but it is known to be within the range of the prange parameters).

Today Robert merged my with gil branch, so you can now even use GIL-code inside parallel sections, and have exceptions propagate from them. And of course you can still surround those with gil blocks with a nogil-mode try/finally.

Posted in Uncategorized | Leave a comment

Fused Types

So the OpenMP branch is merged, and there’s a pending pull request for some additional functionality, mainly the initialization of thread-private variables to invalid values, like NaN.

I fixed up a bug in the withgil branch where warnings issued for division with differing C and Python semantics would segfault the program. This because it was relying on the global error indicators and not on the local ones, that needed to be passed in as arguments.

So now I’m going continue with Fused Types, some things that are are remaining are support for cpdef functions, and perhaps supporting fused types in structs and extension classes.

Posted in Uncategorized | Leave a comment