OpenMP exception propagation

While implementing OpenMP exception propagation I overlooked something: each OpenMP thread that is not a Python thread that executes a with gil block actually allocates and deallocates a new threadstate each time they are entered and exited. This is because first call to PyGILState_Ensure() allocates the threadstate and the last call to PyGILState_Release() deallocates the thread state. Normally Python has already done this for you, but any thread but the master thread in the outermost parallel section is not a Python thread. So if we have a parallel section with a with gil block, we need to ensure the gil and allocate the thread state and then immediately release it with PyEval_SaveThread():

    declare exc_info
    declare parallel_error_indicator

    #pragma omp parallel
        /* ensure to allocate a threadstate for 
           the entire parallel section */
        PyGILState_STATE _gilstate = PyGILState_Ensure();

        /* release the GIL immediately */

        #pragma omp for
        for (...) {
            if (!parallel_error_indicator) {

                loop body

                goto no_error;
                    PyGILState_STATE _gilstate = PyGILState_Ensure();
                    save exception info in exc_info
                    set parallel error indicator
                #pragma omp flush(parallel_error_indicator)
            } // end skip-loop-body if
        } // end omp for

    } // end omp parallel

    // after parallel section
    if (parallel_error_indicator) {
        PyGILState_STATE _gilstate = PyGILState_Ensure();
        restore exception info
        goto parent error label

Here ‘loop body’ is the code from the user containing a with gil block with a possible exception. In case of an exception it will contain a goto to our error_label. We acquire and release the GIL in the surrounding parallel section to avoid doing it for every iteration. Hence we cannot use the concise #pragma omp parallel for anymore.

We also need to save and restore the filename, lineno and C-lineno, and make them private variables in each parallel section, as in the time between raising the exception and setting the shared variables and releasing the GIL, another OpenMP thread may acquire the GIL, raise another exception and overwrite the line information. We simply remember only the first exception that was raised and propagate that outwards.

Alternatively we could remember the last exception, and inititialize our exc_info variables to NULL and do a Py_XDECREF, and leave the line information shared. However, the OpenMP compiler on my OSX (gcc 4.2) does not support OpenMP v3.0 (it was released before the 3.0 specification) and seems to be a little buggy (feeding it a trivial 10-line OpenMP program with certain combinations of privatization I had it segfault (gcc itself!)). With this alternative approach it seems to generate code that often resets the line information (and the filename) to 0 and NULL after the parallel section, thus segfaulting the program when it tries to build a traceback with a NULL filename (a print shows that the shared variable is assigned to multiple times (and we have the GIL so there is no race)). Seeing that the later version generates code that can execute such a function 10 million times without any problem, I can only assume that 4.2 is indeed a little buggy and that my generated code was correct. However, setting the variables private and by propagating them we have code that works in all compilers, so it seems like the better choice.


About markflorisson

Blog for the Cython gsoc 2011
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s