Parallel C++ thread with OpenMP

Difference between version 0 and 1 - Previous - Next
There is an easy way to parallelize complex copmputations in C++ by OpenMP. OpenMP is a set of directives to the compiler to parallelize loops, which allows efficient parallelization over multiple processors and vectorization over SIMD units with little changes to the code. It can be combined with SWIG and Tcl to speed up number crunching and to use Tcl to control the process.

Since Tcl_Obj is not thread-safe, all data must be stored in the C struct, most easily a C++ object. The interface can be done with SWIG, because that can create "objects" with little to no effort. Here is an example which computes a dot product in parallel:


======
#ifdef SWIG
%module dotpro

%include exception.i
%include typemaps.i
%include "std_vector.i"

namespace std {
    %template(fvec) vector<float>;
}

%{
#include "dotpro.hpp"
%}

#else
#include <vector>
#endif

// here comes the C++ header file 
typedef std::vector<float> fvec;

class dotpro {
        fvec a;
        fvec b;

public:
        dotpro(const fvec& a_, const fvec& b_) {
                a = a_;
                b = b_;
        }

        double dotproduct() {
                size_t l = a.size();
                double result = 0;
                // this code runs in parallel via OpenMP
                #pragma omp parallel for reduction(+:result)
                for (size_t i = 0; i < l; i++) {
                        result += a[i]*b[i];
                }

                return result;
        }
};


======


Save as dotpro.hpp and compile like this on macOS:


======
swig -c++ -tcl8 dotpro.hpp
clang-omp++ -DUSE_TCL_STUBS -dynamiclib -fopenmp dotpro_wrap.cxx -o dotpro.dylib -ltclstub8.5

======

On Linux:
======
swig -c++ -tcl8 dotpro.hpp
g++ -DUSE_TCL_STUBS -shared -fopenmp dotpro_wrap.cxx -o dotpro.so -ltclstub8.5

======

Then in Tcl:

======
(Programmieren) 49 % load dotpro.dylib
(Programmieren) 50 % dotpro d {1.0 2.0 3.0} {4.0 5.0 6.0}
_80196bfcb97f0000_p_dotpro
(Programmieren) 51 % d dotproduct
32.0

======


The last call to dotproduct then runs in parallel over all your CPUs.

The key is to store all data in C structs (the std::vector in this case), because Tcl_Obj is not thread safe, and convert it on the way there and back, which is easily done by SWIG.