9

I intend to use buffers std::vector<size_t> buffer(100), one in each thread in a parallelization of a loop, as suggested by this code:

std::vector<size_t> buffer(100);
#pragma omp parallel for private(buffer)
for(size_t j = 0; j < 10000; ++j) {
    // ... code using the buffer ...
}

This code does not work. Although there is a buffer for every thread, those can have size 0.

How can I allocate the buffer in the beginning of each thread? Can I still use #pragma omp parallel for? And can I do it more elegantly than this:

std::vector<size_t> buffer;
#pragma omp parallel for private(buffer)
for(size_t j = 0; j < 10000; ++j) {
    if(buffer.size() != 100) {
        #pragma omp critical
        buffer.resize(100);
    }
    // ... code using the buffer ...
}
5
  • I think I see the problem now. The vector is not being properly copy-constructed into the OpenMP region. I'm not sure what the OpenMP standard says about copy-construction of private variables into the threads.
    – Mysticial
    Commented Mar 11, 2013 at 22:28
  • If you want them to be separate, just declare the vector inside the OpenMP region.
    – Mysticial
    Commented Mar 11, 2013 at 22:29
  • Thanks for the clarification. I know how to delcare the vector inside the OpenMP region if I parallelize the loop manually. But does this work also with #pragma omp parallel for?
    – Max Flow
    Commented Mar 11, 2013 at 22:33
  • 2
    You would have to do a second layer. One with omp parallel. Then one with omp for. Declare the vector inside the first level, but outside the for-loop.
    – Mysticial
    Commented Mar 11, 2013 at 22:35
  • That's a solution I was looking for. Thanks!
    – Max Flow
    Commented Mar 11, 2013 at 22:44

2 Answers 2

14

The question and the accepted answer have been around for a while, here are some further information which provide additional insight into openMP and therefore might be helpful to other users.

In C++, the private and firstprivate clause handle class objects differently:

From the OpenMP Application Program Interface v3.1:

private: the new list item is initialized, or has an undefined initial value, as if it had been locally declared without an initializer. The order in which any default constructors for different private variables of class type are called is unspecified.

firstprivate: for variables of class type, a copy constructor is invoked to perform the initialization of list variables.

i.e. private calls the default constructor, whereas firstprivate calls the copy constructor of the corresponding class.

The default constructor of std::vector constructs an empty container with no elements, this is why the buffers have size 0.

To answer the question, this would be an other solution with no need to split the OpenMP region:

std::vector<size_t> buffer(100, 0);  
#pragma omp parallel for firstprivate(buffer)
for (size_t j = 0; j < 10000; ++j) {
  // use the buffer
}

EDIT a word of caution regarding private variables in general: the thread stack size is limited and unless explicitly set (environment variable OMP_STACKSIZE) compiler dependent. If you use private variables with a large memory footprint, stack overflow may become an issue.

3
  • 1
    Thanks for this. This is exactly what I suspected regarding objects and private/firstprivate. But regarding your note on the stack size, it's completely irrelevant since std::vector allocates memory on the heap, and the actual vector container only needs to store like 3 pointers on the stack (in most implementations). Commented Feb 5, 2014 at 22:23
  • This answer is better because it reduces the number of copies of buffer from 10000 to the number of threads. Commented May 16, 2015 at 4:03
  • Can I do something like declare an array of vectors up-front, and then have each thread only access the vector in one "slot" of that array? I have done this for simple variables like ints but I'm not feeling confident that it is safe to do this with vectors.
    – Ben Farmer
    Commented Nov 25, 2015 at 17:18
13

Split the OpenMP region as shown in this question.

Then declare the vector inside the outer-region, but outside the for-loop itself. This will make one local vector for each thread.

#pragma omp parallel
{
    std::vector<size_t> buffer(100);

#pragma omp for
    for(size_t j = 0; j < 10000; ++j) {
    {

        // ... code using the buffer ...

    }
}

Not the answer you're looking for? Browse other questions tagged or ask your own question.