Quantcast
Viewing all articles
Browse latest Browse all 3270

OpenMP threading analysis

Colleagues,

I am threading a large bit of code that repeatedly calls a routine (in the present case, 100's of times). Within that routine I have used an !$omp parallel do structure to thread a double do loop. Threading directives surround the outer loop.  I use a "schedule(dynamic)" clause to balance work load (says here), and specify the use of 4 cores. I could use some help interpreting VTune performance data. The time requited by "_kmp_barrier" is by far the largest block of time. For example, VTune reports:

  1. _kmp_barrier             17..53
  2. _kmp_x86_pause      4.52
  3. my routine                 4.05

VTune also reports a histogram showing wall time and the number of cores running simultaneously. As I might expect, the largest time is for 1 core running (the master thread). But the wall times for 2, 3, 4 cores running simultaneously are not about the same. '2' is 5x larger than '3', and '4' is virtually no time. I would think the wall time for 2 and 3 cores running simultaneously would be small, and the wall time for 4 cores running simultaneously would be 2nd largest, after '1'.

Does this mean the work load is wildly imbalanced?

What is so much time taken up by the omp barrier?

David

 


Viewing all articles
Browse latest Browse all 3270


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>