Quantcast
Channel: Intel® Fortran Compiler
Viewing all articles
Browse latest Browse all 3270

Fast memcpy and memset

$
0
0

Hi,

 

I am working with a code which has several statements where an array is assigned some constant value or one array is copied to other. I can see that the sequential version of the copy is taking same time as the parallel version (shown by VTune). (ie, increasing the openMP threads has no effect from 1 to 16 threads).

To reduce the copying overhead mentioned above, I saw that the compiler opt-report is giving the following suggestions for few memset and memcpy instructions -

remark #34014: optimization advice for memcpy: increase the source's alignment to 16 (and use __assume_aligned) to speed up library implementation

remark #34014: optimization advice for memset: increase the destination's alignment to 16 (and use __assume_aligned) to speed up library implementation

 

I tried doing the above, that is, for some arrays (which are intent(out) to the function), I am using assume_aligned directive just above their first usage, but still the above remark #34014 is shown by the compiler opt-report. Also, I did the above for some other local arrays, but for them also, the above remark #34014 is shown by the compiler, and an additional message is also being shown -

remark #34014: optimization advice for memcpy: increase the source's alignment to 16 (and use __assume_aligned) to allow inline implementation

 

Any suggestion as to what could be wrong?

 

Thanks,

Amlesh


Viewing all articles
Browse latest Browse all 3270

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>