Hi,
I am working with a code which has several statements where an array is assigned some constant value or one array is copied to other. I can see that the sequential version of the copy is taking same time as the parallel version (shown by VTune). (ie, increasing the openMP threads has no effect from 1 to 16 threads).
To reduce the copying overhead mentioned above, I saw that the compiler opt-report is giving the following suggestions for few memset and memcpy instructions -
remark #34014: optimization advice for memcpy: increase the source's alignment to 16 (and use __assume_aligned) to speed up library implementation
remark #34014: optimization advice for memset: increase the destination's alignment to 16 (and use __assume_aligned) to speed up library implementation
I tried doing the above, that is, for some arrays (which are intent(out) to the function), I am using assume_aligned directive just above their first usage, but still the above remark #34014 is shown by the compiler opt-report. Also, I did the above for some other local arrays, but for them also, the above remark #34014 is shown by the compiler, and an additional message is also being shown -
remark #34014: optimization advice for memcpy: increase the source's alignment to 16 (and use __assume_aligned) to allow inline implementation
Any suggestion as to what could be wrong?
Thanks,
Amlesh