Hoping to minimize use of multiple isa target options, I made some comparison of avx vs. Avx2 performance on haswell to characterize where difference may be expected.
One of the bigger advantages of avx2 is support of stride -1 vectorization.
I have a few cases where avx is faster, apparently due to splitting 256 bit loads to optimize misaligned access.