You are writing convoluted code and hoping that your compiler will figure it out and convert it internally to the form I posted. Sometimes it does, sometimes it doesn't. In this case it generates reasonable code but doesn't vectorize it for some reason. WTF.
I prefer to just add alignment specification and move on, assuming I don't care about portability. If portability matters, reread my original post ;)
It's not convoluted. It's actually clear and well-defined making it easier to reason about.
I'd call compiler specific alignment attributes more arcane, convoluted, and susceptible to future bugs.
Vectorization isn't a panacea. You need to benchmark to be sure, lacking that I expect GCC to be better at optimizing code than you. If you disagree, please manually write a vectorized one that handles non-aligned addition and post your results :)
I prefer to just add alignment specification and move on, assuming I don't care about portability. If portability matters, reread my original post ;)