What if you put the array in a struct and made a union of both uint32_t and uint8_t? Would the union with the larger size force the compiler to generate a 4-byte aligned array for the bytes?
I suggest this because it would be portable without any compiler specific stuff.
I suggest this because it would be portable without any compiler specific stuff.