right, using MOVUPS for any floating point type double or single (and AES instructions too, btw) is OK and recommended, MOVDQU should be used with integer types - MOVUPS is as fast as MOVAPS for _aligned_ data starting with Nehalem (aka Core i7 / Xeon 5500 etc.)
in AVX, there is an interesting and important paradigm change however, as LD+OP instructions no longer generate exceptions
i.e. in SSE:
ADDPS xmm0, [rsp+10] is the equivalent of MOVAPS xmm1, [rsp+10]; ADDPS xmm0, xmm1;
while in AVX:
VADDPS xmm0, xmm0, [rsp+10] <=> VMOVUPS xmm1, [rsp+10]; VADDPS xmm0, xmm0, xmm1;
so, in AVX, to keep uniform exception behavior (more precisely exception-less behavior) that is independent on compilers code generation it is strongly recommended to avoid using VMOVAPS/VMOVDQA instructions and _mm_load_xx() intrinsics and always use VMOVUPS/VMOVDQU instructions and _mm_loadu_xx() intrinsics instead, it is neutral for performance and will never surprise you (or customer) with the exception (crash) if data passed to the instructions sometimes happen to be misaligned.
Having said that, for the best performance results, please keep aligning your data.