If you aren't willing to tell anything about how you defined your class, it will be difficult to help you. If you want to compile with packed attributes, or to make it interoperable among various compilers, you will likely need to place all objects in descending order of alignment requirement (all 16-byte-aligned objects ahead of 8-byte aligned ones, .....) Alternately, you could assure that each object occupies a multiple of 16 bytes, but don't rely on the compiler and linker supplying padding.
Yelling at other people won't encourage them to help you.
If a __m128i variable is local and it is aligned to 16 bytes then it is most likely so by accident. To align local or global variables on Windows platform you need to use __declspec(align(n)) attribute where n is the desired alignment to be sure that the variable will always be aligned.
Class members cannot be aligned with __declspec(align(n)) because class objects can be allocated dynamically via new operator. It is actually new operator which returns 8 byte aligned memory so the object ends up unaligned.
The only solution is to overload new  and delete  operators with your own code which will allocate and return aligned memory for class object.
@Intel: This question gets asked every few months on the forum (I already answered it once here). Someone should put the answer in some FAQ or sticky the thread.