Your question is not clear. Are you asking how x0 ( a _m128i variable) will be loaded with 4 consecutive elements x00 x01 x02 x03?
it depends on the data types of x00, x01 x02 x03 also.
You can use simple load instruction to load the data (SSE2).
_mm_load_si128(__m128 *data)or _mm_loadu_si128().
e.g. if they are char (8bit each). you can also use SSE4 instructions (PMOVZX), if data is packed:
_m128i x0 = _mm_cvtepu8_epi32(* (__m128i *) Input); where input is pointer to the integer (32bit containing 4 elements).
similarly if each element is short then you need to use: _mm_cvtepu16_epi32() and _mmcvtepu32_epi64 for 32 ints.