If i understand your code correctly, you have bunch of AVX instructions and afew SSE instructions in middle to work on 4 elements. If this code is written in instrinsics, there should not be any penalty. The reason is almost all float 4-element (or 128bit SSE) instructions use AVX encoding. Compiler is smart enough to generate the AVX - 128bit instructions eventhough you are using old SSE intrinsics. So in the end you have full AVX code, mix off 256bits and 128bits.
But issue is only when you have a precompiled SSE lib (w/o AVX switch) then get linked to your AVX code and somehow AVX code jumps to SSE code. Then you can have penality, but you can easily avoid that by adding vzeroupper before the lib call if you are not sure.