the information you're interested in what instructions can be issued in the same cycle via port 0, 1, 5 is essentially listed in Table 2-2 of the Intel 64 Architecture Optimization manual. http://developer.intel.com/products/processor/manuals/index.htm
Note that, padd is not limited to be issued by one specific port (there are more than one SIMD ALU), the instruction that has throughput of 1 cycle generally would be constrained to one specific issue port. You should also be aware that the specific arrangement dependency of instruction operands will impose further constraints on parallelism. If you're writing micro-kernels for experimentation (the out-of-order engine will try to the issue multiple micro-ops as permitted by a number of factors). It's more practical to verify what happend via performance monitoring events than trying to predict based on instruction latency, throughput, port bindings (there are some pointers in Appenix B of the Optimization manual). The three factors (latency/throughput, port bindings) form only a part of conditions for OOO engine to achieve instruction-level parallelism.