I would like to test the network latency/bandwidth of each node that I am running on in parallel. I think the simplest way to do this would be to have each node test itself.
Use two systems...
Barring that, see if you can use a router (not switch) where the router maps one IP address to another IP address. This will require two Ethernet ports on the host (not unusual for most server boards) and for each port to have different IP addresses. In a real configuration, you will likely have a switch or router between the nodes IOW the overhead of the router in the test will be similar to the overhead in a real connection.
As Jim mentioned you need at least 2 nodes to measure performance between pair of nodes.
I_MPI_FABRICS allows to specify transports for intra-node and inter-node communications separately, so you can specify I_MPI_FABRICS=tcp:tcp to use sockets for all communications (in case of intra-node communications it will go over loopback instead of optimized shared memory transport).
But in general to measure latency/bandwidth you can use existing benchmarks, for example IMB (Intel MPI Benchmarks).
For example, using "IMB-MPI1 uniband" you can create multiple pairs of processes communicating in parallel. This benchmark will measure unidirectional bandwidth between 2 nodes and in case of correct setup you will see bandwidth numbers close to theoretical limit.
"IMB-MPI1 pingpong" can be used to measure latency.
Also to get the broader picture of cluster you can use mpiGraph (https://sourceforge.net/projects/mpigraph/).