The main factor would be the amount of mismatch in the rate of writing into the channel and reading from it. If these rates are expected to be similar in both of your kernels, then a shallow depth of a few indexes (<20) will suffice and will not use Block RAMs either. If, however, the rate is expected to be very different, then you should keep increasing the depth and measure the performance to see when the performance will become stable.
When you have a similar amount of stalling on both the read and the write side, it means the source of stall is not the channel but something else. Indeed increasing channel depth will not help if you only have stalls on the read side but not on the write side.