It's easier to answer, if established successful practice in the application area of interest is important to you. Of those you mention, I think only MPI covers distributed memory (cluster computing),and, yes, it's useful for shared memory multi-socket platforms too. Combinations of OpenMP and MPI have proven themselves for data parallel applications over a number of years, and are extended successfully to new architectures like cuda and MIC, and to some extent to tasking. Several of the models you mentioned work only with C++, incompatible with other parallel programming languages, and are too new to answer all your questions. In that group, only TBB is somewhat well established, where you can get an idea what has been done successfully by using your search engine. Evidently, the amount of continuing work and change in the picture is evidence of people believing that superior models are at hand.