Add atomic accesses to that list and you've pretty much covered the gamut of synchronization options. The costs of each is usually measured in terms of how long they take--how long they hold the lock and so serve as a potential target for contention. Atomics have the lightest weight and area fundamental construct for creating the other mechanisms but are limited to controllingnon-blocking access toa single memory element. With an atomic read-modify-write cycle you can employ such operations as test-and-set or compare-exchange to define a lock (monitor) or P&V operator (semaphore) that can do more than single element update but have a bigger footprint and so may cause more contention.
The "lock" and "synchronized" keywords are Java and C# mechanisms for doing the locks and monitors described above. These are constructs in the language which define equivalent structures as described above, so they are in the same vein as the constructs above.
Going to question 4 first, adding constructs to project objects from unsynchronized accesses adds instructions and overhead to the code. If this was single threaded code before, executing it after installing thread safety will slow it down by at least the time it takes to run the extra instructions, even without contention from other threads. The performance difference depends on how big the protected section is with respect to the added synchronization instructions: a bigger protected section means the cost of those synchronization instructions can be amortized over more of the original instructions, meaning the synchronization instructions will have a lesser impact on single threaded code.
Unfortunately, what's good for single-threaded code is not so good for multi-threaded code. That same protected code whose larger size means more code to amortize the cost of synchronization instructions also means that the lock for the protected code is held longer and so many more likely be a point of contention among the threads accessing it.
This additional cost to ensure thread safety may be one reason why there's plenty of Java and .NET classes that may not be thread-safe. There is a performance penalty to pay. And it's not just Java and .NET. STL container classes are generally not thread-safe and therefore perform a little better than their thread-safe counterparts in say, Boost or Intel Threading Building Blocks on single threaded code. But they blow chunks when you throw multiple threads at them, creating the potential for all kinds of havoc. There's also a discipline associated with designing code for scalability and thread safety which is a hurdle many programmers have not surmounted yet.
There are plenty of places on the web where you can learn more about these details. You might start with this Wikipedia article and go from there.
Sorry about the delayed response. I don't visit this forum as often as I might.
Yes, I've noticed the slant of Microsoft .NET threading and documentation towards support of asynchronous callbacks and wondered about it myself. Even the nacent thread pool seems canted towards reusable threads for callbacks. There is a Tech Preview, Microsoft Parallel Extensions to the .NET Framework 3.5 which may offer more, but I haven't had time to evaluate it myself. Beyond I/O completion detection for high performance applications of asynchronous I/O (like Infiniband and interfaces of its ilk), I haven't seen much application of asynchronous callbacks.