Recently I did some work to determine were latency was being added for a particular use-case. Luckily, I was able to builb my testcase by subclassing every part I wanted to instrument to add instrumentation. The instrumentation was simple, placing timestamps into a hashmap, keyed by the name of the location in the code. Obviously this is not the best way of doing this. You don't always have the opportunity to subclass the most important locations. More flexible solutions would be:
1) JInspired's JXInsight
2) An aspect-orientated library or standard Java instrumentation to instrument every method
3) DTrace with Java