I learnt recently that Sandy Bridge uses a micro-instruction cache akin to the execution trace cache in Netburst. From what I know, many of the simpler x86 instructions (those translating into 4 micro-instructions or less) are decoded by hardwired decoders without any help from the microcode ROM, which is fairly a fast and simple process involving no table lookup. So, why would one cache the decoded micro-instructions and then look for them later, when they can be more easily translated by the decoder circuitry.
Is the cache only used for more complex instructions involving microcode ROM lookup? Or is there some higher wisdom involved.