Branch prediction and speculative execution
Most modern processors include sophisticated branch prediction circuitry. This circuitry attempts to guess the branch (say, the block that follows if versus the block following else) that will be executed and start performing these calculations in advance. This allows the processor to occupy more of its computational capacity in the pipelining process, which means that before the first result is ready to be committed to memory, several computations may already be in flight, even if these later computations are discarded.
To illustrate the power of this speculative execution, we’re going to look at an example of a “clamp loop,” which iterates through a span of data and replaces values larger than 255 with 255. A similar example appears in Chandler Carruth’s excellent CppCon 2017 talk Going nowhere fast, where he gives far more detail about what is happening here (https://youtu.be/2EWejmkKlxs?si=diXN9-4gO9X0JfUM...