Now that we know something about how the permutation test works on a single voxel, I'd like you to consider the problem of multiple voxels.
In this case, we need to shuffle the observations for each voxel simultaneously, and shuffle them so that the same elements are transposed at each voxel. We need to compute an empirical distribution for the maximal correlation that can arise by chance over the entire image when the null hypothesis holds.
Once we've constructed this null distribution, we identify the voxel whose actual correlation -- that is, the corelation between its unshuffled data and the ideal waveform -- is greatest. We rank this voxel within the null distribution, and, if the resulting tail probability is significant, we define this voxel as activated.
This creates a problem, though, if we want to use the same null distribution to test other voxels. We've tacitly assumed that all the data that have gone into our estimate of the null distribution have come from voxels where the null hypothesis holds. And now we're saying that at one of those voxels, the null hypothesis is rejected. So our null distribution is contaminated by data from an activated voxel. If we want our permutation test to realise the greatest possible statistical power, we must re-compute the null distribution, this time excluding data from the activated voxel. As you can see from the outline, this procedure is very time-consuming: in order to derive each of the M data points in the null distribution, we need to re-shuffle T time points at each of the N voxels in the image -- and we need to re-do all of this work each time another voxel is discovered to be activated. If the number of activated voxels is some fraction of the total number of voxels N, then the entire procedure takes quadratic time.