I think often the FFT is used to actually speed up convolution, it isn't slower than computing convolution of your kernel shifted to every pixel position.
> [FFT] isn't slower than computing convolution of your kernel shifted to every pixel position.
For a fixed kernel size and sufficiently large images, full-image FFT is slower than naive convolution because FFT scales by O(n log n) by the image size, while naive convolution only scales by O(n) in terms of the image size. This can be avoided by using blocked FFT instead of full-image FFT:
> If one sequence is much longer than the other, zero-extension of the shorter sequence and fast circular convolution is not the most computationally efficient method available.[13] Instead, decomposing the longer sequence into blocks and convolving each block allows for faster algorithms such as the overlap–save method and overlap–add method.[14] A hybrid convolution method that combines block and FIR algorithms allows for a zero input-output latency that is useful for real-time convolution computations.[15]
I see, I wasn't thinking about kernels much smaller than the image itself. So, if you don't use blocked FFT, it only makes sense to use FFT convolution when your kernel size is > log n?
You may be right, but I think in practice, people use direct convolution for small kernels, and FFT convolution for large kernels (blocked FFT if the signal is substantially longer than the kernel). I didn't look into the math behind this intuition though.
https://en.wikipedia.org/wiki/Convolution#Fast_convolution_a...