With GPU Frameworks like GPUImage or Arrayfire.com I always wonder how bus transfers impact the overall performance. The best GPU code is useless if most of the time is spent transferring data from CPU to GPU and the other way back. That's why it's typically a bad idea to just "outsource" a particular computation to the GPU "because it's faster". At least that was a big concern in the old days when I did GPGPU (about 5 years ago) which is why I always thought GPGPU libraries were a bad idea.
Can anyone shed some light on this matter? Has anything changed in the last years? Are bus transfers still a concern? If yes, how does GPUImage handle them?
I would expect a developer to fully look into the bottlenecks in their application before applying something like this. For example, if they quite often have complex, multi-stage image processing pipelines, then offloading the entire pipeline to the GPU might result in quite significant speedup.
In addition (iirc) CPU-GPU busses have got quite a bit faster in the last 5 years. They're still a large bottleneck, yes, but for expensive, highly parallel computations on small pieces of data they don't completely dominate the computation cost.
EDIT:
I've also noticed that this framework uses OpenGL(ES) for its offloading. Given that, the computation could easily be offloaded to an embedded (i.e. non-discrete) GPU, eliminating the data movement cost.
In my work its not a big deal, and modern GPUs support synchronous computation and bidirectional data transfer. For example, 5 MP image takes ~3ms to upload to the GPU but my processing takes ~20ms, and writing the file takes another ~50ms. So I'm still IO bound even if I optimize the GPU DMA.
With iOS and other mobile platforms at least the GPU and CPU generally share the same memory, so it is possible to create buffers that are accessible by both (in the case of iOS you would use CVOpenGLESTextureRef or CVMetalTextureRef)
Can anyone shed some light on this matter? Has anything changed in the last years? Are bus transfers still a concern? If yes, how does GPUImage handle them?