With GPU Frameworks like GPUImage or Arrayfire.com I always wonder how bus trans...

14113 · on April 18, 2016

I would expect a developer to fully look into the bottlenecks in their application before applying something like this. For example, if they quite often have complex, multi-stage image processing pipelines, then offloading the entire pipeline to the GPU might result in quite significant speedup.

In addition (iirc) CPU-GPU busses have got quite a bit faster in the last 5 years. They're still a large bottleneck, yes, but for expensive, highly parallel computations on small pieces of data they don't completely dominate the computation cost.

EDIT:

I've also noticed that this framework uses OpenGL(ES) for its offloading. Given that, the computation could easily be offloaded to an embedded (i.e. non-discrete) GPU, eliminating the data movement cost.

frozenport · on April 18, 2016

In my work its not a big deal, and modern GPUs support synchronous computation and bidirectional data transfer. For example, 5 MP image takes ~3ms to upload to the GPU but my processing takes ~20ms, and writing the file takes another ~50ms. So I'm still IO bound even if I optimize the GPU DMA.

_xzxj · on April 18, 2016

With iOS and other mobile platforms at least the GPU and CPU generally share the same memory, so it is possible to create buffers that are accessible by both (in the case of iOS you would use CVOpenGLESTextureRef or CVMetalTextureRef)