Why can't these models run on the GPU while also using CPU RAM for the storage? That way people will performant-but-memory-starved GPUs can still utilize the better performance of the GPU calculation while also having enough RAM to store the model? I know it is possible to provide system RAM-backed GPU objects.