HF also wrote a blog post on how you can mess around with the model in a python notebook using their excellent Diffusers library: https://huggingface.co/blog/if
I knew the model would have difficulty fitting into a 16GB VRAM GPU, but "you need to load and unload parts of the model pipeline to/from the GPU" is not a workaround I expected.
At that point it's probably better to write a guide on how to set up a VM with a A100 easily instead of trying to fit it into a Colab GPU.