PyTorch themselves used nanoGPT training as demo for this: https://pytorch.org/blog/accelerating-large-language-models/
PyTorch themselves used nanoGPT training as demo for this: https://pytorch.org/blog/accelerating-large-language-models/