This is a very cool idea. We are doing something similar except we are also pred...

This is a very cool idea.

We are doing something similar except we are also predicting the nodes.

In the end, the winning combination will likely be doing both. There will be a predicted graph structure which serves as a high level guide to make sure the long text doesn't lose focus, but everything will still be written with full context using something like Compressive Transformers or Expire-Span.