I am working on question answering based on long documents / bundles of document...

I am working on question answering based on long documents / bundles of documents, 100+ pages, and I took a similar approach. I first summarize each page, give it a title and extract a list of subsections. Then I put all the summaries together and I ask the model to provide a hierarchical index. It will organize the whole bundle into a tree. At querying time I combine the path in the tree as additional context.