With embeddings, you essentially can. Group the book into sections, embed each section, then when you do a prompt, add in the N most similar embedded sections to your prompt.
What if the question is "What are the main themes of this work?"
Or anything where the question answer isn't 'close' to the words used in the question?
How well does this work vs giving it the whole thing as a prompt?
I assume worse but I'm not sure how this approach compares to giving it the full thing in the prompt or splitting it into N sections and running on each and then summarizing.
Any material comparing the different embedding models? I'm working on information retrieval from government documents and without any ML experience it's daunting
You pretty much summed up the drawbacks of the embeddings approach. In my experience it's pretty hard to extract the relevant parts of text, especially when the text is uniform.
I don't think it's as much of a band-aid as it first appears since this roughly mimics how a human would do it.
The problem is that humans have continuous information retrieval and storage where the current crop of embedding systems are static and mostly one shot.
Humans have limited working memory, they quickly forget short term memory (unless it's super significant) and our long term memory fades selectively if not reactivated or significant (intense).
This weird leaky memory has advantages and disadvantages. Forgetting is useful, it removes garbage.
Machine models could vary the balance of temporal types, drop out
Etc. We may get some weird behavior.
I would guess we will see many innovations in how memory is stored in systems like these.
The real gain would be if we were able to use the 100K Context Windows and not this "embeddings trick". The embeddings work only in some cases where the answer is in a short part(s) of the document. If user asks something like "what are the main ideas?" or "Summarize the document." or any question that needs context from large portions of the book/pdf/file, then it will not work with the embeddings trick that use just short passages in prompt. But if large context windows costs are high, we need to keep using embeddings and few text parts.