Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

With embeddings, you essentially can. Group the book into sections, embed each section, then when you do a prompt, add in the N most similar embedded sections to your prompt.


What if the question is "What are the main themes of this work?"

Or anything where the question answer isn't 'close' to the words used in the question?

How well does this work vs giving it the whole thing as a prompt?

I assume worse but I'm not sure how this approach compares to giving it the full thing in the prompt or splitting it into N sections and running on each and then summarizing.


That is solved by hypothetical embeddings.

Background: https://summarity.com/hyde

Demo: https://youtu.be/elNrRU12xRc?t=1550 (or try it on findsight.ai and compare results of the "answer" vs the "state" filter)

For even deeper retrieval consider late interaction models such as ColBERT


I'm not understanding how that works compared to having the full text?

Does the embedding structure somehow expose the themes? And if so, is it more the embeddings that are answering the question by how it groups things?


Any material comparing the different embedding models? I'm working on information retrieval from government documents and without any ML experience it's daunting


You pretty much summed up the drawbacks of the embeddings approach. In my experience it's pretty hard to extract the relevant parts of text, especially when the text is uniform.


You could do multi level summaries etc but yeah this is all just band aids around token limits.


I don't think it's as much of a band-aid as it first appears since this roughly mimics how a human would do it.

The problem is that humans have continuous information retrieval and storage where the current crop of embedding systems are static and mostly one shot.


Humans have limited working memory, they quickly forget short term memory (unless it's super significant) and our long term memory fades selectively if not reactivated or significant (intense).

This weird leaky memory has advantages and disadvantages. Forgetting is useful, it removes garbage.

Machine models could vary the balance of temporal types, drop out Etc. We may get some weird behavior.

I would guess we will see many innovations in how memory is stored in systems like these.


The real gain would be if we were able to use the 100K Context Windows and not this "embeddings trick". The embeddings work only in some cases where the answer is in a short part(s) of the document. If user asks something like "what are the main ideas?" or "Summarize the document." or any question that needs context from large portions of the book/pdf/file, then it will not work with the embeddings trick that use just short passages in prompt. But if large context windows costs are high, we need to keep using embeddings and few text parts.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: