What if the question is "What are the main themes of this work?" Or anything whe...

summarity · on May 11, 2023

That is solved by hypothetical embeddings.

Demo: https://youtu.be/elNrRU12xRc?t=1550 (or try it on findsight.ai and compare results of the "answer" vs the "state" filter)

For even deeper retrieval consider late interaction models such as ColBERT

adamgordonbell · on May 13, 2023

I'm not understanding how that works compared to having the full text?

Does the embedding structure somehow expose the themes? And if so, is it more the embeddings that are answering the question by how it groups things?

akiselev · on May 11, 2023

Any material comparing the different embedding models? I'm working on information retrieval from government documents and without any ML experience it's daunting

jtlicardo · on May 11, 2023

You pretty much summed up the drawbacks of the embeddings approach. In my experience it's pretty hard to extract the relevant parts of text, especially when the text is uniform.

abraxas · on May 11, 2023

You could do multi level summaries etc but yeah this is all just band aids around token limits.

Spivak · on May 11, 2023

I don't think it's as much of a band-aid as it first appears since this roughly mimics how a human would do it.

The problem is that humans have continuous information retrieval and storage where the current crop of embedding systems are static and mostly one shot.

crucialfelix · on May 11, 2023

Humans have limited working memory, they quickly forget short term memory (unless it's super significant) and our long term memory fades selectively if not reactivated or significant (intense).

This weird leaky memory has advantages and disadvantages. Forgetting is useful, it removes garbage.

Machine models could vary the balance of temporal types, drop out Etc. We may get some weird behavior.

I would guess we will see many innovations in how memory is stored in systems like these.