Historically yes - uppercases came first. Even when lower cases showed up, the combination key sequence to get the lowercases working was harder so that took more time to catchup
This is a Google Colab notebook (that works in Jupyter too). The notebook can connect to live data sources (e.g. Kafka) and you can do analysis. You could do a CSV replay if you have timestamped data entries
Jan, can you explain briefly how the deduplicator checks if the new answer is significantly different? Is there code in the repository we can take a look at?
Sure: when a new response is produced because some source documents have changed we ask an LLM to compare the responses and tell if they are significantly different. Even a simplistic prompt, like the one used in the example would do:
Are the two following responses deviating?
Answer with Yes or No.
First response: "{old}"
Second response: "{new}"
That's a good idea, the deduplication criterion is easy to change, using an llm is faster to get started, but after a while a corpus of decisions is created and can be used to either select another mechanism, or e.g. train one on top of bert embeddings.
I feel that there are too many moving pieces here especially for prototyping. There was a much more simpler app recently I took a look at on a recent hackernews post : https://news.ycombinator.com/item?id=36894142
They still have work to do with different connectors (e.g. PDF etc) but the realtime simple document pipeline is what helps a lot.