Also, I don't use ChatGPT to rewrite blog posts and don't like people who do. Its style is annoying and if ChatGPT is doing content I might as well ask it whatever you asked it myself directly. For code I do not care much so long as it works.
Artists correctly realized the threat to their future economic viability and made up reasons it was morally bad. Programmers are currently stuck in an earlier stage, insistent that it can never replace them because [various things].
Destroying a profession without a plan to help those displaced is morally bad. It's also inevitable. The most obvious mistake of everybody on both sides of AI arguments is denying the fact that something can be used for both good and bad, will have devastating effects and yet is an advancement that must happen, etc. This isn't cognitive dissonance - it's reality.
Advanced programmers don't use AI assistants because it doesn't help them write complex new code. This is mostly good for completing only well-known and more or less simple tasks. And I hope that such people understand the danger posed by "artificial intelligence" in many cases and in many forms.
You could just run a local LLM over every document and ask it "is this related to this query". I don't think you actually want to wait a week (and holding all the documents you might ever want to search would run to petabytes).
(the reasonable way is embedding search, which runs much faster with some precomputation, but you still have to store things)
A better way would be to ask the LLM to generate keywords (or queries). And then use old school techniques to find a set of documents, and then filter those using another LLM.
How is that better than embeddings? You’re using embeddings to get a finite list of keywords, throwing out the extra benefits of embeddings (support for every human language, for instance), using a conventional index, and then going back to embeddings space for the final LLM?
That whole thing can be simplified to: compute and store embeddings for docs, compute embeddings for query, find most similar docs.
Ah, I had interpreted “old school search” to mean classic text indexing and Boolean style search. I’d argue that if it’s using embeddings and cosine similarity, it’s not old school. But that’s just semantics.
They couldn't have built it on anything but UDP because the world is now filled with poorly designed firewall/NAT middleboxes which will not route things other than TCP, UDP and optimistically ICMP.