Somewhat off-topic but congratulations, you have re-implemented (in the small) the search engine that powers HN. I say this as an employee of the company behind that engine, and I say this as someone who is intimately familiar with the codebase and with how the technology works.
With a little more effort towards scalability, and with some input from a devops co-founder, you'd be on your way to launching a state-of-the-art, market-competitive search engine.
My employer won't be too happy seeing me post this, but I don't care, this needs to be said.
Whats special about this particular cosine similarity search? There are loads of these! Like every other day on HN since
chatGPT was released. Several PDF chatbots. There are various overfunded vector search companies. Some are open source. And then there is elastic search etc.
We are not at all in disagreement. In fact, you are making the same point I'm making, which is that there is nothing special about any of the commercially available search engines today. With access to OpenAI's embeddings API and to a vector database, a mid-level engineer can build a highly scalable search engine in a few weeks. As a startup, it makes sense today to build your own search engine rather than buy off the shelf.
The only thing companies like ElasticSearch and Algolia still have is their pre-existing customer bases, a few thin layers of marketing, and some network effects. Search engine companies are effectively marketing companies nowadays.
That explains why the corrupt management at my company thinks it's a better strategy to throw hissy fits on social media in the general direction of OpenAI, than to work on actually building useful tech.
I see what you mean. Interesting topic! Some startups will just want to move fast and use their funding to have someone else solve it. I work somewhere where we use Azure search and honestly I don’t think anyone has event talked about it for years. Just sits there doing it’s thing! And not having to maintain that and getting feature enhancements automatically (double edge sword yes) is good. You made that same point though, we are an existing search customer. But I am sure if we were building again we wouldn’t roll out own. The problem Algolia will have is the immense competition. Why leave your fuzzy favourite cloud for search?
With a little more effort towards scalability, and with some input from a devops co-founder, you'd be on your way to launching a state-of-the-art, market-competitive search engine.
My employer won't be too happy seeing me post this, but I don't care, this needs to be said.