Hacker Newsnew | past | comments | ask | show | jobs | submit | osmarks's commentslogin

I was briefly looking into using SMT for Minecraft autocrafting, but it turns out you can do integer linear programming and the mapping is easier.


Also, I don't use ChatGPT to rewrite blog posts and don't like people who do. Its style is annoying and if ChatGPT is doing content I might as well ask it whatever you asked it myself directly. For code I do not care much so long as it works.


Artists correctly realized the threat to their future economic viability and made up reasons it was morally bad. Programmers are currently stuck in an earlier stage, insistent that it can never replace them because [various things].


Destroying a profession without a plan to help those displaced is morally bad. It's also inevitable. The most obvious mistake of everybody on both sides of AI arguments is denying the fact that something can be used for both good and bad, will have devastating effects and yet is an advancement that must happen, etc. This isn't cognitive dissonance - it's reality.


Advanced programmers don't use AI assistants because it doesn't help them write complex new code. This is mostly good for completing only well-known and more or less simple tasks. And I hope that such people understand the danger posed by "artificial intelligence" in many cases and in many forms.


This is sort of true currently, but extrapolate the trend.


You could just run a local LLM over every document and ask it "is this related to this query". I don't think you actually want to wait a week (and holding all the documents you might ever want to search would run to petabytes).

(the reasonable way is embedding search, which runs much faster with some precomputation, but you still have to store things)


A better way would be to ask the LLM to generate keywords (or queries). And then use old school techniques to find a set of documents, and then filter those using another LLM.


How is that better than embeddings? You’re using embeddings to get a finite list of keywords, throwing out the extra benefits of embeddings (support for every human language, for instance), using a conventional index, and then going back to embeddings space for the final LLM?

That whole thing can be simplified to: compute and store embeddings for docs, compute embeddings for query, find most similar docs.


Yes, you can do the "old school search" part with embeddings.


Ah, I had interpreted “old school search” to mean classic text indexing and Boolean style search. I’d argue that if it’s using embeddings and cosine similarity, it’s not old school. But that’s just semantics.



The entire library of Congress is like 10TB. You don’t need anything near petabytes until you get out of text into rich media.


Common Crawl is petabytes. Anna's Archive is about a petabyte, but it includes PDFs with images.


There is at least one organization doing actual embedding-based search (Exa). I wrote about this a bit: https://docs.osmarks.net/hypha/osmarks.net_web_search_plan_%....


Yes, I know Exa and have used them in the past for some side projects. Great product.

But that's still an index. Google also internally uses combination of traditional and sematic/embedding based indices.


Most of these are just an EPYC server platform, some cursed risers and multiple PSUs (though cryptominer server PSU adapters are probably better). See https://nonint.com/2022/05/30/my-deep-learning-rig/ and https://www.mov-axbx.com/wopr/wopr_concept.html.


Looks like a fire hazard :)


WOPR read is the best IMO.


They couldn't have built it on anything but UDP because the world is now filled with poorly designed firewall/NAT middleboxes which will not route things other than TCP, UDP and optimistically ICMP.


CommonMark mostly fixes this.


At the expense of being extremely complex.


Preserving the semantic content is helpful if you think you might want to switch the rendering later.


I solve this for my usecases with custom Markdown rendering which accepts a few new block elements (via a markdown-it plugin). https://github.com/osmarks/website/blob/master/src/index.js


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: