More

osmarks · 2025-06-12T18:56:32 1749754592

I was briefly looking into using SMT for Minecraft autocrafting, but it turns out you can do integer linear programming and the mapping is easier.

osmarks · 2025-04-26T20:44:24 1745700264

Also, I don't use ChatGPT to rewrite blog posts and don't like people who do. Its style is annoying and if ChatGPT is doing content I might as well ask it whatever you asked it myself directly. For code I do not care much so long as it works.

osmarks · 2025-04-26T20:41:35 1745700095

Artists correctly realized the threat to their future economic viability and made up reasons it was morally bad. Programmers are currently stuck in an earlier stage, insistent that it can never replace them because [various things].

happytoexplain · 2025-04-26T20:45:05 1745700305

Destroying a profession without a plan to help those displaced is morally bad. It's also inevitable. The most obvious mistake of everybody on both sides of AI arguments is denying the fact that something can be used for both good and bad, will have devastating effects and yet is an advancement that must happen, etc. This isn't cognitive dissonance - it's reality.

NoOn3 · 2025-04-26T23:40:27 1745710827

Advanced programmers don't use AI assistants because it doesn't help them write complex new code. This is mostly good for completing only well-known and more or less simple tasks. And I hope that such people understand the danger posed by "artificial intelligence" in many cases and in many forms.

osmarks · 2025-04-27T08:31:12 1745742672

This is sort of true currently, but extrapolate the trend.

osmarks · 2025-03-23T09:01:50 1742720510

You could just run a local LLM over every document and ask it "is this related to this query". I don't think you actually want to wait a week (and holding all the documents you might ever want to search would run to petabytes).

(the reasonable way is embedding search, which runs much faster with some precomputation, but you still have to store things)

amelius · 2025-03-23T13:54:37 1742738077

A better way would be to ask the LLM to generate keywords (or queries). And then use old school techniques to find a set of documents, and then filter those using another LLM.

brookst · 2025-03-23T14:30:43 1742740243

How is that better than embeddings? You’re using embeddings to get a finite list of keywords, throwing out the extra benefits of embeddings (support for every human language, for instance), using a conventional index, and then going back to embeddings space for the final LLM?

That whole thing can be simplified to: compute and store embeddings for docs, compute embeddings for query, find most similar docs.

amelius · 2025-03-23T14:44:17 1742741057

Yes, you can do the "old school search" part with embeddings.

brookst · 2025-03-23T16:13:21 1742746401

Ah, I had interpreted “old school search” to mean classic text indexing and Boolean style search. I’d argue that if it’s using embeddings and cosine similarity, it’s not old school. But that’s just semantics.

osmarks · 2025-03-23T20:19:11 1742761151

https://arxiv.org/abs/2212.10496

kortilla · 2025-03-23T09:57:50 1742723870

The entire library of Congress is like 10TB. You don’t need anything near petabytes until you get out of text into rich media.

osmarks · 2025-03-23T10:13:21 1742724801

Common Crawl is petabytes. Anna's Archive is about a petabyte, but it includes PDFs with images.

osmarks · 2025-03-10T08:48:26 1741596506

There is at least one organization doing actual embedding-based search (Exa). I wrote about this a bit: https://docs.osmarks.net/hypha/osmarks.net_web_search_plan_%....

Palmik · 2025-03-10T21:08:23 1741640903

Yes, I know Exa and have used them in the past for some side projects. Great product.

But that's still an index. Google also internally uses combination of traditional and sematic/embedding based indices.

osmarks · on Dec 24, 2024

Most of these are just an EPYC server platform, some cursed risers and multiple PSUs (though cryptominer server PSU adapters are probably better). See https://nonint.com/2022/05/30/my-deep-learning-rig/ and https://www.mov-axbx.com/wopr/wopr_concept.html.

Keyframe · on Dec 24, 2024

Looks like a fire hazard :)

icelancer · on Dec 25, 2024

WOPR read is the best IMO.

osmarks · on Sept 9, 2024

They couldn't have built it on anything but UDP because the world is now filled with poorly designed firewall/NAT middleboxes which will not route things other than TCP, UDP and optimistically ICMP.

osmarks · on July 31, 2024

CommonMark mostly fixes this.

xigoi · on July 31, 2024

At the expense of being extremely complex.

osmarks · on July 31, 2024

Preserving the semantic content is helpful if you think you might want to switch the rendering later.

osmarks · on July 31, 2024

I solve this for my usecases with custom Markdown rendering which accepts a few new block elements (via a markdown-it plugin). https://github.com/osmarks/website/blob/master/src/index.js