More

vimgrinder · on March 22, 2025

For someone it might help: If you are having trouble reading long articles, try text-to-audio with line highlight. It helps a lot. It has cured my lack of attention.

PeterStuer · on March 22, 2025

No trouble reading the article. Those slides though. Make my eyes hurt :(

vimgrinder · on March 22, 2025

they were constantly referred too in the text :/ impossible to skip

vimgrinder · on March 15, 2025

with all the AI stuff going around, can't github just scan repos for such malicious code?

vimgrinder · on March 12, 2025

very excited for this. my current fav model on my mac mini for text processing is gemma 9b + gemma 2b combo spec decoding. great times to have all this getting drop left and right.

vimgrinder · on March 10, 2025

The first lecture is so good. Not only from perspective of content, but how Zhao explain things about how to think about learning as a student. ty for recommendation.

vimgrinder · on Feb 14, 2025

why don't they just block the obs project and let users install it in unofficial manner while removing themselves as middleman? I mean, they have certain let's say guidelines but why go about enforcing them in this weird manner.

vimgrinder · on Feb 6, 2025

So much respect for this guy. He is like Neo of the matrix, bridging the gap between humans and machines. I have so far learned the following for free from his repos/videos:

1. minGPT, nanoGPT (transformers)

2. NLP (make more series)

3. tokenizers (his youtube)

4. RNN (from his blog)

There are many domains which don't have a karpathy and we don't hear about them. So glad we have this guy to spread his intuitions on ML.

DoingIsLearning · on Feb 6, 2025

There are also different styles of teaching and learning. Karpathy always like to start from first principles and increment the building blocks.

Whereas for example Jeremy Howard's style resonates a lot more with how I enjoy learning, very much a "let's build it" and then tinker around to gain intuition on how things inside the box are working.

I see the benefit in both approaches and perhaps Karpathy is more methodical and robust. But I just find Howard's top-down style a lot easier to stay motivated with when I am learning on my own time.

janalsncm · on Feb 6, 2025

Definitely agree. I think a lot of people get hung up on the math in ML but honestly there are so many other things you could spend time on, and there are opportunity costs for everything.

So I say, build the thing, figure out where the shortcomings in your knowledge are, and continue refining. One of those things will inevitably be math. Maybe it will be signals processing the next week or fundamentals of Spark the next. And there are always interesting papers coming out.

vimgrinder · on Feb 6, 2025

I definitely avoided ML for years just due to math. But having a chatbot who can explain math with examples in any style you want defintely changed my opinion about math and ML in general. A big barrier to math is how it's written imo and not explained in a fun way with lot of examples. I certainly don't have a mathy brain, but I do get things when explained with examples (and certainly find it hard to come up with my own examples while fighting with the symbols).

vimgrinder · on Feb 6, 2025

Will checkout jeremy's lectures. I actually use his fastbook notebooks a lot to self-study.

Karpathy's style, for me is more like at the right abstraction to bring out curiosity in me towards the subject. After watching his lectures, i go on to more materials generally, and never really stop there.

bamboozled · on Feb 6, 2025

"Neo of the matrix", what great analogy! Made my day and gave me a good laugh, thanks.

He is for sure a cool guy.

vimgrinder · on Feb 6, 2025

he has earned it haha.

Yajirobe · on Feb 6, 2025

5. How to solve a Rubick’s cube

vimgrinder · on Feb 8, 2025

saw that video just now, thanks for this.

vimgrinder · on Jan 30, 2025

This is cool, and timely (I wanted a neat repo like that).

I have also been working from last 2 weeks on a gpt implementation in C. Eventually it turned out to be really slow (without CUDA). But it taught me how much memory management and data management there is when implementing these systems. You are running like a loop billions of times so you need to preallocate the computational graph and stuff. If anyone wanna check out it's ~1500 LOC single file:

https://github.com/attentionmech/gpt.c/blob/main/gpt.c

vimgrinder · on Jan 27, 2025

I love this paradigm of reasoning by one model and actual work by another. This opens up avenues of specialization and then eventually smaller plays working on more niche things.

vimgrinder · on Jan 26, 2025

Most people I talked with don't grasp how big of an event this is. I consider is almost as similar to as what early version of linux did to OS ecosystem.

robwwilliams · on Jan 26, 2025

Agreed: Worked on a tough problem in philosophy last night with DeepSeek on which I have previously worked with Claude. DeepSeek was at least as good and I found the output format better. I also did not need to provide a “pre-prompt” as I do with Claude.

And free use and FOSS.

Yep, game changer that opens the floodgates.

dutchbookmaker · on Jan 26, 2025

I never tried the $200 a month subscription but it just solved a problem for me that neither o1 or claude was able to solve and did it for free. I like everything about it better.

All I can think is "Wait, this is completely insane!"

Shocka1 · on Jan 26, 2025

Something off about this comment and the account it belongs to being 7 days old. Please post the problem/prompt you used so it can be cross checked.

belter · on Jan 26, 2025

That is probably because they did not try the model yet. I tried and was stunned. It's not better yet in all areas, but where is better, is so much better than Claude or anything from OpenAI.

TuxSH · on Jan 26, 2025

Agreed. It's worse than competitors at code completion/fill the blanks/"coding_completion" (it's introduced bugs in functions it didn't need to modify), and language, but is stellar elsewhere:

- excellent and very detailled answer for highly technical info searches, like "Is there a C++ proposal to make std::format constexpr?"

- excellent at logic where it gets the answer correct on the first try: "Alice has 2 sisters and 1 brother. How many sisters does Alice's brother have?"

- excellent in reverse-engineering (prompt looked like: "here's a bunch of Arm ASM with reg names, the reg correspond to this particular datasheet, explain"

vimgrinder · on Jan 26, 2025

Plus, the speed at which it replies is amazing too. Claude/Chatgpt now seem like inefficient inference engines compared to it.

IanCal · on Jan 26, 2025

I've been trying through openrouter today and it seems quite slow, but this may just be a scaling thing. I tried the chat when it first came out and that was extremely fast.

vimgrinder · on Jan 26, 2025

Yea, they might be scaling is harder or may be more tricks up their sleeves when it comes to serving the model.

resters · on Jan 26, 2025

Precisely. This lets any of us have something that until the other day would have cost hundreds of millions of dollars. It's as if Linus had published linux 2.0, gcc, binutils, libc, etc. all on the same day.

vimgrinder · on Jan 26, 2025

people are doing all sort of experiments and reproducing the "emergence"(sorry it's not the right word) of backtracking; it's all so fun to watch.

vimgrinder · on Jan 26, 2025

If you check failure section of their paper, they also tried other methods like MCTS and PRM which is what other labs have been obsessing about but couldn't move on from (that includes bigshots). Only team which I am aware which tried verifiable rewards is tulu but they didn't scaled it up and just left it there.

This sort of thing imo is similar to what openAI did with transformer architecture i.e. google invented it but couldn't scale it in the right direction and deepmind got busy with atari games. They had all the pieces still openai could do it. It seems to be it comes down to research leadership in what methods to choose to invest in. But yeah, the budgets big labs have, they can easily try 10 different techniques and brute force it all but seems like they are too opinionated in methods and less urgent on outcomes.

[paper] https://arxiv.org/pdf/2501.12948 [tulu] https://x.com/hamishivi/status/1881394117810500004

vimgrinder · on Jan 27, 2025

I found the following thread more insightful than my original comment (wish I could edit that one). A research explains why RL didn't work before this: https://x.com/its_dibya/status/1883595705736163727

krackers · on Jan 28, 2025

Related: https://twitter.com/voooooogel/status/1884089601901683088#m

Also https://epoch.ai/gradient-updates/how-has-deepseek-improved-... has a summary of all the architectural improvements DeepSeek made to increase performance.

qnleigh · on Jan 29, 2025

That's interesting. I suppose it could even be possible to test his theories. Just applied the exact same training methodology to smaller models or slightly easier problems and study what happens.

vimgrinder · on Jan 30, 2025

people already did: https://x.com/karpathy/status/1884678601704169965