Hacker Newsnew | past | comments | ask | show | jobs | submit | more vimgrinder's commentslogin

For someone it might help: If you are having trouble reading long articles, try text-to-audio with line highlight. It helps a lot. It has cured my lack of attention.


No trouble reading the article. Those slides though. Make my eyes hurt :(


they were constantly referred too in the text :/ impossible to skip


with all the AI stuff going around, can't github just scan repos for such malicious code?


very excited for this. my current fav model on my mac mini for text processing is gemma 9b + gemma 2b combo spec decoding. great times to have all this getting drop left and right.


The first lecture is so good. Not only from perspective of content, but how Zhao explain things about how to think about learning as a student. ty for recommendation.


why don't they just block the obs project and let users install it in unofficial manner while removing themselves as middleman? I mean, they have certain let's say guidelines but why go about enforcing them in this weird manner.


So much respect for this guy. He is like Neo of the matrix, bridging the gap between humans and machines. I have so far learned the following for free from his repos/videos:

1. minGPT, nanoGPT (transformers)

2. NLP (make more series)

3. tokenizers (his youtube)

4. RNN (from his blog)

There are many domains which don't have a karpathy and we don't hear about them. So glad we have this guy to spread his intuitions on ML.


There are also different styles of teaching and learning. Karpathy always like to start from first principles and increment the building blocks.

Whereas for example Jeremy Howard's style resonates a lot more with how I enjoy learning, very much a "let's build it" and then tinker around to gain intuition on how things inside the box are working.

I see the benefit in both approaches and perhaps Karpathy is more methodical and robust. But I just find Howard's top-down style a lot easier to stay motivated with when I am learning on my own time.


Definitely agree. I think a lot of people get hung up on the math in ML but honestly there are so many other things you could spend time on, and there are opportunity costs for everything.

So I say, build the thing, figure out where the shortcomings in your knowledge are, and continue refining. One of those things will inevitably be math. Maybe it will be signals processing the next week or fundamentals of Spark the next. And there are always interesting papers coming out.


I definitely avoided ML for years just due to math. But having a chatbot who can explain math with examples in any style you want defintely changed my opinion about math and ML in general. A big barrier to math is how it's written imo and not explained in a fun way with lot of examples. I certainly don't have a mathy brain, but I do get things when explained with examples (and certainly find it hard to come up with my own examples while fighting with the symbols).


Will checkout jeremy's lectures. I actually use his fastbook notebooks a lot to self-study.

Karpathy's style, for me is more like at the right abstraction to bring out curiosity in me towards the subject. After watching his lectures, i go on to more materials generally, and never really stop there.


"Neo of the matrix", what great analogy! Made my day and gave me a good laugh, thanks.

He is for sure a cool guy.


he has earned it haha.


5. How to solve a Rubick’s cube


saw that video just now, thanks for this.


This is cool, and timely (I wanted a neat repo like that).

I have also been working from last 2 weeks on a gpt implementation in C. Eventually it turned out to be really slow (without CUDA). But it taught me how much memory management and data management there is when implementing these systems. You are running like a loop billions of times so you need to preallocate the computational graph and stuff. If anyone wanna check out it's ~1500 LOC single file:

https://github.com/attentionmech/gpt.c/blob/main/gpt.c


I love this paradigm of reasoning by one model and actual work by another. This opens up avenues of specialization and then eventually smaller plays working on more niche things.


Most people I talked with don't grasp how big of an event this is. I consider is almost as similar to as what early version of linux did to OS ecosystem.


Agreed: Worked on a tough problem in philosophy last night with DeepSeek on which I have previously worked with Claude. DeepSeek was at least as good and I found the output format better. I also did not need to provide a “pre-prompt” as I do with Claude.

And free use and FOSS.

Yep, game changer that opens the floodgates.


I never tried the $200 a month subscription but it just solved a problem for me that neither o1 or claude was able to solve and did it for free. I like everything about it better.

All I can think is "Wait, this is completely insane!"


Something off about this comment and the account it belongs to being 7 days old. Please post the problem/prompt you used so it can be cross checked.


That is probably because they did not try the model yet. I tried and was stunned. It's not better yet in all areas, but where is better, is so much better than Claude or anything from OpenAI.


Agreed. It's worse than competitors at code completion/fill the blanks/"coding_completion" (it's introduced bugs in functions it didn't need to modify), and language, but is stellar elsewhere:

- excellent and very detailled answer for highly technical info searches, like "Is there a C++ proposal to make std::format constexpr?"

- excellent at logic where it gets the answer correct on the first try: "Alice has 2 sisters and 1 brother. How many sisters does Alice's brother have?"

- excellent in reverse-engineering (prompt looked like: "here's a bunch of Arm ASM with reg names, the reg correspond to this particular datasheet, explain"


Plus, the speed at which it replies is amazing too. Claude/Chatgpt now seem like inefficient inference engines compared to it.


I've been trying through openrouter today and it seems quite slow, but this may just be a scaling thing. I tried the chat when it first came out and that was extremely fast.


Yea, they might be scaling is harder or may be more tricks up their sleeves when it comes to serving the model.


Precisely. This lets any of us have something that until the other day would have cost hundreds of millions of dollars. It's as if Linus had published linux 2.0, gcc, binutils, libc, etc. all on the same day.


people are doing all sort of experiments and reproducing the "emergence"(sorry it's not the right word) of backtracking; it's all so fun to watch.


If you check failure section of their paper, they also tried other methods like MCTS and PRM which is what other labs have been obsessing about but couldn't move on from (that includes bigshots). Only team which I am aware which tried verifiable rewards is tulu but they didn't scaled it up and just left it there.

This sort of thing imo is similar to what openAI did with transformer architecture i.e. google invented it but couldn't scale it in the right direction and deepmind got busy with atari games. They had all the pieces still openai could do it. It seems to be it comes down to research leadership in what methods to choose to invest in. But yeah, the budgets big labs have, they can easily try 10 different techniques and brute force it all but seems like they are too opinionated in methods and less urgent on outcomes.

[paper] https://arxiv.org/pdf/2501.12948 [tulu] https://x.com/hamishivi/status/1881394117810500004


I found the following thread more insightful than my original comment (wish I could edit that one). A research explains why RL didn't work before this: https://x.com/its_dibya/status/1883595705736163727


Related: https://twitter.com/voooooogel/status/1884089601901683088#m

Also https://epoch.ai/gradient-updates/how-has-deepseek-improved-... has a summary of all the architectural improvements DeepSeek made to increase performance.


That's interesting. I suppose it could even be possible to test his theories. Just applied the exact same training methodology to smaller models or slightly easier problems and study what happens.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: