Something I wonder is, what happened to asm.js? It got killed by WASM. In a way this is good, WASM is a "better" solution, being a formal bytecode machine description, but on the other hand, asm.js would not have the same limitations e.g. with respect to DOM interaction, or debates on how to integrate garbage collection, since you stay squarely in the JS VM you get these things for free.
Basically in some ways it was a superior idea: benefit from the optimizations we are already doing for JS, but define a subset that is a good compilation target and for which we know the JS VM already performs pretty optimally. So apart from defining the subset there is no extra work to do. On the other hand I'm sure there are JS limitations that you inherit. And probably your "binaries" are a bit larger than WASM. (But, I would guess, highly compressible.)
I guess the good news is that you can still use this approach. Just that no one does, because WASM stole the thunder. Again, not sure if this is a good or bad thing, but interesting to think about... for instance, whether we could have gotten to the current state much faster by just fully adopting asm.js instead of diverting resources into a new runtime.
I find it really interesting that it uses a Mamba hybrid with Transformers. Is it the only significant model right now using (at least partially) SSM layers? This must contribute to lower VRAM requirements right? Does it impact how KV caching works?
Maybe what they should do in the future is just automatically provide AI reviews to all papers and state that the work of the reviewers is to correct any problems or fill details that were missed. That would encourage manual review of the AI's work and would also allow authors to predict what kind of feedback they'll get in a structured way. (eg say the standard prompt used was made public so authors could optimize their submission for the initial automatic review, forcing the human reviewer to fill in the gaps)
ok of course the human reviewers could still use AI here but then so could the authors, ad infinitum..
A lot of "generative" work is like this. While you can come up with benchmarks galore, at the end of the day how a model "feels" only seems to come out from actual usage. Just read /r/localllama for opinions on which models are "benchmaxed" as they put it. It seems to be common knowledge in the local LLM community that many models perform well on benchmarks but that doesn't always reflect how good they actually are.
In my case I was until recently working on TTS and this was a huge barrier for us. We used all the common signal quality and MOS-simulation models that judged so called "naturalness" and "expressiveness" etc. But we found that none of these really helped us much in deciding when one model was better than another, or when a model was "good enough" for release. Our internal evaluations correlated poorly with them, and we even disagreed quite a bit within the team on the quality of output. This made hyperparameter tuning as well as commercial planning extremely difficult and we suffered greatly for it. (Notice my use of past tense here..)
Having good metrics is just really key and I'm now at the point where I'd go as far as to say that if good metrics don't exist it's almost not even worth working on something. (Almost.)
I would love to read more, but apart from not finding a lot of time lately, when I do read, it's fiction. Occasionally I have read a textbook on a topic I am really interested in, and I've read blogs and articles on various sciency themes, but when it comes to books, I have just never been very into reading non-fiction. I don't try often, but when I do, I get one or two chapters in and just .. fail to pick it up again.
I know that non-fiction would be "good for me." Particularly reading more in topics I'm less knowledgable about, like finance and business and politics. Personal growth. However, I do find that fiction helps expand my perspective and even, somehow, knowledge, but it's different from non-fiction, less direct. I don't read for that, explicitly, although I do like the effect. But I read because.. I guess, because it's nice for my brain to be somewhere else. I don't know. But non-fiction has never done it for me.. my mind just gets.. bored, I think, trying to absorb what someone else wants me to know. Even when I find the topic interesting.
I guess there are people who like non-fiction and people who like fiction and they often cross-over but I think most people lean one way or the other. I can see there being positives and negatives to either side. People who equally read both must be rare? Or maybe it's just my impression.
I think this depends heavily on which non-fiction, particularly when contrasted with which fiction you're currently reading.
I don't think reading the same self-help books as a bunch of CEO's who see themselves as bold outsiders to the system will actually benefit you; it didn't make them self-aware.
Fiction contains information and ideas; it helps you expand your horizons, and that's generally a good thing. As long as you're not reading a very limited subset of fiction, it will be beneficial.
Reading science fiction has given me ideas that I would have never had before. I can comfortably say that it has expanded my narrow mind. Even pulp space-opera helped here!
Apart from that, taking the time to grok the architecture or top-rated issues of open source projects helps to make you a better developer - or at-least avoid obvious mistakes when coding some new feature of your own.
It is a strange phenomenon though, these walls of text that LLMs output, when you consider that one thing they're really good at is summarization, and that if they are trained on bug report data, you'd think they would reproduce it in terms of style and conciseness.
Is it mainly post-training that causes this behaviour? They seem to do it for everything, like they are really biased towards super verbose output these days. Maybe something to do with reasoning models being trained for longer output?
Basically in some ways it was a superior idea: benefit from the optimizations we are already doing for JS, but define a subset that is a good compilation target and for which we know the JS VM already performs pretty optimally. So apart from defining the subset there is no extra work to do. On the other hand I'm sure there are JS limitations that you inherit. And probably your "binaries" are a bit larger than WASM. (But, I would guess, highly compressible.)
I guess the good news is that you can still use this approach. Just that no one does, because WASM stole the thunder. Again, not sure if this is a good or bad thing, but interesting to think about... for instance, whether we could have gotten to the current state much faster by just fully adopting asm.js instead of diverting resources into a new runtime.
reply