More

lgessler · 2025-08-30T17:15:26 1756574126

Novels are fictional too. So long as they're not taken too literally, archetypes can be helpful mental prompts.

lgessler · 2025-08-20T21:11:06 1755724266

If you're really just doing traditional NER (identifying non-overlapping spans of tokens which refer to named entities) then you're probably better off using encoder-only (e.g. https://huggingface.co/dslim/bert-large-NER) or encoder-decoder (e.g. https://huggingface.co/dbmdz/t5-base-conll03-english) models. These models aren't making headlines anymore because they're not decoder-only, but for established NLP tasks like this which don't involve generation, I think there's still a place for them, and I'd assume that at equal parameter counts they quite significantly outperform decoder-only models at NER, depending on the nature of the dataset.

lgessler · 2025-08-13T11:36:28 1755084988

I recommend having a look at 16.3 onward here if you're curious about this: https://web.stanford.edu/~jurafsky/slp3/16.pdf

I'm not familiar with Whisper in particular, but typically what happens in an ASR model is that the decoder, speaking loosely, sees "the future" (i.e. the audio after the chunk it's trying to decode) in a sentence like this, and also has the benefit of a language model guiding its decoding so that grammatical productions like "I like ice cream" are favored over "I like I scream".

lgessler · 2025-07-11T01:10:18 1752196218

In my (poor) understanding, this can depend on hardware details. What are you running your models on? I haven't paid close attention to this with LLMs, but I've tried very hard to get non-deterministic behavior out of my training runs for other kinds of transformer models and was never able to on my 2080, 4090, or an A100. PyTorch docs have a note saying that in general it's impossible: https://docs.pytorch.org/docs/stable/notes/randomness.html

Inference on a generic LLM may not be subject to these non-determinisms even on a GPU though, idk

msgodel · 2025-07-11T15:39:03 1752248343

Ah. I've typically avoided CUDA except for a couple of really big jobs so I haven't noticed this.

lgessler · 2025-04-10T02:44:21 1744253061

Sure, this is a common sentiment, and one that works for some courses. But for others (introductory programming, say) I have a really hard time imagining an assignment that could not be one-shot by an LLM. What can someone with 2 weeks of Python experience do that an LLM couldn't? The other issue is that LLMs are, for now, periodically increasing in their capabilities, so it's anyone's guess whether this is actually a sustainable attitude on the scale of years.

lgessler · 2025-04-09T23:19:58 1744240798

I'm a professor at an R1 university teaching mostly graduate-level courses with substantive Python programming components.

On the one hand, I've caught some students red handed (ChatGPT generated their exact solution and they were utterly unable to explain the advanced Python that was in their solution) and had to award them 0s for assignments, which was heartbreaking. On the other, I was pleasantly surprised to find that most of my students are not using AI to generate wholesale their submissions for programming assignments--or at least, if they're doing so, they're putting in enough work to make it hard for me to tell, which is still something I'd count as work which gets them to think about code.

There is the more difficult matter, however, of using AI to work through small-scale problems, debug, or explain. On the view that it's kind of analogous to using StackOverflow, this semester I tried a generative AI policy where I give a high-level directive: you may use LLMs to debug or critique your code, but not to write new code. My motivation was that students are going to be using this tech anyway, so I might as well ask them to do it in a way that's as constructive for their learning process as possible. (And I explained exactly this motivation when introducing the policy, hoping that they would be invested enough in their own learning process to hear me.) While I still do end up getting code turned in that is "student-grade" enough that I'm fairly sure an LLM couldn't have generated it directly, I do wonder what the reality of how they really use these models is. And even if they followed the policy perfectly, it's unclear to me whether the learning experience was degraded by always having an easy and correct answer to any problem just a browser tab away.

Looking to the future, I admit I'm still a bit of an AI doomer when it comes to what it's going to do to the median person's cognitive faculties. The most able LLM users engage with them in a way that enhances rather than diminishes their unaided mind. But from what I've seen, the more average user tends to want to outsource thinking to the LLM in order to expend as little mental energy as possible. Will AI be so good in 10 years that most people won't need to really understand code with their unaided mind anymore? Maybe, I don't know. But in the short term I know it's very important, and I don't see how students can develop that skill if they're using LLMs as a constant crutch. I've often wondered if this is like what happened when writing was introduced, and capacity for memorization diminished as it became no longer necessary to memorize epic poetry and so on.

I typically have term projects as the centerpiece of the student's grade in my courses, but next year I think I'm going to start administering in-person midterms, as I fear that students might never internalize fundamentals otherwise.

fn-mote · 2025-04-10T02:03:03 1744250583

> had to award them 0s for assignments, which was heartbreaking

You should feel nothing. They knew they were cheating. They didn't give a crap about you.

Frankly, I would love to have people failing assignments they can't explain even if they did NOT use "AI" to cheat on them. We don't need more meaningless degrees. Make the grades and the degrees mean something, somehow.

globnomulous · 2025-04-10T11:46:51 1744285611

> > had to award them 0s for assignments, which was heartbreaking

> You should feel nothing. They knew they were cheating. They didn't give a crap about you.

Most of us (a) don't feel our students owe us anything personally and (b) want our students to succeed. So it's upsetting to see students pluck the low-hanging, easily picked fruit of cheating via LLMs. If cheating were harder, some of these students wouldn't cheat. Some certainly would. Others would do poorly.

But regardless, failing a student and citing students for plagiarism feel bad, even though basically all of us would agree on the importance and value of upholding standards and enforcing principles of honesty and integrity.

lgessler · on Jan 11, 2025

> Fermentation is completely stopped in a regular fridge, you need higher temperature for fermentation.

My understanding as a hobbyist brewer and fermenter is that this is not true. Fermentation is greatly slowed at lower temperatures, but you should have things happening above freezing. Lager beers, for example, go from pure sugar to beer at 35F. And kimchi matures at fridge temperatures in ways that I'm pretty sure are caused by fermentation.

lgessler · on Jan 4, 2025

I think this is both a harmful and irrational attitude. Why focus on some trivial mechanical errors and disparage the authors for it instead of the thing that is much more important, i.e., the substance of the work? And in dismissing work for such trivial reasons, you risk ignoring things you might have otherwise found interesting.

In an ideal world would second-language speakers of English proofread assiduously? Of course, yes. But time is finite, and in cases like this, so long as a threshold of comprehensibility is cleared, I always give the benefit of the doubt to the authors and surmise that they spent their limited resources focusing on what's more important. (I'd have a much different opinion if this were marketing copy instead of a research paper, of course.)

dTal · on Jan 4, 2025

>in dismissing work for such trivial reasons, you risk ignoring things you might have otherwise found interesting

Not dismissing work for trivially avoidable mistakes risks wasting your precious, limited lifespan investing effort into nonsense. These signals are useful and important. If they couldn't be bothered to proofread, what else couldn't they be bothered to do?

>spent their limited resources focusing on what's more important

Showing that you give a crap is important, and it takes seconds to run through a spell checker.

HarHarVeryFunny · on Jan 4, 2025

Well, it not exactly a research paper, more an overview of the problem and suggested techniques, but it'd still be interesting to hear some criticism based on the content rather than the (admittedly odd) omission to run it through a spell checker. I do wonder why it was written in English, apparently targeting a western audience.

Two of the authors are from "Shanghai AI Labs" rather than students, so one might hope it had at least been proofread and passed some sort of muster.

lgessler · on Dec 27, 2024

With all respect and love to the OP, I must admit that I laughed out loud when I saw the AWS architectural diagram and wondered whether this might be a joke. Personally, I'd have implemented this as a few dozen lines of Python living as a cron job (or even as a long-lived process with a schedule), but I'm no pedigreed engineer.

tr97 · on Dec 27, 2024

Fair enough! As mentioned earlier, one reason I used AWS/Terraform is for personal learning. It may not be the most efficient approach, but I built it this way because it was the most enjoyable for me. :)

delduca · on Dec 27, 2024

I do the same on my personal projects. Big over engineering projects for learning purposes :-)

namaria · on Dec 29, 2024

If you're using Terraform on AWS as a learning experience I hope you're using a pre-paid card.

jedberg · on Dec 27, 2024

> With all respect and love to the OP, I must admit that I laughed out loud when I saw the AWS architectural diagram

OP actually did it more efficiently than most! You should see the AWS suggested architecture. It uses something like 10 different AWS services.

My company actually set out to solve this very problem. We have a cloud cron hosting that's more reliable than the AWS architecture but just requires a few lines of code. Literally this is all you have to do:

    @DBOS.scheduled('* * * * *')
    
    @DBOS.workflow()
    def example_scheduled_workflow(scheduled_time: datetime, actual_time: datetime):
      DBOS.logger.info("I am a workflow scheduled to run once a minute.")

https://github.com/dbos-inc/dbos-demo-apps/blob/main/python/...

huijzer · on Dec 27, 2024

I think this is where Cloudflare shines. They just focussed on the essentials with Workers (“serverless”) at the core of everything instead of VPS at the core of everything.

jedberg · on Dec 27, 2024

Yes, DBOS has a similar philosophy. Strip away all the hard and annoying parts, let you just code. Our other philosophy is "just do it in Postgres". :)

FWIW you can't really do the same thing on Cloudflare workers -- their crons are "best effort", and you'd still need to get storage somewhere else. With DBOS the storage is built right in.

piperswe · on Dec 27, 2024

Cloudflare Durable Objects have alarms you can use to imitate cron, and have storage built-in (there's even support for SQLite databases attached to DOs in beta)

QuinnyPig · on Dec 27, 2024

You’re not kidding about AWS’s own architecture diagrams.

IanCal · on Dec 27, 2024

Although if you drew that out you'd have about the same.

Cron trigger.

Process.

Gpt API.

Database for persistence.

Email sender.

Which part of that wouldn't you have?

singron · on Dec 27, 2024

This is a great fit for Google AppScript.

behnamoh · on Dec 27, 2024

Who likes to learn a niche scripting language that only works on one platform?

singron · on Dec 29, 2024

It's ordinary javascript. It interacts with many google services including gmail without having to actually maintain servers or setup authentication. It's perfect for small little glue tasks like sending yourself emails or anything that interacts with Sheets. You wouldn't use it if you weren't trying to use a Google service.

lgessler · on Dec 3, 2024

I think GP is saying that just lines on the same plot would have been less deceptive, whereas the plot that is actually there has fake data points. Readers are used to the former being purely imaginative, whereas the latter almost always means there's real measurement involved.