I feel like the "that's just a few coffees" metric is getting out of hand. By this metric, my current work laptop, purchased used from a local used reseller, was "a few coffees".
Also, I'm surprised how often on here I see people argue about price differences that are literally as I spend on entire computers.
I dont have a real special usecase, i just use it whenever i think it will give better results than googling or thinking or i dont feel like getting annoyed by cookie popups.
And i dont think gpt3 was best, but it felt like it actually listened.
Now i tell it: "You did this and this wrong, i specifically told u the exact opposite. Can you please do what i asked you?"
And then it says something like: "Oh yes my bad, you are right and very very smart to have caught that you must be a super genius. I will now do what you asked me"
Does the same wrong thing again. and again and again.
I ask it to fix a mistake, it tells me it fixed it, gives 1:1 the same thing with more errors.
It also feels like it forgets mid convo way faster than it did.
> I ask it to fix a mistake, it tells me it fixed it, gives 1:1 the same thing with more errors.
> It also feels like it forgets mid convo way faster than it did.
Mhh, I don't observe this. Hard to say.
You probably know this already, but be sure to don't reuse a AI conversation with different context (Having a single chat for both cooking and coding is nono). Often starting a new chat is better.
If it forgets what you said it sounds a bit like you use one chat for too long, or you use a too small model (fast, air, haiku, nano etc.)
...you sound like a typical opus-person :P Just use anthropic's flagships if you want good instruction following, focus in long convos, and proper understanding of guidance-when-wrong.
Holy moly, I made a simple coding promt and the amount of reasoning output could fill a small book.
> create a single html file with a voxel car that drives in a circle.
Compared to GLM 4.7 / 5 and kimi 2.5 it took a while. The output was fast, but because it wrote so I had to wait longer. Also output was .. more bare bones compared to others.
That's been my experience as well. Huge amounts of reasoning. The model itself is good but even if you get twice as many tokens as with another model, the added amount of reasoning may make it slower in the end.
It's a massive accelerator for my dumb and small hobby projects. If they take to long I tend to give up and do something else.
Recently I designed and 3d printed a case for a raspberry pi, some encoders and buttons, a touchscreen, just to contol a 500 EUR audio effects paddel (eventide H9)
They official android app was over if the worst apps I ever used. They even blocked paste in the login screen...
Few people have this fx box, and even fewer would need my custom controller for it., build it for an audience of one. But thanks to llms it was not that big of a deal. It allowed me to concentrate on what was fun.
In the thinking section it didn't really register the car and washing the car as being necessary, it solely focused on the efficiency of walking vs driving and the distance.
When most people refer to “GLM” they refer to the mainline model. The difference in scale between GLM 5 and GLM 4.7 Flash is enormous: one runs on acceptably on a phone, the other on $100k+ hardware minimum. While GLM 4.7 Flash is a gift to the local LLM crowd, it is nowhere near as capable as its bigger sibling in use cases beyond typical chat.
What's the use case for Zai/GLM? I'm currently on Claude Pro, and the Zai looks about 50% more expensive after the first 3 months and according to their chart GLM 4.7 is not quite as capable as Opus 4.5?
I'm looking to save on costs because I use it so infrequently, but PAYG seems like it'd cost me more in a single session per month than the monthly cost plan.
If you pay for the whole year, GLM4.7 is only $7/mo for the first year. And until a few days ago, they had a fantastic deal that ran for almost 2 months where it was less than $3/mo for the first year. I grabbed it, and have been using it exclusively for personal coding since. It's good enough for me.
The other claimed benefit is a higher quota of tokens.
It's cheap :) It seems they stopped it now, but for the last 2 month you could buy the lite plan for a whole year for under 30 USD, while claude is ~19 USD per month. I bought 3 month for ~9 USD.
I use it for hobby projects. Casual coding with Open Code.
If price is not important Opus / Codex are just plain better.
> The Lite / Pro plan currently does not include GLM-5 quota (we will gradually expand the scope and strive to enable more users to experience and use GLM-5). If you call GLM-5 under the plan endpoints, an error will be returned. If you still wish to experience GLM-5 at this stage and are willing to pay according to the Pricing, you can call it through the General API endpoint (i.e., https://api.z.ai/api/paas/v4/chat/completions), with the deduction priority being [Platform Credits - Account Balance] in sequence.
Cool, thanks. Did you try it out, how's the performance? I saw on openrouter that the stealth model was served at ~19t/s. Is it any better on their endpoints?
When someone says he drank a few coffees, I would never have guessed it was 32.
reply