More

thesurlydev · 2026-01-30T19:23:00 1769800980

Can you share how you're running it?

eknkc · 2026-01-30T20:05:10 1769803510

I've been using it with opencode. You can either use your kimi code subscription (flat fee), moonshot.ai api key (per token) or openrouter to access it. OpenCode works beautifully with the model.

Edit: as a side note, I only installed opencode to try this model and I gotta say it is pretty good. Did not think it'd be as good as claude code but its just fine. Been using it with codex too.

Imustaskforhelp · 2026-01-30T20:20:04 1769804404

I tried to use opencode for kimi k2.5 too but recently they changed their pricing from 200 tool requests/5 hour to token based pricing.

I can only speak from the tool request based but for some reason anecdotally opencode took like 10 requests in like 3-4 minutes where Kimi cli took 2-3

So I personally like/stick with the kimi cli for kimi coding. I haven't tested it out again with OpenAI with teh new token based pricing but I do think that opencode might add more token issue.

Kimi Cli's pretty good too imo. You should check it out!

https://github.com/MoonshotAI/kimi-cli

nl · 2026-01-30T23:17:40 1769815060

I like Kimi-cli but it does leak memory.

I was using it for multi-hour tasks scripted via an self-written orchestrator on a small VM and ended up switching away from it because it would run slower and slower over time.

zeroxfe · 2026-01-30T20:05:18 1769803518

Running it via https://platform.moonshot.ai -- using OpenCode. They have super cheap monthly plans at kimi.com too, but I'm not using it because I already have codex and claude monthly plans.

esafak · 2026-01-30T20:58:13 1769806693

Where? https://www.kimi.com/code starts at $19/month, which is same as the big boys.

UncleOxidant · 2026-01-30T20:26:22 1769804782

so there's a free plan at moonshot.ai that gives you some number of tokens without paying?

JumpCrisscross · 2026-01-31T00:33:01 1769819581

> Can you share how you're running it?

Not OP, but I've been running it through Kagi [1]. Their AI offering is probably the best-kept secret in the market.

[1] https://help.kagi.com/kagi/ai/assistant.html

deaux · 2026-01-31T04:26:38 1769833598

Doesn't list Kimi 2.5 and seems to be chat-only, not API, correct?

lejalv · 2026-01-31T13:26:41 1769866001

> Doesn't list Kimi 2.5 and seems to be chat-only, not API, correct?

Yes, it is chat only, but that list is out of date - Kimi 2.5 (with or without reasoning) is available, as are ChatGPT 5.2, Gemini 3 Pro (Preview), etc

explorigin · 2026-01-30T19:34:01 1769801641

https://unsloth.ai/docs/models/kimi-k2.5

Requirements are listed.

KolmogorovComp · 2026-01-30T20:10:05 1769803805

To save everyone a click

> The 1.8-bit (UD-TQ1_0) quant will run on a single 24GB GPU if you offload all MoE layers to system RAM (or a fast SSD). With ~256GB RAM, expect ~10 tokens/s. The full Kimi K2.5 model is 630GB and typically requires at least 4× H200 GPUs. If the model fits, you will get >40 tokens/s when using a B200. To run the model in near full precision, you can use the 4-bit or 5-bit quants. You can use any higher just to be safe. For strong performance, aim for >240GB of unified memory (or combined RAM+VRAM) to reach 10+ tokens/s. If you’re below that, it'll work but speed will drop (llama.cpp can still run via mmap/disk offload) and may fall from ~10 tokens/s to <2 token/s. We recommend UD-Q2_K_XL (375GB) as a good size/quality balance. Best rule of thumb: RAM+VRAM ≈ the quant size; otherwise it’ll still work, just slower due to offloading.

Gracana · 2026-01-30T20:21:38 1769804498

I'm running the Q4_K_M quant on a xeon with 7x A4000s and I'm getting about 8 tok/s with small context (16k). I need to do more tuning, I think I can get more out of it, but it's never gonna be fast on this suboptimal machine.

segmondy · 2026-01-30T21:41:03 1769809263

you can add 1 more GPU so you can take advantage of tensor parallel. I get the same speed with 5 3090's with most of the model on 2400mhz ddr4 ram, 8.5tk almost constant. I don't really do agents but chat, and it holds up to 64k.

Gracana · 2026-01-30T21:58:52 1769810332

That is a very good point and I would love to do it, but I built this machine in a desktop case and the motherboard has seven slots. I did a custom water cooling manifold just to make it work with all the cards.

I'm trying to figure out how to add another card on a riser hanging off a slimsas port, or maybe I could turn the bottom slot into two vertical slots.. the case (fractal meshify 2 xl) has room for a vertical mounted card that wouldn't interfere with the others, but I'd need to make a custom riser with two slots on it to make it work. I dunno, it's possible!

I also have an RTX Pro 6000 Blackwell and an RTX 5000 Ada.. I'd be better off pulling all the A7000s and throwing both of those cards in this machine, but then I wouldn't have anything for my desktop. Decisions, decisions!

esafak · 2026-01-30T21:01:31 1769806891

The pitiful state of GPUs. $10K for a sloth with no memory.

indigodaddy · 2026-01-31T01:28:10 1769822890

Been using K2.5 Thinking via Nano-GPT subscription and `nanocode run` and it's working quite nicely. No issues with Tool Calling so far.

gigatexal · 2026-01-30T19:25:34 1769801134

Yeah I too am curious. Because Claude code is so good and the ecosystem so just it works that I’m Willing to pay them.

Imustaskforhelp · 2026-01-30T20:22:49 1769804569

I tried kimi k2.5 and first I didn't really like it. I was critical of it but then I started liking it. Also, the model has kind of replaced how I use chatgpt too & I really love kimi 2.5 the most right now (although gemini models come close too)

To be honest, I do feel like kimi k2.5 is the best open source model. It's not the best model itself right now tho but its really price performant and for many use cases might be nice depending on it.

It might not be the completely SOTA that people say but it comes pretty close and its open source and I trust the open source part because I feel like other providers can also run it and just about a lot of other things too (also considering that iirc chatgpt recently slashed some old models)

I really appreciate kimi for still open sourcing their complete SOTA and then releasing some research papers on top of them unlike Qwen which has closed source its complete SOTA.

Thank you Kimi!

epolanski · 2026-01-30T20:05:21 1769803521

You can plug another model in place of Anthropic ones in Claude Code.

zeroxfe · 2026-01-30T20:09:33 1769803773

That tends to work quite poorly because Claude Code does not use standard completions APIs. I tried it with Kimi, using litellm[proxy], and it failed in too many places.

xxr3376 · 2026-01-31T16:12:11 1769875931

You can try Kimi's Anthropic-compatible API.

Just connect Claude Code to Kimi's API endpoint and everything works well

https://www.kimi.com/code/docs/en/more/third-party-agents.ht...

AnonymousPlanet · 2026-01-30T20:59:27 1769806767

It worked very well for me using qwen3 coder behind a litellm. Most other models just fail in weird ways though.

samtheprogram · 2026-01-30T21:07:40 1769807260

opencode is a good alternative that doesnt flake out in this way.

miroljub · 2026-01-30T22:15:27 1769811327

If you don't use Antrophic models there's no reason to use Claude Code at all. Opencode gives so much more choice.

thesurlydev · 2026-01-26T22:26:46 1769466406

For the signal to noise reason, I start with Claude Code reviewing a PR. Then I selectively choose what I want to bubble up to the actual review. Often times, there's additional context not available to the model or it's just nit picky.

just6979 · 2026-01-28T14:57:35 1769612255

Wait, so you have the LLM review, then you review (selectively choose) the proposed review, then you (or the LLM?) review the reviewed review? But often times (so, the majority?) the initial LLM review is useless, so you're reviewing reviews that won't pass review...

Sounds incredibly pointless. But at least you're spending those tokens your boss was forced to buy so the board can tell the investors that they've jumped on the bandwagon, hooray!

thesurlydev · 2026-01-15T23:56:36 1768521396

Pretty cool and related to another path of work I'm following from Steve Yegge: https://medium.com/@steve-yegge/welcome-to-gas-town-4f25ee16...

thesurlydev · 2026-01-05T23:52:03 1767657123

Supabase seems to be killing it. I read somewhere they are used by ~70% of YCombinator startups. I wonder how many of those eventually move to self-hosted.

thesurlydev · 2026-01-05T23:49:56 1767656996

I had a lot of fun reading the articles about Gas Town although I started to lose track of the odd naming. Only odd because they make sense to Steve and others who have seen the Mad Max, Water World movies.

I promptly gave Claude the text to the articles and had him rewrite using idiomatic distributed systems naming.

Fun times!

fragmede · 2026-01-06T00:51:28 1767660688

Care to share that with the rest of the class? I'd love to hear what those idiomatic distributed systems namings are!

dgunay · 2026-01-06T01:13:44 1767662024

Ran it through ChatGPT:

  Town            = Central orchestrator / control plane
  Rig             = Project or workspace namespace
  Polecat         = Ephemeral worker job
  Refinery        = Merge queue manager
  Witness         = Worker health monitor
  Crew            = Persistent worker pool
  Beads           = Persistent work items / tasks
  Hooks           = Work queues / task slots
  GUPP            = Work processing guarantee
  Molecules/Wisps = Structured, persistent workflows
  Convoys         = Grouped feature work units

https://chatgpt.com/share/695c6216-e7a4-800d-b83d-fc1a22fd8a...

fragmede · 2026-01-06T01:39:10 1767663550

Thank you! Is this the future? Everyone gets to have their own cutesy translation of everything? If I want "kubectl apply" to have a Tron theme, while my coworker wants a Disney theme. Is the runbook going to be in Klingon if I'm fluent in that?

dgunay · 2026-01-09T23:53:15 1768002795

I hope not. Homebrew is a great example of why boring tools shouldn't invent quirky terminology.

thesurlydev · 2026-01-04T20:13:05 1767557585

Before I clicked on this I was optimistic and thought this was going to be about how we've turned a corner and the web stack pendulum is now swinging back to the easier days before frontend frameworks.

thesurlydev · 2025-12-17T17:00:13 1765990813

Same! Right there with "every day must begin with coffee"

thesurlydev · 2025-12-15T01:49:34 1765763374

A web app platform written in Rust with the primary focus on zero-dependency apps and using Pingora as a forward and reverse proxy. Targeting Hetzner for hosting and Cloudflare for DNS. I love Rust but don’t like the long compile times which led me down this rabbit hole (zero dependencies make for fast compiles).

thesurlydev · 2025-12-08T17:43:14 1765215794

For a while, the O'Reilly subscription was included in the $99/yr ACM membership. Then they stopped offering O'Reilly for a bit. Then they brought it back as part of the $75 skills add-on.

I feel like this is a little known secret (discount via ACM) that more folks should know about. Hopefully this post helps spread the word.

thesurlydev · 2025-12-03T15:43:24 1764776604

Immediately read this as "prostate" and proceeded to spit out my coffee. Carry on