Hacker Newsnew | past | comments | ask | show | jobs | submit | cdiamand's commentslogin

This is something I have found missing in my current workflow when reviewing PR's. Particularly in the age of large AI generated PR's.

I think most reviewers do this to some degree by looking at points of interest. It'd be cool if this could look at your prior reviews and try to learn your style.

Is this the correct commit to look at? https://github.com/manaflow-ai/cmux/commit/661ea617d7b1fd392...


https://github.com/manaflow-ai/cmux/blob/main/apps/www/lib/s...

This file has most of the logic, the commit you linked to has a bunch of other experiments.

> look at your prior reviews and try to learn your style.

We're really interested in this direction too of maybe setting up a DSPy system to automatically fit reviews to your preferences


Thank you. This is a pretty cool feature that is just scratching the surface of a deep need, so keep at it.

Another perspective where this exact feature would be useful is in security review.

For example - there are many static security analyzers that look for patterns, and they're useful when you break a clearly predefined rule that is well known.

However, there are situations that static tools miss, but a highlight tool like this could help bring a reviewer's eyes to a high risk "area". I.e. scrutinize this code more because it deals with user input information and there is the chance of SQL injection here, etc.

I think that would be very useful as well.


This is a very interesting idea that we’ll definitely look into.


Looks pretty neat, and certainly addresses a missing element in the current AI workflow.

Question: What happens to our data - i.e. the code and context sent to your service?


We log basic request metadata (timestamps, model used, token counts). Prompts and messages are not logged unless you explicitly opt-in. We don't store tool results. Note the underlying model provider you use may store data separately depending on your user agreement with them.


Great stuff! You're missing a few bass drum notes in "When the levee breaks"


You can click on the pattern, then click on the link below "Create a copy" and add missing bass drums. It's like forking a drum pattern lol.


Is there a link to the contract somewhere?



The security section is good to see. Thanks for that!


Funny that you mention that. I honestly thought not too many people would care about it.

Though I am by far no security specialist. Please let me know where I can improve the section!


I've used LLM's in this capacity, and it's awesome. It quickly becomes a crutch.


Last year, I built a DM for myself using the OpenAI api and an elevenLabs voice generator. I asked it set my character in Baldur's Gate, so it could pull upon the huge amount of DnD source material it had been trained on.

A few takeaways:

1. An LLM based DM can give the player essentially infinite richness and description on anything they ask for.

2. There is difficulty in setting the rules for the LLM to follow that match the DnD rulebook. But this is possible to solve for. Also, I found the LLM to be too pliable as a DM. I kept getting my way, or getting my hand held through scenarios. Maybe this is a feature?

3. My conversation quickly began to approach the context window for the LLM and some RAG engineering is very necessary to keep the LLM informed about the key parts of your history.

4. Most importantly, I found that I most enjoy the human connection that I get through DnD and an LLM with a voice doesn't really satisfy that.


> Also, I found the LLM to be too pliable as a DM. I kept getting my way, or getting my hand held through scenarios. Maybe this is a feature?

LLMs are fine-tuned to be "helpful assistants", so they're basically sycophantic.


This was my experience too. The short context and the optimism bias make chatgpt the wrong solution.

It starts well and then NPCs become inconsistent and the DM basically lets you craft the story by constantly doing a "yes and".

It becomes boring because the stakes feel so low.


> My conversation quickly began to approach the context window for the LLM and some RAG engineering is very necessary to keep the LLM informed about the key parts of your history

Assuming we're talking about GPT-4o, that 128k context window theoretically corresponds to somewhere around 73,000 words. People talk at around 100 words per minute in conversation, so that would be about 730 minutes of context, or about 12 hours. The Gemini models can do up to 2 million tokens of context... which we could extrapolate to 11,400 minutes of context (190 hours), which might be enough?

I would say GPT-4o was only good up to about 64k tokens the last time I really tested large context stuff, so let's call that 6 hours of context. In my experience, Gemini's massive context windows are actually able to retain a lot of information... it's not like there's only 64k usable or something. Google has some kind of secret sauce there.

One could imagine architecting the app to use Gemini's Context Caching[0] to keep response times low, since it wouldn't need to re-process the entire session for every response. The application would just spin up a new context cache in the background every 10 minutes or so and delete the old one, reducing the amount of recent conversation that would have to be re-processed each time to generate a response.

I've just never seen RAG work particularly well... and fitting everything into the context is very nice by comparison.

But, one alternative to RAG would be a form of context compression... you could give the LLM several tools/functions for managing the context. The LLM would be instructed to use these tools to record (and update) the names and information of different characters, places, and items that the players encounter, important events that have occurred during the game, as well as information about who the current players are and what items and abilities those players have, and then the LLM would be provided with this "memory" in the context in place of a complete conversational record. The LLM would then just receive (for example) the most recent 15 or 30 minutes of conversation, in addition to that memory.

> I found the LLM to be too pliable as a DM.

I haven't tried using an LLM as a DM, but in my experience, GPT-4o is happy to hold its ground on things. This isn't like the GPT-3.5 days where it was a total pushover for anything and everything. I believe the big Gemini models are also stronger than the old models used to be in this regard. Maybe you just need a stricter prompt for the LLM that tells it how to behave?

I also think the new trend of "reasoning" models could be very interesting for use cases like this. The model could try to (privately) develop a more cohesive picture of the situation before responding to new developments. You could already do this to some extent by making multiple calls to the LLM, one for the LLM to "think", and then another for the LLM to provide a response that would actually go to the players.

One could also imagine giving the LLM access to other functions that it could call, such as the ability to play music and sound effects from a pre-defined library of sounds, or to roll the dice using an external random number generator.

> 4. Most importantly, I found that I most enjoy the human connection that I get through DnD and an LLM with a voice doesn't really satisfy that.

Sure, maybe it's not something people actually want... who knows. But, I think it looks pretty fun.[1]

One of the harder things with this would be helping the LLM learn when to speak and when to just let the players talk amongst themselves. A simple solution could just be to have a button that the players can press when they want, which will then trigger the LLM to respond to what's been recently said, but it would be cool to just have a natural flow.

[0]: https://ai.google.dev/gemini-api/docs/caching

[1]: https://www.youtube.com/watch?v=9oBdLUEayGI


I did something similar, but tried to get several agents to play a DnD round together. It basically worked, but was insipid.


I come back to it from time to time and play for a week. It looks like it's been getting more accessible over time, with a new UI update having been rolled out recently.

There's definitely a certain mindset that helps make it more enjoyable. Imagination being key! From the technical side, the developers have peeled back the curtain on how they've created parts of the game:

https://www.youtube.com/watch?v=U03XXzcThGU&ab_channel=Logo%...

https://www.youtube.com/watch?v=jV-DZqdKlnE&ab_channel=GDC20...


Can someone familiar with South Korean politics give us some context for what is happening here?


1. Very low trust in politicians after it was revealed former president Park Geun-hye was secretly a crypto-Christian cult member despite claiming to be atheist.

2. South Korea has a bizarre large-scale "gender war" going on that extents into mainstream society. Imagine the Western online MRA/redpill/incel vs. radfem circles but as core identities in national politics.

For some reason a high-trust society has decided to become an ultra-low-trust society where trust is being eradicated all the way down to the nuclear family.


Re: 2. The "gender war" is greatly exaggerated and much astroturfed. Marriage rates have been dropping for decades because Koreans in their 20s and 30s cannot maintain the economic expectations of their parents. (Korean norms require a condo before marriage, when the going rate for condos is 30x median salary. Young people usually start their careers at below-median salaries.)

The way the "gender war" appeared was that Yoon was more popular among men, and this was reported in the international news, then Korean news reporters reported on the international news, legitimizing the story of a gender gap.

This primed Korean journalists to look for further signs of conflict between the genders, which were then amplified out of proportion by international journalists looking for a story. Korean journalists see the international stories as more trustworthy, and now they report as if there is a gender war.

There is a heavy selection bias among journalists to look for spicy gender stories, where the actual participants are the fringe of an online "movement". The Korean press club doesn't seem to understand or account for these biases. In real life there isn't much "war".


crypto-Christian cult

Had to look that up and wasn’t disappointed…


Seemingly nothing to do with Bitcoin either :-)


I was disappointed b/c it was not clear. OP could have instead stated:

"...former president Park Geun-hye was secretly a religious cult member."

which is shorter and clearer.


But "crypto" is Greek, everybody loves subrosa communication.


> Imagine the Western online MRA/redpill/incel vs. radfem circles but as core identities in national politics.

I have zero idea about SK, but ... "woke vs not woke" has become very much a core identity part in Western politics. The last US election has proven that, and what's going on here in Germany especially with Markus Söder isn't funny any more either [1], we got elections looming in about three months.

[1] https://www.zeit.de/politik/deutschland/2024-11/markus-soede...


Trump would've won even if his platform had "lets add fluoride to the water to turn the frogs gay".

Pretty much every "Western" election voted out the incumbent due to their either poor economy or poor messaging about the economy. The swing-PA voter didn't vote out Biden because their daughter was dominating trans-kids in sports; they want grocery prices down.


In the context of western politics, "woke vs not woke" is just today's name for progressive (or even status quo) vs reactionary politics. What's being referred to in SK is somewhat more specific to incels and the 4B movement (which is more radical than how westerners have been using it).


I've read in multiple places that 4B is just an online thing, maybe a couple of thousand women, that has blown up in western news media and not a real life thing in Korea.


you are correct. Western news media wants it to be bigger than it is right now, because if you check the last month or so, the only people ever talking about 4B movement are western liberals and hmm i wonder why Western Media outlets are obsessed with talking about 4B since trump became into power. Dubious motives.


Trump hasn't come into power yet. He doesn't take office until near the end of January.


Koreans don't do things by halves.


Is there anywhere good I could read about that "gender war"? I've heard about it before, but usually only narrativized as one-sided (e.g. discussing feminist policies)


It was the religious thing and not all the other scandals that caused the low trust? That's fascinating!


Corruption probe, a weird cult, and now apparently a coup d’etat in the making.


Blender has the famed Donut Tutorial, which is a blast if you're even slightly interested in learning about it.

Does Unreal have something similar?


Your First Hour in Unreal Engine 5.2: https://www.youtube.com/watch?v=peUO_55ck4o (link to course is in description)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: