Hacker Newsnew | past | comments | ask | show | jobs | submit | mklond's commentslogin

Apologies for that. We had about 8 keys in rotation, but eventually ran out of phone numbers to create new OpenAI accounts + fresh accounts have super low rate limits for 2 days. We had a rate limit increase now, so this should be less of an issue.

Will release a new level soon as well :-)

PS: in case it wasn’t clear I’m on the Lakera team.


for activations you can just use https://smspva.com/


Gets trickier at the higher levels, but all of Gandalf's defenses are hand crafted at the moment. Can probably be made much more secure. Lots of interesting discussions happening here: https://news.ycombinator.com/item?id=35905876


Prompt injection beautifully explained by a fun game.

https://gandalf.lakera.ai

Goal of the game is to design prompts to make Gandalf reveal a secret password.


Discussed here:

Gandalf – Game to make an LLM reveal a secret password - https://news.ycombinator.com/item?id=35905876 - May 2023 (267 comments)


That's really cool. I got the first three pretty quickly but I'm struggling with level 4.


lvl4 starts getting harder since it evaluates both input and output

see https://news.ycombinator.com/item?id=35905876 for creative solutions (spoiler alert!)


Live stats from #Gandalf:

Ratio of successful prompts:

Level 1: 54% Level 2: 22% Level 3: 9% Level 4: 2% Level 5: 13% Level 6: 21%

Level 7: 1.5%

-> Don’t give up at level 4, if you crack that, you have a good shot at making it to Level 7. But will you be one of the lucky few to beat Gandalf Level 7?


Wait.. Level 1 is only 54%??


Transformers seem great. IMO the big challenge is still to better understand how they perform in operation.


Thanks for your comment. The whole field of run-time monitoring is concerned with this problem. It's a tough one to crack when the distribution changes are subtle, but you can and should at least check simple data attributes for consistency.


Would love to hear what other testing techniques people use for machine learning? Are there great testing frameworks out there that people use?


Thanks for this resource. I haven't read it, but will definitely have a look at what the EU has come up with.


"High-risk AI systems should bear the CE marking to indicate their conformity with this Regulation so that they can move freely within the Union"

That would be something!


That every programmer today can build and train an ML model is one of the biggest advancements of ML engineering in the past 10 years.

But as you say it's GIGO, the difficulty today is to know what to feed it and to know what that means for the real life performance. There are no great tools for that yet.


> the difficulty today is to know what to feed it and to know what that means for the real life performance.

This has always been the difficulty.

Generalization is the fundamental problem in machine learning. Making easily available tools has led to an exponential growth in applications as more people play with it (many without understanding what they are doing or why), but predictably hasn't lead to an exponential growth in successful applications.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: