Because the article shows it isn't Gemini that is the issue, it is the tool calling. When Gemini can't get to a file (because it is blocked by .gitignore), it then uses cat to read the contents.
I've watched this with GPT-OSS as well. If the tool blocks something, it will try other ways until it gets it.
How can an LLM be at fault for something? It is a text prediction engine. WE are giving them access to tools.
Do we blame the saw for cutting off our finger?
Do we blame the gun for shooting ourselves in the foot?
Do we blame the tiger for attacking the magician?
The answer to all of those things is: no. We don't blame the thing doing what it is meant to be doing no matter what we put in front of it.
It was not meant to give access like this. That is the point.
If a gun randomly goes off and shoots someone without someone pulling the trigger, or a saw starts up when it’s not supposed to, or a car’s brakes fail because they were made wrong - companies do get sued all the time.
But the LLM can't execute code. It just predicts the next token.
The LLM is not doing anything. We are placing a program in front of it that interprets the output and executes it. It isn't the LLM, but the IDE/tool/etc.
So again, replace Gemini with any Tool-calling LLM, and they will all do the same.
When people say ‘agentic’ they mean piping that token to various degrees of directly into an execution engine. Which is what is going on here.
And people are selling that as a product.
If what you are describing was true, sure - but it isn’t. The tokens the LLM is outputting is doing things - just like the ML models driving Waymo’s are moving servos and controls, and doing things.
It’s a distinction without a difference if it’s called through an IDE or not - especially when the IDE is from the same company.
That causes effects which cause liability if those things cause damage.
Because it misses the point. The problem is not the model being in a cloud. The problem is that as soon as "untrusted inputs" (i.e. web content) touch your LLM context, you are vulnerable to data exfil. Running the model locally has nothing to do with avoiding this. Nor does "running code in a sandbox", as long as that sandbox can hit http / dns / whatever.
The main problem is that LLMs share both "control" and "data" channels, and you can't (so far) disambiguate between the two. There are mitigations, but nothing is 100% safe.
Sorry, I didn't elaborate. But "completely local" meant not doing any network calls unless specifically approved. When llm calls are completely local you just need to monitor a few explicit network calls to be sure.
The LLM cannot actually make the network call. It outputs text that another system interprets as a network call request, which then makes the request and sends that text back to the LLM, possibly with multiple iterations of feedback.
You would have to design the other system to require approval when it sees a request. But this of course still relies on the human to understand those requests. And will presumably become tedious and susceptible to consent fatigue.
This is the reason I use AWS wrappers (render.com/fly.io) for small projects.
It may be more expensive but you can't pop the free tier/selected machine.
Really? I am trying to put my customer/mum hat on and but is this really true.
How many OS would you need to really support? 4 or 5?
On top of that docs/knowledge would be more standardised. Google/peers/family would help people more.
Tell me you've never worked in tech support without telling me you've never worked in tech support.
My experience in a tech support center for a software company is that, for the kind of person who calls in to customer support, them having made a Google search first is not something that should ever be assumed. And usually whole offices were chronic customer support users or none of them were—peer support, when present, is already sufficient, and in the offices where everyone is clueless having a system-native color picker isn't going to fix it.
> On top of that docs/knowledge would be more standardised.
If everyone started using the native widgets at once, then maybe external docs would be more helpful, but until that happens your software-specific documentation becomes much harder.
How do you take screenshots of the color picking flow for your documentation? If you just pick a browser to screenshot then you will get calls from people using a different browser who are confused that it looks different for them. If you screenshot every supported browser then your documentation becomes much more expensive to create and maintain.
I once had a problem with my laptop, which was a problem with the drive. I pulled it out and duplicated the problem on a different laptop, so I needed to get a replacement. I kept mum and went through all the steps I was instructed to (reboot with this or that key held down, etc) until finally support said “well sorry, we’ll have send you a box for you to send back the laptop”. It would have been useless and annoying to the person on the other end of the phone to dry to skip all that. Like doctors, they must deal with a lot of people who studied at the university of Google and think they know it all.
I have a few times sent in bug reports on software I had previously worked on myself. Again, just file it like any other bug. Usually the bug just gets fixed (or not) but I did once get mail from a former colleague who said he was assigned my bug and how the hell was I? Sadly he also told me, “we aren’t going to fix it.” :-/
Of course most of the time I don't know any more than the next schlub. Otherwise I wouldn't have called.
I'm at a point where I just treat the front-line CS person like a fellow engineer and tell them exactly what's wrong, and why I know that.
I've actually had pretty good success with this strategy, though it really depends on the company. Framework laptops and system76 for example were both phenomenal with this approach. The first reply I got from them was either an engineer or someone who talked to one, or someone very experienced in CS who would be a good candidate for engineering.
Worst case if the CS person has no idea what I'm talking about, then we start from scratch but at least they know I'm not a dumbass they can BS :-)
Most of them have to walk through a decision tree in response to their computer and don’t know about the domain anyway, so I don’t want to waste their time.
In my experience this is, unfortunately, true. I see it from both sides and would prefer the native implementations myself, but I've never worked with a customer who agrees.
I start by saying, your customer who uses an iPhone is never going to use an Android, and vice versa, so there is no need to keep them consistent and identical looks and design wise. You should use the native items as much as possible because a random user is more likely to understand the common system version than they will understand your bespoke version. Use the native share icon, don't use the iOS share icon on Android, etc.
Also iOS tends to be way more consistent than android, windows, etc, so there could be a case for iOS native and 'company consistent' for everything else, especially if you're in the USA. iOS people pay more and it could be worth it to have two branches for customer support if it leads to total better conversions and thus more profits. Your business's core competency is not making UI toolkits, it's selling whatever your making. Leverage the literal billions of dollars apple and google invest into the core UX toolkits.
The parent was complaining about needing "either [...] a mobile phone [...] or some custom device", which is not true. And sure, Authy is a third-party; but it's not the only option, and you can implement your own (TOTP is not that complicated).
And TOTP has much better user experience than raw keys, especially for beginners who might mix the public/private parts, and experts who want hardware protection.
Actually my main concern is the reliance on 3rd parties - requiring a mobile phone is an implicit reliance on a lot of 3rd parties that IMO should not have any business where/how i authenticate myself.
I don't know about TOTP but if it can be completely independent from 3rd parties and can be used locally like private+public key signatures can then i guess it is fine.
I think react native changed the hybrid landscape and they jumped onboard. As a dev who works with RN a lot I find SwiftUI to be a pleasure. Just a lot of legacy to deal with
Pretty much all community types come from a single repository with a consistent naming scheme. For example, if you use “lodash”, types are available via “@types/lodash” from DefinitelyTyped (the DT repo is now maintained by the typescript team). This is even tracked on npm. VS code can just see if “@types/$package” exists and then prompt you to add it
I'd argue that if you're not packaging your software or testing your software's dependencies, either you're doing something extremely exotic that lies far outside anyone's happy path or "dylib error" should not even be a keyword in your vocabulary.
DLL Hell ceased to be a practical concern over a decade ago, particularly given that Windows provides tight control over its dynamic linking search order.